Thanks for adding your script to the post. I wouldn't worry too much about it being specific for your setup, no one expects you to generalize it for everyone. It's a great starting point for someone else to customize it for their system. The only aspect you should make sure is that it doesn't leak any of your sensitive information.
At a certain point, layers will be pushed to RAM leading to incredibly slow inference. You don't want to wait hours for the model to generate a single response.
He has EA in his name ffs, how do people fall for it smh