The problem is you are trying to use a system for something it was never intended for. Persistent notifications were only ever intended for long running background services.
You're just moving the goal posts. I ran an LLM on device in an Android app I built a month ago. Does that make me first to do it? No. They are the first to production with an actual product.
LLMs and diffusion models have been in apps for months.
Show me a single example of an app that has an LLM on device. Find a single one that isn't making an API call to a powerful server running the LLM. Show me the app update that adds a multi gigabyte LLM into the device. I'll wait...
Feel free to not respond when you realize you are wrong and you have no clue what everyone else is talking about.
At a glance I was confused/angry why this would only be for the Pixel 8 Pro and not the standard Pixel 8 considering they both have the same Tensor G3.
However, (from my own testing) it seems very likely the full 12 GB of ram the Pro has (vs the 8GB in the Pixel 8) is needed for some of these tasks like summarization.
Gemini Nano now powers on-device generative AI features for Pixel 8 Pro
Technically auto complete can be considered Gen AI, but it obviously lacks the creativity that we all associate with Gen AI today. You don't need a model that is generally useful to do auto complete.
The point is it didn't take a generally useful Gen AI model to do auto complete before but Google is now shipping features (beyond auto complete) that use such a model. Gen AI on device is novel.
No, that might be accurate for what they are talking about. The absolute smallest Generative AI models (that are generally useful) are starting to shrink but are still several GB in size. Doing this on device is actually new.
The problem is you are trying to use a system for something it was never intended for. Persistent notifications were only ever intended for long running background services.