I’ve been seeing a lot of news about TurboQuant’s ability to compress LLM’s. It would be great if the Starlight Fast2 local model and future models could be made to run well on 12GB VRAM. Most people aren’t using cards with 16GB+. Please consider. Good idea? Support?
I don’t know if you saw Topaz’s post in March, but it sounds like they are working on implementing a similar tech already. It’s branded NeuroStream Topaz Labs Introduces Topaz NeuroStream. Breakthrough Tech for Running Large AI Models Locally
1 Like
I had not. Nice to know it’s being used for the Wonder2 model. If it’s on the roadmap for integration into Starlight (or a new Wonder model for video) let me know. Thanks!