The failure rate of this version is a bit high. Could you add an automatic retry parameter to specify how many times it should rerun automatically after a failure?
I can understand Topaz’s move, what makes me shake my head again is the bad communication, price confusions and above all they have released a buggy studio version.
Dear Topaz team, why don’t you test better before a release, just wait a bit and don’t release it right away.
Communication!
Had there been clarity in the initial email (that some of us received), these forums would have a small fraction of the posts wondering how much it is going to cost us.
There is still time for an official, somewhat simple ‘sticky’ post to provide this information. It’s mind-blowing that the CEO of the company has resorted to making individual forum replies to users asking what their subscription price is going to be.
I look forward to a time when this community settles down and we can concentrate on how to use the apps, report problems and suggest improvements.
Experiencing the same installation loop.
So far I rendered some old footage with the new Starlight Sharp.
I love the results. And I love the significant higher render speed of this optimized Starlight version.
Problem is: the preview during rendering causes mostly a crash of Topaz Studio Video.
The app restarts the rendering on the last render point after reopening the app, but it not results in a finished SLS video. There are one the end only video fragments (mostly without audio) in the output folder.
So, I don’t use furthermore the preview function in order to ensure a ready rendered SLS video files.
So the app needs some adjustments.
Which GPU do you have?
I need to upgrade my 4070, and I’m currently looking at 5070ti as the most affordable 16Gb GPU.
I am going to try and uninstall and reinstall the new 1.0.0 because it totally slows-down my computer now. I have the same settings as before. VRAM setting is still set to 70% just like I have on version 7.14 or whatever that last Video AI version is.
I have this problem as well: I cannot close out the program without getting stuck, and then have to redownload Starlight model every time I restart…
Is anyone experiencing the “this model is only available in the cloud” after downloading Starlight?
I can’t seem to get passed this error no matter how many times it redownloads.
shut down topaz video, go in your documents folder, Topaz Video folder and delete the project folders, and then restart Topaz Video.. usually that resolves the issue.
I noticed there’s some type of issue when you have older project file and for some reason the new session gets corrupted by it. And deleting the project folders helps.
I do wish there was an option to delete the project file on program close. Since I noticed the project folder gets corrupted. especially if you are program crashed or had an issue encoding. Then you definitely want to delete the project folder and restart the Topaz Video app.
Despite the weak yen, I’ve been paying out the nose to support TVAI with my business because I see its potential for legacy media preservation and I wanted to support the company developing the technology, but I will not support subscription-only software under any circumstances. Since this is the path your business has chosen to go down, it has lost me as a paying customer. Reconsider, and both me and my wallet will be back, assuming a competitor doesn’t win me over in the meantime. Otherwise, so long and–well, I was going to say good luck, but frankly, I don’t care anymore.
A suggestion for Starlight local
It would be great to have more technical information and control over how the software is using memory (VRAM/DRAM).
The difference in render time between a run that just fits in VRAM and a run that is a few 100MB over is (as we all know) massive.
At a minimum, it would be very helpful to understand how TVAI is using memory with it’s default settings. If you can’t get to any new user controls right now, at least provide this. Clever workarounds could be developed in the meantime.
In my case, at the moment I’m running a job on a 5090 with 10GB of ununsed VRAM.
This machine has another GPU that is used for the display, so the 5090 (and it’s VRAM) are 100% dedicated to TVAI (although my memory slider is at 90%).
I estimated this job to take 8.7 hours, but, once it began, the estimate was 1 day 4 hours.
This is because the model is falling back to DRAM. As I watch the DRAM usage fluctuate, it looks like the model could have fit into VRAM.
It’s not entirely clear how the user could manage this situation since there is no official guidance on this.
All we have right now is a slider 0-100%.
Do we put that at 100%?
It seems the “unofficial” guidance is that keeping it well-off 100% is recommended which comes via crashes, tech support help and user testing feedback.
It would be nice to have some sophisticated insight on this topic given the vast amounts of time and money wrapped up in that 1 setting.
Some things I would like to know are:
-
Could a user force a smaller model version that might produce 95% of the ultimate result, but fits in VRAM and finishes 3x faster? Could the user be given another slider that indicates how strongly they prefer speed over quality?
-
Could a user actually put the slider to 100% if the GPU has no other work except TVAI processing?
-
Conversely, in the situation I described above with 10GB of free VRAM and still going to DRAM, would I have been better off setting the slider to 60-70% to trick the software into using less VRAM. Some users say yes…
-
Should the Re-siziable BAR (Base Address Register) be enabled or not? I heard that for some products/games that disabling it can heavily bias staying in VRAM…
-
In the past, I thought the NVIDIA CUDA memory fallback policy was totally black or white. It seems if I were to set it to “no fallback”, and I max out VRAM, the job would 100% fail. But, would it?
What is confusing is that if I wanted to run this job using a 24GB 4090 with memory fallback off, it seems the SLm hardware capability test would pass, but then what? Would the job I’m running now somehow work all in VRAM?
There’s more to this topic, but you get the picture:
It would be great to get some insight and control over the memory.
I think for power users this matters a lot.
While power users may not (yet) be the majority… they may have more at stake… a job on the line along with more time and money invested. Beyond that though, TV wants this group - especially - to be saying to the world that this software is slick PLUS it’s tunable and sophisticated. And, when a product really works well, this group has a lot more potential to spend money (for longer) than the average user.
I’m crossing my fingers.
I “brought” Video AI a year ago (I have read there is a thing perpetual license?). Anyway, I could not keep using it until I paid $299 for another year. But in my products area is last years purchase saying “owned”. This years “purchase”? isn’t there, but is in orders. I don’t what is going on. With the version 1.0.0, do I have to pay more for that, or do I just download it. Thanks.
Maybe there’s an issue with your account transition as local Starlight normally is blocked behind the „Pro“ paywall except for „founders“.
Contact support.
instead limiting to 90%, it might be better to start Starlight via cmd with just 10GB Vram limit in your case. I don’t have the CLi commands right now, but it’s definitely possible to run Starlight over CMD via Runner.exe and at the end of the command string a value of the GPU VRAM size can be added, you could set 10GB here instead TVAIs 32GB gpu read out, maybe worth a try
I tested the Sharp model. When the model runs, it creates a directory called .triton_cache, where it records all analyses and the model it will run in JSON files. I analyzed it and came to some conclusions.
Sharp uses the Triton implementation for its model TVAI (GitHub - triton-lang/triton: Development repository for the Triton language and compiler)
Triton is a programming language and compiler created by OpenAI. Programs like Topaz Video AI use Triton in the background (via PyTorch) to generate super-fast code specifically optimized for your NVIDIA graphics card.
Triton actually decides how much VRAM and other resources are used. It looks at your hardware, determines what and how much to use, creates a JSON file, and SLS runs on the system accordingly. You can think of it as a kind of runtime prompt.
You set the VRAM to 100%, it decides how much to use, in my case it only uses 4-10 GB of my 16 GB RAM.
I made changes to the JSON file, but it didn’t help much. I especially need to turn off debug by setting it to false, that’s a waste of time.
SLM is currently 3x faster than SLS. However, I analyzed the logs with chatgpt, and it says it’s using less than half the hardware, meaning it’s not running more efficiently because the tuning optimization isn’t complete. The problem is that there’s no tuning file in SLM. In fact, there is, but it’s in an encrypted file and can’t be manipulated. Most likely, the entire tuning section is in the model.part4.bin.enc file in the models directory. But because it’s an encrypted file, we can’t change the settings.
Actually, a new version of Triton has been released: Triton 3.4.0 Release Latest on Jul 31, but the version SLM uses is 3.3.1. Perhaps they’ll update it to the new version in the next update.
This is an example of my json file
{
"hash": "cbd04fe993f8775fdd18202e8675c36562bc3036fd6e8fe26a34877da466c0b3",
"target": {
"backend": "cuda",
"arch": 120,
"warp_size": 32
},
"num_warps": 8,
"num_ctas": 1,
"num_stages": 1,
"num_buffers_warp_spec": 0,
"num_consumer_groups": 0,
"reg_dec_producer": 0,
"reg_inc_consumer": 0,
"maxnreg": null,
"cluster_dims": [
1,
1,
1
],
"ptx_version": null,
"enable_fp_fusion": true,
"launch_cooperative_grid": false,
"supported_fp8_dtypes": [
"fp8e4b15",
"fp8e4nv",
"fp8e5"
],
"deprecated_fp8_dtypes": [
"fp8e4b15"
],
"default_dot_input_precision": "tf32",
"allowed_dot_input_precisions": [
"tf32",
"tf32x3",
"ieee"
],
"max_num_imprecise_acc_default": 0,
"extern_libs": [
[
"libdevice",
"C:\\ProgramData\\Topaz Labs LLC\\Topaz Video\\models\\Lib\\site-packages\\triton\\backends\\nvidia\\lib\\libdevice.10.bc"
]
],
"debug": false,
"backend_name": "cuda",
"sanitize_overflow": false,
"arch": "sm120",
"triton_version": "3.3.1",
"shared": 3072,
"tmem_size": 0,
"global_scratch_size": 0,
"global_scratch_align": 1,
"name": "triton_red_fused__to_copy_native_layer_norm_4"
}
ChatGPT’s recommended best practices for speedup:
- Lower Precision
Currently: tf32 (TensorFloat-32).
My suggestion:
If the application (Topaz / Triton kernel) supports it → choose FP16 or FP8.
RTX 50xx series uses FP8 Tensor Cores very powerfully → speed can increase by 2–3 times.
There may be a small difference in quality, but it is not very noticeable (especially in video).
- Increase num_ctas
Currently: num_ctas=1 → only 1 block works.
My suggestion: num_ctas=2 or 4 → more blocks load the GPU at the same time.
This improves GPU scheduling and pipeline overlapping → speed increases.
- Increase num_stages
Currently: num_stages=1.
My suggestion: num_stages=2 or 3.
This allows for overlapped load + compute (i.e., computing continues while data is being loaded) → latency decreases, throughput increases.
- Increasing the tiles size
If you are currently processing a lot of tiles, the kernel call for each tile introduces overhead.
Use a larger tile (if VRAM allows) → less overhead → fps increases.
- Optimizing the batch size
If the batch is too small → GPU compute is not fully loaded.
If the batch is too large (for example, you said 101) → kernel overhead increases.
Suggestion: medium batch size (32–64) → both VRAM balanced usage and faster.
I have since 6 months a second PC (in the background) with RTX 5090 only dedicated for Topaz Video rendering.
Topaz’s development policy is completely inconsistent. It has been years since it has been going on with it as far as I am concerned with the impossibility of testing well before buying a license. On my PC I have a recent radeon graphics card but it would take an NVIDIA. On my Mac Intel, however, quite recent it works badly and I learn that for the studio version it would now need a Mac with Apple chip. Not everyone can change the computer or graphics card to the sandstone of the fantasy of Topaz and its multiple interface changes where everything must be reappeared, note new bugs rarely paid by your multiple updates that have been lasting for so many years. Too much is too much, so I would not buy a license because of this. You don’t like Apple and it shows. Farewell.
Yes, seems to be a good idea in your case with a fresh/new install of Studio/TVAI.
In my case the software upgrade didn’t cause anything (negative).
I also use mostly the old VRAM settings in order to decrease room temperature.
This actually depends on your budget, but my suggestion is to buy no less than the 5070 TI. That’s what I use. It’s the best model for the price.
5090 > 5080 > 5070 TI