Users with AMD GPUs please let us know the average performance improvement you are noticing between 3.0.x and 3.1.x of TVAI. Please add the following information to your reply.
GPU utilization using GPU-Z or similar app, CPU, RAM, GPU, VRAM, input resolution, model used, output resolution, fps or slow motion amount.
I have to look at the logs again, I meant that there are 36 fps at the beginning after the start of the processing, for a second or two and then it collapses.
But I may have been mistaken, that’s why the verification.
The performance on my AMD GPU is so slow that I rarely test it, but I might give it a try again. Here are my specs: AMD Ryzen 7 5800U 8 core 16 threads, 16 GB 4,1 Ghz RAM, AMD RX Vega 8 (5000) 2 Ghz, 2 GB VRAM,
Model: | V2.6.4 | RAM | VRAM | CPU-Load | GPU-Load |
---|---|---|---|---|---|
Sec/frame | MB | MB | |||
Proteus | 0.12 | 15300 | 2629 | 10-20% | 2-94% |
Artemis LQ | 0.09 | 15300 | 2690 | 8-30% | 5-90% |
Chronos v3 f | 0.03 | 14500 | 1718 | 19-38% | 6-8% |
Stabilisation | |||||
Gaia HQ | 0.12 | 15500 | 3847 | 5-20% | 20-90% |
Model: | V3.0.12 | RAM | VRAM | CPU-Load | GPU-Load |
---|---|---|---|---|---|
Sec/frame | MB | MB | |||
Proteus | 0.15 | 16300 | 3911 | 9-15% | 5-45% |
Artemis LQ | 0.12 | 15900 | 3802 | 13-20% | 8-70% |
Chronos v3 f | 0.12 | 16300 | 4054 | 10-18% | 6-60% |
Stabilisation | 0.41 | 16500 | 5421 | 5-15% | 9-60% |
Gaia HQ | 0.07 | 16000 | 4817 | 12-20% | 6-100% |
Model: | V3.1.1 | RAM | VRAM | CPU-Load | GPU-Load |
---|---|---|---|---|---|
Sec/frame | MB | MB | |||
Proteus | 0.12 | 15400 | 4726 | 6-13% | 6-80% |
Artemis LQ | 0.08 | 15300 | 4581 | 3-10% | 30-70% |
Chronos v3 f | 0.07 | 15600 | 5643 | 8-15% | 30-70% |
Stabilisation | 0.11 | 16300 | 6727 | 5-15% | 30-80% |
Gaia HQ | 0.09 | 15300 | 5956 | 5-10% | 50-98% |
Closed no active APP | Closed no active APP |
---|---|
RAM | VRAM |
13424 | 1215 |
Input | File: |
---|---|
Res: | 720x576 |
Format: | .mov |
output: | File: |
---|---|
Res: | 1080p |
Format: | .mp4 |
188% scale | H.264 (AMD) |
Sys specs: | ||||||
---|---|---|---|---|---|---|
CPU | AMD | Threadripper | 3960X | 128 GB | ECC on | SMT off 24 cores only |
GPU | AMD | Radeon PRO | W6800 | 32GB | ECC on | |
OS | Microsoft Windows 11 Pro for Workstations | |||||
Version | 10.0.22621 Build 22621 | |||||
MainBoard | PRIME TRX40-PRO | |||||
Radeon PRO | GPU Driver | V22.11.2 | Date 30.11.22 |
GPU Memory Visualisation while Proteus is processing.
Visualisation of Proteus while export
This is outstanding. a very information rich post. Thanks for sharing. i am myself on a 6900xt LC. Will surely try to provide results in similar format. Please keep it coming
Thanks,
if you enter the data in a spreadsheet and copy it here, the spreadsheets will be created automatically in the post.
Thank you @TPX
Still waiting on AMD to get back to me with some details. Hopefully will have the first alphas out in a few weeks.
ok,
whats the Problem?
DirectML or something else?
Sometimes it feels like its the driver, because there is no way to change it.
Most likely, I’m hoping it is something we can work around. The gist of it is that we need to be able to copy data from RAM To VRAM, Process, copy output from VRAM To RAM in a staggered manner, with ORT+DirectML implementation it is not happening, leading to situations of GPU waiting. This is overcome by the multiple process approach, but that’s not really the solution.
There also seems to be some kind of wastage within ORT+DirectML. Lots of possibilities, the hope is to get the same level of performance in a single process that we get on multiple processes currently. We will be looking at RML to see if it provides more boost.
Thanks for the Info!
I did turn on Prefetch of L1 again but it seems to not matter for TVAI.
But its like to put more power on the CPU.
You only notice when working that the screen builds up faster here and there with the prefetch of L1 enabled.
AMD Threadripper 3960X|128 GB ECC on SMT off 24 cores only
AMD Radeon PRO W6800 32GB ECC on
Microsoft Windows 11 Pro for Workstations
Version 10.0.22621 Build 22621
Main Board PRIME TRX40-PRO
Radeon PRO GPU Driver V22.11.2 Date 30.11.22
Model: | V3.1.2.0.b | RAM | VRAM | CPU-Load | GPU-Load | 10 sec preview | AI-Processor |
---|---|---|---|---|---|---|---|
Time: | |||||||
Mem = 50% | |||||||
Sec/frame | MB | MB | |||||
Proteus | 0.09 | 15700 | 2657 | 20-35% | 2-94% | 56 sec | Auto |
Artemis LQ | 0.08 | 15700 | 2625 | 8-30% | 5-90% | 49 sec | Auto |
Chronos v3 f | 0.08 | 15800 | 4945 | 19-38% | 2-94% | 47 sec | Auto |
Stabilisation | 0.22 | 16123 | 3861 | 12-30% | 60-80% | 1m45sec | Auto |
Gaia HQ | 0.08 | 15700 | 6204 | 5-20% | 60-90% | 43 sec | Auto |
Just for Comparison
Nvidia Quadro RTX 5000 / 16GB VRAM (like 2080S), Driver: 528.24
Intel 7820X, 8C/16T, 64 GB Ram
Win 11, PRO
Model: | V3.1.2.0.b | RAM | VRAM | CPU-Load | GPU-Load | 10 sec preview | AI-Processor |
---|---|---|---|---|---|---|---|
Time: | |||||||
Mem = 50% | |||||||
Sec/frame | MB | MB | |||||
Proteus | 0.05 | 11715 | 2584 | 12-20% | 40-50% | 30 sec | Auto |
Artemis LQ | 0.04 | 11234 | 2461 | 12-20% | 60-66% | 22 sec | Auto |
Chronos v3 f | 0.04 | 11584 | 3075 | 12-24% | 55-62% | 37 sec | Auto |
Stabilisation | 0.12 | 12200 | 4816 | 16-30% | 30-45% | 1m40sec | Auto |
Gaia HQ | 0.08 | 10899 | 3763 | 9-16% | 95-97% | 45 sec | Auto |
Model: | V3.1.1 AMD sys | RAM | VRAM | CPU-Load | GPU-Load | 10 sec preview | AI-Processor |
---|---|---|---|---|---|---|---|
Mem = 50% | |||||||
Sec/frame | MB | MB | |||||
Proteus | 0.11 | 16100 | 5183 | 6-13% | 6-80% | 58 sec | Auto |
Artemis LQ | 0.08 | 15900 | 5071 | 13-30% | 30-70% | 46 sec | Auto |
Chronos v3 f | 0.07 | 16200 | 6100 | 20-30% | 30-70% | 58 sec | Auto |
Stabilisation | 0.11 | 17200 | 7200 | 20-30% | 30-80% | 1m45sec | Auto |
Gaia HQ | 0.08 | 15400 | 6420 | 10-20% | 50-98% | 48 sec | Auto |
output: | File: |
---|---|
Res: | 1080p |
Format: | .mp4 |
188% scale | H.264 (AMD) |
Can Topaz distribute test videos and benchmarks to beta testers to investigate what the bottlenecks are by conducting a common test?
Currently, each tester is using different videos and different methods to investigate, so the test results are inconsistent.
For example, if we prepare 480i, 480p, 1080i, and 1080p test videos, perform common tests such as 2x scale and 4x scale, and have them compare the specs of each PC (CPU, GPU, MEM, SSD, HDD…) If we let them compare the results, it would be easier to grasp the cause of the problem.
The benchmark results obtained would also be useful for users to consider what they can do to improve the speed of their PCs.
I agree. Suggested something similar as well some time ago.
I have always wanted it too
Will just add a benchmark menu item that will just run a few models and report numbers, along with system info so they are easy to post.
Will do this once all the performance improvements are finished.
TVAI will become a new benchmark tool for GPUs and CPUs.
At the same time when you will bring back the model manager menu, right?
Here is a version for anyone with the issue to test:
This topic was automatically closed after 5 minutes. New replies are no longer allowed.