VAI v3.3.8 crashes on Win10 | Geforce 1080 | model: apo-v8-fp32-1152x1344-ox

When trying to do a framerate conversion from 15 → 30 FPS using Apollo (v8), ffmpeg crashes.
Apollo fast works and so do the Chronos variants.

  • The input clip is yuv420p and resolution 1280x960. The model VAI picks that crashes is apo-v8-fp32-1152x1344-ox.tz
  • When rescaling the input clip to half the size (640x480) and trying Apollo again from UI, no crash happens. A different model is then used by VAI (apo-v8-fp32-480x384-ox.tz)

The key bit of the error message:

2023-07-27 14-12-02.529 Thread: 23228 Info OUT: 2 2023-07-27 14:12:02.5281511 [E:onnxruntime:, sequential_executor.cc:494 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_0_3 node. Name:'DmlFusedNode_0_3' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\FusedGraphKernel.cpp(397)\onnxruntime.dll!00007FF8C43E92B0: (caller: 00007FF8C43E7575) Exception(2) tid(5600) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.
2023-07-27 14:12:02 22016  CRITICAL:  ONNX problem: Run: Non-zero status code returned while running DmlFusedNode_0_3 node. Name:'DmlFusedNode_0_3' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\FusedGraphKernel.cpp(397)\onnxruntime.dll!00007FF8C43E92B0: (caller: 00007FF8C43E7575) Exception(2) tid(5600) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.
2023-07-27 14:12:02 22016  CRITICAL:  Model couldn't be run for outputName ["out"]
2023-07-27 14-12-02.529 Thread: 23228 Info OUT: 2 
2023-07-27 14:12:02 22016  CRITICAL:  Unable to run model with index  0  it had error:  no backend available
2023-07-27 14-12-02.530 Thread: 23228 Info OUT: 2 
2023-07-27 14:12:02 22016  CRITICAL:  Caught exception in tile processing thread and stopped it msg: unable to run model with index 01

Smells like GPU OOM since the GPU RAM utilization right before the crash was ~6GB (out of a total 8GB - 0.5GB for other windows programs claiming their share).

E.g. a bug where VAI fails to correctly calculate whether or not the full model and frame buffers will fit into GPU memory. Or perhaps the model is corrupt. Tried deleting the model file and have VAI re-download it several times, so if it’s a model corruption, it would follow that it’s server-side.

_PS. If you’d like to get bug reports via a different channel, let me know the appropriate one. Some details in the “logsForSupport.tar.gz” file is sensitive and falls under EU GDPR regulation. As such should only be handed to you with care and not widely published like here, in order for you to be compliant with the regulation.

logsForSupport.tar.gz (58.6 KB)
_

I tested this also on a beefier graphics card (RTX 3080 on linux/ubuntu), and it definitely seems to be a GPU OOM bug with handling of this specific model. On Linux the fp16 variant was chosen instead by VAI, but even with the reduced parameter sizes, the memory consumption spiked to 20 GB during initialization, before dropping down to a steady 2.3 GB during the model run. 20 GB (or more for fp32) is clearly not going to fit into a GTX 1080 GPU.

$ ffmpeg -hide_banner -y -i "a.mkv" -flush_packets 1 -sws_flags spline+accurate_rnd+full_chroma_int -color_trc 2 -colorspace 2 -color_primaries 2 -filter_complex tvai_fi=model=apo-8:slowmo=1:rdt=-0.000001:fps=30:device=0:vram=1:instances=1,tvai_up=model=amq-13:scale=1:blend=0.2:device=0:vram=1:instances=1 -c:v h264_nvenc -profile:v high -preset medium -pix_fmt yuv420p -b:v 0 -map_metadata 0 -movflags use_metadata_tags+write_colr -map_metadata:s:v 0:s:v -map_metadata:s:a 0:s:a -c:a copy "a-apo8.mp4"

 Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 4@16x RX: 72.27 MiB/s TX: 15.62 MiB/s
 GPU 1905MHz MEM 9501MHz TEMP  68°C FAN  74% POW 378 / 390 W
 GPU[||||||||||||||||||||||||||||||||97%] MEM[||||||||||||||||||20.983Gi/24.000Gi]
   ┌────────────────────────────────────────────────────────────────────────────────────┐
100│GPU0 %                            ┌───────────────────┐ ┌───────────────────────────│
   │GPU0 mem%                       ┌─┘                   └─┘                          ┌│
 75│                                │                                                  ││
   │                                │                                                  ││
   │                                │                                                  ││
 50│                                │                            ┌───┐                 ││
   │                                │                            │   │                 ││
 25│                                │┌─┐ ┌───┐ ┌─────────────────┘   └───────────┐ ┌───┘│
   │                                ││ └─┘   └─┘                                 └─┘    │
  0│────────────────────────────────┴┘                                                  │
   └────────────────────────────────────────────────────────────────────────────────────┘
    PID USER DEV    TYPE  GPU        GPU MEM    CPU  HOST MEM Command
1982729 root   0 Compute  96%  21158MiB  86%    13%   2019MiB ffmpeg -hide_banner -y -i a.

It would be great if you could track down the issue of why the model initialization peaks so extremely in terms of memory use.

PS: I even got an OOM on the RTX 3080 when I combined the “apo-v8-fp16-1152x1344-ox.tz” (frame rate conversion) with amq-13 (image enhancement).

$ ffmpeg -hide_banner -y -i "a.mkv" -flush_packets 1 -sws_flags spline+accurate_rnd+full_chroma_int -color_trc 2 -colorspace 2 -color_primaries 2 -filter_complex tvai_fi=model=apo-8:slowmo=1:rdt=-0.000001:fps=30:device=0:vram=1:instances=1,tvai_up=model=amq-13:scale=1:blend=0.2:device=0:vram=1:instances=1 -c:v h264_nvenc -profile:v high -preset medium -pix_fmt yuv420p -b:v 0 -map_metadata 0 -movflags use_metadata_tags+write_colr -map_metadata:s:v 0:s:v -map_metadata:s:a 0:s:a -c:a copy "a-apo8-2.mp4"

 Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 4@16x RX: 30.27 MiB/s TX: 1.953 MiB/s
 GPU 1950MHz MEM 9501MHz TEMP  60°C FAN  69% POW 335 / 390 W
 GPU[||||||||||||||||||||||||||||||| 88%] MEM[|||||||||||||      9.182Gi/24.000Gi]
   ┌────────────────────────────────────────────────────────────────────────────────────┐
100│GPU0 %┌─────────────────────────────────────────────────────────────────────┐   ┌─┐ │
   │GPU0 mem%                ┌─┐                                                └───┘ └─│
 75│      │                  │ │                                                 ┌───┐  │
   │      │                  │ │                                                 │   │  │
   │      │                  │ │                                                 │   │  │
 50│      │                  │ │         ┌─┐                             ┌───┐ ┌─┘   └─┐│
   │      │                  │ │         │ │                             │   └─┘       └│
 25│      │┌─┐ ┌─────┐ ┌─┐ ┌─┘ └─┐ ┌─────┘ │ ┌───────────────────┐ ┌─────┘              │
   │     ┌┼┘ └─┘     └─┘ └─┘     └─┘       └─┘                   └─┘                    │
  0│─────┴┘                                                                             │
   └────────────────────────────────────────────────────────────────────────────────────┘
    PID USER DEV    TYPE  GPU        GPU MEM    CPU  HOST MEM Command                     
1983289 root   0 Compute  94%  13752MiB  56%   139%   2550MiB ffmpeg -hide_banner -y -i a.

Error log:

[swscaler @ 0x55f52bb52dc0] [swscaler @ 0x55f52d935480] No accelerated colorspace conversion found from yuv420p to rgb48le.
2023-07-27 13:57:44 140408069296128  INFO:  ---TBlockProc::TBlockProc W: 1344 H: 1152 C: 1 R: 1 X: 0 Y: 0
2023-07-27 13:57:44 140408069296128  INFO:  ---TBlockProc::TBlockProc W: 1344 H: 1152 C: 1 R: 1 X: 0 Y: 0
2023-07-27 13:58:19 140408069296128  INFO:  ---TBlockProc::TBlockProc W: 672 H: 576 C: 2 R: 2 X: 608 Y: 384
2023-07-27 13:58:19 140408069296128  INFO:  ---TBlockProc::TBlockProc W: 672 H: 576 C: 2 R: 2 X: 608 Y: 3842023-07-27 13:58:23.890303959 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=ryzen ; expr=cudaMalloc((void**)&p, size);
2023-07-27 13:58:23 140392716541952  CRITICAL:  ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=ryzen ; expr=cudaMalloc((void**)&p, size);
2023-07-27 13:58:23 140392716541952  CRITICAL:  Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:23 140392716541952  CRITICAL:  Unable to run model with index  0  it had error:  no backend available
2023-07-27 13:58:23 140392716541952  CRITICAL:  Caught exception in tile processing thread and stopped it msg: std::exception12023-07-27 13:58:24.512706881 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392741720064  CRITICAL:  ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392741720064  CRITICAL:  Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:24 140392741720064  CRITICAL:  Unable to run model with index  0  it had error:  no backend available
2023-07-27 13:58:24 140392741720064  CRITICAL:  Caught exception in tile processing thread and stopped it msg: std::exception22023-07-27 13:58:24.537308021 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392724934656  CRITICAL:  ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392724934656  CRITICAL:  Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:24 140392724934656  CRITICAL:  Unable to run model with index  0  it had error:  no backend available
2023-07-27 13:58:24 140392724934656  CRITICAL:  Caught exception in tile processing thread and stopped it msg: std::exception2terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided

It looks like your machine is running out of RAM. When you open TVAI your machine has slightly under 11 GB of RAM left and this RAM is being exceeded and during your exporting queue.

Do you mean system RAM or GPU RAM?

Where did you get 11 GB from?

This?

2023-07-27 14:03:58 15080  INFO: RAM 31.94 GB Total / 21.1339 Free/Used

I still get the same crash when I free up memory, so that there is over 20 GB free

2023-07-27 16:20:35 20692  INFO: RAM 31.94 GB Total / 9.35011 Free/Used

So clearly not the reason.
Did you even look at the DML error? The crash is coming from DML, not the host kernel’s OOM catcher.

EDIT: Oh, and please don’t mark your replies as a “solution”.
A solution has two parts to it:

  1. it contains a suggestion on how to resolve the problem, and
  2. contains a confirmation from the customer that the advice solves the problem.

You didn’t provide any advice, and you did not wait for a confirmation whether or not the [non-] proposal addressed the problem.

Yes, RAM;

The app is closing because you only have 11 GB of RAM available when the app is opened which is being quickly exceeded by Topaz Video AI.

Please cease the use of other RAM-intensive apps and try again. If the issue persists, share the new log.

All I have to go off of is the information and logs provided and at this time, all of the failures shared are due to not enough RAM.

In reply to the comment about marking this as resolved; in general, this section is for bugs not processing errors and app crashes. For these cases, you will want to open a support ticket, however, we can continue to work on this in this thread, however, this does not appear to be a bug.

  1. Programs crash because of bugs.

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways

How do you define a bug?

  1. As I wrote, I got the same crash also when I freed up system memory, having 22 GIB free when TVAI was launched.

  2. I could reproduce the bug on Linux, which the nvtop data provided shows; The sharp spike during initialization of this specific model (20 GiB GPU RAM consumed), before dwindling down to the expected 2-3 GiB range that other models consume. That initial schock/spike happens on both windows and Linux, and nothing but a RTX [34]090 can survive that.

  3. The exact amount of GPU RAM utilization each time this specific model is initialized varies, so the behavior is stochastic. 1/5 times I get an ONNX GPU OOM also on the 3090.

  4. Since the crash originates from the GPU as evidenced by the error message above, the error message provided in the log, and a similar error message from linux (with more informative info from CUDA than DirectML), also posted above [1], the signs point at GPU RAM being exhausted. ONNX runs on GPU. CUDA runs on GPU, that’s where the tensors are located. As such I don’t see what CPU RAM has to do with this issue.

The problem I reported was that GPU memory spikes during initialization of this specific model, spiking to an extent that leads to the ONNX GPU OOM situation. Not a system RAM OOM situation. Since OOM happens on the largest consumer card available as well, that is a strong indication that there is a fault (bug) where the TVAI engine isn’t checking if GPU memory is available before it tries to load and initialize this particular model at least.

Do you now understand the problem?

[1]: CUDA failure 2: out of memory ; GPU=0 ; hostname=ryzen ; expr=cudaMalloc((void**)&p, size);

If you were able to reproduce the crash when the RAM was free then this would not be the same issue and I would need to take a look at the new logs from any machine you reproduced this behavior on.

It is the same issue. Reproduced on two different machines.
The exact same crash message on the windows machine with CPU ram freed up as mentioned, which was the one I opened this ticket for. So it just confirms the ffmpeg crash was not due to lack of CPU RAM which was your initial answer.

As such the data I already provided is still valid and up-to-date save for the free CPU RAM number (which has been confirmed as irrelevant to this issue).

Can you please post both logs from the two different machines here?

It seems that you may be latching on to information in the logs that does not mean what you think it means. The logs are not meant to be read by users since these are created for our developers, however, there is always room for our understanding to be incorrect.

I would like to compare both to be sure we are not missing anything.