I tested this also on a beefier graphics card (RTX 3080 on linux/ubuntu), and it definitely seems to be a GPU OOM bug with handling of this specific model. On Linux the fp16 variant was chosen instead by VAI, but even with the reduced parameter sizes, the memory consumption spiked to 20 GB during initialization, before dropping down to a steady 2.3 GB during the model run. 20 GB (or more for fp32) is clearly not going to fit into a GTX 1080 GPU.
$ ffmpeg -hide_banner -y -i "a.mkv" -flush_packets 1 -sws_flags spline+accurate_rnd+full_chroma_int -color_trc 2 -colorspace 2 -color_primaries 2 -filter_complex tvai_fi=model=apo-8:slowmo=1:rdt=-0.000001:fps=30:device=0:vram=1:instances=1,tvai_up=model=amq-13:scale=1:blend=0.2:device=0:vram=1:instances=1 -c:v h264_nvenc -profile:v high -preset medium -pix_fmt yuv420p -b:v 0 -map_metadata 0 -movflags use_metadata_tags+write_colr -map_metadata:s:v 0:s:v -map_metadata:s:a 0:s:a -c:a copy "a-apo8.mp4"
Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 4@16x RX: 72.27 MiB/s TX: 15.62 MiB/s
GPU 1905MHz MEM 9501MHz TEMP 68°C FAN 74% POW 378 / 390 W
GPU[||||||||||||||||||||||||||||||||97%] MEM[||||||||||||||||||20.983Gi/24.000Gi]
┌────────────────────────────────────────────────────────────────────────────────────┐
100│GPU0 % ┌───────────────────┐ ┌───────────────────────────│
│GPU0 mem% ┌─┘ └─┘ ┌│
75│ │ ││
│ │ ││
│ │ ││
50│ │ ┌───┐ ││
│ │ │ │ ││
25│ │┌─┐ ┌───┐ ┌─────────────────┘ └───────────┐ ┌───┘│
│ ││ └─┘ └─┘ └─┘ │
0│────────────────────────────────┴┘ │
└────────────────────────────────────────────────────────────────────────────────────┘
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
1982729 root 0 Compute 96% 21158MiB 86% 13% 2019MiB ffmpeg -hide_banner -y -i a.
It would be great if you could track down the issue of why the model initialization peaks so extremely in terms of memory use.
PS: I even got an OOM on the RTX 3080 when I combined the “apo-v8-fp16-1152x1344-ox.tz” (frame rate conversion) with amq-13 (image enhancement).
$ ffmpeg -hide_banner -y -i "a.mkv" -flush_packets 1 -sws_flags spline+accurate_rnd+full_chroma_int -color_trc 2 -colorspace 2 -color_primaries 2 -filter_complex tvai_fi=model=apo-8:slowmo=1:rdt=-0.000001:fps=30:device=0:vram=1:instances=1,tvai_up=model=amq-13:scale=1:blend=0.2:device=0:vram=1:instances=1 -c:v h264_nvenc -profile:v high -preset medium -pix_fmt yuv420p -b:v 0 -map_metadata 0 -movflags use_metadata_tags+write_colr -map_metadata:s:v 0:s:v -map_metadata:s:a 0:s:a -c:a copy "a-apo8-2.mp4"
Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 4@16x RX: 30.27 MiB/s TX: 1.953 MiB/s
GPU 1950MHz MEM 9501MHz TEMP 60°C FAN 69% POW 335 / 390 W
GPU[||||||||||||||||||||||||||||||| 88%] MEM[||||||||||||| 9.182Gi/24.000Gi]
┌────────────────────────────────────────────────────────────────────────────────────┐
100│GPU0 %┌─────────────────────────────────────────────────────────────────────┐ ┌─┐ │
│GPU0 mem% ┌─┐ └───┘ └─│
75│ │ │ │ ┌───┐ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
50│ │ │ │ ┌─┐ ┌───┐ ┌─┘ └─┐│
│ │ │ │ │ │ │ └─┘ └│
25│ │┌─┐ ┌─────┐ ┌─┐ ┌─┘ └─┐ ┌─────┘ │ ┌───────────────────┐ ┌─────┘ │
│ ┌┼┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ │
0│─────┴┘ │
└────────────────────────────────────────────────────────────────────────────────────┘
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
1983289 root 0 Compute 94% 13752MiB 56% 139% 2550MiB ffmpeg -hide_banner -y -i a.
Error log:
[swscaler @ 0x55f52bb52dc0] [swscaler @ 0x55f52d935480] No accelerated colorspace conversion found from yuv420p to rgb48le.
2023-07-27 13:57:44 140408069296128 INFO: ---TBlockProc::TBlockProc W: 1344 H: 1152 C: 1 R: 1 X: 0 Y: 0
2023-07-27 13:57:44 140408069296128 INFO: ---TBlockProc::TBlockProc W: 1344 H: 1152 C: 1 R: 1 X: 0 Y: 0
2023-07-27 13:58:19 140408069296128 INFO: ---TBlockProc::TBlockProc W: 672 H: 576 C: 2 R: 2 X: 608 Y: 384
2023-07-27 13:58:19 140408069296128 INFO: ---TBlockProc::TBlockProc W: 672 H: 576 C: 2 R: 2 X: 608 Y: 3842023-07-27 13:58:23.890303959 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=ryzen ; expr=cudaMalloc((void**)&p, size);
2023-07-27 13:58:23 140392716541952 CRITICAL: ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=ryzen ; expr=cudaMalloc((void**)&p, size);
2023-07-27 13:58:23 140392716541952 CRITICAL: Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:23 140392716541952 CRITICAL: Unable to run model with index 0 it had error: no backend available
2023-07-27 13:58:23 140392716541952 CRITICAL: Caught exception in tile processing thread and stopped it msg: std::exception12023-07-27 13:58:24.512706881 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392741720064 CRITICAL: ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392741720064 CRITICAL: Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:24 140392741720064 CRITICAL: Unable to run model with index 0 it had error: no backend available
2023-07-27 13:58:24 140392741720064 CRITICAL: Caught exception in tile processing thread and stopped it msg: std::exception22023-07-27 13:58:24.537308021 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392724934656 CRITICAL: ONNX problem: Run: Non-zero status code returned while running Conv node. Name:'fnet/autoencode_unit/decoder_2/conv_1/Conv/BiasAdd' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2999025170055637504
2023-07-27 13:58:24 140392724934656 CRITICAL: Model couldn't be run for outputName ["generator/output:0"]
2023-07-27 13:58:24 140392724934656 CRITICAL: Unable to run model with index 0 it had error: no backend available
2023-07-27 13:58:24 140392724934656 CRITICAL: Caught exception in tile processing thread and stopped it msg: std::exception2terminate called after throwing an instance of 'std::system_error'
what(): Resource deadlock avoided