TensorRT version for models?

jojje · July 25, 2023, 9:00pm

Question: What is required to load the TensorRT models on Linux (ubuntu)?

After finally having been able to get a dockerized version of VAI ffmpeg to work with the fp16 onnx models, I noticed that Topaz labs is offering TensorRT models as well. In fact, VAI tries to load them even before the fp16/32 onnx variants.

Unfortunately they fail to load for me, using either the official nvidia cuda image, as well as their tensorrt image: nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 image and nvcr.io/nvidia/tensorrt:23.06-py3 respectively.

The error I get:

CRITICAL: TRT Issue ERROR1: [stdArchiveReader.cpp::StdArchiveReader::42] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 232, Serialized Engine Version: 237)
CRITICAL: TRT Issue ERROR4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
CRITICAL: Can’t deserialize CUDA engine
CRITICAL: Loading default error reading file: /opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-288x288-2x-rt806-8517.tz

… and repeats 18 times for each tensor-rt model variant before finally a fp16 onnx model i tried, successfully loaded and executed.

I have the latest version of tensorrt, as indicated by the nvidia docker image I’m using, which comes with TensorRT 8.6.1.6-1+cuda12.0, which is the latest published by nvidia.

My system details, though I don’t think that matters in this case:

OS: Ubuntu 22.04.1 LTS
Kernel: 5.15.0-76-generic x86_64
nVidia Driver: 535.54.03
CUDA version: 12.2
GPU: NVIDIA GeForce RTX 3090

Longer excerpt from the VAI log, leading up to the unmarshalling failure type:

INFO:  Creating file fetcher for: https://veai-models.topazlabs.com REMOTE DIR: /
INFO:  AIEngine mode: Normal 3.0.17
INFO:  model dir path from env TVAI_MODEL_DIR and TVAI_MODEL_DATA_DIR /opt/TopazVideoAIBETA/models/opt/TopazVideoAIBETA/models
INFO:  ModelManager: setting modeldirPath: /opt/TopazVideoAIBETA/models
INFO:  File Fetcher setting localDirPath to /opt/TopazVideoAIBETA/models
...
INFO: Loaded models info map
INFO:  OS VER: 5.15
INFO:  VNNI: 0 AVX: 0 CPUName: AMD Ryzen 9 5950X 16-Core Processor Thread Count: 32
INFO:  Machine id: <snip>
INFO: === System Information ===
INFO: OS Linux Version 5.15
INFO: CPU AMD Ryzen 9 5950X 16-Core Processor Threads 32 AVX 0 AVX2 1 VNNI 0
INFO: Is Apple Processor 0
INFO: RAM 125.699 GB Total / 48.1274 Free/Used
INFO: Machine ID: <snip>
INFO: Device count: 1
INFO: - Index 0 Name NVIDIA GeForce RTX 3090 Cores 0
INFO:  VRAM 24.2402 GB Total / 0 GB Used DataType 2
INFO:  Serial <snip> ComputeLevel: 806 CUDA ID: 0 Visible: 1 Legacy: 0
INFO: ==========================
INFO:  Model found in list amq-13
INFO:  Device auto changes index 0 extra thread count 1 max memory 1 original index 0 extra threads 1 max memory 1 max instances 0
INFO:  Available Backends 308
INFO:  Blocks total: 1 N 18 Blocks count 18
INFO:  Checking block sizes 1803200.0548
INFO:  Make backend 1320352180256tensorrt01
INFO: Computing device instances
INFO:  Apple 0 DEVICES 0
INFO:  DEVICE INFO 1
INFO:  Creating model name v3 instance 0x5560fc8921f0
INFO: NETS ["fgnet-fp16-[H]x[W]-[S]x-rt[C]-[R].tz"] OUTPUTS ["generator/output"]
INFO:  COMPUTED FILE NAME fgnet-fp16-[H]x[W]-[S]x-rt[C]-[R].tz amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz
INFO:  Locating local model: "/opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz" status: 1
INFO: CACHING MAY NOT WORK"/opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz"
INFO:  Loading filename amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz fgnet-fp16-[H]x[W]-[S]x-rt[C]-[R].tz ["generator/output"]
INFO:  Total Devices: 1
INFO:  DEVICE DETAILS: 0 INSTANCES 1
INFO:  PARAM DEVICE: 0
INFO: ModelBackend loading from file
INFO: CACHING MAY NOT WORK"/opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz"
INFO:  Reading model file duration: 0.023 ms
CRITICAL:  TRT Issue ERROR1: [stdArchiveReader.cpp::StdArchiveReader::42] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 232, Serialized Engine Version: 237)
CRITICAL:  TRT Issue ERROR4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
CRITICAL:  Can't deserialize CUDA engine
CRITICAL:  Loading default error reading file: /opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-256x352-2x-rt806-8517.tz

gregory.maddra · July 28, 2023, 9:18pm

Currently we expect the following:

CUDA 11.8
TensorRT 8.5.1.7-1+cuda11.8
CuDNN 8.6.0.163-1+cuda11.8

TensorRT tends to be quite specific about version numbering, so you’ll likely need to downgrade if you’re running the newest versions. That said, there shouldn’t be a need to install TensorRT yourself, we do include the minimal set of libraries for running our TRT models. Is it possible that the version in your docker container is being preferred over those we’re shipping?

Edit: And, just to note, we don’t support 30-series cards for using TensorRT on Linux so it’s not guaranteed the model will load even if you do have the correct versions.

jojje · August 1, 2023, 12:34pm

Which cards do you support for TensorRT then?

And thanks for the specific versions. I’ll try those.

gregory.maddra · August 1, 2023, 3:03pm

TensorRT support on Linux is primarily available for Artemis, Chronos v2, Dione, Proteus, and Themis for these cards:

20 series
40 series
RTX Ada Generation
Tesla A30, A100, L40, L4, or T4

jojje · August 2, 2023, 8:13pm

How unfortunate I am. Got the 3090 3 weeks before nvidia launched the 4090 replacement.
Good to know about the RTX support matrix. I’ll stick to the fp16 versions then and not waste time in vain trying to get RT to work.

PS. perhaps you’ll consider providing support for the 30-series in the future, seeing as it has a lot of users. More than the 40 series at present.

phil.stopford · August 7, 2023, 4:25am

Indeed, I do find it rather a shame, and fairly surprising, that the 30 series cards aren’t supported for this. I would have thought them to be more prevalent in use compared to the more anemic 20 series cards and the new-and-expensive 40 series cards.
I would hope 30 series support can be reconsidered.

reiner · August 9, 2023, 9:59pm

Is this a typo? 40 Series and Ada is the same… Why should GEN30 not be supported?

gregory.maddra · August 10, 2023, 3:02pm

No, we support both the regular 40 series cards and several of the workstation cards with the same compute level, such as the RTX 6000 Ada.

We’ve mainly been converting the Linux TensorRT models as we’ve hit a need for them internally. We don’t currently have any 30 series or compatible cards involved with the Linux project. We’re unfortunately not able to reuse the Windows models here, so for the time being we have no Linux models for 30 series.

phil.stopford · August 12, 2023, 1:12am

There’s no way for external folks to contribute to 30 series support on Linux, I guess?

gregory.maddra · August 15, 2023, 6:30pm

That’s unfortunately a correct guess. Converting the models requires access to data that we’re unable to provide to end users.

Expanding the supported GPUs is planned, but I don’t have a definite timeline for when that may include the 30 series.

reiner · August 17, 2023, 12:48pm

Thanx for getting back.

The confusion arose, because you differentiate between “GEN 40” and “Ada Lovelace”… Which actually is the same. It would be like saying: “we provide AVC ENcoding and also H264”…

It is not clear to me why Ampere is not supported, if Turing is supported. I have not encountered one usecase of any TensorRT scenario where Turing support was present, but Ampere was excluded. could you clarify on this particular scenario?

PLease be very technical in your explanation

gregory.maddra · August 17, 2023, 3:08pm

The Linux machines we’re using for developing Video AI do not have Ampere graphics cards; the models must be converted for each compute level we wish to target, and it has been our experience that the conversions are not reusable across devices of different compute levels.

We do intend to do the conversion at some point in the future, however it’s not the highest priority task for the Linux version at the moment.

gregory.maddra · September 6, 2023, 4:27pm

Hello everyone,

We have upload models for the 30-series cards on Linux. Please let me know if you encounter any problems with them!

phil.stopford · September 8, 2023, 3:34pm

I’m curious about the performance expectations here. With Gaia selected, and this input file, I’m seeing 15 fps on my 3070 RTX:

For reference, using Windows, with the equivalent settings, I’m seeing nearer 27 fps, for all Legion 5 Pro power options (quiet, normal, all-the-watts)

phil.stopford · September 8, 2023, 6:09pm

nyx fails with a model download error:

logsForSupport.tar.gz (2.1 MB)

phil.stopford · September 8, 2023, 6:14pm

Iris also seems to fail with an AI error.
logsForSupport.tar.gz (9.0 KB)

anoop-7619 · September 13, 2023, 5:42am

Quick question, new to toppaz, but familiar with ubuntu / cuda etc …

I’m in a headless server, and inside docker, everything’s installed and working, but how can I login for it to download the models? It’s looking for an auth.tpz file and if i launch ./login it wants to launch a window ?

Can I perform my first login headless?

phil.stopford · September 14, 2023, 10:35pm

@gregory.maddra : still curious about the performance difference here. I wondered if you had some insight

gregory.maddra · September 15, 2023, 2:36pm

I don’t have much insight on why the TensorRT models would be performing differently. If it’s running a non-TensorRT model, we do use a different method of executing ONNX models (CUDA vs DirectML).

For your Nyx and Iris issues (sorry for the delay, Discourse didn’t notify me there were updates in the thread), could you try updating to the 3.4.4 beta if you haven’t already? I believe that Nyx had an issue with some GPUs in the 3.4.3 beta.

gregory.maddra · September 15, 2023, 2:38pm

We don’t currently support headless login. If you’re unable to get a browser working for logins, you can reach out to our support team to perform an offline login. Press the chat button in the bottom right, here.