Question: What is required to load the TensorRT models on Linux (ubuntu)?
After finally having been able to get a dockerized version of VAI ffmpeg to work with the fp16 onnx models, I noticed that Topaz labs is offering TensorRT models as well. In fact, VAI tries to load them even before the fp16/32 onnx variants.
Unfortunately they fail to load for me, using either the official nvidia cuda image, as well as their tensorrt image: nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 image and nvcr.io/nvidia/tensorrt:23.06-py3 respectively.
The error I get:
CRITICAL: TRT Issue ERROR1: [stdArchiveReader.cpp::StdArchiveReader::42] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 232, Serialized Engine Version: 237)
CRITICAL: TRT Issue ERROR4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
CRITICAL: Can’t deserialize CUDA engine
CRITICAL: Loading default error reading file: /opt/TopazVideoAIBETA/models/amq-v13-fgnet-fp16-288x288-2x-rt806-8517.tz
… and repeats 18 times for each tensor-rt model variant before finally a fp16 onnx model i tried, successfully loaded and executed.
I have the latest version of tensorrt, as indicated by the nvidia docker image I’m using, which comes with TensorRT 8.6.1.6-1+cuda12.0, which is the latest published by nvidia.
My system details, though I don’t think that matters in this case:
TensorRT tends to be quite specific about version numbering, so you’ll likely need to downgrade if you’re running the newest versions. That said, there shouldn’t be a need to install TensorRT yourself, we do include the minimal set of libraries for running our TRT models. Is it possible that the version in your docker container is being preferred over those we’re shipping?
Edit: And, just to note, we don’t support 30-series cards for using TensorRT on Linux so it’s not guaranteed the model will load even if you do have the correct versions.
How unfortunate I am. Got the 3090 3 weeks before nvidia launched the 4090 replacement.
Good to know about the RTX support matrix. I’ll stick to the fp16 versions then and not waste time in vain trying to get RT to work.
PS. perhaps you’ll consider providing support for the 30-series in the future, seeing as it has a lot of users. More than the 40 series at present.
Indeed, I do find it rather a shame, and fairly surprising, that the 30 series cards aren’t supported for this. I would have thought them to be more prevalent in use compared to the more anemic 20 series cards and the new-and-expensive 40 series cards.
I would hope 30 series support can be reconsidered.
No, we support both the regular 40 series cards and several of the workstation cards with the same compute level, such as the RTX 6000 Ada.
We’ve mainly been converting the Linux TensorRT models as we’ve hit a need for them internally. We don’t currently have any 30 series or compatible cards involved with the Linux project. We’re unfortunately not able to reuse the Windows models here, so for the time being we have no Linux models for 30 series.
The confusion arose, because you differentiate between “GEN 40” and “Ada Lovelace”… Which actually is the same. It would be like saying: “we provide AVC ENcoding and also H264”…
It is not clear to me why Ampere is not supported, if Turing is supported. I have not encountered one usecase of any TensorRT scenario where Turing support was present, but Ampere was excluded. could you clarify on this particular scenario?
The Linux machines we’re using for developing Video AI do not have Ampere graphics cards; the models must be converted for each compute level we wish to target, and it has been our experience that the conversions are not reusable across devices of different compute levels.
We do intend to do the conversion at some point in the future, however it’s not the highest priority task for the Linux version at the moment.
For reference, using Windows, with the equivalent settings, I’m seeing nearer 27 fps, for all Legion 5 Pro power options (quiet, normal, all-the-watts)
Quick question, new to toppaz, but familiar with ubuntu / cuda etc …
I’m in a headless server, and inside docker, everything’s installed and working, but how can I login for it to download the models? It’s looking for an auth.tpz file and if i launch ./login it wants to launch a window ?
I don’t have much insight on why the TensorRT models would be performing differently. If it’s running a non-TensorRT model, we do use a different method of executing ONNX models (CUDA vs DirectML).
For your Nyx and Iris issues (sorry for the delay, Discourse didn’t notify me there were updates in the thread), could you try updating to the 3.4.4 beta if you haven’t already? I believe that Nyx had an issue with some GPUs in the 3.4.3 beta.
We don’t currently support headless login. If you’re unable to get a browser working for logins, you can reach out to our support team to perform an offline login. Press the chat button in the bottom right, here.