I see you use some old cublas* modules of CUDA toolkit 11.
Maybe its better to update them to latest 11 version.
Maybe also its better to update to CUDA toolkit 12.
CUDA 12 Features
- CUDA 12 introduces support for the NVIDIA Hopper and Ada Lovelace architectures
Does this mean we can get better performance on our RTX 4090s if they update the CUDA toolkit to v12?