Speed Tests v2.64 vs V3.03 (3060 vs 3080)...Slow v3.03

So I done some speed tests and V3.03 still seems extremely slow.
File: exact 2 min file, same settings for all scenarios
System: i9 9900k, 32gb ram (3060 12gb and 3080 12gb) Swapped and rebooted before each test.
v2.64 on 3060: 6 mins 31 secs
v3.03 on 3060: 7 mins 53 secs
v.2.64 on 3080: 4 mins 50 secs
v3.03 on 3080: 6 mins 34 secs

So what’s going on with v3.xx, as a 3060 on v2.64 is faster than a 3080 on v3.03, even though the 3080 has 2.5 times the amount of cuda cores, which I was lead to believe VEAI uses?

There is one report of better speeds in the main 3.0.3 release thread. They did not say what model or input output resolution so it was not helpful to me.
From my own testing I tried a 720x480 4:3 AR source to 1920x1080 square pixels with black borders added using Artemis High Quality. PNG output.
I set the speed to display in fps.
Using a Ryzen 5900X and an RTX 3080ti it was running at 4.3 fps.

When I manually change the ffmpeg command to veai_up=model=ahq-12:scale=2.25 instead of veai_up=model=ahq-12:scale=0, it runs at about 14 fps and looks better.
I did not double check with 3.0.3, but with 3.0.1 it was the same speed as 2.6.4.

Not all models are effected by changing the scale like that.

(I did not bother to update my video card drivers, so maybe I ought to do that.)

Not sure about those commands but somewhere around a scale of 240% Veai switches from a 2x model to a 4x model. Some 4x models will do alot of weird interpolation depending on how the source looks. Sometimes the 4x model is much cleaner, sometimes it’s weird looking. I mostly use Gaia for non-interlaced stuff. Alot of times I will use 250% scale just to activate the 4x model. Usually once the 4x model comes kicks in the frame times get much longer.