Well, as @jo.vo hinted at and dakota confirmed, that speedup wasn’t anything to do with the speed of the model itself; They just split the uploaded clips and parcel them out to their various servers, then concatenate the result at the end.
This is the SOP for any cloud based sequence (video or other) processing, since it costs the provider basically nothing but yields massive value for the customers. That value for customer is saved time, the scarcest of resource in the world.
I’m certain you already know this, but most people tend to forget the basic economics of computation.
$ = <number of calculations> x <cost per calculation>, or for sequences like video breaking the second term into two <element sequence length> x <# calculations needed per element> x <cost per calculation> fits better with the notion of <pixel frame dimension> x <calculations per pixel> x <frames per video clip>, but it’s still just the same <amount of work> x <cost of work> expressed differently.
For most of us that typically only do one processing at a time, on one saturated machine, with the hardware already paid for previously, we don’t tend to think of this cost equation and instead just focus on “how long the job takes”. But that formula still holds true even for local use. We can calculate that $ amount by just amortizing our hardware purchase cost + electricity across the time frame we expect to have that hardware around.
Now in the “cloud”, where you have practically infinite machines and rent them rather than buy them, that formula above is all that matters, since you only pay for the machines when you’re actually using them. No ammortization needed (for typical SMB operations), as opposed to local hardware which we pay for even when we’re not using it.
As such since the cost is a function of number of calculations and cost per calculation, it doesn’t matter how fast or slow the processing is. The cost remains the same.
So if you have to pay $X for a given processing job, why not choose an option that does it fast rather than slow given the cost is the same?
E.g. split the source clip into 100 parts, send them out to 100 machines, have each process its little chunk for 10s and then stich the result back at the end. 10 seconds of processing time instead of 1000 seconds using only one machine; at the same price.
So it seems to me the TL people made a rational business decision here; “Why Not make cloud processing a fast experience, given it costs us basically nothing?”
* Now of course, I’m simplifying a tad bit. There’s usually ~20% additional cost to segmenting work, since you have to do a bit of extra/redundant/duplicated processing so you can blend back the chunks without visible artifacts in the stitched clip, but that’s a rounding error overall.
Well, I think you could have picked a better example, like introducing carousels, moving fields, menus and buttons around in the GUI etc, but I get your point.
Scene detection was actually a very relevant feature for me, since it was a problem I’d solved in my decade old prior pipeline using avisynth filters to preserve hard cuts. I found it always a shame that TVAI didn’t have this 101 rudimentary feature that any video processing requires. Without it someone has to go in and manually replace those ugly blended hard-cuts, which are now warped transitions, into frame duplications. Time consuming and error prone work. Or they’d have to write their custom tool to detect scene cuts and automate that exact same correction themselves. Not many users gonna do that, leading to bad output results unless the person that requested the framerate upscale was a professional who only processed one scene at a time. Again, I doubt even you do that with your DVD-rips 