Is Topaz Labs considering Nvidia's PID?

NVIDIA PiD (Pixel Diffusion Decoder) is an open-source generative module that replaces traditional VAE decoders in latent diffusion models. By unifying decoding and 4\(\times \) or 8\(\times \) upsampling into one 4-step generative process, it bypasses blurry reconstructions to generate sharp, high-resolution details natively from image latents.

Key Advantages

  • Drop-in Replacement: PiD slots directly into existing pipelines (like FLUX, SD3, and DINOv2) without requiring you to retrain your base models.

  • Speed: Thanks to DMD2 distillation, it decodes a 512\(\times \)512 latent to 2048\(\times \)2048 pixels in just 4 sampling steps.

  • Hardware Efficiency: On a consumer RTX 5090, it generates a 2K image in under a second (roughly 6\(\times \) faster than traditional cascades)

I wonder if Topaz Labs engineers could make a video upscaling model that is similar to StarLight Precise, but much, much faster, using Nvidia’s latest technique. The paper was published only a couple weeks ago: arXiv—PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Prompt: Please determine the feasibility of TopazLabs (topazlabs.com) engineers incorporating the technique recently published by Nvidia—https://arxiv.org/abs/2605.23902---in a new model that replaces, or supplements, their existing StarLight models. Nvidia’s technique is “PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion.” ( PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion ). Would Nvidia’s PiD technique make their video upscaling more performant, and at the same or higher level of quality, compared to their StarLight models?

ChatGPT 5.5:

Bottom line: PiD looks highly feasible as a supplement to Topaz’s StarLight family, especially as a faster latent-to-4K decoder / final upsampling stage. It is not, by itself, a drop-in replacement for StarLight, because StarLight’s core value is degraded-video restoration with temporal consistency, while PiD is primarily a high-resolution image latent decoder. A “StarLight + PiD-like decoder” architecture is plausible; “PiD replaces StarLight” is much less plausible without major Topaz-specific retraining.

PiD’s core idea is to replace the usual “latent → VAE decode → separate super-resolution” cascade with one conditional pixel-diffusion decoder. NVIDIA reports that PiD can decode 512×512 latents to 2048×2048 pixels in under one second on an RTX 5090 using 13 GB peak memory, and that it is about 6× faster than cascaded diffusion super-resolution pipelines while producing better visual fidelity in their tests. It also supports partially denoised latents, so the upstream latent diffusion model can potentially stop early. (NVIDIA)

For Topaz, the important distinction is that StarLight is a video restoration model, not merely an image upscaler. Topaz says Project Starlight uses diffusion for video enhancement, upscaling, denoising, de-aliasing, and sharpening, and emphasizes temporal consistency by analyzing hundreds of surrounding frames. Topaz also says diffusion is the reason for the quality boost, but that these models are larger and slower than older GAN-style models. (topazlabs.com)

So PiD could help most in these places:

Possible use inside Topaz Feasibility Likely benefit
Replace final VAE decoder / final 2K–4K decoder in a latent StarLight pipeline High, if StarLight uses compatible latent representations Faster high-res output, potentially lower VRAM
Replace StarLight Sharp’s second-pass 4K upscaler High to moderate Cleaner single-stage 4K output; fewer pipeline artifacts
Decode partially denoised StarLight latents to reduce diffusion steps Moderate, but requires retraining Potentially large speedup if diffusion-step count is the bottleneck
Use PiD per-frame as a standalone upscaler Moderate technically, weak product fit May upscale frames, but risks flicker and degraded restoration quality
Replace the whole StarLight model Low PiD does not solve full video restoration or temporal consistency by itself

Topaz’s own docs show exactly why a PiD-like module is attractive. StarLight Mini requires strong GPUs and can take hours; StarLight Fast 2 is reported by Topaz as 3–5× faster than Starlight Mini but still a diffusion model with long local render times; StarLight Sharp and Precise also remain resource-intensive. (docs.topazlabs.com) Topaz’s system requirements also show that the StarLight family is unusually heavy compared with the rest of Topaz Video, with VRAM requirements ranging roughly from 10 GB to 16+ GB depending on variant, and higher recommendations for best quality. (docs.topazlabs.com)

The most promising engineering path would be a temporal PiD decoder, not plain PiD. Instead of conditioning only on a single image latent, Topaz would condition the decoder on the target frame latent plus nearby-frame latents, optical-flow/feature alignment, or whatever temporal representation StarLight already uses. That would preserve StarLight’s main advantage—stable details over motion—while letting the final decode/upscale stage run much faster.

A second promising path is to use PiD-like decoding for StarLight’s 4K stage. Topaz notes that StarLight Sharp can export 4K using a special second pass with a Nyx variant under the hood. (docs.topazlabs.com) That sounds like the kind of cascaded “enhance, then upscale again” path PiD is specifically designed to avoid. A trained StarLight-specific PiD decoder could plausibly generate the final 4K frame directly from the model’s latent output, reducing an entire second stage.

The big caveat is quality. NVIDIA’s published PiD results are strong, but they are not direct evidence that PiD beats StarLight on old VHS, 8/16 mm film, deinterlaced MiniDV, low-bitrate archival footage, faces, text, or camera-motion-heavy shots. PiD’s own public code currently offers image-to-latent and latent-diffusion decoding entry points, and the project lists backbones such as Flux, SD3, SDXL, Z-Image, Qwen-Image, DINOv2, and SigLIP—not a Topaz video-restoration backbone. (GitHub) Topaz would almost certainly need to train its own PiD-style decoder on its own paired video-restoration data.

My best estimate:

Performance: likely improved, possibly substantially, if PiD replaces a decoder/upscale bottleneck or enables fewer diffusion steps. A 2×–6× gain for the decode/upscale portion is plausible from NVIDIA’s published numbers, but end-to-end StarLight speedup could be much smaller if most time is spent in the temporally conditioned diffusion core rather than final decoding.

Quality: potentially equal or higher for still-frame detail and 4K texture synthesis, but not guaranteed for video. Without temporal conditioning, a PiD-style decoder could increase shimmer, face inconsistency, texture crawling, or hallucinated detail. With temporal conditioning and Topaz-specific training, it could plausibly match or exceed existing StarLight variants.

Verdict: Topaz engineers should probably treat PiD as an architectural idea worth prototyping, especially as a StarLight decoder/4K-output replacement, not as an off-the-shelf model replacement. The strongest product would likely be something like “StarLight Fast/HQ with PiD-style temporal latent decoding”: StarLight for restoration and temporal reasoning, PiD-style decoding for faster high-resolution synthesis.

@partha.acharjee @Ange.topazlabs

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

3 Likes