Skip to main content

Nvidia Launches Rubin CPX AI Chip for Generative Video

Unified media and AI inference for long-context creation without the seams

Nvidia has unveiled Rubin CPX, a purpose-built AI GPU for massive-context inference across video, audio, and software, aimed squarely at generative content that runs for minutes to hours without losing the plot. Announced September 9, the chip integrates high-throughput media engines and transformer inference in one device to keep timelines “in memory” from ingest to render, with no manual chunking and no mid-scene context drop. The company framed CPX as optimized for million-token and hour-scale workflows in its launch materials and press briefings (press release).

Bottom line: Rubin CPX is built to hold the whole story, including video, audio, and code, so generative models can maintain tone, pacing, and continuity across long-form projects.

What’s actually new in CPX

Nvidia’s positioning is clear: instead of just more raw TOPS, Rubin CPX is tuned for the mashup that modern creative pipelines demand, pairing fast media decode and encode with heavy transformer inference, and moving data across that boundary at silicon speed.

Rubin CPX at a glance Details (as announced)
AI compute Up to 30 PFLOPs (NVFP4) for high-throughput inference
Memory 128 GB GDDR7 per device for long-context sessions
Media engines Dedicated concurrent video decode and encode blocks for ingest and render pipelines
Software stack CUDA, TensorRT, and new long-sequence schedulers
Target workloads Generative video, speech-to-speech, dubbing and translation, and million-token code inference

Why this matters: hour-long context windows let models track narrative, style, and continuity across an entire edit, reducing the seams that appear when projects are chunked and reassembled.

Disaggregated inference: CPX’s role in Nvidia’s split-brain architecture

Rubin CPX is the inference half of a two-part, disaggregated design that splits duties between compute-optimized and bandwidth-optimized silicon. The goal is to scale FLOPs and memory independently so operators are not overbuying one to get enough of the other (Tom’s Hardware).

Nvidia’s flagship rack for this approach is the Vera Rubin NVL144 CPX platform, a liquid-cooled, rack-scale system that pairs Rubin GPUs with Rubin CPX and orchestrates them with Vera CPUs. It is pitched as a turnkey engine for long-context inference and media-heavy generative workloads at data center scale (DataCenterDynamics).

Vera Rubin NVL144 CPX (rack-scale) Platform snapshot
Accelerators Rubin GPUs (compute) + Rubin CPX GPUs (inference and media)
Aggregate performance Up to 8 exaFLOPs (NVFP4) per rack
Fast memory About 100 TB in-rack
Fabric bandwidth About 1.7 PB/s intra-rack fabric
Cooling Liquid-cooled design

The takeaway for platforms building creative AI at scale: CPX slots into a rack design meant to run whole-series timelines and entire application codebases in context, feeding models rich, uninterrupted sequences across text, audio, and video.

How Nvidia is framing the opportunity

Nvidia’s messaging leans hard into generative AI economics. At launch, the company argued that a $100 million Rubin-class deployment could underpin as much as $5 billion in token-metered revenue over its lifecycle, an aggressive projection that reflects a broader thesis: cheaper, longer, more coherent inference unlocks new products and usage at scale (Reuters).

As context windows stretch from clips to full timelines, the unit economics of creative AI bend toward longer, more valuable sessions, including feature-length edits, multi-episode arcs, or full-repo copilots.

Where Rubin CPX fits in the creative stack

This launch aims directly at the friction points creators and platforms have been wrestling with over the past year: coherence across long-form video, real-time dubbing that keeps performance intact, translation that preserves pacing and intent, and software copilots that understand entire projects rather than isolated files. Nvidia positions CPX as the silicon that lets those workflows run continuously instead of in stitched fragments.

Key points from the announcement:

  • Context continuity: maintaining tone, style, and timing across hour-scale sequences.
  • Throughput for media-heavy inference: dedicated codecs on-die to keep frames flowing while the transformer reasons.
  • Scheduling for long sequences: new runtime features in CUDA and TensorRT to keep latency predictable as context grows.

None of this is about training frontier models. CPX is squarely an inference story. In a market that has been focused on max training FLOPs, Nvidia is carving out a lane for the creative runtime where latency, memory bandwidth, and IO choreography matter as much as peak math.

Competitive and industry context

Rubin CPX arrives in a cycle where long-context AI is quickly becoming table stakes for creative and coding tools. Hardware is adapting accordingly: disaggregated designs, memory-centric scaling, and media engines as first-class citizens. With CPX, Nvidia extends its position in the inference stack by making video and audio throughput integral rather than bolted on.

The open question is software: how quickly creative tools and cloud platforms expose the full CPX feature set, including massive context sessions, long-sequence schedulers, and inline video pipelines, will determine how fast these capabilities show up in everyday editing and localization workflows.

Availability, rollout, and what to watch

Rubin CPX and the Vera Rubin NVL144 CPX platform are slated for early access with select partners before broader availability near the end of 2026, according to Nvidia’s announcement materials (press release). Pricing was not disclosed. Expect the rollout to follow Nvidia’s typical pattern: hyperscalers and major platforms first, with ISVs tapping CUDA and TensorRT updates as they land.

Timeline Notes
Now Product announced, software enablement underway in CUDA and TensorRT
2025 to 2026 Partner integrations and rack-scale pilots for Vera Rubin NVL144 CPX
Late 2026 Expected wider availability for data center deployments

The signal in the noise

Rubin CPX is not a spec-sheet flex just for the sake of it. The point is coherence at length. For the creator economy, that is the difference between AI that can produce a convincing clip and AI that can carry a narrative. By pulling media IO and long-sequence inference onto a single, workflow-native device, then scaling it with a rack that treats memory like a first-class resource, Nvidia is betting that the next wave of generative video and audio will be created in uninterrupted, hour-scale sessions.

If Nvidia’s projections around cost and throughput hold up in production, long-form generative content moves from demo to default. That is the story Rubin CPX is trying to write. Whether the industry reads along will depend on software support, access, and how quickly platforms ship features that make those long contexts feel invisible to the user, seamless in every sense of the word.