Skip to main content

ShengShu Technology announced a Vidu Q1 model update introducing a multi-reference “Reference-to-Video” workflow that uses up to seven image inputs to guide AI video generation, aimed at improving continuity and visual control across shots. The company detailed the update in a PR Newswire announcement.

What the Multi-Reference Update Introduces

At the core of the update is the ability to upload and designate up to seven distinct reference images per sequence. According to the company, these inputs act as visual anchors for characters, wardrobe, props, and environments – elements that often “drift” in generative video as scenes evolve.

The model update unveils a multi-reference feature supporting up to seven image inputs, the announcement states, positioning the capability as a step toward more consistent, controllable AI video.

Vidu’s stated goal is to keep recurring details stable across shot changes and transitions – an area that has been challenging in AI video, especially for multi-character scenes, signature costumes, and branded settings. The update also emphasizes improved semantic understanding: when prompts call for actions or objects not depicted in the reference set, the model is designed to infer and add those elements while preserving the established look.

Why This Matters for Creative Teams

For creators working in narrative video, branded content, animation, and previsualization, continuity issues translate into time-consuming cleanup and compromised storytelling. The multi-reference update is framed as a bid to reduce that friction. For solo entrepreneurs and small teams, the ability to hold a character’s identity, a product’s design language, or a location’s visual markers steady from clip to clip could help shorten the path from concept to client-ready edits. For marketers, the potential to maintain brand assets and hero products across variants may support more consistent campaigns without heavy postproduction.

Feature Highlights at a Glance

Item Details
Multi-reference limit Up to seven image inputs per sequence
Intended benefit Improved character, prop, and background consistency across shots
Prompt semantics Model can infer and introduce prompt-described elements not present in references
Availability Rolling out in the Vidu Q1 model update
Access point Reference-to-Video workflow within the Vidu platform

Positioning Within Vidu Q1

Vidu Q1 is ShengShu Technology’s current-generation model, emphasizing cinematic visuals and broader multimodal control. The company has previously highlighted a focus on continuity and transitions in its recent communications about Q1, alongside attention to audio fidelity. The multi-reference update fits into that trajectory by expanding the tools creators can use to assert control over key on-screen elements throughout a sequence.

Context: Recognition and Roadmap

ShengShu Technology’s broader momentum has also been in the spotlight. The company was named to the World Economic Forum’s 2025 Technology Pioneers list, with communications around the recognition underscoring efforts to push multimodal generation for visual storytelling and production. That nod situates the Vidu Q1 update in a landscape where global attention is on practical, creator-facing improvements to stability and control in AI media tools. Source: WEF Technology Pioneers announcement.

How the Update Aligns With Vidu’s Platform Direction

Vidu’s platform messaging has consistently focused on speed, visual quality, and accessibility for non-technical creators – attributes designed to make AI video viable for solo creators, indie studios, agencies, and in-house brand teams. The multi-reference feature complements that positioning by addressing a widely cited blocker in AI video pipelines: maintaining identity and scene integrity beyond a single shot.

For readers tracking the product’s evolution, Vidu’s public materials present a stack built to move from single-shot experiments toward multi-shot storytelling. The new multi-reference workflow appears to extend the system’s ability to carry a cast of characters, props, and backgrounds across cuts while still allowing prompt-level direction. Official product information: Vidu by ShengShu Technology.

Ecosystem and API Considerations

Beyond the web platform, ShengShu has promoted an API layer intended for enterprises and developers. While the multi-reference update announced here focuses on platform usage in Q1, the company’s API communications describe integration paths for text-to-video, image-to-video, and reference-driven workflows. If those capabilities continue to align, multi-reference workflows could become relevant well beyond the browser – spanning brand asset managers, ad-tech creative engines, and production toolchains. Background: Vidu API announcement.

Where It May Matter Most

As generative video moves from single shots to sequences, consistency underpins professional viability. Reported areas where multi-reference could be consequential include:

  • Brand storytelling: recurring product visuals and campaign motifs
  • Character-driven narratives: stable appearance across scene changes
  • Animation and previz: repeatable set pieces and backplates
  • Social and performance marketing: variant testing without losing visual identity

For creators, marketers, and early-stage founders, the practical implication is fewer continuity breaks across edits, potentially fewer pick-up shots or design fixes, and a tighter feedback loop between intent and output.

Additional Notes From the Announcement

The company emphasizes that the multi-reference system is meant to balance input flexibility with artistic cohesion. Seven references are positioned as a pragmatic upper bound for describing multi-character scenes, key props, and background context without overcomplicating the model’s interpretation. The stated semantic improvements are meant to keep the door open to new actions or objects added via prompt while the model maintains the visual logic learned from references.

Quick Comparison: Single vs. Multi-Reference Scenarios

Scenario Single Reference Multi-Reference (Up to 7)
Character identity across angles More prone to drift between shots Multiple angles increase identity stability
Prop and wardrobe consistency May shift under lighting/transitions Multiple anchors reinforce consistent details
Environment continuity Backgrounds can vary between cuts Backplates and setting references steady the scene
Introducing new prompt elements Risk of visual mismatch Semantic inference aims to blend additions into the set look

Availability and Access

Per the company, the multi-reference feature is now accessible within the Vidu platform’s Reference-to-Video workflow as part of the Vidu Q1 model update. The announcement frames the update as immediately relevant to creators building longer, more stable sequences while preserving prompt-based direction and image-driven guidance.

Company Perspective

ShengShu’s communications point to a broader ambition: bringing creator-focused controls to multimodal models so that narrative continuity, brand fidelity, and stylistic intent are preserved as scenes evolve. Recognition from industry observers and organizations has centered on these creator-facing outcomes rather than purely technical milestones, underlining a trend toward tools that reflect the needs of visual artists, storytellers, and marketing teams.

Key Takeaways

  • Vidu Q1’s multi-reference update supports up to seven image inputs for more consistent AI video.
  • Improved semantic understanding is intended to add prompt-described elements while respecting the look established by references.
  • The feature is positioned for creators and teams prioritizing continuity across sequences and edits.
  • The update is live in the platform’s Reference-to-Video workflow, with broader platform and API context continuing to evolve.