Skip to main content

Veo 3.1 Adds Vertical Video and End-Frame Control

Google Veo 3.1 is now live in AI Studio and the Gemini API, and the headline is simple: native 9:16 vertical video is here. For creators publishing to Shorts, Reels, TikTok, and stories, that one change brings Veo squarely into the most important frame of the modern internet. The update also tightens motion realism and narrative continuity, signaling Google’s most direct challenge yet to OpenAI’s Sora 2. You can review the official model details in the Google Developers Blog announcement.

Google AI Studio Veo 3.1

Why this matters to creators right now

Vertical is no longer an afterthought; it’s where attention lives. The move to native 9:16 output means fewer compromises and fewer fixes in post. Framing is preserved, faces stay in the shot, typography lands where you placed it, and brand assets survive the jump across platforms. For marketers and founders running lean, that translates to faster approvals and fewer re-renders. For visual artists and storytellers, it means less technical wrangling and more focus on pacing, tone, and payoff.

What’s new in focus: format, finish, and feel

Three themes define this release: full-format flexibility, precise scene endings, and more believable motion. The net effect is a tool that doesn’t just create striking clips; it aims to land moments on cue and hold character across shots.

1. End Frames: Define your final shot. Lock in continuity, guide your story’s finish, and control how scenes land.
2. Enhanced Image-to-Video Fidelity: Experience richer realism, improved motion depth, and more consistent detail, frame to frame.
3. Smarter Physics & Expression: Refined human movement and emotion. Every gesture, every reaction now feels more alive and real.

Full-Format flexibility: 9:16 vertical joins square and widescreen

Veo 3.1 adds native vertical (9:16) alongside square (1:1) and widescreen (16:9). That alignment with social-first canvases is more than convenience; it preserves creative intent from prompt to publish. Whether you’re shipping a branded story for a product drop or a character-led vignette, you can now originate in the format audiences will actually see, without cropping compromises.

Sharper endings and continuity you can plan around

End-frame control gives editorial teams a new lever. Shots can conclude on a precise beat: brand mark in frame, character in a specific pose, or an environment resolved to a clean hold. In an era where AI video often dazzles but drifts, a defined end state becomes the glue that holds multi-shot stories together and makes transitions predictable in the timeline.

Motion that reads as intention, not accident

Improved image-to-video fidelity aims to keep seeded details and textures stable as scenes move. That steadiness matters for identity, wardrobe, logos, and products that must look the same from cut to cut. Alongside that, refinements to physics and expression are meant to reduce uncanny gestures, so reactions, eyelines, and micro-movements follow the emotional logic of the scene rather than breaking immersion.

Creator-facing capabilities remain intact

Veo’s feature set remains broad, with a tighter feel in this release. The model continues to support native audio generation, text-to-video, and multimodal creation that blends prompts with reference images or other inputs. The difference now is how consistently those tools honor framing and finish.

Still includes everything you love about Veo – seamless video and audio generation, text-to-video, and multi-modal creation – now sharper, and more intuitive.

Key updates at a glance

Capability What’s new in Veo 3.1
Aspect ratios Native 9:16 vertical joins 1:1 and 16:9 for social-first and cross-platform delivery.
End-frame control Define how shots conclude to improve continuity, timing, and editorial precision.
Image-to-video fidelity Greater detail retention and frame-to-frame stability for identity, props, and environments.
Physics & expression More grounded, lifelike motion and reactions, reducing uncanny moments in performance.
Audio + multimodal Continues to support native audio, text-to-video, and mixed-reference inputs with a more intuitive feel.
Access Available in Google AI Studio and via Gemini API, with examples and configuration details in Google’s docs.

Head-to-head: Sora 2 vs. Veo 3.1

This release is a clear signal: Google and OpenAI are competing directly for the workflows that matter to creators: short-form storytelling, brand-safe identity, and consistent, publishable results. Sora 2 has been pushing toward more realistic physics and synchronized audio; Veo 3.1 answers with vertical-by-default credibility and explicit story endpoints.

For teams evaluating both, the question isn’t which model can generate a stunning 8-second clip, but which one offers repeatable control across a series: the same character across shots, the same product across environments, and the same framing across platforms. Veo’s end-frame handle and stability improvements are designed to reduce editorial entropy in those scenarios.

Signals for marketing and brand teams

  • Vertical as the master format: Campaign concepts can be originated in 9:16 and adapted outward, not the other way around.
  • Identity holds: Better adherence to seeded details makes character-driven and product-led stories more feasible at scale.
  • Predictable pacing: End-frame control helps align AI output with storyboard beats, easing approvals and re-cuts.

Access and documentation

Veo 3.1 is accessible through Google AI Studio and via the Gemini API, with configuration notes, examples, and operational guidance published by Google. The Gemini API video docs include example flows for handling scenes and dialogue-style prompts, clarifying how creative inputs map to the model’s output. For non-technical teams, the Google AI Studio page centralizes the practical controls and lets collaborators align on format and finish before a handoff.

Editorial perspective: what to watch next

Two questions will define the next phase of this rivalry. First, how well do identity and framing hold over longer, multi-shot sequences? Second, how cleanly do these models integrate with existing editorial pipelines and asset libraries? Veo 3.1’s aim is to convert impressive demos into consistent sequences. If that promise holds, vertical storytelling could move from experiments to everyday deliverables, even on tight timelines and lean budgets.

Bottom line for creators

Veo 3.1’s native vertical format, end-frame control, and steadier motion are aimed squarely at the realities of today’s social-first distribution. For visual storytellers, brand builders, and indie founders, the change is less about raw capability and more about control, the difference between a cool clip and a finished piece that lands where you intended.