Shelly Palmer

Midjourney Set to Release its First Video Model

Midjourney has introduced its first video generation model. Called “V1,” it creates 10-second, 24 fps clips from text, image, or mixed prompts. Early tests show support for dynamic motion, basic scene transitions, and a broad range of camera moves. Aspect ratios include 16:9, 1:1, and 9:16. It runs on a mix of image and video training data.

This is not a photorealistic model. Founder David Holz says the goal is aesthetic control, not realism. Think art direction over live action.

The alpha is private. There’s no timeline for general access or pricing. Holz says they’re prioritizing safety and alignment before scaling.

Midjourney joins OpenAI, Google, and Runway in the text-to-video sprint. Each is approaching the medium with different training data, guardrails, and use cases. So far, only Google’s Veo 3 is ready for primetime (assuming you can tell your story by grouping scenes of 8 seconds or less)… but the race has really just begun.

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.