

Generate high-quality videos from images using xAI's Grok Imagine Video 1.5 Preview model via AIML API.
What exactly is Grok Imagine Video 1.5 Preview?
Grok Imagine Video 1.5 Preview is xAI's latest image-to-video generation model, designed to transform static images into smooth, high-fidelity video sequences. It builds on the original Grok Imagine Video with improved temporal coherence, motion quality, and prompt adherence — delivering cinematic output from a single image and a text description.
API Pricing
* Video generation: $0.104 / second of generated video
Architecture: what makes it work
Image-conditioned video synthesisThe model uses the input image as a visual anchor for the first frame. Subsequent frames are generated autoregressively while maintaining consistency with the original composition, lighting, and subject identity — producing natural-looking motion rather than a slideshow of unrelated frames.
Temporal coherence modelingGrok Imagine Video 1.5 Preview is trained to preserve object continuity and scene geometry across frames. Motion is physically plausible — subjects move, deform, and interact with the scene in ways consistent with the source image rather than drifting arbitrarily.
Text-guided motion controlA text prompt specifies the type and direction of motion. The model combines visual grounding from the image with semantic direction from the prompt, allowing precise control over what moves, how it moves, and at what intensity.
Core capabilities
Image-to-video generationProvide a source image and a motion description. The model animates the scene — adding camera movement, subject motion, environmental effects, or combinations of all three — while preserving the visual identity of the original.
High visual fidelityOutput maintains the resolution, color grading, and stylistic qualities of the input image. The model does not apply a generic visual filter — generated frames inherit the aesthetic of the source.
Short-to-medium clip outputSuited for product demos, social content, marketing animation, and visual prototyping. Generates consistent clips without external post-processing.
Who should use Grok Imagine Video 1.5 Preview?
Creative and marketing teamsAgencies and in-house teams that need to animate product photography, campaign visuals, or brand assets without a full video production pipeline.
Product and UI designersDesigners prototyping motion concepts, onboarding animations, or interactive mockups from static design exports.
Social media and content creatorsCreators producing short-form video content at scale, where sourcing or shooting original footage is impractical.
Developers building visual AI productsEngineers adding video generation to creative tools, e-commerce platforms, or media applications — with a single API call and no video infrastructure required.
What exactly is Grok Imagine Video 1.5 Preview?
Grok Imagine Video 1.5 Preview is xAI's latest image-to-video generation model, designed to transform static images into smooth, high-fidelity video sequences. It builds on the original Grok Imagine Video with improved temporal coherence, motion quality, and prompt adherence — delivering cinematic output from a single image and a text description.
API Pricing
* Video generation: $0.104 / second of generated video
Architecture: what makes it work
Image-conditioned video synthesisThe model uses the input image as a visual anchor for the first frame. Subsequent frames are generated autoregressively while maintaining consistency with the original composition, lighting, and subject identity — producing natural-looking motion rather than a slideshow of unrelated frames.
Temporal coherence modelingGrok Imagine Video 1.5 Preview is trained to preserve object continuity and scene geometry across frames. Motion is physically plausible — subjects move, deform, and interact with the scene in ways consistent with the source image rather than drifting arbitrarily.
Text-guided motion controlA text prompt specifies the type and direction of motion. The model combines visual grounding from the image with semantic direction from the prompt, allowing precise control over what moves, how it moves, and at what intensity.
Core capabilities
Image-to-video generationProvide a source image and a motion description. The model animates the scene — adding camera movement, subject motion, environmental effects, or combinations of all three — while preserving the visual identity of the original.
High visual fidelityOutput maintains the resolution, color grading, and stylistic qualities of the input image. The model does not apply a generic visual filter — generated frames inherit the aesthetic of the source.
Short-to-medium clip outputSuited for product demos, social content, marketing animation, and visual prototyping. Generates consistent clips without external post-processing.
Who should use Grok Imagine Video 1.5 Preview?
Creative and marketing teamsAgencies and in-house teams that need to animate product photography, campaign visuals, or brand assets without a full video production pipeline.
Product and UI designersDesigners prototyping motion concepts, onboarding animations, or interactive mockups from static design exports.
Social media and content creatorsCreators producing short-form video content at scale, where sourcing or shooting original footage is impractical.
Developers building visual AI productsEngineers adding video generation to creative tools, e-commerce platforms, or media applications — with a single API call and no video infrastructure required.