Delivers optimized reasoning performance over text, image, audio, video, and PDF
Gemini 2.5 Flash‑Lite is DeepMind’s most cost-efficient model in the 2.5 lineup, optimized for high-throughput reasoning across text, image, audio, video, and PDF inputs. It supports a 1,048,576-token context window and generates up to 65,536 output tokens per call—ideal for enterprise workflows requiring full-document analysis, long-horizon reasoning, or multi-turn state retention. With sub‑250ms latency to first token and ~480 tokens/sec generation speed, Flash‑Lite combines scale, speed, and affordability for production-grade deployment.

Reliable AI for Multimodal Automation

Process and analyze large documents, codebases, and PDFs with structured output.

Automatically process and interpret combined text, audio, image, or video inputs.

Power bots and agents with real-time capabilities and web grounding across diverse formats.
How Flash‑Lite stacks up against the most relevant lightweight AI models in the market
Flash‑Lite delivers lower latency (0.22s first-token), faster output (480 t/s), and better pricing—ideal for enterprise integrations and high load.
Learn more about Gemini 2.5 Flash.


Claude 4 Sonnet provides stronger general reasoning and alignment in dialogue, but Gemini 2.5 Flash‑Lite is faster and better suited for high-throughput document workflows thanks to its massive context window and real-time latency.
Learn more about Claude 4 Sonnet.
Command R+ supports longer generations and better JSON formatting, but Flash‑Lite wins on multimodal input, inference speed, and response latency.
Learn more about Command R+.


AI/ML API provides scalability, faster deployment, and access to 200+ advanced machine learning models without the need for extensive in-house expertise or infrastructure.
Our API allows seamless integration of powerful AI capabilities into your applications, regardless of your coding experience. Simply swap your API key to begin using the AI/ML API.
AI/ML API provides flexibility for business growth since you can scale resources by purchasing more tokens as needed, ensuring optimal performance and cost efficiency
We offer flat, predictable pricing, payable by card or cryptocurrency, keeping it the lowest on the market and affordable for everyone.

Visit AI Playground to quickly try Imagen.
For more information about technical features, please refer to Gemini 2.5 Flash Lite documentation.
import requests
response = requests.post(
"https://api.aimlapi.com/v1/chat/completions",
headers={"Authorization":"Bearer <YOUR_AIMLAPI_KEY>","Content-Type":"application/json"},
data=json.dumps({
"model": "google/gemini-2.5-flash-lite-preview",
"messages": [
{
"role": "user",
"content": "text",
"name": "text"
}
],
"max_completion_tokens": 512,
"max_tokens": 512,
"stream": False,
"stream_options": {
"include_usage": True
},
"n": 1,
"temperature": 1,
"top_p": 1,
"stop": "text",
"frequency_penalty": 1,
"prediction": {
"type": "content",
"content": "text"
},
"presence_penalty": 1,
"seed": 1,
"response_format": {
"type": "text"
},
"reasoning_effort": "low"
})
)
data = response.json()