


A multimodal AI system that proficiently integrates both textual and visual data processing for enhanced understanding.
Gemma 3 (12B) is a state-of-the-art multimodal large language model that integrates advanced text and vision processing capabilities. It features an extended 131,000-token context window, enabling deep understanding and generation over extensive inputs. Designed for versatile deployment, it balances high performance with efficiency across a wide range of devices, from mobile phones to high-end workstations.
Google commits to ethical AI development by maintaining transparency about Gemma 3’s capabilities and limitations. The company advocates for responsible use of the model to minimize risks of misuse or unintended harmful outputs.
Gemma 3 is offered under the Gemma Terms of Use, granting a commercially-friendly license that supports both research and commercial applications, while ensuring alignment with ethical standards.