Powerful multimodal AI model for advanced visual and language processing tasks.
Llama 3.2 90B Vision Instruct Turbo is a large-scale multimodal AI model capable of processing both text and images. It represents Meta's first foray into multimodal AI, offering advanced visual reasoning capabilities alongside powerful language processing.
The model is designed for a wide range of applications, including:
The model supports multiple languages, making it suitable for multilingual tasks and applications.
Llama 3.2 90B Vision Instruct Turbo utilizes an optimized transformer architecture. For image processing, it employs separately trained image reasoning adaptor weights that are integrated with the core LLM weights through cross-attention.
The model demonstrates strong performance across various benchmarks:
Llama 3.2 90B Vision Instruct Turbo competes with leading models like Claude 3 Haiku and GPT-4o-mini in image recognition and visual understanding tasks.
The model includes a new Llama guard safety model to ensure responsible and ethical use.
The Llama 3.2 models, including all associated multimodal capabilities, are governed by a specific licensing agreement that restricts commercial use within Europe. According to the Llama 3.2 Acceptable Use Policy, individuals or organizations based in the European Union are not granted rights to utilize these models for commercial purposes. This restriction is crucial for developers and organizations considering the deployment of Llama 3.2 models in their applications within the EU.
For more detailed information on the acceptable use and licensing terms, please refer to the Llama 3.2 Use Policy.