Powerful multimodal AI model for advanced visual and language processing tasks.
Llama 3.2 90B Vision Instruct Turbo is a large-scale multimodal AI model capable of processing both text and images. It represents Meta's first foray into multimodal AI, offering advanced visual reasoning capabilities alongside powerful language processing.
The model is designed for a wide range of applications, including:
The model supports multiple languages, making it suitable for multilingual tasks and applications.
Llama 3.2 90B Vision Instruct Turbo utilizes an optimized transformer architecture. For image processing, it employs separately trained image reasoning adaptor weights that are integrated with the core LLM weights through cross-attention.
The model demonstrates strong performance across various benchmarks:
Llama 3.2 90B Vision Instruct Turbo competes with leading models like Claude 3 Haiku and GPT-4o-mini in image recognition and visual understanding tasks.
The model includes a new Llama guard safety model to ensure responsible and ethical use.
Llama 3.2 90B Vision Instruct Turbo is available under the Llama 3.2 Community License, which allows for fine-tuning and specific applications while maintaining certain restrictions.