LLaVa-NeXT Multimodal chatbot combining language and vision for diverse AI applications.
Model Name: LLaVA v1.6 - Mistral 7B
Developer/Creator: Haotian Liu
Release Date: December 2023
Version: 1.6
Model Type: Multimodal Language Model (Text and Image)
LLaVA v1.6 - Mistral 7B is an open-source, multimodal chatbot that combines a large language model with a pre-trained vision encoder. It excels in understanding and generating text based on both textual and visual inputs, making it ideal for a wide range of multimodal tasks.
LLaVA v1.6 - Mistral 7B is designed for:
The model demonstrates strong multilingual capabilities, with improved bilingual support compared to earlier versions.
LLaVA v1.6 - Mistral 7B utilizes:
<image>
token in promptsThe model was trained on a diverse dataset including:
Data Source and Size: The training data comprises over 1.3 million diverse samples, including image-text pairs and instruction-following data.
Knowledge Cutoff: December 2023
Diversity and Bias: The model's training data includes a wide range of sources, potentially reducing bias.
LLaVA v1.6 - Mistral 7B demonstrates strong performance across various benchmarks:
Accuracy: LLaVA v1.6 - Mistral 7B shows competitive performance compared to similar models.
For example, it achieves 35.3 on MMMU and 37.7 on MathVista benchmarks.
Speed: Specific inference speed metrics are not provided, but the 7B parameter size suggests efficient computation.
Robustness: The model demonstrates strong performance across multiple benchmarks and tasks, indicating good generalization capabilities.
While specific ethical guidelines are not detailed, users should adhere to responsible AI practices and consider potential biases in model outputs. The model should not be used for generating harmful or misleading content.
LLaVA v1.6 - Mistral 7B follows the licensing terms of the Mistral-7B-Instruct-v0.2 base model. Users should refer to the official licensing terms for specific usage rights and restrictions.