Model Overview Card for DiscoLM Mixtral 8x7b
Basic Information
- Model Name: DiscoLM Mixtral 8x7b
- Developer/Creator: DiscoResearch, led by Björn Plüster
- Release Date: December 11, 2023
- Version: V2
- Model Type: Text Generation
Description
Overview
DiscoLM Mixtral 8x7b is a state-of-the-art language model designed for advanced text generation tasks. It leverages a sparse mixture of experts (MoE) architecture to optimize performance and efficiency, making it suitable for a wide range of natural language processing (NLP) applications.
Key Features
- Sparse Mixture of Experts (MoE) Architecture: Utilizes 8 groups of experts, totaling 46.7 billion parameters, but only 12.9 billion parameters per token for efficiency.
- High Performance: Achieves top-tier benchmarks on various NLP tasks.
- Multi-Language Support: Proficient in English, French, Spanish, Italian, and German.
- Extended Context Length: Supports a context length of up to 32,768 tokens.
Intended Use
DiscoLM Mixtral 8x7b is designed for:
- Text generation and completion
- Conversational AI
- Content creation
- Language translation
- Advanced NLP research
Language Support
The model supports multiple languages, including:
- English
- French
- Spanish
- Italian
- German
Technical Details
Architecture
DiscoLM Mixtral 8x7b employs a sparse mixture of experts (MoE) architecture. This design allows the model to use only a subset of its total parameters for each token, balancing computational efficiency with high performance. The architecture is based on the Mixtral framework, optimized for causal language modeling.
Training Data
The model was fine-tuned on a diverse set of datasets, including:
- Synthia: A synthetic dataset designed for general NLP tasks.
- MethaMathQA: A dataset focused on mathematical problem-solving.
- Capybara: A comprehensive dataset for conversational AI.
Data Source and Size
The training data encompasses a wide range of sources to ensure robustness and diversity. The exact size of the training data is not specified, but it includes substantial amounts of text from various domains to enhance the model's generalization capabilities.
Knowledge Cutoff
The model's knowledge is up-to-date as of December 2023.
Diversity and Bias
Efforts were made to include diverse datasets to minimize biases. However, as with any large language model, some biases may still be present due to the nature of the training data.
Performance Metrics
Key Performance Metrics
- ARC (25-shot): 67.32
- HellaSwag (10-shot): 86.25
- MMLU (5-shot): 70.72
- TruthfulQA (0-shot): 54.17
- Winogrande (5-shot): 80.72
- GSM8k (5-shot): 25.09
Comparison to Other Models
DiscoLM Mixtral 8x7b outperforms many contemporary models, including LLama 2 70B from Meta, in several benchmarks.
Speed
The model is optimized for efficient inference, leveraging its MoE architecture to reduce computational overhead.
Robustness
DiscoLM Mixtral 8x7b demonstrates strong generalization across diverse inputs and maintains high performance across different topics and languages.
Usage
Code Samples
const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.aimlapi.com/v1',
apiKey: '<YOUR_API_KEY>',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'DiscoResearch/DiscoLM-mixtral-8x7b-v2',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
Ethical Guidelines
The model should be used responsibly, considering potential biases and ethical implications. It is intended for research purposes and should not be used for harmful activities.
Licensing
DiscoLM Mixtral 8x7b is released under the Apache 2.0 license, allowing for both commercial and non-commercial use.