Model Overview Card for Llama-3 70B Gradient Instruct 1048k
Basic Information
- Model Name: Llama-3 70B Gradient Instruct 1048k
- Developer/Creator: Gradient
- Release Date: May 16, 2024
- Version: 1.0
- Model Type: Text-based LLM
Description
Overview
The Llama-3 70B Gradient Instruct 1048k model is a state-of-the-art text-based large language model from Gradient AI designed to handle extensive context lengths, extending from the traditional 8k tokens to over 1,048k tokens. This capability allows the model to perform complex reasoning and generate coherent outputs over significantly larger inputs, making it suitable for applications requiring deep understanding and context retention.
Key Features
- Extended context length from 8k to > 1040k tokens
- Instruction-tuned for improved dialogue and chat abilities
- Minimal training data required (< 0.01% of Llama-3's original pre-training data)
- Progressive training on increasing context lengths for optimal performance
Intended Use
This model is designed for various applications, including but not limited to:
- Document summarization
- Question answering systems
- Long-form content generation
- Autonomous agents for business operations
Technical Details
Architecture
The Llama-3 70B Gradient Instruct 1048k model is based on the Transformer architecture, which is known for its efficiency in handling sequential data and long-range dependencies.
Training Data
The model was trained on a total of approximately 430 million tokens, with 34 million tokens specifically for the final training stage. The data sources include augmented datasets from SlimPajama and UltraChat, ensuring a diverse range of contexts and styles.
Data Source and Size
- Total Training Tokens: ~430M
- Final Stage Tokens: 34M
- Original Pre-training Data: Less than 0.003% of Llama-3's original dataset.
Performance Metrics
- Context Length Evaluation: Capable of processing contexts up to 1,048k tokens.
- Inference Speed: Optimized for real-time applications with high throughput.
Benchmarks
The Llama-3 70B Gradient Instruct 1048k demonstrates impressive performance on common industry benchmarks, outperforming many available open-source chat models. It also showcases the potential for SOTA LLMs to learn to operate on long contexts with minimal training by appropriately adjusting RoPE theta.
Usage
Code Samples
The model is available on the AI/ML API platform as "gradientai/Llama-3-70B-Instruct-Gradient-1048k".
const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.aimlapi.com/v1,
apiKey: '<YOUR_API_KEY>',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'gradientai/Llama-3-70B-Instruct-Gradient-1048k',
messages: [
{
role: 'system',
content: 'You are SQL code assistant.',
},
{
role: 'user',
content: 'Could you please provide me with an example of a database structure that I could use for a project in MySQL?'
}
],
});
const message = result.choices[0].message.content;
console.log(\`Assistant: \${message}\`);
};
main();
API Documentation
Detailed API Documentation is available on the AI/ML API website, providing comprehensive guidelines for integration.
Ethical Guidelines
The development of the Llama-3 70B Gradient Instruct 1048k model adheres to ethical AI principles, focusing on transparency, fairness, and accountability in its applications.
Licensing
The Llama-3 70B Gradient Instruct 1048k is licensed under the Llama3 license, which allows for both commercial and non-commercial use.