Textembedding-gecko@003 Description
Basic Information
- Model Name: Textembedding-gecko@003
- Developer/Creator: Google
- Release Date: April 2024
- Version: 003
- Model Type: Text Embedding
Overview
Textembedding-gecko@003 is a state-of-the-art text embedding model developed by Google, designed to generate high-quality vector representations of text. This model excels in capturing semantic meanings and relationships between textual inputs, making it suitable for various natural language processing tasks.
Key Features
- High Dimensionality: Offers 768 embedding dimensions.
- Versatility: Competes effectively with larger models while maintaining efficiency.
- Performance: Optimized for both accuracy and speed in generating embeddings.
Intended Use
This model is intended for applications, where understanding the contextual meaning of text is crucial.
- Semantic search
- Text classification
- Clustering
Language Support
Textembedding-gecko@003 is primarily designed for English but can be adapted for other languages depending on the training data used.
Technical Details
Architecture
The model is based on a transformer architecture, which allows it to effectively process and understand complex language patterns and relationships.
Training Data
Textembedding-gecko@003 was trained on a diverse dataset comprising over 8 trillion tokens, including web text, books, and other textual sources. This extensive training enables the model to generalize well across various topics.
Data Source and Size
The training data includes a mix of structured and unstructured text, ensuring a broad understanding of language. The model's performance benefits from this vast and varied dataset.
Knowledge Cutoff
The model has a knowledge cutoff date of April 2024.
Diversity and Bias
Efforts were made to include a diverse range of sources to minimize biases. However, like all models, it may still reflect some biases present in the training data.
Performance Metrics
Textembedding-gecko@003, developed by Google, showcases impressive performance across various natural language processing tasks.
Benchmark Performance
Massive Text Embedding Benchmark (MTEB)
- Average score of 66.31, outperforming larger models with up to 7 billion parameters while maintaining only 1.2 billion parameters.
Task-Specific Performance
- Text Classification: Average score of 81.17.
- Semantic Textual Similarity: Average score of 85.06.
- Summarization: Average score of 32.63.
- Retrieval Tasks: Average score of 55.70.
Zero-Shot Generalization
Textembedding-gecko@003 demonstrates strong zero-shot performance, effectively generalizing to unseen tasks, outperforming several competitive baselines.
Usage
Code Samples
The model is available on the AI/ML API platform as "textembedding-gecko@003".
const { OpenAI } = require('openai');
const main = async () => {
const api = new OpenAI({ apiKey: '<YOUR_API_KEY>', baseURL: 'https://api.aimlapi.com/v1' });
const text = 'Your text string goes here';
const response = await api.embeddings.create({
input: text,
model: 'textembedding-gecko@003',
});
const embedding = response.data[0].embedding;
console.log(embedding);
};
main();
API Documentation
Detailed API Documentation is available on the AI/ML API website, providing comprehensive guidelines for integration.
Ethical Guidelines
The development of Textembedding-gecko@003 adheres to ethical AI principles, focusing on transparency, fairness, and accountability in its use and deployment.
Licensing
Textembedding-gecko@003 is available under a permissive license, allowing both commercial and non-commercial usage rights.