Model Overview Card for Vicuna FastChat-T5
Basic Information
Model Name: FastChat-T5 (3B)
Developer/Creator: LM-SYS (primarily Dacheng Li, Lianmin Zheng, and Hao Zhang)
Release Date: April 2023
Version: Current Version
Model Type: Text (Chatbot)
Description
Overview:
FastChat-T5 is an open-source chatbot model that enhances the Flan-t5-xl model (3B parameters) through fine-tuning on conversations collected from ShareGPT. It uses an encoder-decoder transformer architecture to generate responses to user inputs.
Key Features:
- Encoder-decoder transformer architecture
- Fine-tuned on 70,000 ShareGPT conversations
- Autoregressive response generation
- Optimized learning rate and warmup ratio for fine-tuning
- Licensed under Apache License 2.0
Intended Use:
FastChat-T5 is designed for commercial chatbot applications and research in natural language processing. It can be used for generating responses in conversational agents and other NLP tasks.
Language Support:
Supports English. Other languages may be supported but with reduced accuracy due to the training data being primarily in English.
Technical Details
Architecture:
FastChat-T5 is based on an encoder-decoder transformer architecture. The encoder processes the input text bidirectionally, creating hidden representations. The decoder then uses cross-attention to focus on these representations while generating the response autoregressively from a starting token.
Training Data:
- Source: Conversations collected from ShareGPT.com
- Size: 70,000 conversations
- Diversity: Includes various types of conversational data, reflecting diverse scenarios and user interactions. However, it may contain biases inherent in the ShareGPT dataset.
Data Source and Size:
- Nature: User-shared conversations, processed as question-answer pairs
- Volume: 70,000 conversation pairs
- Knowledge Cutoff: April 2023
Diversity and Bias:
The training data reflects the conversations shared by users on ShareGPT, which may introduce certain biases. The diversity is limited to the topics and styles of interactions present in these conversations.
Performance Metrics
Accuracy:
- Task Classification: FastChat-T5 generally outperforms the Dolly-V2-12B model in several tasks, despite having fewer parameters. For example, it scores higher in generic types, role play, common sense tasks, and counterfactual tasks.
Speed:
- Optimized for inference on GPU-enabled systems.
- Uses a cosine learning rate schedule and a warmup ratio of 0.03 during fine-tuning.
Robustness:
- Handles diverse inputs well but shows limitations in programming and mathematical tasks, scoring lower in these areas compared to other models.
Usage
Code Samples
Ethical Considerations:
FastChat-T5 may inherit biases from the ShareGPT dataset. Users should be cautious of potential ethical issues, including biased or harmful outputs, and use the model responsibly.
License Type
FastChat-T5 is licensed under the Apache License 2.0, which allows for commercial and non-commercial use.