Vicuna FastChat T5 (3B)

Vicuna FastChat T5 (3B) open-source chatbot model by LM-SYS, fine-tuned for diverse conversations.

Vicuna FastChat-T5 Description

Basic Information

Model Name: FastChat-T5 (3B)
Developer/Creator: LM-SYS (primarily Dacheng Li, Lianmin Zheng, and Hao Zhang)
Release Date: April 2023
Version: Current Version
Model Type: Text (Chatbot)

Overview:
FastChat-T5 is an open-source chatbot model that enhances the Flan-t5-xl model (3B parameters) through fine-tuning on conversations collected from ShareGPT. It uses an encoder-decoder transformer architecture to generate responses to user inputs.

Key Features:

Encoder-decoder transformer architecture
Fine-tuned on 70,000 ShareGPT conversations
Autoregressive response generation
Optimized learning rate and warmup ratio for fine-tuning
Licensed under Apache License 2.0

Intended Use:

FastChat-T5 is designed for commercial chatbot applications and research in natural language processing. It can be used for generating responses in conversational agents and other NLP tasks.

Language Support:

Supports English. Other languages may be supported but with reduced accuracy due to the training data being primarily in English.

Technical Details

Architecture:

FastChat-T5 is based on an encoder-decoder transformer architecture. The encoder processes the input text bidirectionally, creating hidden representations. The decoder then uses cross-attention to focus on these representations while generating the response autoregressively from a starting token.

Training Data:

Source: Conversations collected from ShareGPT.com
Size: 70,000 conversations
Diversity: Includes various types of conversational data, reflecting diverse scenarios and user interactions. However, it may contain biases inherent in the ShareGPT dataset.

Data Source and Size:

Nature: User-shared conversations, processed as question-answer pairs
Volume: 70,000 conversation pairs
Knowledge Cutoff: April 2023

Diversity and Bias:

The training data reflects the conversations shared by users on ShareGPT, which may introduce certain biases. The diversity is limited to the topics and styles of interactions present in these conversations.

Performance Metrics

Accuracy:

Task Classification: FastChat-T5 generally outperforms the Dolly-V2-12B model in several tasks, despite having fewer parameters. For example, it scores higher in generic types, role play, common sense tasks, and counterfactual tasks.

Speed:

Optimized for inference on GPU-enabled systems.
Uses a cosine learning rate schedule and a warmup ratio of 0.03 during fine-tuning.

Robustness:

Handles diverse inputs well but shows limitations in programming and mathematical tasks, scoring lower in these areas compared to other models.

Usage

Code Samples

Ethical Considerations:

FastChat-T5 may inherit biases from the ShareGPT dataset. Users should be cautious of potential ethical issues, including biased or harmful outputs, and use the model responsibly.

License Type

FastChat-T5 is licensed under the Apache License 2.0, which allows for commercial and non-commercial use.

Example H2

Try it now