Replit's powerful code completion model, replit-code-v1-3b, supports 20 languages.
Basic Information
replit-code-v1-3b is a 2.7B parameter Causal Language Model developed by Replit, Inc. The model is focused on code completion and has been trained on a diverse dataset of 20 programming languages, including Markdown, Java, JavaScript, Python, and more, totaling 525B tokens.
The model is intended to be used by anyone as a foundation for application-specific fine-tuning without strict limitations on commercial use.
The model supports 20 different programming languages: Markdown, Java, JavaScript, Python, TypeScript, PHP, SQL, JSX, reStructuredText, Rust, C, CSS, Go, C++, HTML, Vue, Ruby, Jupyter Notebook, R, Shell.
The model utilizes advanced techniques like Flash Attention and AliBi positional embeddings to enable efficient training and inference on long input sequences.
The model's training data went through data cleansing filters. Users are still advised to exercise reasonable caution when using the model in production systems.
The model checkpoint and vocabulary file are licensed under the Creative Commons license (CC BY-SA-4.0). The source code files are licensed under the Apache 2.0 license.