OpenCoder is a comprehensive open-source code language model built on a transparent data processing pipeline and reproducible datasets, aimed at advancing code intelligence studies. It employs a sophisticated data preparation process, including deduplication and filtering, resulting in a high-quality dataset for pretraining.
Sort: