
* add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>
2.0 KiB
dots.llm1
Overview
The dots.llm1
model was proposed in dots.llm1 technical report by rednote-hilab team.
The abstract from the report is the following:
Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.
Dots1Config
autodoc Dots1Config
Dots1Model
autodoc Dots1Model - forward
Dots1ForCausalLM
autodoc Dots1ForCausalLM - forward