mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 13:20:12 +06:00

* add dots1

* address comments

* fix

* add link to dots1 doc

* format

---------

Co-authored-by: taishan <rgtjf1@163.com>

2025-06-25 11:38:25 +02:00

2.0 KiB

Raw Blame History

dots.llm1

Overview

The dots.llm1 model was proposed in dots.llm1 technical report by rednote-hilab team.

The abstract from the report is the following:

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.

Dots1Config

autodoc Dots1Config

Dots1Model

autodoc Dots1Model - forward

Dots1ForCausalLM

autodoc Dots1ForCausalLM - forward

2.0 KiB Raw Blame History