mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 21:30:07 +06:00
1.5 KiB
1.5 KiB
OLMo2
Overview
The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.
The architectural changes from the original OLMo model to this model are:
- RMSNorm is used instead of standard layer norm.
- Norm is applied to attention queries and keys.
- Norm is applied after attention/feedforward layers rather than before.
This model was contributed by shanearora. The original code can be found here.
Olmo2Config
autodoc Olmo2Config
Olmo2Model
autodoc Olmo2Model - forward
Olmo2ForCausalLM
autodoc Olmo2ForCausalLM - forward