# OLMo2
PyTorch FlashAttention SDPA
## Overview The OLMo2 model is the successor of the OLMo model, which was proposed in [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838). The architectural changes from the original OLMo model to this model are: - RMSNorm is used instead of standard layer norm. - Norm is applied to attention queries and keys. - Norm is applied after attention/feedforward layers rather than before. This model was contributed by [shanearora](https://huggingface.co/shanearora). The original code can be found [here](https://github.com/allenai/OLMo/tree/main/olmo). ## Olmo2Config [[autodoc]] Olmo2Config ## Olmo2Model [[autodoc]] Olmo2Model - forward ## Olmo2ForCausalLM [[autodoc]] Olmo2ForCausalLM - forward