[mistral] Fix FA2 attention reshape for Mistral Nemo (#32065)

* [mistral] Fix FA2 attention reshape * [run-slow] mistral
2025-07-30 17:52:35 +06:00 · 2024-07-19 11:19:35 +02:00 · 2024-07-19 11:19:35 +02:00 · 22f888b3fa
commit 22f888b3fa
parent cd48553fc8
1 changed files with 1 additions and 1 deletions
--- a/src/transformers/models/mistral/modeling_mistral.py
+++ b/src/transformers/models/mistral/modeling_mistral.py
@ -387,7 +387,7 @@ class MistralFlashAttention2(MistralAttention):
            is_causal=self.is_causal,
        )

-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size).contiguous()
+        attn_output = attn_output.reshape(bsz, q_len, self.num_heads * self.head_dim).contiguous()
        attn_output = self.o_proj(attn_output)

        if not output_attentions: