mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
chore(pixtral): emit block attention mask when using flash attention (#38741)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* chore(pixtral): emit block attention mask when using flash attention Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs. * remove unnecessary attention_mask assignment
This commit is contained in:
parent
60d4b35b20
commit
1dcb022e8f
@ -214,7 +214,6 @@ class PixtralAttention(nn.Module):
|
||||
# Since we use packing, if flash_attention_2 is selected we rely on position_ids
|
||||
if self.config._attn_implementation == "flash_attention_2":
|
||||
kwargs["position_ids"] = kwargs["position_ids"].to(hidden_states.device, non_blocking=True)
|
||||
attention_mask = None
|
||||
|
||||
attn_output, attn_weights = attention_interface(
|
||||
self,
|
||||
@ -508,9 +507,13 @@ class PixtralVisionModel(PixtralPreTrainedModel):
|
||||
|
||||
position_embeddings = self.patch_positional_embedding(patch_embeds, position_ids)
|
||||
|
||||
attention_mask = generate_block_attention_mask(
|
||||
[p.shape[-2] * p.shape[-1] for p in patch_embeds_list], patch_embeds
|
||||
)
|
||||
if self.config._attn_implementation == "flash_attention_2":
|
||||
# We only rely on position_ids when using flash_attention_2
|
||||
attention_mask = None
|
||||
else:
|
||||
attention_mask = generate_block_attention_mask(
|
||||
[p.shape[-2] * p.shape[-1] for p in patch_embeds_list], patch_embeds
|
||||
)
|
||||
|
||||
return self.transformer(
|
||||
patch_embeds,
|
||||
|
Loading…
Reference in New Issue
Block a user