transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-20 04:58:22 +06:00

History

Anton Vlasjuk b07770c5eb [`GPT-NeoX`] Add SDPA support (#31031 ) * starting support for sdpa in `gptneox` models * small comment on tests * fix dropout * documentation and style * clarify concrete paths for reference * generalise attn projections and rope application added head mask check to sdpa mask creation handle sdpa memory backend bug via own version flag * update docs and style * move dtype casting outside of general attn_projection_and_rope function fix flash_attn_2 stuff * more generic attn warning if output_attns or head_mask * simplify head mask check by moving head mask creation to a later point * remove copied llama artifact * remove padding_mask from attention function signature * removing unnecessary comments, only "save" attn implementation once * [run_slow] gpt_neox	2024-06-26 13:56:36 +01:00
..
__init__.py	[WIP] Adding GPT-NeoX-20B (#16659 )	2022-05-24 09:31:10 -04:00
test_modeling_gpt_neox.py	[`GPT-NeoX`] Add SDPA support (#31031 )	2024-06-26 13:56:36 +01:00

[GPT-NeoX] Add SDPA support (#31031 )

* starting support for sdpa in `gptneox` models

* small comment on tests

* fix dropout

* documentation and style

* clarify concrete paths for reference

* generalise attn projections and rope application

added head mask check to sdpa mask creation

handle sdpa memory backend bug via own version flag

* update docs and style

* move dtype casting outside of general attn_projection_and_rope function

fix flash_attn_2 stuff

* more generic attn warning if output_attns or head_mask

* simplify head mask check by moving head mask creation to a later point

* remove copied llama artifact

* remove padding_mask from attention function signature

* removing unnecessary comments, only "save" attn implementation once

* [run_slow] gpt_neox

2024-06-26 13:56:36 +01:00

__init__.py [WIP] Adding GPT-NeoX-20B (#16659 ) 2022-05-24 09:31:10 -04:00

test_modeling_gpt_neox.py [GPT-NeoX] Add SDPA support (#31031 ) 2024-06-26 13:56:36 +01:00