Cyril Vallez
9afb904b15
Refactor (and fix) gpt_neox ( #35610 )
...
* start a nice modular
* Update modular_gpt_neox.py
* Update modular_gpt_neox.py
* Update modular_gpt_neox.py
* Update modular_gpt_neox.py
* update
* Update modular_gpt_neox.py
* convert
* fix attribute
* fix attrs
* oups
* fix
* fix
* fix
* fix
* fix
* fix order to pass test (see with accelerate team)
* trigger CIs
* modular
* update
* up
* Update test_modeling_gpt_neox.py
* Update test_modeling_gpt_neox.py
* trigger CIs
* correctly pass arg
* simplify
* remove key warning
* update tp -> it's compatible since the view is before
* trigger CIs
2025-02-04 11:18:43 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis ( #35659 )
...
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
2025-01-24 16:55:28 +01:00
Arthur
2c47618c1a
🚨 All attention refactor 🚨 ( #35235 )
...
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
Cyril Vallez
d363e71d0e
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers ( #34858 )
...
* update
* style
* fix missing args
* remove last trace of old rope classes
* remove deprecated copied from
* fix copies
* trigger CIs
* post rebase clean-up
* reverse mistral
* cleanup after dropping commits
* Add comment
2024-12-11 11:16:52 +01:00
Anton Vlasjuk
46df859975
[GPTNeoX
] Flex Attention + Refactor ( #34896 )
...
* gpt neox flex attention + refactor
* some formatting
* small fix on dropout
* add assertion on flex attn test
* flaky ci :(
* add head mask support
* style
* handle dtype, replace torch where
* fixup flex with output attns
* code review and several other fixes
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* style
* remove unnecessary comment
* remove incorrect comment
* make flex attn check more agnostic tor versions and centralized
* change peft input dtype check to value since q and k could be affected by other stuff like RoPE
* i forgor
* flaky
* code review and small fixes
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-04 14:48:28 +01:00
Joao Gante
186b8dc190
Tests: upgrade test_eager_matches_sdpa_generate
( #34386 )
2024-10-25 11:55:07 +01:00
Raushan Turganbay
65bb284448
Compile compatibilty for decoder-only models ( #32617 )
...
* squash into one commit
* add qwen2-vl for rope standardization
* fix mistral compile
* fix qwen2-vl
* fix-copies
2024-09-09 10:59:04 +02:00
Anton Vlasjuk
605f3245dc
Fix mask creations of GPTNeoX
and GPT2
( #31944 )
...
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
2024-07-23 10:11:12 +02:00
Anton Vlasjuk
b07770c5eb
[GPT-NeoX
] Add SDPA support ( #31031 )
...
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
2024-06-26 13:56:36 +01:00
Arthur
673440d073
update ruff version ( #30932 )
...
* update ruff version
* fix research projects
* Empty
* Fix errors
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Joao Gante
441de62f49
RoPE models: add numerical sanity-check test for RoPE scaling ( #29808 )
...
* add hard rope scaling test
* make fixup
* quick rope scaling tests
* add copy statements
2024-03-28 11:25:50 +00:00
Arthur
83f9196cc4
[GPTNeoX
] Fix BC issue with 4.36 ( #28602 )
...
* fix dtype issue
* add a test
* update copied from mentions
* nits
* fixup
* fix copies
* Apply suggestions from code review
2024-01-21 17:01:19 +00:00
Yih-Dar
bd90cda9a6
CI with num_hidden_layers=2
🚀 🚀 🚀 ( #25266 )
...
* CI with layers=2
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-08-02 20:22:36 +02:00
Joao Gante
34d9409427
Llama/GPTNeoX: add RoPE scaling ( #24653 )
...
* add rope_scaling
* tmp commit
* add gptneox
* add tests
* GPTNeoX can now handle long inputs, so the pipeline test was wrong
* Update src/transformers/models/open_llama/configuration_open_llama.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove ntk
* remove redundant validation
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-13 16:47:30 +01:00
Arthur
e5c760d636
[GPTNeoX] Nit in config ( #24349 )
...
* add raise value error for attention size
* nits to fix test_config
* style
2023-06-20 19:19:19 +02:00
Yih-Dar
dadc9fb427
Update GPTNeoXLanguageGenerationTest
( #24193 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-12 15:37:12 +02:00
Yih-Dar
ffad4f1373
Update tiny models and pipeline tests ( #23446 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-05-18 17:29:04 +02:00
peter-sk
83b38fbea8
GPTNeoXForQuestionAnswering ( #23059 )
...
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* and gpt neox for qa
* use right model
* forgot this one
* base_model_prefix is "gpt_neox" for GPTNeoX* models
* unnecessary stuff
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* format
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* removed gpt2 stuff
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-04 10:15:15 -04:00
peter-sk
614e191c4d
added GPTNeoXForTokenClassification ( #23002 )
...
* initial commit
* added GPTNeoXForTokenClassification
* typo
* doc
fixed extra comma that turned into a tuple
* unifying variable names
fixing forward call
* classifier_dropout is in config
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-27 11:08:26 -04:00
Sugawara
6daa9cb515
add GPTNeoXForSequenceClassification ( #22671 )
...
* add GPTNeoXForSequenceClassification
* move the labels to logits.device (ref: #22561 )
* fix
2023-04-10 11:52:23 -04:00
Joao Gante
7dcd8703ef
Generate: support for left-padding on GPTNeoX and Llama ( #22382 )
2023-03-27 15:48:23 +01:00
Joao Gante
0fa46524ac
Generate: Add GPTNeoX integration test ( #22346 )
2023-03-24 11:33:16 +00:00
Joao Gante
502fec779b
Generate: add test for left-padding support ( #22322 )
2023-03-23 17:00:22 +00:00
Yih-Dar
871c31a6f1
🔥 Rework pipeline testing by removing PipelineTestCaseMeta
🚀 ( #21516 )
...
* Add PipelineTesterMixin
* remove class PipelineTestCaseMeta
* move validate_test_components
* Add for ViT
* Add to SPECIAL_MODULE_TO_TEST_MAP
* style and quality
* Add feature-extraction
* update
* raise instead of skip
* add tiny_model_summary.json
* more explicit
* skip tasks not in mapping
* add availability check
* Add Copyright
* A way to diable irrelevant tests
* update with main
* remove disable_irrelevant_tests
* skip tests
* better skip message
* better skip message
* Add all pipeline task tests
* revert
* Import PipelineTesterMixin
* subclass test classes with PipelineTesterMixin
* Add pipieline_model_mapping
* Fix import after adding pipieline_model_mapping
* Fix style and quality after adding pipieline_model_mapping
* Fix one more import after adding pipieline_model_mapping
* Fix style and quality after adding pipieline_model_mapping
* Fix test issues
* Fix import requirements
* Fix mapping for MobileViTModelTest
* Update
* Better skip message
* pipieline_model_mapping could not be None
* Remove some PipelineTesterMixin
* Fix typo
* revert tests_fetcher.py
* update
* rename
* revert
* Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
* style and quality
* test fetcher for all pipeline/model tests
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-28 19:40:57 +01:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting ( #21480 )
...
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
2023-02-06 18:10:56 -05:00
Yih-Dar
14fb8a63b9
skip some gpt_neox tests that require 80G RAM ( #17923 )
...
* skip some gpt_neox tests that require 80G RAM
* remove tests
* fix quality
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 09:04:38 -04:00
Jason Phang
205bc4152c
Fix GPT-NeoX-20B past handling, attention computation ( #17811 )
...
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs
* 20B tests
2022-06-30 08:47:40 -04:00
Sylvain Gugger
fdb120805c
Fix cache for GPT-Neo-X ( #17764 )
...
* Fix cache for GPT-Neo-X
* Add more tests
2022-06-20 08:43:36 -04:00
Jason Phang
71e602725b
[WIP] Adding GPT-NeoX-20B ( #16659 )
...
* initial
* first try
* working 20B
* 20B tokenizers
* Docs
* Import fixes for missing classes
* Update docs, fixup
* black formatting
* isort
* flake
* dummy objects
* documentation
* Documentation yml
* more docs
* tweaks for tests
* tokenization auto
* fix neox tests
* test
* test
* einsum
* address PR feedback
* Documentation
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_neox/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_neox/configuration_gpt_neox.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove undefined LaTeX syntax
* Update to full url to avoid confusion about if that's supposed to refer to the Hub
* fix auto
* move tests
* documentation fix
* more doc fixes
* test refactor
* fix import
* fix import
* fix import
* fix import
* fix import
* style fixes
* More modeling fixes
Co-authored-by: Jason Phang <zp489@gr057.hpc.nyu.edu>
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-24 09:31:10 -04:00