Matt
508a704055
No more Tuple, List, Dict ( #38797 )
...
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 19:37:18 +01:00
Quentin Gallouédec
de24fb63ed
Use HF papers ( #38184 )
...
* Use hf papers
* Hugging Face papers
* doi to hf papers
* style
2025-06-13 11:07:09 +00:00
Cyril Vallez
4602059aae
[modular] Fix the prefix-based renaming if the old and new model share a common name suffix ( #37829 )
...
* first try
* Fix and set examples
* style
* fix
* Update modular_test_detr.py
* Update image_processing_new_imgproc_model.py
* Update modular_model_converter.py
2025-04-29 10:43:23 +02:00
cyyever
0fb8d49e88
Use Python 3.9 syntax in examples ( #37279 )
...
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-07 12:52:21 +01:00
efsotr
2b4734bd49
Support passing flash_attn_kwargs when gradient_checkpointing is enabled ( #37037 )
...
* support passing flash_attn_kwargs when gradient_checkpointing is enabled
* make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
2025-03-31 10:53:02 +02:00
Cyril Vallez
bc65f3fc1c
[modular] Do not track imports in functions ( #36279 )
...
* Add check
* just check for function
* Update examples
2025-02-25 10:29:47 +01:00
Liangliang Ma
315a9f494e
Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files ( #35647 )
...
* add xpu for unmask
* change modular for generated matching
* add lastest modeling for helium
2025-02-05 13:28:31 +01:00
Cyril Vallez
91be6a5eb2
Modular: support for importing functions from any file ( #35692 )
...
* fix function imports
* improve comment
* Update modeling_switch_function.py
* make checks more robust
* improvement
* rename
* final test update
2025-01-16 16:37:53 +00:00
Cyril Vallez
46276f9a7f
Fix modular edge case + modular sorting order ( #35562 )
...
* look-ahead negation
* re add examples by default
* Fix the bug in topological sort
* Update create_dependency_mapping.py
* start adding test
* finalize test
* more tests
* style
* style
2025-01-09 17:17:52 +01:00
Arthur
2c47618c1a
🚨 All attention refactor 🚨 ( #35235 )
...
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
Cyril Vallez
d363e71d0e
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers ( #34858 )
...
* update
* style
* fix missing args
* remove last trace of old rope classes
* remove deprecated copied from
* fix copies
* trigger CIs
* post rebase clean-up
* reverse mistral
* cleanup after dropping commits
* Add comment
2024-12-11 11:16:52 +01:00
Cyril Vallez
1da1e0d7f2
Support for easier multimodal use of modular ( #35056 )
...
* update modular and add examples
* style
* improve example comments
* style
* fix small logic issue for imports
* fix relative order issue when files do not make sense
* Improve comments
* trigger CIs
2024-12-04 15:13:11 +01:00