Joao Gante
aeeceb9916
[cache] add a test to confirm we can use cache at train time ( #35709 )
...
* add test
* augment test as suggested
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* rerun tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-16 17:02:34 +00:00
Raushan Turganbay
84a6789145
Enable different torch dtype in sub models ( #34873 )
...
* fix
* fix test
* add tests
* add more tests
* fix tests
* supposed to be a torch.dtype test
* handle BC and make fp32 default
2025-01-13 13:42:08 +01:00
Arthur
2c47618c1a
🚨 All attention refactor 🚨 ( #35235 )
...
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
Marc Sun
1eee1cedfd
Fix loading with only state dict and low_cpu_mem_usage = True ( #35217 )
...
* fix loading with only state dict and config
* style
* add tests
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-18 09:54:32 +01:00
Yih-Dar
b0a51e5cff
Fix flaky Hub CI (test_trainer.py
) ( #35062 )
...
* fix
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* check
* check
* check
* check
* check
* check
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* check
* check
* check
* Final space
* Final adjustment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-12-05 17:02:27 +01:00
Tibor Reiss
f297af55df
Fix: take into account meta device ( #34134 )
...
* Do not load for meta device
* Make some minor improvements
* Add test
* Update tests/utils/test_modeling_utils.py
Update test parameters
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Make the test simpler
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-20 11:32:07 +01:00
Joao Gante
13493215ab
🧼 remove v4.44 deprecations ( #34245 )
...
* remove v4.44 deprecations
* PR comments
* deprecations scheduled for v4.50
* hub version update
* make fiuxp
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-15 23:07:24 +01:00
Joao Gante
34927b0f73
MPS: isin_mps_friendly
can support 0D tensors ( #34538 )
...
* apply fix
* tested
* make fixup
2024-11-04 16:18:50 +00:00
kang sheng
655bec2da7
use a tinymodel to test generation config which aviod timeout ( #34482 )
...
* use a tinymodel to test generation config which aviod timeout
* remove tailing whitespace
2024-10-29 09:39:06 +01:00
Lysandre Debut
409dd2d19c
Fix failing conversion ( #34010 )
...
* Fix
* Tests
* Typo
* Typo
2024-10-11 14:59:23 +02:00
Joao Gante
3557f9a14a
Generate: can_generate()
recursive check ( #33718 )
...
* add recursive check and test warnings
* missing space
* models without can_generate
2024-09-26 18:11:14 +01:00
Joao Gante
e15687fffe
Generation: deprecate PreTrainedModel
inheriting from GenerationMixin
( #33203 )
2024-09-23 18:28:36 +01:00
Joao Gante
7542fac2c7
Pipeline: no side-effects on model.config
and model.generation_config
🔫 ( #33480 )
2024-09-18 15:43:06 +01:00
Joao Gante
72d4a3f9c1
mps: add isin_mps_friendly
, a wrapper function for torch.isin
( #33099 )
2024-08-26 15:34:19 +01:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig
from saving generate
parameters; Update deprecations in generate
-related code 🧹 ( #32659 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 11:12:53 +01:00
Zach Mueller
8ec028aded
Reduce the error log when using core models that need their weights renamed, and provide a step forward ( #32656 )
...
* Fin
* Modify msg
* Finish up nits
2024-08-16 13:05:57 -04:00
Pablo Montalvo
a5a8291ad1
Fix tests ( #32649 )
...
* skip failing tests
* [no-filter]
* [no-filter]
* fix wording catch in FA2 test
* [no-filter]
* trigger normal CI without filtering
2024-08-13 09:46:21 +01:00
Quentin Gallouédec
f1c8542ff7
"to be not" -> "not to be" ( #32636 )
...
* "to be not" -> "not to be"
* Update sam.md
* Update trainer.py
* Update modeling_utils.py
* Update test_modeling_utils.py
* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
amyeroberts
7e5d46ded4
Respect the config's attn_implementation if set ( #32383 )
...
* Respect the config's attn if set
* Update test - can override in from_config
* Fix
2024-08-05 16:33:19 +01:00
Yih-Dar
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-25 16:12:23 +02:00
amyeroberts
817a676bd7
Don't default to other weights file when use_safetensors=True ( #31874 )
...
* Don't default to other weights file when use_safetensors=True
* Add tests
* Update tests/utils/test_modeling_utils.py
* Add clarifying comments to tests
* Update tests/utils/test_modeling_utils.py
* Update tests/utils/test_modeling_utils.py
2024-07-22 18:29:50 +01:00
Zach Mueller
693cb828ff
Fix bad test about slower init ( #32002 )
...
Bronked main
2024-07-16 10:33:05 -04:00
Zach Mueller
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) ( #31771 )
...
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00
Yih-Dar
a1a34657d4
Avoid race condition ( #31973 )
...
* [test_all] hub
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-15 17:56:24 +02:00
Omar Salman
1499a55008
Add warning message for beta and gamma parameters ( #31654 )
...
* Add warning message for and parameters
* Fix when the warning is raised
* Formatting changes
* Improve testing and remove duplicated warning from _fix_key
2024-07-11 13:01:47 +01:00
Joao Gante
4c2538b863
Test loading generation config with safetensor weights ( #31550 )
...
fix test
2024-07-09 16:22:43 +02:00
Marc Sun
8c5c180de0
Fix serialization for offloaded model ( #31727 )
...
* Fix serialization
* style
* add test
2024-07-05 08:07:07 +02:00
Yih-Dar
93cd94b79d
Move some test files (tets/test_xxx_utils.py
) to tests/utils
( #31730 )
...
* move
* move
* move
* move
* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-02 13:46:03 +02:00