Yih-Dar
95346e9dcd
Add artifact name in job step to maintain job / artifact correspondence ( #28682 )
...
* avoid using job name
* apply to other files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-31 15:58:17 +01:00
Joao Gante
beb2a09687
DeepSpeed: hardcode torch.arange
dtype on float
usage to avoid incorrect initialization ( #28760 )
2024-01-31 14:39:07 +00:00
Kian Sierra McGettigan
f7076cd346
Flax mistral ( #26943 )
...
* direct copy from llama work
* mistral modules forward pass working
* flax mistral forward pass with sliding window
* added tests
* added layer collection approach
* Revert "added layer collection approach"
This reverts commit 0e2905bf22
.
* Revert "Revert "added layer collection approach""
This reverts commit fb17b6187a
.
* fixed attention outputs
* added mistral to init and auto
* fixed import name
* fixed layernorm weight dtype
* freeze initialized weights
* make sure conversion consideres bfloat16
* added backend
* added docstrings
* added cache
* fixed sliding window causal mask
* passes cache tests
* passed all tests
* applied make style
* removed commented out code
* applied fix-copies ignored other model changes
* applied make fix-copies
* removed unused functions
* passed generation integration test
* slow tests pass
* fixed slow tests
* changed default dtype from jax.numpy.float32 to float32 for docstring check
* skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
* updated checkpoint since from_pt not included
* applied black style
* removed unused args
* Applied styling and fixup
* changed checkpoint for doc back
* fixed rf after adding it to hf hub
* Add dummy ckpt
* applied styling
* added tokenizer to new ckpt
* fixed slice format
* fix init and slice
* changed ref for placeholder TODO
* added copies from Llama
* applied styling
* applied fix-copies
* fixed docs
* update weight dtype reconversion for sharded weights
* removed Nullable input ids
* Removed unnecessary output attentions in Module
* added embedding weight initialziation
* removed unused past_key_values
* fixed deterministic
* Fixed RMS Norm and added copied from
* removed input_embeds
* applied make style
* removed nullable input ids from sequence classification model
* added copied from GPTJ
* added copied from Llama on FlaxMistralDecoderLayer
* added copied from to FlaxMistralPreTrainedModel methods
* fix test deprecation warning
* freeze gpt neox random_params and fix copies
* applied make style
* fixed doc issue
* skipped docstring test to allign # copied from
* applied make style
* removed FlaxMistralForSequenceClassification
* removed unused padding_idx
* removed more sequence classification
* removed sequence classification
* applied styling and consistency
* added copied from in tests
* removed sequence classification test logic
* applied styling
* applied make style
* removed freeze and fixed copies
* undo test change
* changed repeat_kv to tile
* fixed to key value groups
* updated copyright year
* split casual_mask
* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
* went back to 2023 for tests_pr_documentation_tests
* went back to 2024
* changed tile to repeat
* applied make style
* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
Matt
7a4961007a
Wrap Keras methods to support BatchEncoding ( #28734 )
...
* Shim the Keras methods to support BatchEncoding
* Extract everything to a convert_batch_encoding function
* Convert BatchFeature too (thanks Amy)
* tf.keras -> keras
2024-01-31 13:18:42 +00:00
Julien Chaumond
721e2d94df
canonical repos moves ( #28795 )
...
* canonical repos moves
* Style
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-01-31 14:18:31 +01:00
Hieu Lam
bebeeee012
Resolve DeepSpeed cannot resume training with PeftModel ( #28746 )
...
* fix: resolve deepspeed resume peft model issues
* chore: update something
* chore: update model instance pass into is peft model checks
* chore: remove hard code value to tests
* fix: format code
2024-01-31 13:58:26 +01:00
Patrick von Platen
65a926e82b
[Whisper] Refactor forced_decoder_ids & prompt ids ( #28687 )
...
* up
* Fix more
* Correct more
* Fix more tests
* fix fast tests
* Fix more
* fix more
* push all files
* finish all
* make style
* Fix timestamp wrap
* make style
* make style
* up
* up
* up
* Fix lang detection behavior
* Fix lang detection behavior
* Add lang detection test
* Fix lang detection behavior
* make style
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* better error message
* make style tests
* add warning
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-01-31 14:02:07 +02:00
Younes Belkada
f9f1f2ac5e
[HFQuantizer
] Remove check_packages_compatibility
logic ( #28789 )
...
remove `check_packages_compatibility` logic
2024-01-31 03:21:27 +01:00
tom-p-reichel
ae0c27adfa
don't initialize the output embeddings if we're going to tie them to input embeddings ( #28192 )
...
* test that tied output embeddings aren't initialized on load
* don't initialize the output embeddings if we're going to tie them to the input embeddings
2024-01-31 02:19:18 +01:00
Alessio Serra
a937425e94
Prevent MLflow exception from disrupting training ( #28779 )
...
Modified MLflow logging metrics from synchronous to asynchronous
Co-authored-by: codiceSpaghetti <alessio.ser@hotmail.it>
2024-01-31 02:10:44 +01:00
Younes Belkada
d703eaaeff
[bnb
] Fix bnb slow tests ( #28788 )
...
fix bnb slow tests
2024-01-31 01:31:20 +01:00
Matt
74c9cfeaa7
Pin Torch to <2.2.0 ( #28785 )
...
* Pin torch to <2.2.0
* Pin torchvision and torchaudio as well
* Playing around with versions to see if this helps
* twiddle something to restart the CI
* twiddle it back
* Try changing the natten version
* make fixup
* Revert "Try changing the natten version"
This reverts commit de0d6592c3
.
* make fixup
* fix fix fix
* fix fix fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-30 23:01:12 +01:00
Matt
415e9a0980
Add tf_keras imports to prepare for Keras 3 ( #28588 )
...
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
2024-01-30 17:26:36 +00:00
amyeroberts
1d489b3e61
Task-specific pipeline init args ( #28439 )
...
* Abstract out pipeline init args
* Address PR comments
* Reword
* BC PIPELINE_INIT_ARGS
* Remove old arguments
* Small fix
2024-01-30 16:54:57 +00:00
amyeroberts
2fa1c808ae
[Backbone
] Use load_backbone
instead of AutoBackbone.from_config
( #28661 )
...
* Enable instantiating model with pretrained backbone weights
* Remove doc updates until changes made in modeling code
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add missing imports and arguments
* Update docstrings
* Make sure test is properly configured
* Include recent DPT updates
2024-01-30 16:54:09 +00:00
Yih-Dar
c24c52454a
Further pin pytest version (in a temporary way) ( #28780 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-30 17:48:49 +01:00
fxmarty
6f7d5db58c
Fix transformers.utils.fx compatibility with torch<2.0 ( #28774 )
...
guard sdpa on torch>=2.0
2024-01-30 14:54:42 +01:00
Thien Tran
5c8d941d66
Use Conv1d for TDNN ( #25728 )
...
* use conv for tdnn
* run make fixup
* update TDNN
* add PEFT LoRA check
* propagate tdnn warnings to others
* add missing imports
* update TDNN in wav2vec2_bert
* add missing imports
2024-01-30 09:33:55 +01:00
Younes Belkada
866253f85e
[HfQuantizer
] Move it to "Developper guides" ( #28768 )
...
Update _toctree.yml
2024-01-30 07:20:20 +01:00
Poedator
d78e78a0e4
HfQuantizer
class for quantization-related stuff in modeling_utils.py
(#26610 )
...
* squashed earlier commits for easier rebase
* rm rebase leftovers
* 4bit save enabled @quantizers
* TMP gptq test use exllama
* fix AwqConfigTest::test_wrong_backend for A100
* quantizers AWQ fixes
* _load_pretrained_model low_cpu_mem_usage branch
* quantizers style
* remove require_low_cpu_mem_usage attr
* rm dtype arg from process_model_before_weight_loading
* rm config_origin from Q-config
* rm inspect from q_config
* fixed docstrings in QuantizationConfigParser
* logger.warning fix
* mv is_loaded_in_4(8)bit to BnbHFQuantizer
* is_accelerate_available error msg fix in quantizer
* split is_model_trainable in bnb quantizer class
* rm llm_int8_skip_modules as separate var in Q
* Q rm todo
* fwd ref to HFQuantizer in type hint
* rm note re optimum.gptq.GPTQQuantizer
* quantization_config in __init__ simplified
* replaced NonImplemented with create_quantized_param
* rm load_in_4/8_bit deprecation warning
* QuantizationConfigParser refactoring
* awq-related minor changes
* awq-related changes
* awq config.modules_to_not_convert
* raise error if no q-method in q-config in args
* minor cleanup
* awq quantizer docstring
* combine common parts in bnb process_model_before_weight_loading
* revert test_gptq
* .process_model_ cleanup
* restore dict config warning
* removed typevars in quantizers.py
* cleanup post-rebase 16 jan
* QuantizationConfigParser classmethod refactor
* rework of handling of unexpected aux elements of bnb weights
* moved q-related stuff from save_pretrained to quantizers
* refactor v1
* more changes
* fix some tests
* remove it from main init
* ooops
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix awq issues
* fix
* fix
* fix
* fix
* fix
* fix
* add docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/hf_quantizer.md
* address comments
* fix
* fixup
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address final comment
* update
* Update src/transformers/quantizers/base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* add kwargs update
* fixup
* add `optimum_quantizer` attribute
* oops
* rm unneeded file
* fix doctests
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-30 02:48:25 +01:00
Zhan Ling
1f5590d32e
Move CLIP _no_split_modules to CLIPPreTrainedModel ( #27841 )
...
Add _no_split_modules to CLIPModel
2024-01-30 02:15:58 +01:00
Omar Sanseviero
a989c6c6eb
Don't allow passing load_in_8bit
and load_in_4bit
at the same time ( #28266 )
...
* Update quantization_config.py
* Style
* Protect from setting directly
* add tests
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-30 01:43:40 +01:00
ThibaultLengagne
cd2eb8cb2b
Add French translation: french README.md ( #28696 )
...
* doc: french README
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
* doc: Add Depth Anything
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
* doc: Add french link in other docs
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
* doc: Add missing links in fr docs
* doc: fix several mistakes in translation
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
---------
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
Co-authored-by: Sarapuce <alexandreh@padok.fr>
2024-01-29 10:07:49 -08:00
Ajay Patel
a055d09e11
Support saving only PEFT adapter in checkpoints when using PEFT + FSDP ( #28297 )
...
* Update trainer.py
* Revert "Update trainer.py"
This reverts commit 0557e2cc9effa3a41304322032239a3874b948a7.
* Make trainer.py use adapter_only=True when using FSDP + PEFT
* Support load_best_model with adapter_only=True
* Ruff format
* Inspect function args for save_ load_ fsdp utility functions and only pass adapter_only=True if they support it
2024-01-29 17:10:15 +00:00
Sanchit Gandhi
da3c79b245
[Whisper] Make tokenizer normalization public ( #28136 )
...
* [Whisper] Make tokenizer normalization public
* add to docs
2024-01-29 16:07:35 +00:00
xkszltl
e694e985d7
Fix typo of Block
. ( #28727 )
2024-01-29 15:25:00 +00:00
amyeroberts
9e8f35fa28
Mark test_constrained_beam_search_generate as flaky ( #28757 )
...
* Make test_constrained_beam_search_generate as flaky
* Update tests/generation/test_utils.py
2024-01-29 15:22:25 +00:00
amyeroberts
0f8d015a41
Pin pytest version <8.0.0 ( #28758 )
...
* Pin pytest version <8.0.0
* Update setup.py
* make deps_table_update
2024-01-29 15:22:14 +00:00
Julien Chaumond
26aa03a252
small doc update for CamemBERT ( #28644 )
2024-01-29 15:46:32 +01:00
Nate Cibik
0548af54cc
Enable Gradient Checkpointing in Deformable DETR ( #28686 )
...
* Enabled gradient checkpointing in Deformable DETR
* Enabled gradient checkpointing in Deformable DETR encoder
* Removed # Copied from headers in modeling_deta.py to break dependence on Deformable DETR code
2024-01-29 10:10:40 +00:00
Wesley Gifford
f72c7c22d9
PatchtTST and PatchTSMixer fixes ( #28083 )
...
* 🐛 fix .max bug
* remove prediction_length from regression output dimensions
* fix parameter names, fix output names, update tests
* ensure shape for PatchTST
* ensure output shape for PatchTSMixer
* update model, batch, and expected for regression distribution test
* update test expected
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* standardize on patch_length
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make arguments more explicit
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* adjust prepared inputs
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
---------
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-29 10:09:26 +00:00
Vinyzu
3a08cc485f
[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) ( #28751 )
...
* [Docs] Fix Typo in English CLIP model_doc
* [Docs] Fix Typo in Japanese CLIP model_doc
2024-01-29 10:06:51 +00:00
Klaus Hipp
39fa400969
Fix input data file extension in examples ( #28741 )
2024-01-29 10:06:31 +00:00
Yih-Dar
5649c0cbb8
Fix DepthEstimationPipeline
's docstring ( #28733 )
...
* fix
* fix
* Fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-29 10:42:55 +01:00
Angela Yi
243e186efb
Add serialization logic to pytree types ( #27871 )
...
* Add serialized type name to pytrees
* Modify context
* add serde test
2024-01-29 10:41:20 +01:00
amyeroberts
f1cc615721
[Siglip
] protect from imports if sentencepiece not installed ( #28737 )
...
[Siglip] protect from imports if sentencepiece not installed
2024-01-28 15:10:14 +00:00
Joao Gante
03cc17775b
Generate: deprecate old src imports ( #28607 )
2024-01-27 15:54:19 +00:00
Joao Gante
a28a76996c
Falcon: removed unused function ( #28605 )
2024-01-27 15:52:59 +00:00
Sanchit Gandhi
de13a951b3
[Flax] Update no init test for Flax v0.7.1 ( #28735 )
2024-01-26 18:20:39 +00:00
Steven Liu
abe0289e6d
[docs] Fix datasets in guides ( #28715 )
...
* change datasets
* fix
2024-01-26 09:29:07 -08:00
Yih-Dar
f8b7c4345a
Unpin pydantic ( #28728 )
...
* try pydantic v2
* try pydantic v2
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-26 17:39:33 +01:00
Scruel Tao
3aea38ce61
fix: suppress GatedRepoError
to use cache file ( fix #28558 ). ( #28566 )
...
* fix: suppress `GatedRepoError` to use cache file (fix #28558 ).
* move condition_to_return parameter back to outside.
2024-01-26 16:25:08 +00:00
Matt
708b19eb09
Stop confusing the TF compiler with ModelOutput objects ( #28712 )
...
* Stop confusing the TF compiler with ModelOutput objects
* Stop confusing the TF compiler with ModelOutput objects
2024-01-26 12:22:29 +00:00
Yih-Dar
a638de1987
Fix weights_only
( #28725 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-26 13:00:49 +01:00
Shukant Pal
d6ac8f4ad2
Initialize _tqdm_active with hf_hub_utils.are_progress_bars_disabled(… ( #28717 )
...
Initialize _tqdm_active with hf_hub_utils.are_progress_bars_disabled() to respect HF_HUB_DISABLE_PROGRESS_BARS
It seems like enable_progress_bar() and disable_progress_bar() sync up with huggingface_hub, but the initial value is always True. This changes will make sure the user's preference is respected implicity on initialization.
2024-01-26 11:59:34 +00:00
D
3a46e30dd1
[docs
] Update preprocessing.md ( #28719 )
...
* Update preprocessing.md
adjust ImageProcessor link to working target (same as in lower section of file)
* Update preprocessing.md
2024-01-26 11:58:57 +00:00
Turetskii Mikhail
1f47a24aa1
fix: corrected misleading log message in save_pretrained function ( #28699 )
2024-01-26 11:52:53 +00:00
Facico
bbe30c6968
support PeftMixedModel signature inspect ( #28321 )
...
* support PeftMixedModel signature inspect
* import PeftMixedModel only peft>=0.7.0
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* fix styling
* Update src/transformers/trainer.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* style fixup
* fix note
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-26 12:05:01 +01:00
fxmarty
8eb74c1c89
Fix duplicate & unnecessary flash attention warnings ( #28557 )
...
* fix duplicate & unnecessary flash warnings
* trigger ci
* warning_once
* if/else order
---------
Co-authored-by: Your Name <you@example.com>
2024-01-26 09:37:04 +01:00
Yih-Dar
142ce68389
Don't fail when LocalEntryNotFoundError
during processor_config.json
loading ( #28709 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-26 09:02:32 +01:00