transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Yih-Dar	95346e9dcd	Add artifact name in job step to maintain job / artifact correspondence (#28682 ) * avoid using job name * apply to other files --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-31 15:58:17 +01:00
Joao Gante	beb2a09687	DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760 )	2024-01-31 14:39:07 +00:00
Kian Sierra McGettigan	f7076cd346	Flax mistral (#26943 ) * direct copy from llama work * mistral modules forward pass working * flax mistral forward pass with sliding window * added tests * added layer collection approach * Revert "added layer collection approach" This reverts commit `0e2905bf22`. * Revert "Revert "added layer collection approach"" This reverts commit `fb17b6187a`. * fixed attention outputs * added mistral to init and auto * fixed import name * fixed layernorm weight dtype * freeze initialized weights * make sure conversion consideres bfloat16 * added backend * added docstrings * added cache * fixed sliding window causal mask * passes cache tests * passed all tests * applied make style * removed commented out code * applied fix-copies ignored other model changes * applied make fix-copies * removed unused functions * passed generation integration test * slow tests pass * fixed slow tests * changed default dtype from jax.numpy.float32 to float32 for docstring check * skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids * updated checkpoint since from_pt not included * applied black style * removed unused args * Applied styling and fixup * changed checkpoint for doc back * fixed rf after adding it to hf hub * Add dummy ckpt * applied styling * added tokenizer to new ckpt * fixed slice format * fix init and slice * changed ref for placeholder TODO * added copies from Llama * applied styling * applied fix-copies * fixed docs * update weight dtype reconversion for sharded weights * removed Nullable input ids * Removed unnecessary output attentions in Module * added embedding weight initialziation * removed unused past_key_values * fixed deterministic * Fixed RMS Norm and added copied from * removed input_embeds * applied make style * removed nullable input ids from sequence classification model * added copied from GPTJ * added copied from Llama on FlaxMistralDecoderLayer * added copied from to FlaxMistralPreTrainedModel methods * fix test deprecation warning * freeze gpt neox random_params and fix copies * applied make style * fixed doc issue * skipped docstring test to allign # copied from * applied make style * removed FlaxMistralForSequenceClassification * removed unused padding_idx * removed more sequence classification * removed sequence classification * applied styling and consistency * added copied from in tests * removed sequence classification test logic * applied styling * applied make style * removed freeze and fixed copies * undo test change * changed repeat_kv to tile * fixed to key value groups * updated copyright year * split casual_mask * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest * went back to 2023 for tests_pr_documentation_tests * went back to 2024 * changed tile to repeat * applied make style * empty for retry on Wav2Vec2	2024-01-31 14:19:02 +01:00
Matt	7a4961007a	Wrap Keras methods to support BatchEncoding (#28734 ) * Shim the Keras methods to support BatchEncoding * Extract everything to a convert_batch_encoding function * Convert BatchFeature too (thanks Amy) * tf.keras -> keras	2024-01-31 13:18:42 +00:00
Julien Chaumond	721e2d94df	canonical repos moves (#28795 ) * canonical repos moves * Style --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2024-01-31 14:18:31 +01:00
Hieu Lam	bebeeee012	Resolve DeepSpeed cannot resume training with PeftModel (#28746 ) * fix: resolve deepspeed resume peft model issues * chore: update something * chore: update model instance pass into is peft model checks * chore: remove hard code value to tests * fix: format code	2024-01-31 13:58:26 +01:00
Patrick von Platen	65a926e82b	[Whisper] Refactor forced_decoder_ids & prompt ids (#28687 ) * up * Fix more * Correct more * Fix more tests * fix fast tests * Fix more * fix more * push all files * finish all * make style * Fix timestamp wrap * make style * make style * up * up * up * Fix lang detection behavior * Fix lang detection behavior * Add lang detection test * Fix lang detection behavior * make style * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * better error message * make style tests * add warning --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-01-31 14:02:07 +02:00
Younes Belkada	f9f1f2ac5e	[`HFQuantizer`] Remove `check_packages_compatibility` logic (#28789 ) remove `check_packages_compatibility` logic	2024-01-31 03:21:27 +01:00
tom-p-reichel	ae0c27adfa	don't initialize the output embeddings if we're going to tie them to input embeddings (#28192 ) * test that tied output embeddings aren't initialized on load * don't initialize the output embeddings if we're going to tie them to the input embeddings	2024-01-31 02:19:18 +01:00
Alessio Serra	a937425e94	Prevent MLflow exception from disrupting training (#28779 ) Modified MLflow logging metrics from synchronous to asynchronous Co-authored-by: codiceSpaghetti <alessio.ser@hotmail.it>	2024-01-31 02:10:44 +01:00
Younes Belkada	d703eaaeff	[`bnb`] Fix bnb slow tests (#28788 ) fix bnb slow tests	2024-01-31 01:31:20 +01:00
Matt	74c9cfeaa7	Pin Torch to <2.2.0 (#28785 ) * Pin torch to <2.2.0 * Pin torchvision and torchaudio as well * Playing around with versions to see if this helps * twiddle something to restart the CI * twiddle it back * Try changing the natten version * make fixup * Revert "Try changing the natten version" This reverts commit `de0d6592c3`. * make fixup * fix fix fix * fix fix fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-30 23:01:12 +01:00
Matt	415e9a0980	Add tf_keras imports to prepare for Keras 3 (#28588 ) * Port core files + ESM (because ESM code is odd) * Search-replace in modelling code * Fix up transfo_xl as well * Fix other core files + tests (still need to add correct import to tests) * Fix cookiecutter * make fixup, fix imports in some more core files * Auto-add imports to tests * Cleanup, add imports to sagemaker tests * Use correct exception for importing tf_keras * Fixes in modeling_tf_utils * make fixup * Correct version parsing code * Ensure the pipeline tests correctly revert to float32 after each test * Ensure the pipeline tests correctly revert to float32 after each test * More tf.keras -> keras * Add dtype cast * Better imports of tf_keras * Add a cast for tf.assign, just in case * Fix callback imports	2024-01-30 17:26:36 +00:00
amyeroberts	1d489b3e61	Task-specific pipeline init args (#28439 ) * Abstract out pipeline init args * Address PR comments * Reword * BC PIPELINE_INIT_ARGS * Remove old arguments * Small fix	2024-01-30 16:54:57 +00:00
amyeroberts	2fa1c808ae	[`Backbone`] Use `load_backbone` instead of `AutoBackbone.from_config` (#28661 ) * Enable instantiating model with pretrained backbone weights * Remove doc updates until changes made in modeling code * Use load_backbone instead * Add use_timm_backbone to the model configs * Add missing imports and arguments * Update docstrings * Make sure test is properly configured * Include recent DPT updates	2024-01-30 16:54:09 +00:00
Yih-Dar	c24c52454a	Further pin pytest version (in a temporary way) (#28780 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-30 17:48:49 +01:00
fxmarty	6f7d5db58c	Fix transformers.utils.fx compatibility with torch<2.0 (#28774 ) guard sdpa on torch>=2.0	2024-01-30 14:54:42 +01:00
Thien Tran	5c8d941d66	Use Conv1d for TDNN (#25728 ) * use conv for tdnn * run make fixup * update TDNN * add PEFT LoRA check * propagate tdnn warnings to others * add missing imports * update TDNN in wav2vec2_bert * add missing imports	2024-01-30 09:33:55 +01:00
Younes Belkada	866253f85e	[`HfQuantizer`] Move it to "Developper guides" (#28768 ) Update _toctree.yml	2024-01-30 07:20:20 +01:00
Poedator	d78e78a0e4	`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610 ) * squashed earlier commits for easier rebase * rm rebase leftovers * 4bit save enabled @quantizers * TMP gptq test use exllama * fix AwqConfigTest::test_wrong_backend for A100 * quantizers AWQ fixes * _load_pretrained_model low_cpu_mem_usage branch * quantizers style * remove require_low_cpu_mem_usage attr * rm dtype arg from process_model_before_weight_loading * rm config_origin from Q-config * rm inspect from q_config * fixed docstrings in QuantizationConfigParser * logger.warning fix * mv is_loaded_in_4(8)bit to BnbHFQuantizer * is_accelerate_available error msg fix in quantizer * split is_model_trainable in bnb quantizer class * rm llm_int8_skip_modules as separate var in Q * Q rm todo * fwd ref to HFQuantizer in type hint * rm note re optimum.gptq.GPTQQuantizer * quantization_config in __init__ simplified * replaced NonImplemented with create_quantized_param * rm load_in_4/8_bit deprecation warning * QuantizationConfigParser refactoring * awq-related minor changes * awq-related changes * awq config.modules_to_not_convert * raise error if no q-method in q-config in args * minor cleanup * awq quantizer docstring * combine common parts in bnb process_model_before_weight_loading * revert test_gptq * .process_model_ cleanup * restore dict config warning * removed typevars in quantizers.py * cleanup post-rebase 16 jan * QuantizationConfigParser classmethod refactor * rework of handling of unexpected aux elements of bnb weights * moved q-related stuff from save_pretrained to quantizers * refactor v1 * more changes * fix some tests * remove it from main init * ooops * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix awq issues * fix * fix * fix * fix * fix * fix * add docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/hf_quantizer.md * address comments * fix * fixup * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address final comment * update * Update src/transformers/quantizers/base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/quantizers/auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * add kwargs update * fixup * add `optimum_quantizer` attribute * oops * rm unneeded file * fix doctests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-01-30 02:48:25 +01:00
Zhan Ling	1f5590d32e	Move CLIP _no_split_modules to CLIPPreTrainedModel (#27841 ) Add _no_split_modules to CLIPModel	2024-01-30 02:15:58 +01:00
Omar Sanseviero	a989c6c6eb	Don't allow passing `load_in_8bit` and `load_in_4bit` at the same time (#28266 ) * Update quantization_config.py * Style * Protect from setting directly * add tests * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-01-30 01:43:40 +01:00
ThibaultLengagne	cd2eb8cb2b	Add French translation: french README.md (#28696 ) * doc: french README Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add Depth Anything Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add french link in other docs Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add missing links in fr docs * doc: fix several mistakes in translation Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> --------- Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> Co-authored-by: Sarapuce <alexandreh@padok.fr>	2024-01-29 10:07:49 -08:00
Ajay Patel	a055d09e11	Support saving only PEFT adapter in checkpoints when using PEFT + FSDP (#28297 ) * Update trainer.py * Revert "Update trainer.py" This reverts commit 0557e2cc9effa3a41304322032239a3874b948a7. * Make trainer.py use adapter_only=True when using FSDP + PEFT * Support load_best_model with adapter_only=True * Ruff format * Inspect function args for save_ load_ fsdp utility functions and only pass adapter_only=True if they support it	2024-01-29 17:10:15 +00:00
Sanchit Gandhi	da3c79b245	[Whisper] Make tokenizer normalization public (#28136 ) * [Whisper] Make tokenizer normalization public * add to docs	2024-01-29 16:07:35 +00:00
xkszltl	e694e985d7	Fix typo of `Block`. (#28727 )	2024-01-29 15:25:00 +00:00
amyeroberts	9e8f35fa28	Mark test_constrained_beam_search_generate as flaky (#28757 ) * Make test_constrained_beam_search_generate as flaky * Update tests/generation/test_utils.py	2024-01-29 15:22:25 +00:00
amyeroberts	0f8d015a41	Pin pytest version <8.0.0 (#28758 ) * Pin pytest version <8.0.0 * Update setup.py * make deps_table_update	2024-01-29 15:22:14 +00:00
Julien Chaumond	26aa03a252	small doc update for CamemBERT (#28644 )	2024-01-29 15:46:32 +01:00
Nate Cibik	0548af54cc	Enable Gradient Checkpointing in Deformable DETR (#28686 ) * Enabled gradient checkpointing in Deformable DETR * Enabled gradient checkpointing in Deformable DETR encoder * Removed # Copied from headers in modeling_deta.py to break dependence on Deformable DETR code	2024-01-29 10:10:40 +00:00
Wesley Gifford	f72c7c22d9	PatchtTST and PatchTSMixer fixes (#28083 ) * 🐛 fix .max bug * remove prediction_length from regression output dimensions * fix parameter names, fix output names, update tests * ensure shape for PatchTST * ensure output shape for PatchTSMixer * update model, batch, and expected for regression distribution test * update test expected Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * standardize on patch_length Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make arguments more explicit Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com> * adjust prepared inputs Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com> --------- Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com> Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-01-29 10:09:26 +00:00
Vinyzu	3a08cc485f	[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) (#28751 ) * [Docs] Fix Typo in English CLIP model_doc * [Docs] Fix Typo in Japanese CLIP model_doc	2024-01-29 10:06:51 +00:00
Klaus Hipp	39fa400969	Fix input data file extension in examples (#28741 )	2024-01-29 10:06:31 +00:00
Yih-Dar	5649c0cbb8	Fix `DepthEstimationPipeline`'s docstring (#28733 ) * fix * fix * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-29 10:42:55 +01:00
Angela Yi	243e186efb	Add serialization logic to pytree types (#27871 ) * Add serialized type name to pytrees * Modify context * add serde test	2024-01-29 10:41:20 +01:00
amyeroberts	f1cc615721	[`Siglip`] protect from imports if sentencepiece not installed (#28737 ) [Siglip] protect from imports if sentencepiece not installed	2024-01-28 15:10:14 +00:00
Joao Gante	03cc17775b	Generate: deprecate old src imports (#28607 )	2024-01-27 15:54:19 +00:00
Joao Gante	a28a76996c	Falcon: removed unused function (#28605 )	2024-01-27 15:52:59 +00:00
Sanchit Gandhi	de13a951b3	[Flax] Update no init test for Flax v0.7.1 (#28735 )	2024-01-26 18:20:39 +00:00
Steven Liu	abe0289e6d	[docs] Fix datasets in guides (#28715 ) * change datasets * fix	2024-01-26 09:29:07 -08:00
Yih-Dar	f8b7c4345a	Unpin pydantic (#28728 ) * try pydantic v2 * try pydantic v2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-26 17:39:33 +01:00
Scruel Tao	3aea38ce61	fix: suppress `GatedRepoError` to use cache file (fix #28558 ). (#28566 ) * fix: suppress `GatedRepoError` to use cache file (fix #28558). * move condition_to_return parameter back to outside.	2024-01-26 16:25:08 +00:00
Matt	708b19eb09	Stop confusing the TF compiler with ModelOutput objects (#28712 ) * Stop confusing the TF compiler with ModelOutput objects * Stop confusing the TF compiler with ModelOutput objects	2024-01-26 12:22:29 +00:00
Yih-Dar	a638de1987	Fix `weights_only` (#28725 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-26 13:00:49 +01:00
Shukant Pal	d6ac8f4ad2	Initialize _tqdm_active with hf_hub_utils.are_progress_bars_disabled(… (#28717 ) Initialize _tqdm_active with hf_hub_utils.are_progress_bars_disabled() to respect HF_HUB_DISABLE_PROGRESS_BARS It seems like enable_progress_bar() and disable_progress_bar() sync up with huggingface_hub, but the initial value is always True. This changes will make sure the user's preference is respected implicity on initialization.	2024-01-26 11:59:34 +00:00
D	3a46e30dd1	[`docs`] Update preprocessing.md (#28719 ) * Update preprocessing.md adjust ImageProcessor link to working target (same as in lower section of file) * Update preprocessing.md	2024-01-26 11:58:57 +00:00
Turetskii Mikhail	1f47a24aa1	fix: corrected misleading log message in save_pretrained function (#28699 )	2024-01-26 11:52:53 +00:00
Facico	bbe30c6968	support PeftMixedModel signature inspect (#28321 ) * support PeftMixedModel signature inspect * import PeftMixedModel only peft>=0.7.0 * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * fix styling * Update src/transformers/trainer.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style fixup * fix note --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-01-26 12:05:01 +01:00
fxmarty	8eb74c1c89	Fix duplicate & unnecessary flash attention warnings (#28557 ) * fix duplicate & unnecessary flash warnings * trigger ci * warning_once * if/else order --------- Co-authored-by: Your Name <you@example.com>	2024-01-26 09:37:04 +01:00
Yih-Dar	142ce68389	Don't fail when `LocalEntryNotFoundError` during `processor_config.json` loading (#28709 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-26 09:02:32 +01:00

1 2 3 4 5 ...

15028 Commits