transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Guillaume LEGENDRE	4df3ccddb7	Migrate the CI runners to the new clusters (#33849 ) * try fixing push-ci * move to new runners * move benchmark.yml to new runners * move doctest_job.yml to new runners * move doctests.yml to new runners * move push-important-models.yml to new runners * move self-pr-slow-ci.yml to new runners * fix typo Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix working directory Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix working directory Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * improve code Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2024-10-03 14:39:49 +02:00
Joao Gante	6f0ce52760	VLM Generate: tag `test_static_cache_matches_dynamic` as flaky (#33630 ) flaky	2024-10-03 12:27:02 +01:00
Nonthachai Plodthong	f1a5f81296	Update an keyerror on _save_check_point prevent confusion of missing … (#33832 ) * Update an keyerror on _save_check_point prevent confusion of missing metric keys * Update grammar error and case sensitive. Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * adding update KeyError on _evaluate function to align with _save_checkpoint function --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-10-03 10:27:49 +02:00
HofitBata	dc8156fdd8	Fix dt proj bias reassigned (#33314 ) * When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object. * When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object. * When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.	2024-10-03 09:51:03 +02:00
Yoni Gozlan	d7950bff82	uniformize processor Mllama (#33876 ) * uniformize processor Mllama * nit syntax * nit	2024-10-02 16:50:15 +02:00
Yoni Gozlan	62e8c759c3	rename all test_processing_.py to test_processor_.py (#33878 ) * rename all test_processing_.py to test_processor_.py ans fix duplicate test processor paligemma * fix copies * fix broken tests * fix-copies * fix test processor bridgetower	2024-10-02 16:43:43 +02:00
Pavel Iakubovskii	2f25ab95db	Handle Trainer `tokenizer` kwarg deprecation with decorator (#33887 ) * Handle deprecation with decorator * Fix for seq2seq Trainer	2024-10-02 15:28:20 +01:00
Yoni Gozlan	ee71c9853a	Optim deformable detr (#33600 ) * optimize deformable detr * fix copies * remove deformable_detr_basline * fix hardcoded float16 and .float() * [run slow] deformable-detr,grounding-dino,mask2former,oneformer,rt-detr * [run slow] deformable_detr,grounding_dino,mask2former,oneformer,rt_detr	2024-10-02 15:46:27 +02:00
Marc Sun	cac4a4876b	[Quantization] Switch to optimum-quanto (#31732 ) * switch to optimum-quanto rebase squach * fix import check * again * test try-except * style	2024-10-02 15:14:34 +02:00
amyeroberts	b7474f211d	Trainer - deprecate tokenizer for processing_class (#32385 ) * Trainer - deprecate tokenizer for processing_class * Extend chage across Seq2Seq trainer and docs * Add tests * Update to FutureWarning and add deprecation version	2024-10-02 14:08:46 +01:00
Omar Salman	e7c8af7f33	Add sdpa for DistilBert (#33724 ) * Add sdpa for DistilBert * [run_slow] distilbert * [run_slow] distilbert * [run_slow] distilbert * Try without slow tests * [run_slow] distilbert * [run_slow] distilbert	2024-10-02 13:55:19 +01:00
Kyle Sayers	614c79a9b0	Fix kwargs passed by AutoQuantizationConfig.from_pretrained (#33798 ) fix kwargs Co-authored-by: kylesayrs <kyle@neuralmagic.com>	2024-10-02 14:12:03 +02:00
Kyle Sayers	b09234cfc1	Allow for nightly packages of `compressed_tensors` (#33828 ) * only check spec * correct typo in nightly package name	2024-10-02 14:11:44 +02:00
g-prz	fe484726aa	Add falcon gguf (#33437 ) * feat(gguf): add falcon q2 k * fix(gguf): remove useless renaming * feat(gguf): seperate falcon 7b and 40b * feat(gguf): apply fixup * fix(test): error rebase * feat(gguf): add fp16 weight comparison for falcon * feat(gguf): test weight of all layers * test(gguf): add falcon 40b under skip decorator * feat(gguf): quick example for extracting model size	2024-10-02 14:10:39 +02:00
George	181c962aab	populate quantization_config for kv-cache-scheme only configs (#33874 )	2024-10-02 14:06:40 +02:00
Yih-Dar	e5d14f39ad	Don't run reminder bot for now (#33883 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-02 11:51:01 +02:00
Pablo Montalvo	50290cf7a0	Uniformize model processors (#31368 ) * add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * add uniform processors for blip + blip2 * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * add blip, blip2, bridgetower Added tests for bridgetower which override common. Also modified common tests to force center cropping if existing * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * removed copied from * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * add missing import * fix * adapt bridgetower tests to shortest edge * uniformize donut processor + tests * add wav2vec2 * extend common testing to audio processors * add testing + bert version * propagate common kwargs to different modalities * BC order of arguments * check py version * revert kwargs merging * add draft overlap test * update * fix blip2 and wav2vec due to updates * fix copies * ensure overlapping kwargs do not disappear * replace .pop by .get to handle duplicated kwargs * fix copies * fix missing import * add clearly wav2vec2_bert to uniformized models * fix copies * increase number of features * fix style * [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * fix concatenation * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * Update tests/test_processing_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * 🧹 * address comments * clean up + tests * [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-10-02 10:41:08 +02:00
TrickEye	2292be6c1b	Fix: typo (#33880 ) Update llm_tutorial.md: typo	2024-10-02 09:12:21 +01:00
Yoni Gozlan	61ac161a9d	Add support for custom inputs and batched inputs in ProcessorTesterMixin (#33711 ) * add support for custom inputs and batched inputs in ProcessorTesterMixin * Fix batch_size behavior ProcessorTesterMixin * Change format prepare inputs batched * Remove override test pixtral processor * Remove unnecessary tests and cleanup after new prepare_inputs functions * Fix instructBlipVideo image processor	2024-10-01 23:52:03 +02:00
amyeroberts	1baa08897d	Repo consistency fix after #33339 (#33873 ) * Repo consistency fix after #33339 * [run-slow] omdet_turbo	2024-10-01 21:03:15 +01:00
Prakarsh Kaushik	68a2b50069	[Fix] ViViT interpolate_pos_encoding (#33815 ) * fix:test_inference_interpolate_pos_encoding * style:make style;make fixup * test: add suggestion to test_modeling_vivit * chore:add suggestions * style:make style * [run_slow] vivit * ci:slow test fix * [run_slow] vivit	2024-10-01 20:14:35 +01:00
g-prz	8635802af9	Move weight initilization deformabledetr (#33339 ) * fix(copy): fixup copy * fix(deformable_detr): move weight initialization to the right place * fix(grounding_dino): move weight initialization to the right place * fix(rt_detr): move weight initialization to the right place * [run-slow] deformable_detr, grounding_dino, rt_detr	2024-10-01 20:08:57 +01:00
Matt	a43e84cb3b	Make ASR pipeline compliant with Hub spec + add tests (#33769 ) * Remove max_new_tokens arg * Add ASR pipeline to testing * make fixup * Factor the output test out into a util * Full error reporting * Full error reporting * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Small comment --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-10-01 18:15:04 +01:00
Nicola De Angeli	0256520794	fix: repair depth estimation multiprocessing (#33759 ) * fix: repair depth estimation multiprocessing * test: add test for multiprocess depth estimation	2024-10-01 17:59:59 +01:00
Yih-Dar	f205da9660	Avoid using context that is not accessable from external contributors (#33866 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-01 17:42:45 +02:00
Manal ML	0c4c2d7e07	Add include_loss_for_metrics (#33088 ) * Add include_loss_for_metrics * Fix styling * Initialize inputs and losses to avoid AttributeError * Ruff styling * Refactor compute_metrics and update EvalPrediction * Change Naming * Added include_for_metrics to group both args * Fix style * Change warnings to logger Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-10-01 16:51:41 +02:00
jackyjinjing	5f9f58fc59	Validate the eval dataset in advance. (#33743 ) * Validate the eval dataset in advance. * format * format * format * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * format --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-10-01 16:45:06 +02:00
Kyle Sayers	f8110a6ddf	Raise `accelerate` dependency error in case of defaulting `low_cpu_mem_usage=True` (#33830 ) Clarify warning, add import check	2024-10-01 16:44:38 +02:00
aroun-coumar	326b2bad1c	This PR contains additional changes for #33143 (#33581 ) * fix: Fix optimizer bug in ModelCard * fix: fix W293 * Fixes in modelcard.py for issue #33143 --------- Co-authored-by: moontidef <53668275+relic-yuexi@users.noreply.github.com>	2024-10-01 16:42:30 +02:00
Raushan Turganbay	b1c914e463	Fix device mismatch errors (#33851 ) fix device mismatch errors	2024-10-01 15:55:57 +02:00
Matt	ac28a23b3d	Workaround for bark issue in pipelines (#33824 ) * Quick workaround for bark + generation_config issue * make fixup * [run slow] bark	2024-10-01 14:40:12 +01:00
Francesco Ortu	acdfdd9387	add attention weight up-cast to float32 in chameleon (#33822 ) add attention weight float32 cast in chameleon	2024-10-01 15:19:16 +02:00
Fabian David Schmidt	351873a145	fix: skip dropout in eval for flash_attn in various models (#33844 ) * fix(m2m_100): skip dropout in eval for flash_attn * fix(misc): skip dropout in eval for flash attn various models * chore(m2m_100): copy flash attn from bart * chore: run make fix-copies * [run-slow] bart, m2m_100	2024-10-01 14:39:21 +02:00
Kenza Bouzid	88d960937c	Refactor image features selection in LlaVa (#33696 ) * refactor image features selection * break line * remove whitespace * add pr comments: include projection and rename function * make fix-copies * fix get_image_feature in vip llava	2024-10-01 14:37:31 +02:00
Joao Gante	22266be970	Generate: move llama `prepare_inputs_for_generation` to `GenerationMixin` (#33677 )	2024-10-01 12:32:54 +01:00
Yih-Dar	d19ab15421	post reminder comment only once (#33848 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-01 12:52:53 +02:00
Wing Lian	fbde09c8c9	fix check for hidden size in text model for deepspeed zero3 auto entries (#33829 ) * fix check for hidden size in text model for deepspeed zero3 auto entries * fix typo	2024-10-01 12:28:26 +02:00
Guang Yang	808997a634	Fix passing str dtype to static cache (#33741 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-01 09:50:17 +02:00
Adibvafa Fallahpour	c269c5c74d	Fix Mamba slow path bug with dtype mismatch. (#32691 ) * Fix Mamba slow path bug with dtype mismatch. * Update test_modeling_mamba.py * Improve style. * Fix issue with cache position of dtype mismatch test. * Change test for slow path. * Revert changes. * Switch to buggy code and add test to catch it. * Fix the dtype mismatch bug and add test code to verify it. * Fix minor bug with test. * Fix incorrect dtype of model output. * Fix incorrect dtype of cache. * Fix incorrect dtype of ssm cache. * Fix incorrect dtype of conv state. * Remove assertion for ssm state. * Add assertion for conv state dtype. * Fix all issues with dtype mismatch test.	2024-10-01 09:28:40 +02:00
dependabot[bot]	570c89625b	Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/lxmert (#33821 ) Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-09-30 21:57:57 +02:00
Aryan	90dca5a71b	minor typo fix (#33784 ) fix typo	2024-09-30 21:42:22 +02:00
pogpog	b77846a6e6	Fix link in gguf.md (#33768 ) Change hyphen to underscore for URL in link to convert_hf_to_gguf.py	2024-09-30 20:17:33 +02:00
aroun-coumar	baa765f813	Fixes for issue #33763 in idefics2 model (#33766 )	2024-09-30 18:08:48 +01:00
Joshua Lochner	18c5b216f1	Fix ViT-MAE decoder interpolate (#33330 ) * Fix ViT-MAE decoder interpolate * Add unit test for `interpolate_pos_encoding` w/ custom sizes * [run_slow] vit_mae	2024-09-30 18:47:13 +02:00
Arthur	1dba608df9	[`modular`] fixes! (#33820 ) * fix converter for function definitions * small changes * no prints * style	2024-09-30 16:43:55 +02:00
Yih-Dar	1d29a75a6a	Add Slow CI reminder bot (#33506 ) * add workflow * update * fix * Update .github/workflows/slow_ci_remainder.yml Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-30 16:26:54 +02:00
mobicham	f5247aca01	Hqq serialization (#33141 ) * HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-30 14:47:18 +02:00
Quentin Gallouédec	4d5b458704	Fix typo in documentation (#33805 ) fix typo	2024-09-30 12:02:23 +02:00
Jerry Zhang	4bb49d4e00	Enable non-safetensor ser/deser for TorchAoConfig quantized model 🔴 (#33456 ) * Enable non-safetensor serialization and deserialization for TorchAoConfig quantized model Summary: After https://github.com/huggingface/huggingface_hub/pull/2440 we added non-safetensor serialization and deserialization in huggingface, with this we can now add the support in transformers Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor see README for more details Test Plan: tested locally Reviewers: Subscribers: Tasks: Tags: * formatting * formatting * minor fix * formatting * address comments * comments * minor fix * update doc * refactor compressed tensor quantizer	2024-09-30 11:30:29 +02:00
Philip May	2e24ee4dfa	Fix typing in `load_balancing_loss_func` function of `modeling_mixtral.py`. (#33641 ) * fix return type * update to union * fix gate_logits typing * fix num_experts type * fix typing * run fix-copies * add doc for top_k * run fix-copies * empty commit to trigger CI	2024-09-27 18:10:07 +02:00

1 2 3 4 5 ...

17000 Commits