transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Raushan Turganbay	21d5025826	Attn implementation for composite models (#32238 ) * first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-10-22 06:54:44 +02:00
Andrés Marafioti	32590b5ecb	Fix method name which changes in tutorial (#34252 ) The method `model_download_tool` was called `model_download_counter` earlier in the tutorial, this raises an error when following the code.	2024-10-21 14:21:52 -03:00
Matt	f701b98e4a	Add a doc section on writing generation prompts (#34248 ) Add a section on writing generation prompts	2024-10-21 14:35:57 +01:00
Yoni Gozlan	a4122813d1	Add DetrImageProcessorFast (#34063 ) * add fully functionning image_processing_detr_fast * Create tensors on the correct device * fix copies * fix doc * add tests equivalence cpu gpu * fix doc en * add relative imports and copied from * Fix copies and nit	2024-10-21 09:05:05 -04:00
Yoni Gozlan	24bdc94da5	Change Paligemma import logging to work with modular (#34211 ) * change import logging * fix CI	2024-10-21 08:55:27 -04:00
Raushan Turganbay	ca541bd4f4	Generation tests: don't rely on main input name (#34228 ) * don't rely on main input name * update	2024-10-21 10:00:14 +02:00
Matthew Hoffman	816f442496	Only cast logits to float when computing loss (#34147 ) * Only cast logits to float when computing loss Some misses from #31292 and #33902 * Move logits.float() into existing if labels is not None branch	2024-10-18 18:15:26 +02:00
Matt	e46e3bc173	Fix UDOP dtype issue (#34180 ) * Trigger UDOP tests * Try forcing dtype in LayoutLMV3 * Do checks to see where uint8 is getting in * Do checks to see where uint8 is getting in * Found it! * Add .astype(np.float32) * Remove forced check, make fixup * Checking where exactly the uint8 creeps in * More checking on the uint8 issues * Manually upcast in rescale() * Remove UDOP trigger	2024-10-18 16:54:58 +01:00
Cyril Vallez	6604764007	add Glm (#33823 ) * Create modular_glm.py * Update modular_glm.py * Finalize architecture without all attentions * Add all attentions modules * Finalize modular * Update given last version * Last update * Finalize model * Finalize converter * Update convert_glm_weights_to_hf.py * style * style * Create __init__.py * Aff all inits * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Correct the rotary embeddings * Remove apply_residual_connection_post_layernorm (always false) * remove use_rms_norm (always true) * remove past_layer_norm (always true) * Update __init__.py * Update config and license * start adding tests and doc * Add doc + style * Update test_modeling_glm.py * Add dummies * Apply correct modeling * Refactor attention to follow llama * Update __init__.py * Update convert_glm_weights_to_hf.py * Correct bias * remove linear_bias and pdrop (never used) * apply modular * Simplify converter * remove dummies + style * add model_input_names * Add pretraining_tp to config for when eager attention is used * Update modular to remove all pretraining_tp * Update test_modeling_glm.py * Update the __all__ * Update __all__ * Update __init__.py * Update test_modeling_glm.py * add revisions * Add the correct repos and revisions * style * Update __init__.py * update exports * remove import of modular files * style * Apply Llama changes + refine converter * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * style * Use new modular converter * add pretrainedmodel to init * style * Update test_modeling_glm.py * Move config outside modular to please CI about docstrings * Add dummies to please CI * Update glm.md * Update glm.md	2024-10-18 17:41:12 +02:00
Lysandre Debut	e95ea479ee	Informative 2 (#34154 ) * Informative * style * Informative 2 * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-10-18 14:12:15 +02:00
byi8220	0437d6cd03	Fix broken test decorator `require_torch_up_to_2_accelerators` (#34201 ) * fix broken require_torch_up_to_2_accelerators * make style	2024-10-18 13:54:55 +02:00
Raushan Turganbay	5a5b590d06	BLIP: fix input expansion logic (#34225 ) fix	2024-10-18 12:17:30 +02:00
Arthur	b54109c746	Fix-red-ci (#34230 ) * fix copies, skip fx for llama * styke * re-fix copies * last? * style	2024-10-17 23:38:35 +02:00
Zach Mueller	6ba31a8a94	Enable users to use their own loss functions + deal with prefetching for grad accum (#34198 ) * bookmark * Bookmark * Bookmark * Actually implement * Pass in kwarg explicitly * Adjust for if we do or don't have labels * Bookmark fix for od * bookmark * Fin * closer * Negate accelerate grad accum div * Fixup not training long enough * Add in compute_loss to take full model output * Document * compute_loss -> compute_loss_fn * Add a test * Refactor * Refactor * Uncomment tests * Update tests/trainer/test_trainer.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-10-17 17:01:56 -04:00
Pedro Cuenca	7a06d07e14	Support Llama 3.2 conversion (text models) (#33778 ) * Support Llama 3.2 conversion (text models) Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Fix rope factor * Update chat template Initialize from a well-known template. The guidance is that the changes should be applied to 3.1 models as well. * Remove import * Support Llama Guard 3 conversion * Tokenizer details * Fix eos added token in base models * Fix generation config for base models * Specify revision for known tokenizers * Style * Reuse chat templates for older models * Improve error when converting tokenizer < Llama 3 --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>	2024-10-17 22:37:37 +02:00
Arthur	c1c7e89620	Fix Gradient Accumulation issue (#34191 ) * quick fix * 3 losses * oups * fix * nits * check how it scales for special models * propagate for conditiona detr * propagate * propagate * propagate * fixes * propagate changes * update * fixup * nits * f string * fixes * more fixes * ? * nit * arg annoying f string * nits * grumble * update * nit * refactor * fix fetch tests * nit * nit * Update src/transformers/loss/loss_utils.py Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * update * nit * fixup * make pass * nits * port code to more models * fixup * ntis * arf * update * update * nits * update * fix * update * nits * fine * agjkfslga.jsdlkgjklas * nits * fix fx? * update * update * styel * fix imports * update * update * fixup to fix the torch fx? --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2024-10-17 22:34:40 +02:00
Joao Gante	f51ac9e059	Generate: visit non-llm `prepare_inputs_for_generation` (#34199 ) * tmp * all visited * test all * Update src/transformers/models/moshi/modeling_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * delete another one :D --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-17 16:53:48 +01:00
David Chanin	1d2c29f0b3	Fix bus error when using GPT2 on M1 macs (#34031 ) There's a bug on M1 macs with transformer >= 4.43.0 and torch >= 2.1.0, where if a model has tied embeddings, then the fast loading from #31771 causes a bus error when the model is actually run. This can be solved by disabling `_supports_param_buffer_assignment` for these models. More info in comments in #33357	2024-10-17 17:39:04 +02:00
Guang Yang	9470c00042	Llama3 and Llama2 are ExecuTorch compatible (#34101 ) Llama3_1b and Llama2_7b are ExecuTorch compatible Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-17 17:33:19 +02:00
Name	7f5088503f	removes decord (#33987 ) * removes decord dependency optimize np Revert "optimize" This reverts commit faa136b51ec4ec5858e5b0ae40eb7ef89a88b475. helpers as documentation pydoc missing keys * make fixup * require_av --------- Co-authored-by: ad <hi@arnaudiaz.com>	2024-10-17 17:27:34 +02:00
Sebastian Schoennenbeck	f2846ad2b7	Fix for tokenizer.apply_chat_template with continue_final_message=True (#34214 ) * Strip final message * Do full strip instead of rstrip * Retrigger CI --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2024-10-17 15:45:07 +01:00
Christopher McGirr	b57c7bce21	fix(Wav2Vec2ForCTC): torch export (#34023 ) * fix(Wav2Vec2ForCTC): torch export Resolves the issue described in #34022 by implementing the masking of the hidden states using an elementwise multiplication rather than indexing with assignment. The torch.export functionality seems to mark the tensor as frozen even though the update is legal. This change is a workaround for now to allow the export of the model as a FxGraph. Further investigation is required to find the real solution in pytorch. * [run-slow] hubert, unispeech, unispeech_sat, wav2vec2	2024-10-17 15:41:55 +01:00
Yih-Dar	fce1fcfe71	Ping team members for new failed tests in daily CI (#34171 ) * ping * fix * fix * fix * remove runner * update members --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-17 16:11:52 +02:00
Amos You	aa3e35ac67	Fix warning message for fp32_cpu_offloading in bitsandbytes configs (#34079 ) * change cpu offload warning for fp8 quantization * change cpu offload warning for fp4 quantization * change cpu offload variable name for fp8 and fp4 quantization	2024-10-17 15:11:33 +02:00
larin92	6d2b203339	Update `trainer._get_eval_sampler()` to support `group_by_length` arg (#33514 ) Update 'trainer._get_eval_sampler()' to support 'group_by_length' argument Trainer didn't support grouping by length for evaluation, which made evaluation slow with 'eval_batch_size'>1. Updated 'trainer._get_eval_sampler()' method was based off of 'trainer._get_train_sampler()'.	2024-10-17 14:43:29 +02:00
Marc Sun	3f06f95ebe	Revert "Fix FSDP resume Initialization issue" (#34193 ) Revert "Fix FSDP resume Initialization issue (#34032)" This reverts commit `4de1bdbf63`.	2024-10-16 15:25:18 -04:00
Reza Rahemtola	3a10c6192b	Avoid using torch's Tensor or PIL's Image in chat template utils if not available (#34165 ) * fix(utils): Avoid using torch Tensor or PIL Image if not available * Trigger CI --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2024-10-16 16:01:18 +01:00
Yoni Gozlan	bd5dc10fd2	Fix wrong name for llava onevision and qwen2_vl in tokenization auto (#34177 ) * nit fix wrong llava onevision name in tokenization auto * add qwen2_vl and fix style	2024-10-16 16:48:52 +02:00
steveepreston	cc7d8b87e1	Revert `accelerate` error caused by `46d09af` (#34197 ) Revert `accelerate` bug	2024-10-16 16:13:41 +02:00
alpertunga-bile	98bad9c6d6	[fix] fix token healing tests and usage errors (#33931 ) * auto-gptq requirement is removed & model is changed & tokenizer pad token is assigned * values func is changed with extensions & sequence key value bug is fixed * map key value check is added in ExtensionsTree * empty trimmed_ids bug is fixed * tail_id IndexError is fixed * empty trimmed_ids bug fix is updated for failed test * too much specific case for specific tokenizer is removed * input_ids check is updated * require auto-gptq import is removed * key error check is changed with empty list check * empty input_ids check is added * empty trimmed_ids fix is checked with numel function * usage change comments are added * test changes are commented * comment style and quality bugs are fixed * test comment style and quality bug is fixed	2024-10-16 14:22:55 +02:00
Yoach Lacombe	9ba021ea75	Moshi integration (#33624 ) * clean mimi commit * some nits suggestions from Arthur * make fixup * first moshi WIP * converting weights working + configuration + generation configuration * finalize converting script - still missing tokenizer and FE and processor * fix saving model w/o default config * working generation * use GenerationMixin instead of inheriting * add delay pattern mask * fix right order: moshi codes then user codes * unconditional inputs + generation config * get rid of MoshiGenerationConfig * blank user inputs * update convert script:fix conversion, add tokenizer, feature extractor and bf16 * add and correct Auto classes * update modeling code, configuration and tests * make fixup * fix some copies * WIP: add integration tests * add dummy objects * propose better readiblity and code organisation * update tokenization tests * update docstrigns, eval and modeling * add .md * make fixup * add MoshiForConditionalGeneration to ignore Auto * revert mimi changes * re * further fix * Update moshi.md * correct md formating * move prepare causal mask to class * fix copies * fix depth decoder causal * fix and correct some tests * make style and update .md * correct config checkpoitn * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make style * Update src/transformers/models/moshi/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * change firm in copyrights * udpate config with nested dict * replace einsum * make style * change split to True * add back splt=False * remove tests in convert * Update tests/models/moshi/test_modeling_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add default config repo + add model to FA2 docstrings * remove logits float * fix some tokenization tests and ignore some others * make style tokenization tests * update modeling with sliding window + update modeling tests * [run-slow] moshi * remove prepare for generation frol CausalLM * isort * remove copied from * ignore offload tests * update causal mask and prepare 4D mask aligned with recent changes * further test refine + add back prepare_inputs_for_generation for depth decoder * correct conditional use of prepare mask * update slow integration tests * fix multi-device forward * remove previous solution to device_map * save_load is flaky * fix generate multi-devices * fix device * move tensor to int --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co>	2024-10-16 11:21:49 +02:00
Raushan Turganbay	d087165db0	IDEFICS: support inputs embeds (#34043 ) * support embeds * use cache from config * style... * fix tests after rebase	2024-10-16 09:25:26 +02:00
Chulhwa (Evan) Han	9d6998c759	🌐 [i18n-KO] Translated `blip-2.md` to Korean (#33516 ) * docs: ko: model_doc/blip-2 * feat: nmt draft * Apply suggestions from code review Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> * Update docs/source/ko/model_doc/blip-2.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> --------- Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2024-10-15 11:21:22 -07:00
Yijun Lee	554ed5d1e0	🌐 [i18n-KO] Translated `trainer_utils.md` to Korean (#33817 ) * docs: ko: trainer_utils.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>	2024-10-15 11:21:05 -07:00
Yijun Lee	8c33cf4eec	🌐 [i18n-KO] Translated `gemma2.md` to Korean (#33937 ) * docs: ko: gemma2.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions	2024-10-15 11:20:46 -07:00
Jiwook Han	67acb0b123	🌐 [i18n-KO] Translated `vivit.md` to Korean (#33935 ) * docs: ko: model_doc/vivit.md * feat: nmt draft * fix: manual edits * fix: manual edits	2024-10-15 10:31:44 -07:00
laurentd-lunit	0f49deacbf	[feat] LlavaNext add feature size check to avoid CUDA Runtime Error (#33608 ) * [feat] add feature size check to avoid CUDA Runtime Error * [minor] add error handling to all llava models * [minor] avoid nested if else * [minor] add error message to Qwen2-vl and chameleon * [fix] token dimension for check * [minor] add feature dim check for videos too * [fix] dimension check * [fix] test reference values --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>	2024-10-15 16:19:18 +02:00
Marc Sun	d00f1ca860	Fix optuna ddp hp search (#34073 )	2024-10-15 15:42:07 +02:00
Yoni Gozlan	65442718c4	Add support for inheritance from class with different suffix in modular (#34077 ) * add support for different suffix in modular * add dummy example, pull new changes for modular * nide lines order change	2024-10-15 14:55:09 +02:00
Joao Gante	d314ce70bf	Generate: move `logits` to same device as `input_ids` (#34076 ) tmp commit	2024-10-15 14:32:09 +02:00
Subhalingam D	5ee9e786d1	Fix default behaviour in TextClassificationPipeline for regression problem type (#34066 ) * update code * update docstrings * update tests	2024-10-15 13:06:20 +01:00
Shikhar Mishra	4de1bdbf63	Fix FSDP resume Initialization issue (#34032 ) * Fix FSDP Initialization for resume training * Added init_fsdp function to work with dummy values * Fix FSDP initialization for resuming training * Added CUDA decorator for tests * Added torch_gpu decorator to FSDP tests * Fixup for failing code quality tests	2024-10-15 13:48:10 +02:00
Prakarsh Kaushik	293e6271c6	Add sdpa for Vivit (#33757 ) * chore:add sdpa to vivit * fix:failing slow test_inference_interpolate_pos_encoding(failing on main branch too) * chore:fix nits * ci:fix repo consistency failure * chore:add info and benchmark to model doc * [run_slow] vivit * chore:revert interpolation test fix for new issue * [run_slow] vivit * [run_slow] vivit * [run_slow] vivit * chore:add fallback for output_attentions being True * [run_slow] vivit * style:make fixup * [run_slow] vivit	2024-10-15 11:27:54 +02:00
Raushan Turganbay	23874f5948	Idefics: enable generation tests (#34062 ) * add idefics * conflicts after merging main * enable tests but need to fix some * fix tests * no print * fix/skip some slow tests * continue not skip * rebasing broken smth, this is the fix	2024-10-15 11:17:14 +02:00
Victor Muštar	dd4216b766	Update README.md with Enterprise Hub (#34150 )	2024-10-15 10:45:22 +02:00
Arthur	fa3f2db5c7	Add documentation for docker (#33156 ) * initial commit * nit	2024-10-14 11:58:45 +02:00
Lysandre Debut	5114c9b9e9	Specify that users should be careful with their own files (#34153 ) * Informative * style	2024-10-14 11:40:39 +02:00
Diogo Miguel Silva	013d3ac2b5	Fixed error message in mllama (#34106 )	2024-10-14 10:30:35 +02:00
Vladislav Bronzov	cb5ca3265f	Add GGUF for starcoder2 (#34094 ) * add starcoder2 arch support for gguf * fix q6 test	2024-10-14 10:22:49 +02:00
PengWeixuan	4c439173df	Fix a typo (#34148 ) Correct a typo "If you want you tokenizer..."->"If you want your tokenizer...."	2024-10-14 10:15:25 +02:00

1 2 3 4 5 ...

17201 Commits