transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-18 12:08:22 +06:00

Author	SHA1	Message	Date
hiroaki222	99e0ab6ed8	Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL (#35705 ) doc: Update original code repository URL	2025-01-15 07:36:50 -08:00
Mohamed Mekkouri	12dfd99007	Fix : Nemotron Processor in GGUF conversion (#35708 ) * fixing nemotron processor * make style	2025-01-15 14:25:44 +01:00
jiqing-feng	387663e571	Enable gptqmodel (#35012 ) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass *kwargs limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass *kwargs add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-15 14:22:49 +01:00
Matt	615bf9c5e4	Add future import for Py < 3.10 (#35666 ) * Add future import for Py < 3.10 * make fixup * Same issue in convert_olmo2_weights_to_hf.py	2025-01-15 12:45:43 +00:00
Raushan Turganbay	09d5f76274	Clean-up composite configs (#34603 ) * remove manual assignment tie-word-embeddings * remove another unused attribute * fix tests * fix tests * remove unnecessary overwrites * fix * decoder=True * clean pix2struct * run-all * forgot `_tied_weights_keys` when adding Emu3 * also Aria + fix-copies * and clean aria	2025-01-15 10:04:07 +01:00
Mahdi Baghbanzadeh	c61fcde910	Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251 ) * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling	2025-01-14 17:01:10 +00:00
Ego Joseph Oborakpororo	b0cdbd9119	Enhanced Installation Section in README.md (#35094 ) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md Enhanced installation section with troubleshooting, GPU setup, and OS-specific details. * Update README.md Enhanced installation section with troubleshooting, GPU setup, and OS-specific details. * Update installation.md Updated installation.md to include virtual environment and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting and GPU setup instructions. * Update installation.md * Update installation.md * Update installation.md * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update README.md Removed numbering from README.md. * Update README.md Removed unnecessary "a)" formatting as per maintainer feedback. * Update README.md Added blank lines around code snippets for better readability. * Update README.md Removed the line "b) Install a backend framework:" from README.md as per feedback. * Update README.md Simplified "For Windows:" to "Windows" in README.md as per feedback as well as "For macOS/Linux:" to "macOS/Linux" * Update README.md Removed unnecessary heading and retained valid code snippet. * Update README.md Removed unnecessary heading "d) Optional: Install from source for the latest updates" as per feedback. * Update README.md Removed "GPU Setup (Optional)" section to align with minimal design feedback. * Update installation.md Removed "Create and Activate a Virtual Environment" section from installation.md as per feedback. * Update installation.md Adjusted "Troubleshooting" to a second-level heading and added an introductory line as per feedback. * Update installation.md Updated troubleshooting section with simplified headings and formatted code blocks as per feedback. * Update installation.md Integrated GPU setup instructions into the "Install with pip" section for better content flow. * Update README.md Removed Troubleshooting section from README.md for minimalism as per maintainer feedback.	2025-01-14 08:05:08 -08:00
Mohamed Mekkouri	a11041ffad	Fix : add require_read_token for gemma2 gated model (#35687 ) fix gemma2 gated model test	2025-01-14 11:47:05 +01:00
Mohamed Mekkouri	df2a812e95	Fix expected output for ggml test (#35686 ) fix expected output	2025-01-14 11:46:55 +01:00
Mohamed Mekkouri	050636518a	Fix : HQQ config when hqq not available (#35655 ) * fix * make style * adding require_hqq * make style	2025-01-14 11:37:37 +01:00
Martin	715fdd6459	Update torchao.md: use auto-compilation (#35490 ) * Update torchao.md: use auto-compilation * Update torchao.md: indicate updating transformers to the latest --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-01-14 11:33:48 +01:00
Mohamed Mekkouri	4b8d1f7fca	Fix : adding einops lib in the CI docker for some bitsandbytes tests (#35652 ) * fix docker * fix	2025-01-14 07:36:10 +01:00
RTrace	34f76bb62b	Fix `zero_shot_image_classification` documentation guide link in SigLIP (#35671 )	2025-01-13 11:08:17 -08:00
Arthur	c23a1c1932	Add-helium (#35669 ) * Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 18:41:15 +01:00
Ahmed Almaghz	a3f82328ed	[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic (#35193 ) * Create token_classification.md * Update token_classification.md * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update _toctree.yml --------- Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>	2025-01-13 09:32:15 -08:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Arthur	e6f9b03464	[`Compile`] Only test compiling model forward pass (#35658 ) * rename test to only compile forward! * style emu	2025-01-13 13:43:29 +01:00
Raushan Turganbay	84a6789145	Enable different torch dtype in sub models (#34873 ) * fix * fix test * add tests * add more tests * fix tests * supposed to be a torch.dtype test * handle BC and make fp32 default	2025-01-13 13:42:08 +01:00
Arthur	87089176d9	[`Phi`] bias should be True (#35650 ) bias should be True	2025-01-13 13:15:07 +01:00
Sai-Suraj-27	91f14f1fc4	Removed some duplicated code (#35637 ) * Removed duplicate class field definition. * Removed duplicate code in try-except block. --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2025-01-13 12:34:21 +01:00
jiqing-feng	b8c34d97fc	Fix whisper compile (#35413 ) Fix compile error Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-01-13 11:31:51 +01:00
Cyril Vallez	cd44bdb4b8	Fix device in rope module when using dynamic updates (#35608 ) fix rope device	2025-01-13 10:11:17 +01:00
Matt	15bd3e61f8	Update codeowners with individual model owners (#35595 ) * Update codeowners with individual model owners * rip yoach * add comment * Replace - with _ * Add @qubvel for zero-shot object-detection * Update CODEOWNERS Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update CODEOWNERS Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update CODEOWNERS Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update CODEOWNERS Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add yoni for omdet-turbo * Update CODEOWNERS Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Refactor / comment the CODEOWNERS file * Capture modular files as well * Add dummies without owner * More cleanup * Set Niels on a few more models that he added --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-01-10 17:59:36 +00:00
Yih-Dar	1e3c6c1f7d	Skip `MobileNetV1ModelTest::test_batching_equivalence` for now (#35614 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 18:32:36 +01:00
Yih-Dar	04eae987f3	Fix flaky `test_beam_search_low_memory` (#35611 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 17:31:03 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Taha Akbari	0aaf124fb9	Added error when sequence length is bigger than max_position_embeddings (#32156 ) * Added error when sequence length is bigger than max_position_embeddings * Fixed formatting * Fixed bug * Changed copies to match * Fixed bug * Applied suggestions * Removed redundant code * Fixed bugs * Bug fix * Bug fix * Added requested Changes * Fixed bug * Fixed unwanted change * Fixed unwanated changes * Fixed formatting	2025-01-10 15:23:54 +00:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
Yih-Dar	bbc00046b9	Fix flaky `test_custom_4d_attention_mask` (#35606 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 15:40:04 +01:00
Arthur Zucker	f63829c87b	v4.49.0-dev	2025-01-10 12:31:11 +01:00
Raushan Turganbay	52e1f87c7d	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:23:00 +01:00
Cyril Vallez	ccc0381d36	Fix flex_attention in training mode (#35605 ) * fix flex * add test * style	2025-01-10 11:49:12 +01:00
Arthur Zucker	a9bd1e6284	Remove `benchmark.py` after #34275	2025-01-10 11:09:06 +01:00
Raushan Turganbay	e0646f3dce	Chat template: return vectorized output in processors (#34275 ) * update chat template * style * fix tests * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typehints + docs * fix tests * remove unnecessary warnings * forgot code style :( * allow users to pass backend and num frames * Update docs/source/en/chat_templating.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typo fix * style * address comments * align with "pipeline" template * update docs * update docs * unpack for all kwargs? * wrong conflict resolution while rebasing * tmp * update docs * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-10 11:05:29 +01:00
eustlb	5f087d1335	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:00:54 +01:00
Yih-Dar	6f127d3f81	Skip `torchscript` tests if a cache object is in model's outputs (#35596 ) * fix 1 * fix 1 * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 10:46:03 +01:00
Tom Aarsen	6b73ee8905	ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459 ) * Introduce 5 integration tests for the 4 model classes + torch export * ModernBert: reuse GemmaRotaryEmbedding via modular * Revert #35589, keep rope_kwargs; rely on them in modular_modernbert * Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert" This reverts commit `11b44b9ee8`. * Don't set rope_kwargs; override 'self.rope_init_fn' call instead	2025-01-10 10:25:10 +01:00
Zach Mueller	8de7b1ba8d	Add flex_attn to diffllama (#35601 ) Add sdpa to diffllama	2025-01-09 20:49:11 +01:00
Benjamin Warner	1e3ddcb2d0	ModernBERT bug fixes (#35404 ) * bug fixes * organize imports * wrap cpu warning in reference_compile * Avoid needing repad_logits_with_grad, always repad with grads when training I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that? * Revert "Avoid needing repad_logits_with_grad, always repad with grads when training" This reverts commit `cedcb4e89b`. * Fix grammar: keep -> keeps * Propagate grammar fix with modular_model_converter --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>	2025-01-09 20:15:38 +01:00
Arthur	e97d7a5be5	add `_supports_flex_attn = True` for models that do support it (#35598 ) * add `_supports_flex_attn = True` * fix repo consistency	2025-01-09 20:03:33 +01:00
胡译文	c9c682d19c	[doc] deepspeed universal checkpoint (#35015 ) * universal checkpoint * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-09 09:50:51 -08:00
Cyril Vallez	3a4ae6eace	Refactor/fix Cohere2 (#35594 ) * refactor/fix cohere2 * add kwargs * tests * remove func and import it	2025-01-09 17:54:57 +01:00
Tom Aarsen	32e0db8a69	[`tokenizers`] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593 ) * Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this. * Simplify setting self.add_prefix_space, ensure pre_tok exists * Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized' `862d1a346a/bindings/python/src/pre_tokenizers.rs (L672)` produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer. * Propagate add_prefix_space in T5TokenizerFast to superclass	2025-01-09 17:46:50 +01:00
Cyril Vallez	46276f9a7f	Fix modular edge case + modular sorting order (#35562 ) * look-ahead negation * re add examples by default * Fix the bug in topological sort * Update create_dependency_mapping.py * start adding test * finalize test * more tests * style * style	2025-01-09 17:17:52 +01:00
Amit Luhar	d3fe9fa3fe	PR for Issue #22694 : Fixed Training Evaluation table display for VSCode (#35557 )	2025-01-09 15:05:47 +00:00
Pablo Montalvo	395b114bd1	Small fix rope kwargs (#35589 ) * don't know why this keeps popping up? * remove unused rope_kwargs	2025-01-09 15:40:36 +01:00
Yih-Dar	82dd6c14bb	Fix flaky `SwitchTransformersModelTest::test_training_gradient` (#35587 ) * fix * Update tests/models/switch_transformers/test_modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-09 15:36:22 +01:00
Arthur	eb4579cf43	`tokenizer` train from iterator without pre_tokenizers (#35396 ) * fix if else issues * add a test * fix the test * style	2025-01-09 15:34:43 +01:00
Mehant Kammakomati	320512df46	feat: add TP plan for granite (#35573 ) Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>	2025-01-09 15:25:55 +01:00
Saif Rehman Nasir	633da1b10e	[Idefics3] Move image features to same device as input embeds (#35100 ) * [Idefics3] Move image features to same device as input embeds * Update src/transformers/models/idefics3/modeling_idefics3.py * make style --------- Co-authored-by: Saif Rehman Nasir <shyshin@github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Raushan Turganbay <raushan@huggingface.co>	2025-01-09 14:25:36 +01:00

... 31 32 33 34 35 ...

19383 Commits