transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Stefano Fiorucci	d806fa3e92	docs: fix outdated link to TF32 explanation (#32947 ) fix outdated link	2024-08-22 13:28:00 -07:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Jinuk	09e6579d2d	🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334 ) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>	2024-08-22 10:42:39 -07:00
Franz Louis Cesista	273c0afc8f	Fix regression on `Processor.save_pretrained` caused by #31691 (#32921 ) fix save_pretrained	2024-08-22 18:42:44 +02:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00
Joao Gante	975b988bfe	Gemma2: eager attention by default (#32865 )	2024-08-22 15:59:30 +01:00
Shaopeng Fu	f1d822ba33	fix: (issue #32689 ) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849 ) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.	2024-08-22 16:42:00 +02:00
Isotr0py	ee8c01f839	Add chat_template for tokenizer extracted from GGUF model (#32908 ) * add chat_template to gguf tokenizer * add template through tokenizer config	2024-08-22 16:41:25 +02:00
regisss	99d67f1a09	Improve greedy search memory usage (#32895 ) Do not call torch.repeat_interleave if expand_size is 1	2024-08-22 15:37:44 +01:00
Yih-Dar	bf97d4aa6d	Fix benchmark script (#32635 ) * fix * >= 0.3.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-22 16:07:47 +02:00
Shubham Ugare	9282413611	Add SynCode to llm_tutorial (#32884 )	2024-08-22 15:30:22 +02:00
Younes Belkada	eeea71209a	FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469 ) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-08-22 15:29:21 +02:00
Joao Gante	8b94d28f97	CI: separate step to download nltk files (#32935 ) * separate step to download nltk files * duplicated * rm comma	2024-08-22 14:17:24 +01:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Gal Cohen (galco)	6baa6f276a	fix: no need to dtype A in jamba (#32924 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 15:03:22 +02:00
Sai-Suraj-27	af638c4afe	fix: Added missing `huggingface_hub` installation to workflows (#32891 ) Added missing huggingface_hub installation to workflows.	2024-08-22 12:51:12 +01:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Arthur	3bb7b05229	Update docker image building (#32918 ) commit	2024-08-21 21:23:10 +02:00
Ruilin Huang	c6d484e38c	fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296 ) [whisper] don't overwrite return_timestamps when not passed to generate	2024-08-21 20:21:27 +01:00
Ahmed Almaghz	87134662f7	[i18n-ar] add README_ar.md to README.md (#32583 ) * Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md	2024-08-20 16:11:54 -07:00
Nicholas Broad	1dde50c7d2	link for optimizer names (#32400 ) * link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup	2024-08-20 15:28:24 -07:00
Pavel Iakubovskii	078d5a88cd	Replace `tensor.norm()` with decomposed version for CLIP executorch export (#32887 ) * Replace .norm() with decomposed version for executorch export * [run_slow] clip	2024-08-20 21:27:21 +01:00
dependabot[bot]	9800e6d170	Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer (#32903 ) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-20 21:02:17 +01:00
Anton Vlasjuk	c63a3d0f17	Fix: Mamba2 `norm_before_gate` usage (#32686 ) * mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only	2024-08-20 19:47:34 +02:00
Gal Cohen (galco)	01c4fc455b	fix: jamba cache fails to use torch.nn.module (#32894 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 14:50:13 +02:00
Arthur	65f4bc99f9	Fix repr for conv (#32897 ) add nx	2024-08-20 14:34:24 +02:00
Marc Sun	fd06ad5438	🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627 ) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci	2024-08-20 11:42:36 +02:00
Arthur	13e645bb40	Allow-head-dim (#32857 ) * support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara <suhara@users.noreply.github.com>> * update Co-authored-by: bzantium <bzantium@users.noreply.github.com> * Co-authored-by: suhara <suhara@users.noreply.github.com> * Update Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com> --------- Co-authored-by: bzantium <bzantium@users.noreply.github.com> Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>	2024-08-20 10:24:48 +02:00
Matt	85345bb439	Add tip to clarify tool calling (#32883 )	2024-08-19 18:37:35 +01:00
Sai-Suraj-27	37204848f1	Docs: Fixed `whisper-large-v2` model link in docs (#32871 ) Fixed whisper-large-v2 model link in docs.	2024-08-19 09:50:35 -07:00
Anton Vlasjuk	61d89c19d8	Fix: Mamba2 generation mismatch between input_ids and inputs_embeds (#32694 ) * fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass	2024-08-19 16:06:07 +02:00
Younes Belkada	93e538ae2e	Mamba / FalconMamba: Fix mamba left padding (#32677 ) * fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-19 16:01:35 +02:00
Isotr0py	59e8f1919c	Fix incorrect vocab size retrieval in GGUF config (#32551 ) * fix gguf config vocab size * minor fix * link issue	2024-08-19 15:53:54 +02:00
Alan-Blanchet	5f6c080b62	RT-DETR parameterized batchnorm freezing (#32631 ) * fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-08-19 14:50:57 +01:00
Yitong Huang	8a4857c0db	Support save/load ckpt for XLA FSDP (#32311 ) * Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by: Zach Mueller <muellerzr@gmail.com> * add is_fsdp_xla_v1_enabled --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-08-19 15:44:21 +02:00
Aaron Chung	f1b720ed62	Add __repr__ for Conv1D (#32425 ) * Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.	2024-08-19 15:26:19 +02:00
Fanli Lin	e55b33ceb4	[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519 ) * enable * fix	2024-08-19 12:46:59 +01:00
Ita Zaporozhets	54b7703682	support torch-speech (#32537 )	2024-08-19 11:26:35 +02:00
Kamil Akesbi	8260cb311e	Add Descript-Audio-Codec model (#31494 ) * dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-08-19 10:21:51 +01:00
MAHIR DAIYAN	843e5e20ca	Add Flax Dinov2 (#31960 ) * tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-08-19 09:28:13 +01:00
Joao Gante	52cb4034ad	generate: missing `to` in DoLa body, causing exceptions in multi-gpu generation (#32856 )	2024-08-17 16:37:00 +01:00
Alex Calderwood	6806d33567	Make beam_constraints.Constraint.advance() docstring more accurate (#32674 ) * Fix beam_constraints.Constraint.advance() docstring * Update src/transformers/generation/beam_constraints.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-08-16 19:36:55 +01:00
Zach Mueller	8ec028aded	Reduce the error log when using core models that need their weights renamed, and provide a step forward (#32656 ) * Fin * Modify msg * Finish up nits	2024-08-16 13:05:57 -04:00
Marc Sun	1c36db697a	fix multi-gpu with static cache (#32543 )	2024-08-16 19:02:37 +02:00
Zach Mueller	0b066bed14	Revert PR 32299, flag users when Zero-3 was missed (#32851 ) Revert PR 32299	2024-08-16 12:35:41 -04:00
Zhan Rongrui	f20d0e81ea	improve _get_is_as_tensor_fns (#32596 ) * improve _get_is_as_tensor_fns * format	2024-08-16 15:59:44 +01:00
Yangshen⚡Deng	a27182b7fc	Fix AutoConfig and AutoModel support for Llava-Next-Video (#32844 ) * Fix: fix all model_type of Llava-Next-Video to llava_next_video * Fix doc for llava_next_video * * Fix formatting issues * Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation * Fix docs TOC for llava-next-video	2024-08-16 12:41:05 +01:00

1 2 3 4 5 ...

16663 Commits