transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Gal Cohen (galco)	6baa6f276a	fix: no need to dtype A in jamba (#32924 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 15:03:22 +02:00
Sai-Suraj-27	af638c4afe	fix: Added missing `huggingface_hub` installation to workflows (#32891 ) Added missing huggingface_hub installation to workflows.	2024-08-22 12:51:12 +01:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Arthur	3bb7b05229	Update docker image building (#32918 ) commit	2024-08-21 21:23:10 +02:00
Ruilin Huang	c6d484e38c	fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296 ) [whisper] don't overwrite return_timestamps when not passed to generate	2024-08-21 20:21:27 +01:00
Ahmed Almaghz	87134662f7	[i18n-ar] add README_ar.md to README.md (#32583 ) * Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md	2024-08-20 16:11:54 -07:00
Nicholas Broad	1dde50c7d2	link for optimizer names (#32400 ) * link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup	2024-08-20 15:28:24 -07:00
Pavel Iakubovskii	078d5a88cd	Replace `tensor.norm()` with decomposed version for CLIP executorch export (#32887 ) * Replace .norm() with decomposed version for executorch export * [run_slow] clip	2024-08-20 21:27:21 +01:00
dependabot[bot]	9800e6d170	Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer (#32903 ) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-20 21:02:17 +01:00
Anton Vlasjuk	c63a3d0f17	Fix: Mamba2 `norm_before_gate` usage (#32686 ) * mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only	2024-08-20 19:47:34 +02:00
Gal Cohen (galco)	01c4fc455b	fix: jamba cache fails to use torch.nn.module (#32894 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 14:50:13 +02:00
Arthur	65f4bc99f9	Fix repr for conv (#32897 ) add nx	2024-08-20 14:34:24 +02:00
Marc Sun	fd06ad5438	🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627 ) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci	2024-08-20 11:42:36 +02:00
Arthur	13e645bb40	Allow-head-dim (#32857 ) * support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara <suhara@users.noreply.github.com>> * update Co-authored-by: bzantium <bzantium@users.noreply.github.com> * Co-authored-by: suhara <suhara@users.noreply.github.com> * Update Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com> --------- Co-authored-by: bzantium <bzantium@users.noreply.github.com> Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>	2024-08-20 10:24:48 +02:00
Matt	85345bb439	Add tip to clarify tool calling (#32883 )	2024-08-19 18:37:35 +01:00
Sai-Suraj-27	37204848f1	Docs: Fixed `whisper-large-v2` model link in docs (#32871 ) Fixed whisper-large-v2 model link in docs.	2024-08-19 09:50:35 -07:00
Anton Vlasjuk	61d89c19d8	Fix: Mamba2 generation mismatch between input_ids and inputs_embeds (#32694 ) * fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass	2024-08-19 16:06:07 +02:00
Younes Belkada	93e538ae2e	Mamba / FalconMamba: Fix mamba left padding (#32677 ) * fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-19 16:01:35 +02:00
Isotr0py	59e8f1919c	Fix incorrect vocab size retrieval in GGUF config (#32551 ) * fix gguf config vocab size * minor fix * link issue	2024-08-19 15:53:54 +02:00
Alan-Blanchet	5f6c080b62	RT-DETR parameterized batchnorm freezing (#32631 ) * fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-08-19 14:50:57 +01:00
Yitong Huang	8a4857c0db	Support save/load ckpt for XLA FSDP (#32311 ) * Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by: Zach Mueller <muellerzr@gmail.com> * add is_fsdp_xla_v1_enabled --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-08-19 15:44:21 +02:00
Aaron Chung	f1b720ed62	Add __repr__ for Conv1D (#32425 ) * Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.	2024-08-19 15:26:19 +02:00
Fanli Lin	e55b33ceb4	[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519 ) * enable * fix	2024-08-19 12:46:59 +01:00
Ita Zaporozhets	54b7703682	support torch-speech (#32537 )	2024-08-19 11:26:35 +02:00
Kamil Akesbi	8260cb311e	Add Descript-Audio-Codec model (#31494 ) * dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-08-19 10:21:51 +01:00
MAHIR DAIYAN	843e5e20ca	Add Flax Dinov2 (#31960 ) * tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-08-19 09:28:13 +01:00
Joao Gante	52cb4034ad	generate: missing `to` in DoLa body, causing exceptions in multi-gpu generation (#32856 )	2024-08-17 16:37:00 +01:00
Alex Calderwood	6806d33567	Make beam_constraints.Constraint.advance() docstring more accurate (#32674 ) * Fix beam_constraints.Constraint.advance() docstring * Update src/transformers/generation/beam_constraints.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-08-16 19:36:55 +01:00
Zach Mueller	8ec028aded	Reduce the error log when using core models that need their weights renamed, and provide a step forward (#32656 ) * Fin * Modify msg * Finish up nits	2024-08-16 13:05:57 -04:00
Marc Sun	1c36db697a	fix multi-gpu with static cache (#32543 )	2024-08-16 19:02:37 +02:00
Zach Mueller	0b066bed14	Revert PR 32299, flag users when Zero-3 was missed (#32851 ) Revert PR 32299	2024-08-16 12:35:41 -04:00
Zhan Rongrui	f20d0e81ea	improve _get_is_as_tensor_fns (#32596 ) * improve _get_is_as_tensor_fns * format	2024-08-16 15:59:44 +01:00
Yangshen⚡Deng	a27182b7fc	Fix AutoConfig and AutoModel support for Llava-Next-Video (#32844 ) * Fix: fix all model_type of Llava-Next-Video to llava_next_video * Fix doc for llava_next_video * * Fix formatting issues * Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation * Fix docs TOC for llava-next-video	2024-08-16 12:41:05 +01:00
Joao Gante	cf32ee1753	Cache: use `batch_size` instead of `max_batch_size` (#32657 ) * more precise name * better docstrings * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-16 11:48:45 +01:00
Fanli Lin	8f9fa3b081	[tests] make test_sdpa_equivalence device-agnostic (#32520 ) * fix on xpu * [run_all]	2024-08-16 11:34:13 +01:00
Joao Gante	70d5df6107	Generate: unify `LogitsWarper` and `LogitsProcessor` (#32626 )	2024-08-16 11:20:41 +01:00
Ao Tang	5fd7ca7bc9	Use head_dim if in config for RoPE (#32495 ) * use head_dim if in config for RoPE * typo * simplify with getattr	2024-08-16 11:37:43 +02:00
Arthur	c215523528	add back the position ids (#32554 ) * add back the position ids * fix failing test	2024-08-16 11:00:05 +02:00
Raushan Turganbay	f3c8b18053	VLMs: small clean-up for cache class (#32417 ) * fix beam search in video llava * [run-slow] video_llava	2024-08-16 09:07:05 +05:00
muddlebee	d6751d91c8	fix: update doc link for runhouse in README.md (#32664 )	2024-08-15 20:00:55 +01:00
Sai-Suraj-27	ab7e893d09	fix: Corrected `falcon-mamba-7b` model checkpoint name (#32837 ) Corrected the model checkpoint.	2024-08-15 18:03:18 +01:00
jp	e840127370	reopen: llava-next fails to consider padding_side during Training (#32679 ) restore #32386	2024-08-15 11:44:19 +01:00
Sai-Suraj-27	8820fe8b8c	Updated workflows to the latest versions (#32405 ) Updated few workflows to the latest versions.	2024-08-14 20:18:14 +02:00
Zach Mueller	0cea2081a3	Unpin deepspeed in Docker image/tests (#32572 ) Unpin deepspeed	2024-08-14 18:30:25 +01:00
Sai-Suraj-27	95a77819db	fix: Fixed unknown pytest config option `doctest_glob` (#32475 ) Fixed unknown config option doctest_glob.	2024-08-14 18:30:01 +01:00
Dina Suehiro Jones	6577c77d93	Update the distributed CPU training on Kubernetes documentation (#32669 ) * Update the Kubernetes CPU training example * Add namespace arg Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com> --------- Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>	2024-08-14 09:36:43 -07:00
Yih-Dar	20a04497a8	Fix `JetMoeIntegrationTest` (#32332 ) JetMoeIntegrationTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-14 16:22:06 +02:00
Jerry Zhang	78d78cdf8a	Add TorchAOHfQuantizer (#32306 ) * Add TorchAOHfQuantizer Summary: Enable loading torchao quantized model in huggingface. Test Plan: local test Reviewers: Subscribers: Tasks: Tags: * Fix a few issues * style * Added tests and addressed some comments about dtype conversion * fix torch_dtype warning message * fix tests * style * TorchAOConfig -> TorchAoConfig * enable offload + fix memory with multi-gpu * update torchao version requirement to 0.4.0 * better comments * add torch.compile to torchao README, add perf number link --------- Co-authored-by: Marc Sun <marc@huggingface.co>	2024-08-14 16:14:24 +02:00
Steven Liu	9485289f37	Update translation docs review (#32662 ) update list of people to tag	2024-08-14 13:57:07 +02:00

1 2 3 4 5 ...

16647 Commits