Jinuk
09e6579d2d
🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" ( #32334 )
...
* docs: ko: tasks/knowledge_distillation_for_image_classification.md
* feat: nmt draft
* fix: manual edits
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
---------
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-08-22 10:42:39 -07:00
Franz Louis Cesista
273c0afc8f
Fix regression on Processor.save_pretrained
caused by #31691 ( #32921 )
...
fix save_pretrained
2024-08-22 18:42:44 +02:00
Andrés Marafioti
18199b34e5
[run_slow] idefics2 ( #32840 )
2024-08-22 18:08:03 +02:00
Joao Gante
975b988bfe
Gemma2: eager attention by default ( #32865 )
2024-08-22 15:59:30 +01:00
Shaopeng Fu
f1d822ba33
fix: (issue #32689 ) AttributeError
raised when using Trainer
with eval_on_start=True
in Jupyter Notebook. ( #32849 )
...
fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.
2024-08-22 16:42:00 +02:00
Isotr0py
ee8c01f839
Add chat_template for tokenizer extracted from GGUF model ( #32908 )
...
* add chat_template to gguf tokenizer
* add template through tokenizer config
2024-08-22 16:41:25 +02:00
regisss
99d67f1a09
Improve greedy search memory usage ( #32895 )
...
Do not call torch.repeat_interleave if expand_size is 1
2024-08-22 15:37:44 +01:00
Yih-Dar
bf97d4aa6d
Fix benchmark script ( #32635 )
...
* fix
* >= 0.3.0
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-22 16:07:47 +02:00
Shubham Ugare
9282413611
Add SynCode to llm_tutorial ( #32884 )
2024-08-22 15:30:22 +02:00
Younes Belkada
eeea71209a
FIX / Hub: Also catch for exceptions.ConnectionError
( #31469 )
...
* Update hub.py
* Update errors
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-08-22 15:29:21 +02:00
Joao Gante
8b94d28f97
CI: separate step to download nltk files ( #32935 )
...
* separate step to download nltk files
* duplicated
* rm comma
2024-08-22 14:17:24 +01:00
Marc Sun
c42d264549
FEAT / Trainer: Add adamw 4bit optimizer ( #31865 )
...
* add 4bit optimizer
* style
* fix msg
* style
* add qgalore
* Revert "add qgalore"
This reverts commit 25278e805f
.
* style
* version check
2024-08-22 15:07:09 +02:00
Gal Cohen (galco)
6baa6f276a
fix: no need to dtype A in jamba ( #32924 )
...
Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-22 15:03:22 +02:00
Sai-Suraj-27
af638c4afe
fix: Added missing huggingface_hub
installation to workflows ( #32891 )
...
Added missing huggingface_hub installation to workflows.
2024-08-22 12:51:12 +01:00
Joao Gante
f6e2586a36
Jamba: update integration tests ( #32250 )
...
* try test updates
* a few more changes
* a few more changes
* a few more changes
* [run slow] jamba
* skip logits checks on older gpus
* [run slow] jamba
* oops
* [run slow] jamba
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-22 11:46:10 +01:00
Arthur
3bb7b05229
Update docker image building ( #32918 )
...
commit
2024-08-21 21:23:10 +02:00
Ruilin Huang
c6d484e38c
fix: [whisper] don't overwrite GenerationConfig's return_timestamps
when return_timestamps
is not passed to generate
function ( #31296 )
...
[whisper] don't overwrite return_timestamps when not passed to generate
2024-08-21 20:21:27 +01:00
Ahmed Almaghz
87134662f7
[i18n-ar] add README_ar.md to README.md ( #32583 )
...
* Update README.md
* Update README.md
* Add README_ar.md to i18n/README_de.md
* Add README_ar.md to i18n/README_es.md
* Add README_ar.md to i18n/README_fr.md
* Add README_ar.md to i18n/README_hd.md
* Add README_ar.md to i18n/README_ja.md
* Add README_ar.md to i18n/README_ko.md
* Add README_ar.md to i18n/README_pt-br.md
* Add README_ar.md to i18n/README_ru.md
* Add README_ar.md to i18n/README_te.md
* Add README_ar.md to i18n/README_vi.md
* Add README_ar.md to i18n/README_vi.md
* Add README_ar.md to i18n/README_zh-hans.md
* Add README_ar.md to i18n/README_zh-hant.md
* Create README_ar.md
2024-08-20 16:11:54 -07:00
Nicholas Broad
1dde50c7d2
link for optimizer names ( #32400 )
...
* link for optimizer names
Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring.
* make fixup
2024-08-20 15:28:24 -07:00
Pavel Iakubovskii
078d5a88cd
Replace tensor.norm()
with decomposed version for CLIP executorch export ( #32887 )
...
* Replace .norm() with decomposed version for executorch export
* [run_slow] clip
2024-08-20 21:27:21 +01:00
dependabot[bot]
9800e6d170
Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer ( #32903 )
...
Bump nltk in /examples/research_projects/decision_transformer
Bumps [nltk](https://github.com/nltk/nltk ) from 3.7 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog )
- [Commits](https://github.com/nltk/nltk/compare/3.7...3.9 )
---
updated-dependencies:
- dependency-name: nltk
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-20 21:02:17 +01:00
Anton Vlasjuk
c63a3d0f17
Fix: Mamba2 norm_before_gate
usage ( #32686 )
...
* mamba2 uses norm_before_gate=False
* small nit
* remove norm_before_gate flag and follow False path only
2024-08-20 19:47:34 +02:00
Gal Cohen (galco)
01c4fc455b
fix: jamba cache fails to use torch.nn.module ( #32894 )
...
Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-20 14:50:13 +02:00
Arthur
65f4bc99f9
Fix repr for conv ( #32897 )
...
add nx
2024-08-20 14:34:24 +02:00
Marc Sun
fd06ad5438
🚨 🚨 🚨 Update min version of accelerate to 0.26.0 ( #32627 )
...
* Update min version of accelerate to 0.26.0
* dev-ci
* update min version in import
* remove useless check
* dev-ci
* style
* dev-ci
* dev-ci
2024-08-20 11:42:36 +02:00
Arthur
13e645bb40
Allow-head-dim ( #32857 )
...
* support head dim
* fix the doc
* fixup
* add oproj
Co-authored-by: Suhara
<suhara@users.noreply.github.com>>
* update
Co-authored-by: bzantium <bzantium@users.noreply.github.com>
* Co-authored-by: suhara <suhara@users.noreply.github.com>
* Update
Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>
---------
Co-authored-by: bzantium <bzantium@users.noreply.github.com>
Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>
2024-08-20 10:24:48 +02:00
Matt
85345bb439
Add tip to clarify tool calling ( #32883 )
2024-08-19 18:37:35 +01:00
Sai-Suraj-27
37204848f1
Docs: Fixed whisper-large-v2
model link in docs ( #32871 )
...
Fixed whisper-large-v2 model link in docs.
2024-08-19 09:50:35 -07:00
Anton Vlasjuk
61d89c19d8
Fix: Mamba2 generation mismatch between input_ids and inputs_embeds ( #32694 )
...
* fix cache when using input embeddings
* simplify check, we can always add input ids seq len since its 0 in first pass
2024-08-19 16:06:07 +02:00
Younes Belkada
93e538ae2e
Mamba / FalconMamba: Fix mamba left padding ( #32677 )
...
* fix mamba left padding
* Apply suggestions from code review
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* fix copies
* test with `inputs_embeds`
* Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copies
* clairfy
* fix last comments
* remove
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-19 16:01:35 +02:00
Isotr0py
59e8f1919c
Fix incorrect vocab size retrieval in GGUF config ( #32551 )
...
* fix gguf config vocab size
* minor fix
* link issue
2024-08-19 15:53:54 +02:00
Alan-Blanchet
5f6c080b62
RT-DETR parameterized batchnorm freezing ( #32631 )
...
* fix: Parameterized norm freezing
For the R18 model, the authors don't freeze norms in the backbone.
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-08-19 14:50:57 +01:00
Yitong Huang
8a4857c0db
Support save/load ckpt for XLA FSDP ( #32311 )
...
* Support save/load ckpt for XLA FSDP
* Fix bug for save
* Fix style
* reserve sharded ckpt and better file naming
* minor fix
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* add is_fsdp_xla_v1_enabled
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-08-19 15:44:21 +02:00
Aaron Chung
f1b720ed62
Add __repr__ for Conv1D ( #32425 )
...
* Add representation for Conv1D, for better output info.
* code format for Conv1D
* We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.
2024-08-19 15:26:19 +02:00
Fanli Lin
e55b33ceb4
[tests] make test_sdpa_can_compile_dynamic
device-agnostic ( #32519 )
...
* enable
* fix
2024-08-19 12:46:59 +01:00
Ita Zaporozhets
54b7703682
support torch-speech ( #32537 )
2024-08-19 11:26:35 +02:00
Kamil Akesbi
8260cb311e
Add Descript-Audio-Codec model ( #31494 )
...
* dac model
* original dac works
* add dac model
* dac can be instatiated
* add forward pass
* load weights
* all weights are used
* convert checkpoint script ready
* test
* add feature extractor
* up
* make style
* apply cookicutter
* fix tests
* iterate on FeatureExtractor
* nit
* update dac doc
* replace nn.Sequential with nn.ModuleList
* nit
* apply review suggestions 1/2
* Update src/transformers/models/dac/modeling_dac.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* up
* apply review suggestions 2/2
* update padding in FeatureExtractor
* apply review suggestions
* iterate on design and tests
* add integration tests
* feature extractor tests
* make style
* all tests pass
* make style
* fixup
* apply review suggestions
* fix-copies
* apply review suggestions
* apply review suggestions
* Update docs/source/en/model_doc/dac.md
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update docs/source/en/model_doc/dac.md
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* anticipate transfer weights to descript
* up
* make style
* apply review suggestions
* update slow test values
* update slow tests
* update test values
* update with CI values
* update with vorace values
* update test with slice
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-08-19 10:21:51 +01:00
MAHIR DAIYAN
843e5e20ca
Add Flax Dinov2 ( #31960 )
...
* tfmsenv restored in main
* installed flax
* forward pass done and all tests passed
* make fix-copies and cleaning the scripts
* fixup attempt 1
* fixup attempt 2
* fixup third attempt
* fixup attempt 4
* fixup attempt 5
* dinov2 doc fixed
* FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE
* external pos_encoding layer removed
* fixup attempt 6
* fixed integration test values
* fixup attempt 7
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* comments removed
* comment removed from the test
* fixup
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* new fixes 1
* interpolate_pos_encoding function removed
* droppath rng fixed, pretrained beit copied-from still not working
* modeling_flax_dinov2.py reformatted
* Update tests/models/dinov2/test_modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* added Copied from, to the tests
* copied from statements removed from tests
* fixed copied from statements in the tests
* [run_slow] dinov2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-08-19 09:28:13 +01:00
Joao Gante
52cb4034ad
generate: missing to
in DoLa body, causing exceptions in multi-gpu generation ( #32856 )
2024-08-17 16:37:00 +01:00
Alex Calderwood
6806d33567
Make beam_constraints.Constraint.advance() docstring more accurate ( #32674 )
...
* Fix beam_constraints.Constraint.advance() docstring
* Update src/transformers/generation/beam_constraints.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-16 19:36:55 +01:00
Zach Mueller
8ec028aded
Reduce the error log when using core models that need their weights renamed, and provide a step forward ( #32656 )
...
* Fin
* Modify msg
* Finish up nits
2024-08-16 13:05:57 -04:00
Marc Sun
1c36db697a
fix multi-gpu with static cache ( #32543 )
2024-08-16 19:02:37 +02:00
Zach Mueller
0b066bed14
Revert PR 32299, flag users when Zero-3 was missed ( #32851 )
...
Revert PR 32299
2024-08-16 12:35:41 -04:00
Zhan Rongrui
f20d0e81ea
improve _get_is_as_tensor_fns ( #32596 )
...
* improve _get_is_as_tensor_fns
* format
2024-08-16 15:59:44 +01:00
Yangshen⚡Deng
a27182b7fc
Fix AutoConfig and AutoModel support for Llava-Next-Video ( #32844 )
...
* Fix: fix all model_type of Llava-Next-Video to llava_next_video
* Fix doc for llava_next_video
* * Fix formatting issues
* Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation
* Fix docs TOC for llava-next-video
2024-08-16 12:41:05 +01:00
Joao Gante
cf32ee1753
Cache: use batch_size
instead of max_batch_size
( #32657 )
...
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-16 11:48:45 +01:00
Fanli Lin
8f9fa3b081
[tests] make test_sdpa_equivalence device-agnostic ( #32520 )
...
* fix on xpu
* [run_all]
2024-08-16 11:34:13 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper
and LogitsProcessor
( #32626 )
2024-08-16 11:20:41 +01:00
Ao Tang
5fd7ca7bc9
Use head_dim if in config for RoPE ( #32495 )
...
* use head_dim if in config for RoPE
* typo
* simplify with getattr
2024-08-16 11:37:43 +02:00
Arthur
c215523528
add back the position ids ( #32554 )
...
* add back the position ids
* fix failing test
2024-08-16 11:00:05 +02:00