Vaibhav Srivastav
6f0ecf1049
[docs] add quick usage snippet to Whisper. ( #31289 )
...
* [docs] add quick usage snippet to Whisper.
* Apply suggestions from review.
* 💉 Fix the device for pipeline.
2024-08-27 14:11:52 +02:00
Boris Feld
892d51caee
Log additional test metrics with the CometCallback ( #33124 )
...
* Log additional test metrics with the CometCallback.
Also follow the same metric naming convention as other callbacks
* Merge 2 subsequent if-statements
* Trigger Build
---------
Co-authored-by: Aliaksandr Kuzmik <alexander.kuzmik99@gmail.com>
2024-08-27 13:40:53 +02:00
dependabot[bot]
746e1148cf
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/jax-projects/hybrid_clip ( #33137 )
...
Bump torch in /examples/research_projects/jax-projects/hybrid_clip
Bumps [torch](https://github.com/pytorch/pytorch ) from 1.13.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases )
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md )
- [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0 )
---
updated-dependencies:
- dependency-name: torch
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-27 13:33:37 +02:00
Joao Gante
ab0ac3b98f
CI: fix efficientnet
pipeline timeout and prevent future similar issues due to large image size ( #33123 )
...
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 11:58:27 +01:00
Yih-Dar
3806faa171
disable scheduled daily CI temporarily ( #33136 )
...
disable scheduled daily CI temporary
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-27 11:52:15 +02:00
Aya
7562366d4b
fix: multilingual midel convert to tflite get wrong token ( #32079 )
...
* fix: multilingual midel convert to tflite get wrong token
* fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min
---------
Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com>
Co-authored-by: Aya <[kent831217@gmail.com ]>
2024-08-27 11:44:09 +02:00
Sai-Suraj-27
3bf6dd8aa1
fix: Fixed CodeGenTokenizationTest::test_truncation failing test ( #32850 )
...
* Fixed failing CodeGenTokenizationTest::test_truncation.
* [run_slow] Codegen
* [run_slow] codegen
2024-08-27 09:20:59 +02:00
Zach Mueller
9578c2597e
Fixup py 38 type hints for mps friendly ( #33128 )
...
Fixup py 38
2024-08-26 12:27:39 -04:00
Pablo Montalvo
26f043bd4d
quickfix documentation ( #32566 )
...
* fix documentation
* update config
2024-08-26 17:49:44 +02:00
Sai-Suraj-27
3562772969
fix: Fixed pydantic
required version in dockerfiles to make it compatible with DeepSpeed ( #33105 )
...
Fixed pydantic required version in dockerfiles.
2024-08-26 17:10:36 +02:00
Ritik Nandwal
a378a54a57
Add changes for uroman package to handle non-Roman characters ( #32404 )
...
* Add changes for uroman package to handle non-Roman characters
* Update docs for uroman changes
* Modifying error message to warning, for backward compatibility
* Update instruction for user to install uroman
* Update docs for uroman python version dependency and backward compatibility
* Update warning message for python version compatibility with uroman
* Refine docs
2024-08-26 17:07:01 +02:00
Joao Gante
72d4a3f9c1
mps: add isin_mps_friendly
, a wrapper function for torch.isin
( #33099 )
2024-08-26 15:34:19 +01:00
Joao Gante
894d421ee5
Test: add higher atol
in test_forward_with_num_logits_to_keep
( #33093 )
2024-08-26 15:23:30 +01:00
Joao Gante
93e0e1a852
CI: add torchvision to the consistency image ( #32941 )
2024-08-26 15:17:45 +01:00
Shijie
19e6e80e10
support qwen2-vl ( #32318 )
...
* support-qwen2-vl
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* hyphen->underscore
* make style
* add-flash2-tipd
* delete-tokenize=False
* remove-image_processor-in-init-file
* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES
* format-doct
* support-Qwen2VLVisionConfig
* remove-standardize_cache_format
* fix-letter-varaibles
* remove-torch-in-image-processor
* remove-useless-docstring
* fix-one-letter-varaible-name
* change-block-name
* default-quick-gelu-in-vision
* remove-useless-doc
* use-preimplemented-flash-forward
* fix-doc
* fix-image-processing-doc
* fix-apply-rotary-embed
* fix-flash-attn-sliding-window
* refactor
* remove-default_template
* remove-reorder_cache
* simple-get-rope_deltas
* update-prepare_inputs_for_generation
* update-attention-mask
* update-rotary_seq_len
* remove-state
* kv_seq_length
* remove-warning
* _supports_static_cache
* remove-legacy-cache
* refactor
* fix-replace
* mrope-section-doc
* code-quality
* code-quality
* polish-doc
* fix-image-processing-test
* update readme
* Update qwen2_vl.md
* fix-test
* Update qwen2_vl.md
* nit
* processor-kwargs
* hard-code-norm_layer
* code-quality
* discard-pixel-values-in-gen
* fix-inconsistent-error-msg
* unify-image-video
* hidden_act
* add-docstring
* vision-encode-as-PreTrainedModel
* pixel-to-target-dtype
* update doc and low memoryvit
* format
* format
* channel-foramt
* fix vit_flashatt
* format
* inherit-Qwen2VLPreTrainedModel
* simplify
* format-test
* remove-one-line-func-in-image-processing
* avoid-one-line-reshape
* simplify-rotary_seq_len
* avoid-single-letter-variable
* no-for-loop-sdpa
* avoid-single-letter-variable
* remove-one-line-reshape
* remove-one-line-reshape
* remove-no-rope-in-vit-logic
* default-mrope
* add-copied-from
* more-docs-for-mrope
* polish-doc
* comment-and-link
* polish-doc
* single-letter-variables
* simplify-image-processing
* video->images
* kv_seq_len-update
* vision-rope-on-the-fly
* vision-eager-attention
* change-processor-order
---------
Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
S M Jishanul Islam
8defc95df3
Updated the custom_models.md changed cross_entropy code ( #33118 )
2024-08-26 13:15:43 +02:00
Matt
0a7af19f4d
Update Jinja docs with new functions and general cleanup ( #33097 )
2024-08-23 17:40:06 +01:00
Arun Prakash A
e3a5f35cd5
added doctring to SchedulerType class ( #32898 )
...
* added doctring to SchedulerType class
* Remove trailing whitespace src/transformers/trainer_utils.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fixup
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-23 09:15:25 -07:00
Donggeun Yu
1dbd9d3693
DeviceGuard added to use Deformable Attention more safely on multi-GPU ( #32910 )
...
* Update modeling_deformable_detr.py
* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update ms_deform_attn_cuda.cu
* Update modeling_deformable_detr.py
* Update modeling_deformable_detr.py
* [empty] this is a empty commit
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 17:12:10 +01:00
Matt
371b9c1486
Enable some Jinja extensions and add datetime capabilities ( #32684 )
...
* Add new Jinja features:
- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format
* Add new Jinja features:
- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format
* Fix strftime template
* Add template strip() just to be safe
* Remove the do extension to make porting easier, and also because it's the least useful
* Rename test
* strftime -> strftime_now
* Split test
* Update test to use strftime_now
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu
adb91179b9
Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer ( #32860 )
...
* add liger integration
* fix syntax
* fix import issue
* add trainer.md
* Use _apply_liger_kernel()
* Fixed log message
* Update docs/source/en/trainer.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/trainer.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
* Update docs/source/en/trainer.md
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
* Fixed checkstyle and updated readme
* Added test
* Fixed checkstyle
* fix docstring
* rename use_liger to use_liger_kernel
* Trigger Build
* Added test
* add fix-copies
* Fixed copy inconsistencies
---------
Co-authored-by: shimizust <sshimizu@linkedin.com>
Co-authored-by: Steven Shimizu <shimizust@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-08-23 13:20:49 +02:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig
from saving generate
parameters; Update deprecations in generate
-related code 🧹 ( #32659 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 11:12:53 +01:00
Cyril Vallez
22e6f14525
Reducing memory usage: removing useless logits computation in generate() ( #31292 )
...
* Add .float() in all generation methods logit outputs
* Switch float-casting of logits to training only for main models
* Add `num_logits_to_keep` in Llama and add it by default in generate
* Apply style
* Add num_logits_to_keep as arg in prepare_input_for_generation
* Add support for Mistral
* Revert models except llama and mistral
* Fix default None value in _supports_num_logits_to_keep()
* Fix dimension of dummy input
* Add exception for prophetnet in _supports_num_logits_to_keep()
* Update _supports_num_logits_to_keep() to use inspect.signature()
* Add deprecation cycle + remove modification with pretraining_tp
* Apply style
* Add most used models
* Apply style
* Make `num_logits_to_keep` an int in all cases to remove if-else clause
* Add compile check for the warning
* Fix torch versions
* style
* Add gemma2
* Update warning version
* Add comment about .float operations in generation utils
* Add tests in GenerationTesterMixin and ModelTesterMixin
* Fix batch size for assisted decoding in tests
* fix small issues in test
* refacor test
* fix slicing removing dim issue
* Add nemotron support (should fix check-copy issue in CIs)
* Trigger new CIs
* Trigger new CIs
* Bump version
* Bump version in TODO
* Trigger CIs
* remove blank space
* Trigger CIs
2024-08-23 11:08:34 +01:00
Stefano Fiorucci
d806fa3e92
docs: fix outdated link to TF32 explanation ( #32947 )
...
fix outdated link
2024-08-22 13:28:00 -07:00
Joao Gante
a26de15139
Generate: Deprecate returning legacy cache by default; Handle use_cache=False
( #32863 )
2024-08-22 20:01:52 +01:00
Jinuk
09e6579d2d
🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" ( #32334 )
...
* docs: ko: tasks/knowledge_distillation_for_image_classification.md
* feat: nmt draft
* fix: manual edits
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
---------
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-08-22 10:42:39 -07:00
Franz Louis Cesista
273c0afc8f
Fix regression on Processor.save_pretrained
caused by #31691 ( #32921 )
...
fix save_pretrained
2024-08-22 18:42:44 +02:00
Andrés Marafioti
18199b34e5
[run_slow] idefics2 ( #32840 )
2024-08-22 18:08:03 +02:00
Joao Gante
975b988bfe
Gemma2: eager attention by default ( #32865 )
2024-08-22 15:59:30 +01:00
Shaopeng Fu
f1d822ba33
fix: (issue #32689 ) AttributeError
raised when using Trainer
with eval_on_start=True
in Jupyter Notebook. ( #32849 )
...
fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.
2024-08-22 16:42:00 +02:00
Isotr0py
ee8c01f839
Add chat_template for tokenizer extracted from GGUF model ( #32908 )
...
* add chat_template to gguf tokenizer
* add template through tokenizer config
2024-08-22 16:41:25 +02:00
regisss
99d67f1a09
Improve greedy search memory usage ( #32895 )
...
Do not call torch.repeat_interleave if expand_size is 1
2024-08-22 15:37:44 +01:00
Yih-Dar
bf97d4aa6d
Fix benchmark script ( #32635 )
...
* fix
* >= 0.3.0
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-22 16:07:47 +02:00
Shubham Ugare
9282413611
Add SynCode to llm_tutorial ( #32884 )
2024-08-22 15:30:22 +02:00
Younes Belkada
eeea71209a
FIX / Hub: Also catch for exceptions.ConnectionError
( #31469 )
...
* Update hub.py
* Update errors
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-08-22 15:29:21 +02:00
Joao Gante
8b94d28f97
CI: separate step to download nltk files ( #32935 )
...
* separate step to download nltk files
* duplicated
* rm comma
2024-08-22 14:17:24 +01:00
Marc Sun
c42d264549
FEAT / Trainer: Add adamw 4bit optimizer ( #31865 )
...
* add 4bit optimizer
* style
* fix msg
* style
* add qgalore
* Revert "add qgalore"
This reverts commit 25278e805f
.
* style
* version check
2024-08-22 15:07:09 +02:00
Gal Cohen (galco)
6baa6f276a
fix: no need to dtype A in jamba ( #32924 )
...
Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-22 15:03:22 +02:00
Sai-Suraj-27
af638c4afe
fix: Added missing huggingface_hub
installation to workflows ( #32891 )
...
Added missing huggingface_hub installation to workflows.
2024-08-22 12:51:12 +01:00
Joao Gante
f6e2586a36
Jamba: update integration tests ( #32250 )
...
* try test updates
* a few more changes
* a few more changes
* a few more changes
* [run slow] jamba
* skip logits checks on older gpus
* [run slow] jamba
* oops
* [run slow] jamba
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-22 11:46:10 +01:00
Arthur
3bb7b05229
Update docker image building ( #32918 )
...
commit
2024-08-21 21:23:10 +02:00
Ruilin Huang
c6d484e38c
fix: [whisper] don't overwrite GenerationConfig's return_timestamps
when return_timestamps
is not passed to generate
function ( #31296 )
...
[whisper] don't overwrite return_timestamps when not passed to generate
2024-08-21 20:21:27 +01:00
Ahmed Almaghz
87134662f7
[i18n-ar] add README_ar.md to README.md ( #32583 )
...
* Update README.md
* Update README.md
* Add README_ar.md to i18n/README_de.md
* Add README_ar.md to i18n/README_es.md
* Add README_ar.md to i18n/README_fr.md
* Add README_ar.md to i18n/README_hd.md
* Add README_ar.md to i18n/README_ja.md
* Add README_ar.md to i18n/README_ko.md
* Add README_ar.md to i18n/README_pt-br.md
* Add README_ar.md to i18n/README_ru.md
* Add README_ar.md to i18n/README_te.md
* Add README_ar.md to i18n/README_vi.md
* Add README_ar.md to i18n/README_vi.md
* Add README_ar.md to i18n/README_zh-hans.md
* Add README_ar.md to i18n/README_zh-hant.md
* Create README_ar.md
2024-08-20 16:11:54 -07:00
Nicholas Broad
1dde50c7d2
link for optimizer names ( #32400 )
...
* link for optimizer names
Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring.
* make fixup
2024-08-20 15:28:24 -07:00
Pavel Iakubovskii
078d5a88cd
Replace tensor.norm()
with decomposed version for CLIP executorch export ( #32887 )
...
* Replace .norm() with decomposed version for executorch export
* [run_slow] clip
2024-08-20 21:27:21 +01:00
dependabot[bot]
9800e6d170
Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer ( #32903 )
...
Bump nltk in /examples/research_projects/decision_transformer
Bumps [nltk](https://github.com/nltk/nltk ) from 3.7 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog )
- [Commits](https://github.com/nltk/nltk/compare/3.7...3.9 )
---
updated-dependencies:
- dependency-name: nltk
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-20 21:02:17 +01:00
Anton Vlasjuk
c63a3d0f17
Fix: Mamba2 norm_before_gate
usage ( #32686 )
...
* mamba2 uses norm_before_gate=False
* small nit
* remove norm_before_gate flag and follow False path only
2024-08-20 19:47:34 +02:00
Gal Cohen (galco)
01c4fc455b
fix: jamba cache fails to use torch.nn.module ( #32894 )
...
Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-20 14:50:13 +02:00
Arthur
65f4bc99f9
Fix repr for conv ( #32897 )
...
add nx
2024-08-20 14:34:24 +02:00
Marc Sun
fd06ad5438
🚨 🚨 🚨 Update min version of accelerate to 0.26.0 ( #32627 )
...
* Update min version of accelerate to 0.26.0
* dev-ci
* update min version in import
* remove useless check
* dev-ci
* style
* dev-ci
* dev-ci
2024-08-20 11:42:36 +02:00