transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-25 15:28:59 +06:00

Author	SHA1	Message	Date
Vaibhav Srivastav	6f0ecf1049	[docs] add quick usage snippet to Whisper. (#31289 ) * [docs] add quick usage snippet to Whisper. * Apply suggestions from review. * 💉 Fix the device for pipeline.	2024-08-27 14:11:52 +02:00
Boris Feld	892d51caee	Log additional test metrics with the CometCallback (#33124 ) * Log additional test metrics with the CometCallback. Also follow the same metric naming convention as other callbacks * Merge 2 subsequent if-statements * Trigger Build --------- Co-authored-by: Aliaksandr Kuzmik <alexander.kuzmik99@gmail.com>	2024-08-27 13:40:53 +02:00
dependabot[bot]	746e1148cf	Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/jax-projects/hybrid_clip (#33137 ) Bump torch in /examples/research_projects/jax-projects/hybrid_clip Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-27 13:33:37 +02:00
Joao Gante	ab0ac3b98f	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 ) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 11:58:27 +01:00
Yih-Dar	3806faa171	disable scheduled daily CI temporarily (#33136 ) disable scheduled daily CI temporary Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-27 11:52:15 +02:00
Aya	7562366d4b	fix: multilingual midel convert to tflite get wrong token (#32079 ) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>	2024-08-27 11:44:09 +02:00
Sai-Suraj-27	3bf6dd8aa1	fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850 ) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen	2024-08-27 09:20:59 +02:00
Zach Mueller	9578c2597e	Fixup py 38 type hints for mps friendly (#33128 ) Fixup py 38	2024-08-26 12:27:39 -04:00
Pablo Montalvo	26f043bd4d	quickfix documentation (#32566 ) * fix documentation * update config	2024-08-26 17:49:44 +02:00
Sai-Suraj-27	3562772969	fix: Fixed `pydantic` required version in dockerfiles to make it compatible with DeepSpeed (#33105 ) Fixed pydantic required version in dockerfiles.	2024-08-26 17:10:36 +02:00
Ritik Nandwal	a378a54a57	Add changes for uroman package to handle non-Roman characters (#32404 ) * Add changes for uroman package to handle non-Roman characters * Update docs for uroman changes * Modifying error message to warning, for backward compatibility * Update instruction for user to install uroman * Update docs for uroman python version dependency and backward compatibility * Update warning message for python version compatibility with uroman * Refine docs	2024-08-26 17:07:01 +02:00
Joao Gante	72d4a3f9c1	mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#33099 )	2024-08-26 15:34:19 +01:00
Joao Gante	894d421ee5	Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#33093 )	2024-08-26 15:23:30 +01:00
Joao Gante	93e0e1a852	CI: add torchvision to the consistency image (#32941 )	2024-08-26 15:17:45 +01:00
Shijie	19e6e80e10	support qwen2-vl (#32318 ) * support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>	2024-08-26 15:16:44 +02:00
S M Jishanul Islam	8defc95df3	Updated the custom_models.md changed cross_entropy code (#33118 )	2024-08-26 13:15:43 +02:00
Matt	0a7af19f4d	Update Jinja docs with new functions and general cleanup (#33097 )	2024-08-23 17:40:06 +01:00
Arun Prakash A	e3a5f35cd5	added doctring to SchedulerType class (#32898 ) * added doctring to SchedulerType class * Remove trailing whitespace src/transformers/trainer_utils.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fixup --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-08-23 09:15:25 -07:00
Donggeun Yu	1dbd9d3693	DeviceGuard added to use Deformable Attention more safely on multi-GPU (#32910 ) * Update modeling_deformable_detr.py * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update ms_deform_attn_cuda.cu * Update modeling_deformable_detr.py * Update modeling_deformable_detr.py * [empty] this is a empty commit --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 17:12:10 +01:00
Matt	371b9c1486	Enable some Jinja extensions and add datetime capabilities (#32684 ) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils	2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Stefano Fiorucci	d806fa3e92	docs: fix outdated link to TF32 explanation (#32947 ) fix outdated link	2024-08-22 13:28:00 -07:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Jinuk	09e6579d2d	🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334 ) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>	2024-08-22 10:42:39 -07:00
Franz Louis Cesista	273c0afc8f	Fix regression on `Processor.save_pretrained` caused by #31691 (#32921 ) fix save_pretrained	2024-08-22 18:42:44 +02:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00
Joao Gante	975b988bfe	Gemma2: eager attention by default (#32865 )	2024-08-22 15:59:30 +01:00
Shaopeng Fu	f1d822ba33	fix: (issue #32689 ) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849 ) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.	2024-08-22 16:42:00 +02:00
Isotr0py	ee8c01f839	Add chat_template for tokenizer extracted from GGUF model (#32908 ) * add chat_template to gguf tokenizer * add template through tokenizer config	2024-08-22 16:41:25 +02:00
regisss	99d67f1a09	Improve greedy search memory usage (#32895 ) Do not call torch.repeat_interleave if expand_size is 1	2024-08-22 15:37:44 +01:00
Yih-Dar	bf97d4aa6d	Fix benchmark script (#32635 ) * fix * >= 0.3.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-22 16:07:47 +02:00
Shubham Ugare	9282413611	Add SynCode to llm_tutorial (#32884 )	2024-08-22 15:30:22 +02:00
Younes Belkada	eeea71209a	FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469 ) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-08-22 15:29:21 +02:00
Joao Gante	8b94d28f97	CI: separate step to download nltk files (#32935 ) * separate step to download nltk files * duplicated * rm comma	2024-08-22 14:17:24 +01:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Gal Cohen (galco)	6baa6f276a	fix: no need to dtype A in jamba (#32924 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 15:03:22 +02:00
Sai-Suraj-27	af638c4afe	fix: Added missing `huggingface_hub` installation to workflows (#32891 ) Added missing huggingface_hub installation to workflows.	2024-08-22 12:51:12 +01:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Arthur	3bb7b05229	Update docker image building (#32918 ) commit	2024-08-21 21:23:10 +02:00
Ruilin Huang	c6d484e38c	fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296 ) [whisper] don't overwrite return_timestamps when not passed to generate	2024-08-21 20:21:27 +01:00
Ahmed Almaghz	87134662f7	[i18n-ar] add README_ar.md to README.md (#32583 ) * Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md	2024-08-20 16:11:54 -07:00
Nicholas Broad	1dde50c7d2	link for optimizer names (#32400 ) * link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup	2024-08-20 15:28:24 -07:00
Pavel Iakubovskii	078d5a88cd	Replace `tensor.norm()` with decomposed version for CLIP executorch export (#32887 ) * Replace .norm() with decomposed version for executorch export * [run_slow] clip	2024-08-20 21:27:21 +01:00
dependabot[bot]	9800e6d170	Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer (#32903 ) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-20 21:02:17 +01:00
Anton Vlasjuk	c63a3d0f17	Fix: Mamba2 `norm_before_gate` usage (#32686 ) * mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only	2024-08-20 19:47:34 +02:00
Gal Cohen (galco)	01c4fc455b	fix: jamba cache fails to use torch.nn.module (#32894 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 14:50:13 +02:00
Arthur	65f4bc99f9	Fix repr for conv (#32897 ) add nx	2024-08-20 14:34:24 +02:00
Marc Sun	fd06ad5438	🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627 ) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci	2024-08-20 11:42:36 +02:00

... 53 54 55 56 57 ...

19383 Commits