transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Juan Pizarro	7591ca5bc5	🚨 Add Blip2ForImageTextRetrieval (#29261 ) * add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 18:50:27 +01:00
Ali Salamatian	27903de7ec	Very small change to one of the function parameters (#32548 ) Very small change to one of the parameters np.random.randint second parameter is not included in the possible options. Therefore, we want the upper range to be 2, so that we have some 1 labels in our classification as well.	2024-08-27 09:29:05 -07:00
Sae_Chan_Oh	6101d934a1	🌐 [i18n-KO] Translated `conversations.md` to Korean (#32468 ) * docs: ko: conversations.md * feat: hand-crafted translate docs * fix: modify typo after Grammar Check * Update docs/source/ko/conversations.md 감사합니다 Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * fix: accept suggestions about anchor and spacing * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * fix: anchor 'what happened inside piepeline?' be removed question mark * fix: translate the comments in the code block --------- Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>	2024-08-27 09:25:41 -07:00
Marc Sun	7ee4363d19	update torch req for 4-bit optimizer (#33144 ) update req	2024-08-27 17:07:10 +02:00
Emin Orhan	d47a9e8ce5	fix redundant checkpointing in example training scripts (#33131 ) * fix redundant checkpointing in example scripts * Update examples/pytorch/image-classification/run_image_classification_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/translation/run_translation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/token-classification/run_ner_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/text-classification/run_glue_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/summarization/run_summarization_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_mlm_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_fim_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_clm_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/image-pretraining/run_mim_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/multiple-choice/run_swag_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/question-answering/run_qa_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/object-detection/run_object_detection_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-08-27 15:50:00 +02:00
Joao Gante	c6b23fda65	Llama: make slow tests green 🟢 (#33138 )	2024-08-27 14:44:42 +01:00
Matt	9956c2bc98	Add a fix for custom code tokenizers in pipelines (#32300 ) * Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests	2024-08-27 14:39:57 +01:00
Zizhao Chen	834ec7b1cc	fix Idefics2VisionConfig type annotation (#33103 ) * fix Idefics2VisionConfig type annotation * Update modeling_idefics2.py * Update modeling_idefics2.py add ignore copy * Update modeling_idefics2.py * Update modeling_idefics2.py	2024-08-27 14:43:28 +02:00
pedrobrs	d1f39c484d	Update stateful_callbacks state before saving checkpoint (#32115 ) * update ExportableState callbacks state before saving trainer_state on save_checkpoint * run make fixup and fix format * manage multiple stateful callbacks of same class	2024-08-27 14:33:35 +02:00
Vaibhav Srivastav	6f0ecf1049	[docs] add quick usage snippet to Whisper. (#31289 ) * [docs] add quick usage snippet to Whisper. * Apply suggestions from review. * 💉 Fix the device for pipeline.	2024-08-27 14:11:52 +02:00
Boris Feld	892d51caee	Log additional test metrics with the CometCallback (#33124 ) * Log additional test metrics with the CometCallback. Also follow the same metric naming convention as other callbacks * Merge 2 subsequent if-statements * Trigger Build --------- Co-authored-by: Aliaksandr Kuzmik <alexander.kuzmik99@gmail.com>	2024-08-27 13:40:53 +02:00
dependabot[bot]	746e1148cf	Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/jax-projects/hybrid_clip (#33137 ) Bump torch in /examples/research_projects/jax-projects/hybrid_clip Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-27 13:33:37 +02:00
Joao Gante	ab0ac3b98f	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 ) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 11:58:27 +01:00
Yih-Dar	3806faa171	disable scheduled daily CI temporarily (#33136 ) disable scheduled daily CI temporary Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-27 11:52:15 +02:00
Aya	7562366d4b	fix: multilingual midel convert to tflite get wrong token (#32079 ) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>	2024-08-27 11:44:09 +02:00
Sai-Suraj-27	3bf6dd8aa1	fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850 ) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen	2024-08-27 09:20:59 +02:00
Zach Mueller	9578c2597e	Fixup py 38 type hints for mps friendly (#33128 ) Fixup py 38	2024-08-26 12:27:39 -04:00
Pablo Montalvo	26f043bd4d	quickfix documentation (#32566 ) * fix documentation * update config	2024-08-26 17:49:44 +02:00
Sai-Suraj-27	3562772969	fix: Fixed `pydantic` required version in dockerfiles to make it compatible with DeepSpeed (#33105 ) Fixed pydantic required version in dockerfiles.	2024-08-26 17:10:36 +02:00
Ritik Nandwal	a378a54a57	Add changes for uroman package to handle non-Roman characters (#32404 ) * Add changes for uroman package to handle non-Roman characters * Update docs for uroman changes * Modifying error message to warning, for backward compatibility * Update instruction for user to install uroman * Update docs for uroman python version dependency and backward compatibility * Update warning message for python version compatibility with uroman * Refine docs	2024-08-26 17:07:01 +02:00
Joao Gante	72d4a3f9c1	mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#33099 )	2024-08-26 15:34:19 +01:00
Joao Gante	894d421ee5	Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#33093 )	2024-08-26 15:23:30 +01:00
Joao Gante	93e0e1a852	CI: add torchvision to the consistency image (#32941 )	2024-08-26 15:17:45 +01:00
Shijie	19e6e80e10	support qwen2-vl (#32318 ) * support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>	2024-08-26 15:16:44 +02:00
S M Jishanul Islam	8defc95df3	Updated the custom_models.md changed cross_entropy code (#33118 )	2024-08-26 13:15:43 +02:00
Matt	0a7af19f4d	Update Jinja docs with new functions and general cleanup (#33097 )	2024-08-23 17:40:06 +01:00
Arun Prakash A	e3a5f35cd5	added doctring to SchedulerType class (#32898 ) * added doctring to SchedulerType class * Remove trailing whitespace src/transformers/trainer_utils.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fixup --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-08-23 09:15:25 -07:00
Donggeun Yu	1dbd9d3693	DeviceGuard added to use Deformable Attention more safely on multi-GPU (#32910 ) * Update modeling_deformable_detr.py * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update ms_deform_attn_cuda.cu * Update modeling_deformable_detr.py * Update modeling_deformable_detr.py * [empty] this is a empty commit --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 17:12:10 +01:00
Matt	371b9c1486	Enable some Jinja extensions and add datetime capabilities (#32684 ) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils	2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Stefano Fiorucci	d806fa3e92	docs: fix outdated link to TF32 explanation (#32947 ) fix outdated link	2024-08-22 13:28:00 -07:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Jinuk	09e6579d2d	🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334 ) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>	2024-08-22 10:42:39 -07:00
Franz Louis Cesista	273c0afc8f	Fix regression on `Processor.save_pretrained` caused by #31691 (#32921 ) fix save_pretrained	2024-08-22 18:42:44 +02:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00
Joao Gante	975b988bfe	Gemma2: eager attention by default (#32865 )	2024-08-22 15:59:30 +01:00
Shaopeng Fu	f1d822ba33	fix: (issue #32689 ) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849 ) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.	2024-08-22 16:42:00 +02:00
Isotr0py	ee8c01f839	Add chat_template for tokenizer extracted from GGUF model (#32908 ) * add chat_template to gguf tokenizer * add template through tokenizer config	2024-08-22 16:41:25 +02:00
regisss	99d67f1a09	Improve greedy search memory usage (#32895 ) Do not call torch.repeat_interleave if expand_size is 1	2024-08-22 15:37:44 +01:00
Yih-Dar	bf97d4aa6d	Fix benchmark script (#32635 ) * fix * >= 0.3.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-22 16:07:47 +02:00
Shubham Ugare	9282413611	Add SynCode to llm_tutorial (#32884 )	2024-08-22 15:30:22 +02:00
Younes Belkada	eeea71209a	FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469 ) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-08-22 15:29:21 +02:00
Joao Gante	8b94d28f97	CI: separate step to download nltk files (#32935 ) * separate step to download nltk files * duplicated * rm comma	2024-08-22 14:17:24 +01:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Gal Cohen (galco)	6baa6f276a	fix: no need to dtype A in jamba (#32924 ) Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 15:03:22 +02:00
Sai-Suraj-27	af638c4afe	fix: Added missing `huggingface_hub` installation to workflows (#32891 ) Added missing huggingface_hub installation to workflows.	2024-08-22 12:51:12 +01:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Arthur	3bb7b05229	Update docker image building (#32918 ) commit	2024-08-21 21:23:10 +02:00

1 2 3 4 5 ...

16692 Commits