Commit Graph

16692 Commits

Author SHA1 Message Date
Juan Pizarro
7591ca5bc5
🚨 Add Blip2ForImageTextRetrieval (#29261)
* add Blip2ForImageTextRetrieval

* use one line and remove unnecessary space in tests

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use  value from the config, rather than hardcoded

* change order of params in Blip2QFormerModel.forward

* update docstring

* fix style

* update test_inference_opt

* move embeddings out of Blip2QFormerModel

* remove from_vision_qformer_configs

* remove autocast float16 in Blip2QFormerModel

* rename fiels into vision_projection,text_projection,use_image_text_matching_head

* use CLIPOutput for  Blip2ImageTextMatchingModelOutput

* remove past_key_values_length from Blip2TextEmbeddings

* fix small typo in the CLIPOutput docstring

* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping

* update docstring and add require_torch_fp16

* rollback test_inference_opt

* use use_image_text_matching_head=True in convert

* skip test_model_get_set_embeddings

* fix create_rename_keys error on new itm fields

* revert to do  scale after dot product between "query" and "key"

* fix ValueError on convert script for blip2-opt-2.7b

* update org of paths to Salesforce

* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests

* [run_slow] blip_2

* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED

* fix docstring of Blip2ImageTextMatchingModelOutput

* [run_slow] blip_2

* fix multi-gpu tests

* [run_slow] blip_2

* [run_slow] blip_2

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 18:50:27 +01:00
Ali Salamatian
27903de7ec
Very small change to one of the function parameters (#32548)
Very small change to one of the parameters

np.random.randint second parameter is not included in the possible options. Therefore, we want the upper range to be 2, so that we have some 1 labels in our classification as well.
2024-08-27 09:29:05 -07:00
Sae_Chan_Oh
6101d934a1
🌐 [i18n-KO] Translated conversations.md to Korean (#32468)
* docs: ko: conversations.md

* feat: hand-crafted translate docs

* fix: modify typo after Grammar Check

* Update docs/source/ko/conversations.md

감사합니다

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>

* fix: accept suggestions about anchor and spacing

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/conversations.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* fix: anchor 'what happened inside piepeline?' be removed question mark

* fix: translate the comments in the code block

---------

Co-authored-by: SeungAhSon <gongsoonyee@gmail.com>
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
2024-08-27 09:25:41 -07:00
Marc Sun
7ee4363d19
update torch req for 4-bit optimizer (#33144)
update req
2024-08-27 17:07:10 +02:00
Emin Orhan
d47a9e8ce5
fix redundant checkpointing in example training scripts (#33131)
* fix redundant checkpointing in example scripts

* Update examples/pytorch/image-classification/run_image_classification_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/translation/run_translation_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/token-classification/run_ner_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/text-classification/run_glue_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/summarization/run_summarization_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/language-modeling/run_mlm_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/language-modeling/run_fim_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/language-modeling/run_clm_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/image-pretraining/run_mim_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/multiple-choice/run_swag_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/question-answering/run_qa_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/object-detection/run_object_detection_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-08-27 15:50:00 +02:00
Joao Gante
c6b23fda65
Llama: make slow tests green 🟢 (#33138) 2024-08-27 14:44:42 +01:00
Matt
9956c2bc98
Add a fix for custom code tokenizers in pipelines (#32300)
* Add a fix for the case when tokenizers are passed as a string

* Support image processors and feature extractors as well

* Reverting load_feature_extractor and load_image_processor

* Add test

* Test is torch-only

* Add tests for preprocessors and feature extractors and move test

* Extremely experimental fix

* Revert that change, wrong branch!

* Typo!

* Split tests
2024-08-27 14:39:57 +01:00
Zizhao Chen
834ec7b1cc
fix Idefics2VisionConfig type annotation (#33103)
* fix Idefics2VisionConfig type annotation

* Update modeling_idefics2.py

* Update modeling_idefics2.py

add ignore copy

* Update modeling_idefics2.py

* Update modeling_idefics2.py
2024-08-27 14:43:28 +02:00
pedrobrs
d1f39c484d
Update stateful_callbacks state before saving checkpoint (#32115)
* update ExportableState callbacks state before saving trainer_state on save_checkpoint

* run make fixup and fix format

* manage multiple stateful callbacks of same class
2024-08-27 14:33:35 +02:00
Vaibhav Srivastav
6f0ecf1049
[docs] add quick usage snippet to Whisper. (#31289)
* [docs] add quick usage snippet to Whisper.

* Apply suggestions from review.

* 💉 Fix the device for pipeline.
2024-08-27 14:11:52 +02:00
Boris Feld
892d51caee
Log additional test metrics with the CometCallback (#33124)
* Log additional test metrics with the CometCallback.

Also follow the same metric naming convention as other callbacks

* Merge 2 subsequent if-statements

* Trigger Build

---------

Co-authored-by: Aliaksandr Kuzmik <alexander.kuzmik99@gmail.com>
2024-08-27 13:40:53 +02:00
dependabot[bot]
746e1148cf
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/jax-projects/hybrid_clip (#33137)
Bump torch in /examples/research_projects/jax-projects/hybrid_clip

Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-27 13:33:37 +02:00
Joao Gante
ab0ac3b98f
CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123)
* fix param not being passed in tested; add exceptions

* better source of model name

* Update utils/create_dummy_models.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 11:58:27 +01:00
Yih-Dar
3806faa171
disable scheduled daily CI temporarily (#33136)
disable scheduled daily CI temporary

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-27 11:52:15 +02:00
Aya
7562366d4b
fix: multilingual midel convert to tflite get wrong token (#32079)
* fix: multilingual midel convert to tflite get wrong token

* fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min

---------

Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com>
Co-authored-by: Aya <[kent831217@gmail.com]>
2024-08-27 11:44:09 +02:00
Sai-Suraj-27
3bf6dd8aa1
fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850)
* Fixed failing CodeGenTokenizationTest::test_truncation.

* [run_slow] Codegen

* [run_slow] codegen
2024-08-27 09:20:59 +02:00
Zach Mueller
9578c2597e
Fixup py 38 type hints for mps friendly (#33128)
Fixup py 38
2024-08-26 12:27:39 -04:00
Pablo Montalvo
26f043bd4d
quickfix documentation (#32566)
* fix documentation

* update config
2024-08-26 17:49:44 +02:00
Sai-Suraj-27
3562772969
fix: Fixed pydantic required version in dockerfiles to make it compatible with DeepSpeed (#33105)
Fixed pydantic required version in dockerfiles.
2024-08-26 17:10:36 +02:00
Ritik Nandwal
a378a54a57
Add changes for uroman package to handle non-Roman characters (#32404)
* Add changes for uroman package to handle non-Roman characters

* Update docs for uroman changes

* Modifying error message to warning, for backward compatibility

* Update instruction for user to install uroman

* Update docs for uroman python version dependency and backward compatibility

* Update warning message for python version compatibility with uroman

* Refine docs
2024-08-26 17:07:01 +02:00
Joao Gante
72d4a3f9c1
mps: add isin_mps_friendly, a wrapper function for torch.isin (#33099) 2024-08-26 15:34:19 +01:00
Joao Gante
894d421ee5
Test: add higher atol in test_forward_with_num_logits_to_keep (#33093) 2024-08-26 15:23:30 +01:00
Joao Gante
93e0e1a852
CI: add torchvision to the consistency image (#32941) 2024-08-26 15:17:45 +01:00
Shijie
19e6e80e10
support qwen2-vl (#32318)
* support-qwen2-vl

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* hyphen->underscore

* make style

* add-flash2-tipd

* delete-tokenize=False

* remove-image_processor-in-init-file

* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES

* format-doct

* support-Qwen2VLVisionConfig

* remove-standardize_cache_format

* fix-letter-varaibles

* remove-torch-in-image-processor

* remove-useless-docstring

* fix-one-letter-varaible-name

* change-block-name

* default-quick-gelu-in-vision

* remove-useless-doc

* use-preimplemented-flash-forward

* fix-doc

* fix-image-processing-doc

* fix-apply-rotary-embed

* fix-flash-attn-sliding-window

* refactor

* remove-default_template

* remove-reorder_cache

* simple-get-rope_deltas

* update-prepare_inputs_for_generation

* update-attention-mask

* update-rotary_seq_len

* remove-state

* kv_seq_length

* remove-warning

* _supports_static_cache

* remove-legacy-cache

* refactor

* fix-replace

* mrope-section-doc

* code-quality

* code-quality

* polish-doc

* fix-image-processing-test

* update readme

* Update qwen2_vl.md

* fix-test

* Update qwen2_vl.md

* nit

* processor-kwargs

* hard-code-norm_layer

* code-quality

* discard-pixel-values-in-gen

* fix-inconsistent-error-msg

* unify-image-video

* hidden_act

* add-docstring

* vision-encode-as-PreTrainedModel

* pixel-to-target-dtype

* update doc and low memoryvit

* format

* format

* channel-foramt

* fix vit_flashatt

* format

* inherit-Qwen2VLPreTrainedModel

* simplify

* format-test

* remove-one-line-func-in-image-processing

* avoid-one-line-reshape

* simplify-rotary_seq_len

* avoid-single-letter-variable

* no-for-loop-sdpa

* avoid-single-letter-variable

* remove-one-line-reshape

* remove-one-line-reshape

* remove-no-rope-in-vit-logic

* default-mrope

* add-copied-from

* more-docs-for-mrope

* polish-doc

* comment-and-link

* polish-doc

* single-letter-variables

* simplify-image-processing

* video->images

* kv_seq_len-update

* vision-rope-on-the-fly

* vision-eager-attention

* change-processor-order

---------

Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
S M Jishanul Islam
8defc95df3
Updated the custom_models.md changed cross_entropy code (#33118) 2024-08-26 13:15:43 +02:00
Matt
0a7af19f4d
Update Jinja docs with new functions and general cleanup (#33097) 2024-08-23 17:40:06 +01:00
Arun Prakash A
e3a5f35cd5
added doctring to SchedulerType class (#32898)
* added doctring to SchedulerType class

* Remove trailing whitespace  src/transformers/trainer_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fixup

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-23 09:15:25 -07:00
Donggeun Yu
1dbd9d3693
DeviceGuard added to use Deformable Attention more safely on multi-GPU (#32910)
* Update modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update ms_deform_attn_cuda.cu

* Update modeling_deformable_detr.py

* Update modeling_deformable_detr.py

* [empty] this is a empty commit

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 17:12:10 +01:00
Matt
371b9c1486
Enable some Jinja extensions and add datetime capabilities (#32684)
* Add new Jinja features:

- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format

* Add new Jinja features:

- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format

* Fix strftime template

* Add template strip() just to be safe

* Remove the do extension to make porting easier, and also because it's the least useful

* Rename test

* strftime -> strftime_now

* Split test

* Update test to use strftime_now

* Refactor everything out into chat_template_utils

* Refactor everything out into chat_template_utils

* Refactor everything out into chat_template_utils

* Refactor everything out into chat_template_utils

* Refactor everything out into chat_template_utils
2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu
adb91179b9
Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860)
* add liger integration

* fix syntax

* fix import issue

* add trainer.md

* Use _apply_liger_kernel()

* Fixed log message

* Update docs/source/en/trainer.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/trainer.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Update docs/source/en/trainer.md

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Fixed checkstyle and updated readme

* Added test

* Fixed checkstyle

* fix docstring

* rename use_liger to use_liger_kernel

* Trigger Build

* Added test

* add fix-copies

* Fixed copy inconsistencies

---------

Co-authored-by: shimizust <sshimizu@linkedin.com>
Co-authored-by: Steven Shimizu <shimizust@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-08-23 13:20:49 +02:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 (#32659)
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 11:12:53 +01:00
Cyril Vallez
22e6f14525
Reducing memory usage: removing useless logits computation in generate() (#31292)
* Add .float() in all generation methods logit outputs

* Switch float-casting of logits to training only for main models

* Add `num_logits_to_keep` in Llama and add it by default in generate

* Apply style

* Add num_logits_to_keep as arg in prepare_input_for_generation

* Add support for Mistral

* Revert models except llama and mistral

* Fix default None value in _supports_num_logits_to_keep()

* Fix dimension of dummy input

* Add exception for prophetnet in _supports_num_logits_to_keep()

* Update _supports_num_logits_to_keep() to use inspect.signature()

* Add deprecation cycle + remove modification with pretraining_tp

* Apply style

* Add most used models

* Apply style

* Make `num_logits_to_keep` an int in all cases to remove if-else clause

* Add compile check for the warning

* Fix torch versions

* style

* Add gemma2

* Update warning version

* Add comment about .float operations in generation utils

* Add tests in GenerationTesterMixin and ModelTesterMixin

* Fix batch size for assisted decoding in tests

* fix small issues in test

* refacor test

* fix slicing removing dim issue

* Add nemotron support (should fix check-copy issue in CIs)

* Trigger new CIs

* Trigger new CIs

* Bump version

* Bump version in TODO

* Trigger CIs

* remove blank space

* Trigger CIs
2024-08-23 11:08:34 +01:00
Stefano Fiorucci
d806fa3e92
docs: fix outdated link to TF32 explanation (#32947)
fix outdated link
2024-08-22 13:28:00 -07:00
Joao Gante
a26de15139
Generate: Deprecate returning legacy cache by default; Handle use_cache=False (#32863) 2024-08-22 20:01:52 +01:00
Jinuk
09e6579d2d
🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334)
* docs: ko: tasks/knowledge_distillation_for_image_classification.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-08-22 10:42:39 -07:00
Franz Louis Cesista
273c0afc8f
Fix regression on Processor.save_pretrained caused by #31691 (#32921)
fix save_pretrained
2024-08-22 18:42:44 +02:00
Andrés Marafioti
18199b34e5
[run_slow] idefics2 (#32840) 2024-08-22 18:08:03 +02:00
Joao Gante
975b988bfe
Gemma2: eager attention by default (#32865) 2024-08-22 15:59:30 +01:00
Shaopeng Fu
f1d822ba33
fix: (issue #32689) AttributeError raised when using Trainer with eval_on_start=True in Jupyter Notebook. (#32849)
fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.
2024-08-22 16:42:00 +02:00
Isotr0py
ee8c01f839
Add chat_template for tokenizer extracted from GGUF model (#32908)
* add chat_template to gguf tokenizer

* add template through tokenizer config
2024-08-22 16:41:25 +02:00
regisss
99d67f1a09
Improve greedy search memory usage (#32895)
Do not call torch.repeat_interleave if expand_size is 1
2024-08-22 15:37:44 +01:00
Yih-Dar
bf97d4aa6d
Fix benchmark script (#32635)
* fix

* >= 0.3.0

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-22 16:07:47 +02:00
Shubham Ugare
9282413611
Add SynCode to llm_tutorial (#32884) 2024-08-22 15:30:22 +02:00
Younes Belkada
eeea71209a
FIX / Hub: Also catch for exceptions.ConnectionError (#31469)
* Update hub.py

* Update errors

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-08-22 15:29:21 +02:00
Joao Gante
8b94d28f97
CI: separate step to download nltk files (#32935)
* separate step to download nltk files

* duplicated

* rm comma
2024-08-22 14:17:24 +01:00
Marc Sun
c42d264549
FEAT / Trainer: Add adamw 4bit optimizer (#31865)
* add 4bit optimizer

* style

* fix msg

* style

* add qgalore

* Revert "add qgalore"

This reverts commit 25278e805f.

* style

* version check
2024-08-22 15:07:09 +02:00
Gal Cohen (galco)
6baa6f276a
fix: no need to dtype A in jamba (#32924)
Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-22 15:03:22 +02:00
Sai-Suraj-27
af638c4afe
fix: Added missing huggingface_hub installation to workflows (#32891)
Added missing huggingface_hub installation to workflows.
2024-08-22 12:51:12 +01:00
Joao Gante
f6e2586a36
Jamba: update integration tests (#32250)
* try test updates

* a few more changes

* a few more changes

* a few more changes

* [run slow] jamba

* skip logits checks on older gpus

* [run slow] jamba

* oops

* [run slow] jamba

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-22 11:46:10 +01:00
Arthur
3bb7b05229
Update docker image building (#32918)
commit
2024-08-21 21:23:10 +02:00