transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-03 03:31:05 +06:00

Author	SHA1	Message	Date
Shiyu	342e800086	support 3D attention mask in bert (#32105 ) * support 3D/4D attention mask in bert * test cases * update doc * fix doc	2024-09-06 14:20:48 +02:00
GeLee	2b18354106	add self.head_dim for VisionAttention in Qwen2-VL (#33211 ) * add self.head_dim for VisionAttention in Qwen2-VL * add self.head_dim for VisionAttention in Qwen2-VL * fix ci * black the test_modeling_qwen2_vl.py * use ruff to format test_modeling_qwen2_vl.py * [run-slow] qwen2_vl * use tying for python3.8 * fix the import format * use ruff to fix the ci error I001 * [run-slow] qwen2_vl * remove unused import * commit for rebase * use ruff fix ci * [run-slow] qwen2_vl --------- Co-authored-by: root <liji>	2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi	3314fe1760	Add validation for maximum sequence length in modeling_whisper.py (#33196 ) * Add validation for maximum sequence length in modeling_whisper.py Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message. This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness. * Change exception message in src/transformers/models/whisper/modeling_whisper.py The exception message is for whisper's label's sequence max length. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py It's for whisper's config.max_target_positions. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change method's documentation in src/transformers/models/whisper/modeling_whisper.py * Add test for maximum label's sequence length in test_modeling_whisper.py * Add self to modeling_whisper.py * Update test_modeling_whisper.py with respect to automatic validations * Update modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Separate test_labels_sequence_max_length tests in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Remove assert from test_modeling_whisper.py * Add max_target_positions to WhisperModelTester in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py * Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py * Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py * Add new tests in test_modeling_whisper.py * Update test_modeling_whisper.py --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-09-06 14:09:49 +02:00
Ita Zaporozhets	363301f221	support loading model without config.json file (#32356 ) * support loading model without config.json file * fix condition * update tests * add test * ruff * ruff * ruff	2024-09-06 13:49:47 +02:00
Xuehai Pan	e1c2b69c34	Load dynamic module (remote code) only once if code isn't change (#33162 ) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review	2024-09-06 12:49:35 +01:00
Sanchit Gandhi	51d15eb1c1	[whisper] alternative fix for long-form timestamps (#32131 ) * [whisper] alternative fix for long-form timestamps * update test	2024-09-06 12:57:08 +02:00
Raushan Turganbay	1759bb9126	Fix: StaticCache & `inputs_embeds` (#32932 ) squash commit	2024-09-06 12:56:59 +05:00
Shijie	21fac7abba	simple align qwen2vl kv_seq_len calculation with qwen2 (#33161 ) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-09-05 21:19:30 +05:00
Vladislav Bronzov	5d11de4a2f	Add Qwen2Moe GGUF loading support (#33264 ) * update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe	2024-09-05 17:42:03 +02:00
Joshua Lochner	c6d2848a23	🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision models (#33226 ) * Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models * Apply formatting * Add missing `self.config = config` * Fix copies * Fix hiera interpolation unit test * Formatting * Update `_import_structure` * make style * Fix docstring * Use `# Copied from` instead of utils * DeiT variable renaming (`class_and_dist_pos_embed`) * Fix Hiera `interpolate_pos_encoding`	2024-09-05 16:17:34 +02:00
Younes Belkada	47b096412d	Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195 ) * fix FM training kernels * fix copies * fix copies * propagate to slow path * make it BC * add comment * fix test	2024-09-05 11:55:08 +02:00
Raushan Turganbay	43df47d8e7	Llava Onevision: add model (#32673 ) * working version * fix copies * update * tests * update docs * codestyle * add more tests * add returns for docs * clean up * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * codestyle * style * shouldn't be reversed * [run-slow] llava_onevision * [run-slow] llava_onevision * add pooling in videos * [run-slow] llava_onevision * num-logits-to-keep * [run-slow] llava_onevision * [run-slow] llava_onevision * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * video matched orig impl * fix tests * chat template was modified * Update docs/source/en/model_doc/llava_onevision.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add morer info in the doc page --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-05 14:43:20 +05:00
Yoni Gozlan	9230d78e76	Add validate images and text inputs order util for processors and test_processing_utils (#33285 ) * Add validate images and test processing utils * Remove encoded text from possible inputs in tests * Removed encoded inputs as valid in processing_utils * change text input check to be recursive * change text check to all element of lists and not just the first one in recursive checks	2024-09-04 13:50:31 -04:00
Aymeric Roucher	2cb543db77	Multi agents with manager (#32687 ) * Add Multi agents with a hierarchical system	2024-09-04 17:30:54 +02:00
amyeroberts	d2dcff96f8	[InstructBLIP] qformer_tokenizer is required input (#33222 ) * [InstructBLIP] qformer_tokenizer is required input * Bit safer * Add to instructblipvideo processor * Fix up * Use video inputs * Update tests/models/instructblipvideo/test_processor_instructblipvideo.py	2024-09-04 16:18:06 +01:00
Alex Sherstinsky	122ded0a11	Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188 ) * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities.	2024-09-04 17:01:12 +02:00
laurentd-lunit	d703477265	[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263 ) * [fix] LlavaNextProcessor '_get_unpadded_features' method * [tests] add test_image_token_filling * [chore] style + comment * [minor] improve readability * [chore] run make fix-copies	2024-09-04 17:41:51 +05:00
Joao Gante	d750b509fc	Config: unified logic to retrieve text config (#33219 )	2024-09-04 12:03:30 +01:00
Raushan Turganbay	ebbe8d8014	Cache docs: update (#32929 ) * some changes * more updates * fix cache copy * nits * nits * add tests	2024-09-04 15:05:31 +05:00
Niklas Muennighoff	ecd61c6286	Add OLMoE (#32406 ) * Add OLMoE * Add OLMoE * Updates * Make norm optional; add keys * Add output * Add * Fix dtype * Fix eos config * Update * Add OLMoE * Fix OLMoE path * Format * Format * Rmv copy statement * Rmv copy statement * Format * Add copies * Cp rotary * Fix aming * Fix naming * Update RoPE integration; num_logits_to_keep; Add copy statements * Add eps to config * Format * Add aux loss * Adapt router_aux_loss_coef * Update md * Adapt * adapt tests	2024-09-03 18:43:12 +02:00
Zach Mueller	6b7d64ac1c	Only disallow DeepSpeed Zero-3 for auto bs finder (#31731 ) * Only disallow DeepSpeed * Clean * DeepSpeed! * Add a test for deepspeed	2024-09-03 09:16:28 -04:00
Isotr0py	edeca4387c	🚨 Support dequantization for most GGML types (#32625 ) * use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere	2024-09-03 12:58:14 +02:00
Marc Sun	9ea1eacd11	remove to restriction for 4-bit model (#33122 ) * remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>	2024-09-02 16:28:50 +02:00
Joao Gante	97c0f45b9c	Generate: fix assistant in different device (#33257 )	2024-09-02 14:37:49 +01:00
Matt	52a0213755	Add assistant prefill for chat templates and TextGenerationPipeline (#33198 ) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-09-02 13:23:47 +01:00
Jeongseok Kang	963ed98bed	docs: Replace package abbreviations with full name(`bitsandbytes`) in docstrings (#33230 ) * docs: Provide fullname for `bitsandbytes` package * docs: Provide fullname for `bitsandbytes` package (2)	2024-09-02 13:40:34 +02:00
Aymeric Roucher	1ca9ff5c91	Add duckduckgo search tool (#32882 ) * Add duckduckgo search tool	2024-09-02 09:56:20 +02:00
Joao Gante	eb5b968c5d	Generate: throw warning when `return_dict_in_generate` is False but should be True (#33146 )	2024-08-31 10:47:08 +01:00
Arthur	b017a9eb11	Refactor CI: more explicit (#30674 ) * don't run custom when not needed? * update test fetcher filtering * fixup and updates * update * update * reduce burden * nit * nit * mising comma * this? * this? * more parallelism * more * nit for real parallelism on tf and torch examples * update * update * update * update * update * update * update * update * update * update * update * update * update to make it more custom * update to make it more custom * update to make it more custom * update to make it more custom * update * update * update * update * update * update * use correct path * fix path to test files and examples * filter-tests * filter? * filter? * filter? * nits * fix naming of the artifacts to be pushed * list vs files * list vs files * fixup * fix list of all tests * fix the install steps * fix the install steps * fix the config * fix the config * only split if needed * only split if needed * extend should fix it * extend should fix it * arg * arg * update * update * run tests * run tests * run tests * more nits * update * update * update * update * update * update * update * simpler way to show the test, reduces the complexity of the generated config * simpler way to show the test, reduces the complexity of the generated config * style * oups * oups * fix import errors * skip some tests for now * update doctestjob * more parallelism * fixup * test only the test in examples * test only the test in examples * nits * from Arthur * fix generated congi * update * update * show tests * oups * oups * fix torch job for now * use single upload setp * oups * fu*k fix * nit * update * nit * fix * fixes * [test-all] * add generate marker and generate job * oups * torch job runs not generate tests * let repo utils test all utils * UPdate * styling * fix repo utils test * more parallel please * don't test * update * bit more verbose sir * more * hub were skipped * split by classname * revert * maybe? * Amazing catch Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix * update * update * maybe non capturing * manual convert? * pass artifacts as parameters as otherwise the config is too long * artifact.json * store output * might not be safe? * my token * mmm? * use CI job IS * can't get a proper id? * ups * build num * update * echo url * this? * this! * fix * wget * ish * dang * udpdate * there we go * update * update * pass all * not .txt * update * fetcg * fix naming * fix * up * update * update * ?? * update * more updates * update * more * skip * oups * pr documentation tests are currently created differently * update * hmmmm * oups * curl -L * update * ???? * nit * mmmm * ish * ouf * update * ish * update * update * updatea * nit * nit * up * oups * documentation_test fix * test hub tests everything, just marker * update * fix * test_hub is the only annoying one now * tf threads? * oups * not sure what is happening? * fix? * just use folder for stating hub * I am getting fucking annoyed * fix the test? * update * uupdate * ? * fixes * add comment! * nit --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2024-08-30 18:17:25 +02:00
Matt	38d58a4427	Fix local repos with remote code not registering for pipelines (#33100 ) * Extremely experimental fix! * Try removing the clause entirely * Add test * make fixup * stash commit * Remove breakpoint * Add anti-regression test * make fixup * Move repos to hf-internal-testing!	2024-08-30 16:56:22 +01:00
Gerben van V	5129671290	Add a static cache that offloads to the CPU or other device (#32161 ) * Add a static cache that offloads to the CPU or other device * Fix PR comments, add unit-tests	2024-08-29 11:51:09 +02:00
rasmi	f9ed05dd03	Fix import paths for test_module (#32888 ) * Fix import path for test_feature_extraction_utils.py See https://github.com/huggingface/transformers/pull/32601 * Fix import path for test_image_processing_utils.py	2024-08-28 12:08:29 +01:00
JB (Don)	f1a385b1de	[RoBERTa-based] Add support for sdpa (#30510 ) * Adding SDPA support for RoBERTa-based models * add not is_cross_attention * fix copies * fix test * add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin * address some review comments * use copied from * style * consistency * fix lists --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-28 10:26:00 +02:00
Anton Vlasjuk	3bfd3e4803	Fix: Jamba batched generation (#32914 ) * init fix * fix mask during cached forward, move mask related stuff to own function * adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp) * revert overwriting new integration tests * move some comments to docstring	2024-08-28 09:24:06 +02:00
Mayank Mishra	c35d2ccf5a	Granite language models (#31502 ) * first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-27 21:27:21 +02:00
Juan Pizarro	7591ca5bc5	🚨 Add Blip2ForImageTextRetrieval (#29261 ) * add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 18:50:27 +01:00
Joao Gante	c6b23fda65	Llama: make slow tests green 🟢 (#33138 )	2024-08-27 14:44:42 +01:00
Matt	9956c2bc98	Add a fix for custom code tokenizers in pipelines (#32300 ) * Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests	2024-08-27 14:39:57 +01:00
Joao Gante	ab0ac3b98f	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 ) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 11:58:27 +01:00
Aya	7562366d4b	fix: multilingual midel convert to tflite get wrong token (#32079 ) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>	2024-08-27 11:44:09 +02:00
Sai-Suraj-27	3bf6dd8aa1	fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850 ) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen	2024-08-27 09:20:59 +02:00
Joao Gante	72d4a3f9c1	mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#33099 )	2024-08-26 15:34:19 +01:00
Joao Gante	894d421ee5	Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#33093 )	2024-08-26 15:23:30 +01:00
Shijie	19e6e80e10	support qwen2-vl (#32318 ) * support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>	2024-08-26 15:16:44 +02:00
Matt	371b9c1486	Enable some Jinja extensions and add datetime capabilities (#32684 ) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils	2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00

1 2 3 4 5 ...

4027 Commits