transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-28 16:52:24 +06:00

Author	SHA1	Message	Date
Matt	a43e84cb3b	Make ASR pipeline compliant with Hub spec + add tests (#33769 ) * Remove max_new_tokens arg * Add ASR pipeline to testing * make fixup * Factor the output test out into a util * Full error reporting * Full error reporting * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Small comment --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-10-01 18:15:04 +01:00
Nicola De Angeli	0256520794	fix: repair depth estimation multiprocessing (#33759 ) * fix: repair depth estimation multiprocessing * test: add test for multiprocess depth estimation	2024-10-01 17:59:59 +01:00
Guang Yang	808997a634	Fix passing str dtype to static cache (#33741 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-01 09:50:17 +02:00
Adibvafa Fallahpour	c269c5c74d	Fix Mamba slow path bug with dtype mismatch. (#32691 ) * Fix Mamba slow path bug with dtype mismatch. * Update test_modeling_mamba.py * Improve style. * Fix issue with cache position of dtype mismatch test. * Change test for slow path. * Revert changes. * Switch to buggy code and add test to catch it. * Fix the dtype mismatch bug and add test code to verify it. * Fix minor bug with test. * Fix incorrect dtype of model output. * Fix incorrect dtype of cache. * Fix incorrect dtype of ssm cache. * Fix incorrect dtype of conv state. * Remove assertion for ssm state. * Add assertion for conv state dtype. * Fix all issues with dtype mismatch test.	2024-10-01 09:28:40 +02:00
Joshua Lochner	18c5b216f1	Fix ViT-MAE decoder interpolate (#33330 ) * Fix ViT-MAE decoder interpolate * Add unit test for `interpolate_pos_encoding` w/ custom sizes * [run_slow] vit_mae	2024-09-30 18:47:13 +02:00
mobicham	f5247aca01	Hqq serialization (#33141 ) * HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-30 14:47:18 +02:00
Matt	d3821c4aed	Make audio classification pipeline spec-compliant and add test (#33730 ) * Make audio classification pipeline spec-compliant and add test * Check that test actually running in CI * Try a different pipeline for the CI * Move the test so it gets triggered * Move it again, this time into task_tests! * make fixup * indentation fix * comment * Move everything from testing_utils to test_pipeline_mixin * Add output testing too * revert small diff with main * make fixup * Clarify comment * Update tests/pipelines/test_pipelines_audio_classification.py Co-authored-by: Lucain <lucainp@gmail.com> * Update tests/test_pipeline_mixin.py Co-authored-by: Lucain <lucainp@gmail.com> * Rename function and js_args -> hub_args * Cleanup the spec recursion * Check keys for all outputs --------- Co-authored-by: Lucain <lucainp@gmail.com>	2024-09-27 17:01:06 +01:00
Vladislav Bronzov	9d200cfbee	Add gguf support for bloom (#33473 ) * add bloom arch support for gguf * apply format * small refactoring, bug fix in GGUF_TENSOR_MAPPING naming * optimize bloom GGUF_TENSOR_MAPPING * implement reverse reshaping for bloom gguf * add qkv weights test * add q_8 test for bloom	2024-09-27 12:13:40 +02:00
Raushan Turganbay	3e039d3827	Paligemma support for multi-image (#33447 ) * upadte * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update docs * better example in tests * support image tokens * read token * Update tests/models/paligemma/test_processing_paligemma.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * nit: naming * Update docs/source/en/model_doc/paligemma.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * conflicts after rebasing --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2024-09-27 11:23:14 +02:00
Ita Zaporozhets	6730485b02	clean_up_tokenization_spaces=False if unset (#31938 ) * clean_up_tokenization_spaces=False if unset * deprecate warning * updating param for old models * update models * make fix-copies * fix-copies and update bert models * warning msg * update prophet and clvp * updating test since space before is arbitrarily removed * remove warning for 4.45	2024-09-26 19:38:20 +02:00
Joao Gante	3557f9a14a	Generate: `can_generate()` recursive check (#33718 ) * add recursive check and test warnings * missing space * models without can_generate	2024-09-26 18:11:14 +01:00
Arthur	46841d3eb2	[`MllamaProcessor`] Update errors and API with multiple image (#33715 ) * update error * update and add a test * update * update	2024-09-26 16:33:25 +02:00
Franz Louis Cesista	0a21381ba3	Uniformize kwargs for chameleon processor (#32181 ) * uniformize kwargs of Chameleon * fix linter nit * rm stride default * add tests for chameleon processor * fix tests * add comment on get_component * rm Chameleon's slow tokenizer * add check order images text + nit * update docs and tests * Fix LlamaTokenizer tests * fix gated repo access * fix wrong import --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2024-09-26 10:18:07 -04:00
Andrés Marafioti	f2c388e3f9	Add Idefics 3! (#32473 ) * Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-25 21:28:49 +02:00
Manuel	a55adee890	adding positional encoder changes and tests (#32600 ) * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>	2024-09-25 19:05:01 +01:00
Arthur	19d58d31f1	Add MLLama (#33703 ) * current changes * nit * Add cross_attenttion_mask to processor * multi-image fixed * Add cross_attenttion_mask to processor * cross attn works in all cases * WIP refactoring function for image processor * WIP refactoring image processor functions * Refactor preprocess to use global loops instead of list nested list comps * Docstrings * Add channels unification * fix dtype issues * Update docsrings and format * Consistent max_image_tiles * current script * updates * Add convert to rgb * Add image processor tests * updates! * update * god damn it I am dumb sometimes * Precompute aspect ratios * now this works, full match * fix 😉 * nits * style * fix model and conversion * nit * nit * kinda works * hack for sdpa non-contiguous bias * nits here and there * latest c hanges * merge? * run forward * Add aspect_ratio_mask * vision attention mask * update script and config variable names * nit * nits * be able to load * style * nits * there * nits * make forward run * small update * enable generation multi-turn * nit * nit * Clean up a bit for errors and typos * A bit more constant fixes * 90B keys and shapes match * Fix for 11B model * Fixup, remove debug part * Docs * Make max_aspect_ratio_id to be minimal * Update image processing code to match new implementation * Adjust conversion for final checkpoint state * Change dim in repeat_interleave (accordig to meta code) * tmp fix for num_tiles * Fix for conversion (gate<->up, q/k_proj rope permute) * nits * codestyle * Vision encoder fixes * pass cross attn mask further * Refactor aspect ratio mask * Disable text-only generation * Fix cross attention layers order, remove q/k norm rotation for cross atention layers * Refactor gated position embeddings * fix bugs but needs test with new weights * rope scaling should be llama3 * Fix rope scaling name * Remove debug for linear layer * fix copies * Make mask prepare private func * Remove linear patch embed * Make precomputed embeddings as nn.Embedding module * MllamaPrecomputedAspectRatioEmbedding with config init * Remove unused self.output_dim * nit, intermediate layers * Rename ln and pos_embed * vision_chunk_size -> image_size * return_intermediate -> intermediate_layers_indices * vision_input_dim -> hidden_size * Fix copied from statements * fix most tests * Fix more copied from * layer_id->layer_idx * Comment * Fix tests for processor * Copied from for _prepare_4d_causal_attention_mask_with_cache_position * Style fix * Add MllamaForCausalLM * WIP fixing tests * Remove duplicated layers * Remove dummy file * Fix style * Fix consistency * Fix some TODOs * fix language_model instantiation, add docstring * Move docstring, remove todos for precomputed embeds (we cannot init them properly) * Add initial docstrings * Fix * fix some tests * lets skip these * nits, remove print, style * Add one more copied from * Improve test message * Make validate func private * Fix dummy objects * Refactor `data_format` a bit + add comment * typos/nits Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix dummy objects and imports * Add chat template config json * remove num_kv_heads from vision attention * fix * move some commits and add more tests * fix test * Remove `update_key_name` from modeling utils * remove num-kv-heads again * some prelimiary docs * Update chat template + tests * nit, conversion script max_num_tiles from params * Fix warning for text-only generation * Update conversion script for instruct models * Update chat template in converstion + test * add tests for CausalLM model * model_max_length, avoid null chat_template * Refactor conversion script * Fix forward * Fix integration tests * Refactor vision config + docs * Fix default * Refactor text config * Doc fixes * Remove unused args, fix docs example * Squashed commit of the following: commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830 Author: qubvel <qubvel@gmail.com> Date: Wed Sep 18 13:39:15 2024 +0000 Move model + add output hidden states and output attentions * Fix num_channels * Add mllama text and mllama vision models * Fixing repo consistency * Style fix * Fixing repo consistency * Fixing unused config params * Fix failed tests after refactoring * hidden_activation -> hidden_act for text mlp * Remove from_pretrained from sub-configs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Reuse lambda in conversion script * Remove run.py * Update docs/source/en/model_doc/mllama.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/processing_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused LlamaTokenizerFast * Fix logging * Refactor gating * Remove cycle for collecting intermediate states * Refactor text-only check, add integration test for text-only * Revert from pretrained to configs * Fix example * Add auto `bos_token` adding in processor * Fix tips * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Enable supports_gradient_checkpointing model flag * add eager/sdpa options * don't skip attn tests and bring back GC skips (did i really remove those?) * Fix signature, but get error with None gradient * Fix output attention tests * Disable GC back * Change no split modules * Fix dropout * Style * Add Mllama to sdpa list * Add post init for vision model * Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model * if skipped, say it, don't pass * Clean vision tester config * Doc for args * Update tests/models/mllama/test_modeling_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add cross_attention_mask to test * typehint * Remove todo * Enable gradient checkpointing * Docstring * Style * Fixing and skipping some tests for new cache * Mark flaky test * Skip `test_sdpa_can_compile_dynamic` test * Fixing some offload tests * Add direct GenerationMixin inheritance * Remove unused code * Add initializer_range to vision config * update the test to make sure we show if split * fix gc? * Fix repo consistency * Undo modeling utils debug changes * Fix link * mllama -> Mllama * [mllama] -> [Mllama] * Enable compile test for CausalLM model (text-only) * Fix TextModel prefix * Update doc * Docs for forward, type hints, and vision model prefix * make sure to reset * fix init * small script refactor and styling * nit * updates! * some nits * Interpolate embeddings for 560 size and update integration tests * nit * does not suppor static cache! * update * fix * nit2 * this? * Fix conversion * Style * 4x memory improvement with image cache AFAIK * Token decorator for tests * Skip failing tests * update processor errors * fix split issues * style * weird * style * fix failing tests * update * nit fixing the whisper tests * fix path * update --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal> Co-authored-by: qubvel <qubvel@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-09-25 19:56:25 +02:00
Yoni Gozlan	94f18cf23c	Add OmDet-Turbo (#31843 ) * Add template with add-new-model-like * Add rough OmDetTurboEncoder and OmDetTurboDecoder * Add working OmDetTurbo convert to hf * Change OmDetTurbo encoder to RT-DETR encoder * Add swin timm backbone as default, add always partition fix for swin timm * Add labels and tasks caching * Fix make fix-copies * Format omdet_turbo * fix Tokenizer tests * Fix style and quality * Reformat omdet_turbo * Fix quality, style, copies * Standardize processor kwargs * Fix style * Add output_hidden_states and ouput_attentions * Add personalize multi-head attention, improve docstrings * Add integrated test and fix copy, style, quality * Fix unprotected import * Cleanup comments and fix unprotected imports * Add fix different prompts in batch (key_padding_mask) * Add key_padding_mask to custom multi-head attention module * Replace attention_mask by key_padding_mask * Remove OmDetTurboModel and refactor * Refactor processing of classes and abstract use of timm backbone * Add testing, fix output attentions and hidden states, add cache for anchors generation * Fix copies, style, quality * Add documentation, conver key_padding_mask to attention_mask * revert changes to backbone_utils * Fic docstrings rst * Fix unused argument in config * Fix image link documentation * Reorder config and cleanup * Add tokenizer_init_kwargs in merge_kwargs of the processor * Change AutoTokenizer to CLIPTokenizer in convert * Fix init_weights * Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs * change processor kwargs and make task input optional * Fix omdet docs * Remove unnecessary tests for processor kwargs * Replace nested BatchEncoding output of the processor by a flattened BatchFeature * Make modifications from Pavel review * Add changes Amy review * Remove unused param * Remove normalize_before param, Modify processor call docstring * Remove redundant decoder class, add gradient checkpointing for decoder * Remove commented out code * Fix inference in fp16 and add fp16 integrated test * update omdet md doc * Add OmdetTurboModel * fix caching and nit * add OmDetTurboModel to tests * nit change repeated key test * Improve inference speed in eager mode * fix copies * Fix nit * remove OmdetTurboModel * [run-slow] omdet_turbo * [run-slow] omdet_turbo * skip dataparallel test * [run-slow] omdet_turbo * update weights to new path * remove unnecessary config in class --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>	2024-09-25 13:26:28 -04:00
Matthew Douglas	196d35ccfc	Add AdEMAMix optimizer (#33682 ) * Add AdEMAMix optimizer * Fix test * Update tests/trainer/test_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-09-25 18:07:21 +01:00
Benjamin Fineran	574a9e12bb	HFQuantizer implementation for compressed-tensors library (#31704 ) * Add compressed-tensors HFQuantizer implementation * flag serializable as False * run * revive lines deleted by ruff * fixes to load+save from sparseml, edit config to quantization_config, and load back * address satrat comment * compressed_tensors to compressed-tensors and revert back is_serializable * rename quant_method from sparseml to compressed-tensors * tests * edit tests * clean up tests * make style * cleanup * cleanup * add test skip for when compressed tensors is not installed * remove pydantic import + style * delay torch import in test * initial docs * update main init for compressed tensors config * make fix-copies * docstring * remove fill_docstring * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * review comments * review comments * comments - suppress warnings on state dict load, tests, fixes * bug-fix - remove unnecessary call to apply quant lifecycle * run_compressed compatability * revert changes not needed for compression * no longer need unexpected keys fn * unexpected keys not needed either * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * add to_diff_dict * update docs and expand testing * Update _toctree.yml with compressed-tensors * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update doc * add note about saving a loaded model --------- Co-authored-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Sara Adkins <sara.adkins65@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-09-25 14:31:38 +02:00
NielsRogge	06e27e3dc0	[Pixtral] Improve docs, rename model (#33491 ) * Improve docs, rename model * Fix style * Update repo id	2024-09-25 13:53:12 +02:00
Dmitry Rogozhkin	5e2916bc14	tests: fix pytorch tensor placement errors (#33485 ) This commit fixes the following errors: * Fix "expected all tensors to be on the same device" error * Fix "can't convert device type tensor to numpy" According to pytorch documentation torch.Tensor.numpy(force=False) performs conversion only if tensor is on CPU (plus few other restrictions) which is not the case. For our case we need force=True since we just need a data and don't care about tensors coherency. Fixes: #33517 See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-09-25 12:21:53 +01:00
Jonathan Mamou	52daf4ec76	🚨🚨 Setting default behavior of assisted decoding (#33657 )	2024-09-25 09:39:09 +01:00
Yoni Gozlan	5f0c181f4e	Uniformize kwargs for image-text-to-text processors (#32544 ) * uniformize FUYU processor kwargs * Uniformize instructblip processor kwargs * Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2 * Uniformize llava_next processor * Fix save_load test for processor with chat_template only as extra init args * Fix import Unpack * Fix Fuyu Processor import * Fix FuyuProcessor import * Fix FuyuProcessor * Add defaults for specific kwargs kosmos2 * Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs * Add tests processor Udop * remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature * Fix overwrite tests kwargs processors * Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop * Fix processing test fuyu * remove unnecessary pad_token check in instructblip ProcessorTest * Fix BC tests and cleanup * FIx imports fuyu * Uniformize Pix2Struct * Fix wrong name for FuyuProcessorKwargs * Fix slow tests reversed inputs align fuyu llava-next, change udop warning * Fix wrong logging import udop * Add check images text input order * Fix copies * change text pair handling when positional arg * rebase on main, fix imports in test_processing_common * remove optional args and udop uniformization from this PR * fix failing tests * remove unnecessary test, fix processing utils and test processing common * cleanup Unpack * cleanup * fix conflict grounding dino	2024-09-24 21:28:19 -04:00
Joao Gante	a7734238ff	Generation tests: update imagegpt input name, remove unused functions (#33663 )	2024-09-24 16:40:48 +01:00
jiqing-feng	11c27dd331	Enable BNB multi-backend support (#31098 ) * enable cpu bnb path * fix style * fix code style * fix 4 bit path * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * add multi backend refactor tests * fix style * tweak 4bit quantizer + fix corresponding tests * tweak 8bit quantizer + try fixing corresponding tests * fix dequant bnb 8bit * account for Intel CPU in variability of expected outputs * enable cpu and xpu device map * further tweaks to account for Intel CPU * fix autocast to work with both cpu + cuda * fix comments * fix comments * switch to testing_utils.torch_device * allow for xpu in multi-gpu tests * fix tests 4bit for CPU NF4 * fix bug with is_torch_xpu_available needing to be called as func * avoid issue where test reports attr err due to other failure * fix formatting * fix typo from resolving of merge conflict * polish based on last PR review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix CI * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix error log * fix error msg * add \n in error log * make quality * rm bnb cuda restriction in doc * cpu model don't need dispatch * fix doc * fix style * check cuda avaliable in testing * fix tests * Update docs/source/en/model_doc/chameleon.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix doc * fix check multibackends * fix import sort * remove check torch in bnb * docs: update bitsandbytes references with multi-backend info * docs: fix small mistakes in bnb paragraph * run formatting * reveret bnb check * move bnb multi-backend check to import_utils * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix bnb check * minor fix for bnb * check lib first * fix code style * Revert "run formatting" This reverts commit `ac108c6d6b`. * fix format * give warning when bnb version is low and no cuda found] * fix device assignment check to be multi-device capable * address akx feedback on get_avlbl_dev fn * revert partially, as we don't want the function that public, as docs would be too much (enforced) --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-24 03:40:56 -06:00
Joao Gante	e15687fffe	Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203 )	2024-09-23 18:28:36 +01:00
Yoni Gozlan	1456120929	Uniformize kwargs for Udop processor and update docs (#33628 ) * Add optional kwargs and uniformize udop * cleanup Unpack * nit Udop	2024-09-23 12:47:32 -04:00
Pablo Montalvo	9eb93854b9	Clean up Unpack imports (#33631 ) clean up Unpack imports	2024-09-23 10:21:17 +02:00
Avishai Elmakies	78b2929c05	Sdpa dino v2 (#33403 ) * add sdpa to dinov2 * fixup * add dinov2 to sdpa doc * update doc order * [run-slow] dinov2 * common to eager * [run-slow] dinov2 * update attn implementation in common * update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit * [run-slow] dinov2 --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>	2024-09-21 01:58:00 +01:00
Mayank Mishra	e472e077c2	Granitemoe (#33207 ) * first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * add granitemoe * add decoration * remove moe from sequenceclassification * fix test * fix * fix * fix * move rope? * merge * drop bias * drop bias * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * drop * drop * fix * fix * cleanup * cleanup * fix * fix granite tests * fp32 test * fix * drop jitter * fix * rename * rename * fix config * add gen test --------- Co-authored-by: Yikang Shen <yikang.shn@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-21 01:43:50 +02:00
jiqing-feng	49a0bef4c1	enable low-precision pipeline (#31625 ) * enable low-precision pipeline * fix parameter for ASR * reformat * fix asr bug * fix bug for zero-shot * add dtype check * rm useless comments * add np.float16 check * Update src/transformers/pipelines/image_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/pipelines/token_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix comments * fix asr check * make fixup * No more need for is_torch_available() --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-09-20 16:43:30 -07:00
Yih-Dar	077b552f07	Fix some missing tests in circleci (#33559 ) * fix * fix * fix * fix * skip * skip more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-09-20 20:58:51 +02:00
Duc-Viet Hoang	dc8b6eaeee	Fix contrastive search to correctly handle input with padding (#33507 ) * fix: handle padding in contrastive search for decoder-only models * fix: handle padding in contrastive search for encoder-decoder models * tests: move padding contrastive test to test_util, add t5 test * fix: handle if model_kwargs["decoder_attention_mask"] is None * refactor: improve padding input contrastive search generation tests * chore: _ranking_fast to use LongTensor for cosine_matrix_mask	2024-09-20 16:52:08 +01:00
Yoni Gozlan	c0c6815dc9	Add support for args to ProcessorMixin for backward compatibility (#33479 ) * add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin * change size and crop_size in processor kwargs tests to do_rescale and rescale_factor * remove unnecessary llava processor kwargs test overwrite * nit * change data_arg_name to input_name * Remove unnecessary test override * Remove unnecessary tests Paligemma * Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring	2024-09-20 11:40:59 -04:00
Yih-Dar	31caf0b95f	Fix missing test in `torch_job` (#33593 ) fix missing tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-09-20 17:16:44 +02:00
Joao Gante	2fdb5e74cc	VLM generate: tests can't generate image/video tokens (#33623 )	2024-09-20 15:43:27 +01:00
amyeroberts	f9b4409726	Remove unnecessary CPM model tests (#33621 ) Remove model tests	2024-09-20 14:20:57 +01:00
Joao Gante	266d0a6375	Generate: remove flakyness in `test_generate_from_inputs_embeds_decoder_only` (#33602 ) almost zero is not zero	2024-09-20 14:50:42 +02:00
Lake Lee	ec1424c6a3	Update modeling_mamba2.py, fix pad size (#32599 ) * Update modeling_mamba2.py Fix pad_size calculation to ensure it's less than self.chunk_size * [run_slow] mamba2 * [run-slow] mamba2 * [run-slow] Add @require_read_token decorator to failing tests for token propagation * [run_slow] mamba2	2024-09-20 11:40:57 +01:00
Fanli Lin	8bd1f2f338	[tests] make more tests device-agnostic (#33580 ) * enable * fix * add xpu skip * add marker * skip for xpu * add more * enable on accelerator * add more cases * add more tests * add more	2024-09-20 10:16:43 +01:00
Fanli Lin	4d8908df27	[tests] enable GemmaIntegrationTest on XPU (#33555 ) enable GemmaIntegrationTest	2024-09-19 19:39:19 +01:00
Fanli Lin	b87755aa6d	[tests] skip tests for xpu (#33553 ) * enable * fix * add xpu skip * add marker * skip for xpu * add more * add one more	2024-09-19 19:28:04 +01:00
Yoni Gozlan	f111d5b783	Uniformize kwargs for Paligemma processor and update docs (#33571 ) * Uniformize paligemma processor * nit	2024-09-19 14:14:06 -04:00
Joao Gante	52920b5dd5	Cache: don't throw warnings on `gemma2` when instantiating a new cache (#33595 )	2024-09-19 17:42:47 +01:00
Anton Vlasjuk	b50ff5993a	[`Mamba2`] Move dt calculations to kernel (#33520 ) * use kernel for dt calculations * add small test * [run-slow] mamba2	2024-09-19 17:41:17 +01:00
Vladislav Bronzov	162056a3f4	change sequence_bias type of SequenceBiasLogitsProcessor to list, add… (#33375 ) * change sequence_bias type of SequenceBiasLogitsProcessor tp list, add config tests for all processors * fix format * small fix for all_token_bias_pairs_are_valid internal func * small typo fix in description * improve test impl, some SequenceBiasLogitsProcessor refactoring	2024-09-19 17:35:44 +01:00
Pablo Montalvo	413008c580	add uniform processors for altclip + chinese_clip (#31198 ) * add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * remove try/catch block * deprecate kwarg * format * add copyright + remove unused method * [run-slow]altclip, chinese_clip * clean imports * fix version * clean up deprecation * fix style * add corner case test on kwarg overlap * resume processing - add Unpack as importable * add tmpdirname * fix altclip * fix up * add back crop_size to specific tests * generalize tests to possible video_processor * add back crop_size arg * fixup overlapping kwargs test for qformer_tokenizer * remove copied from * fixup chinese_clip tests values * fixup tests - qformer tokenizers * [run-slow] altclip, chinese_clip * remove prepare_image_inputs	2024-09-19 17:21:54 +02:00
Pablo Montalvo	4f0246e535	fix tests with main revision and read token (#33560 ) * fix tests with main revision and read token * [run-slow]mamba2 * test previously skipped tests * [run-slow]mamba2 * skip some tests * [run-slow]mamba2 * finalize tests * [run-slow]mamba2	2024-09-19 17:10:22 +02:00
Joao Gante	f3b3810fe6	rag: fix CI (#33578 )	2024-09-19 11:55:26 +01:00
Raushan Turganbay	d7975a5874	VLMs: enable generation tests (#33533 ) * add tests * fix whisper * update * nit * add qwen2-vl * more updates! * better this way * fix this one * fix more tests * fix final tests, hope so * fix led * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * pr comments * not pass pixels and extra for low-mem tests, very flaky because of visio tower --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-19 12:04:24 +02:00
Raushan Turganbay	e40bb4845e	Load and save video-processor from separate folder (#33562 ) * load and save from video-processor folder * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-19 09:56:52 +02:00
Yoach Lacombe	5af7d41e49	Codec integration (#33565 ) * clean mimi commit * some nits suggestions from Arthur * make fixup * rename repo id + change readme * Update docs/source/en/model_doc/mimi.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add flaky flag to batching equivalence due to audio_codes failing sometimes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-18 19:23:44 +02:00
Joao Gante	7542fac2c7	Pipeline: no side-effects on `model.config` and `model.generation_config` 🔫 (#33480 )	2024-09-18 15:43:06 +01:00
Yoach Lacombe	f883827c0a	Fix tests in ASR pipeline (#33545 )	2024-09-18 16:25:45 +02:00
Raushan Turganbay	db72894b48	Chat template: save and load correctly for processors (#33462 ) * fix * add tests * fix tests * Update tests/models/llava/test_processor_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * fix tests * update tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-18 13:00:44 +02:00
Duygu Altinok	52e22cbf67	Fix for slow the bug tokenizer adding spaces to single id decodes (#32564 ) * _decode signature change and quick return * added bunch of decoding tests * signature match and return * added tests for decoding * merged decoding test * more tests for special tokens * cosmetics * fixed param * ruffed the file * refinement for single special tokens * added test for single special tokens * slight change to test name Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * minor change test name for skip tokens Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * killed already defined var Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * minor update with vars Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * killed already defined var once more Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> --------- Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>	2024-09-18 12:32:02 +02:00
Aymeric Roucher	e6d9f39dd7	Decorator for easier tool building (#33439 ) * Decorator for tool building	2024-09-18 11:07:51 +02:00
Wang, Yi	454a0f2efd	fix patch_attention_mask incorrect setting which leads to the differe… (#33499 ) * fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1 Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * fix format Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * [run_slow] idefics2 --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-09-17 22:24:42 +01:00
teamclouday	6c051b4e1e	Add revision to trainer push_to_hub (#33482 ) * add revision to trainer push_to_hub * apply suggestions * add test for revision * apply ruff format * reorganize imports * change test trainer path	2024-09-17 23:11:32 +02:00
Yoni Gozlan	d8500cd229	Uniformize kwargs for Pixtral processor (#33521 ) * add uniformized pixtral and kwargs * update doc * fix _validate_images_text_input_order * nit	2024-09-17 14:44:27 -04:00
Nikita Krasnytskyi	c29a8694b0	Fix missing `sequences_scores` in the Whisper beam search output (#32970 ) * added sequences_scores to the output * added beam_indices to output * added test to check for beam_indices, sequences_scores and their shape * removed redundant whitespaces * make fixup	2024-09-17 19:36:11 +01:00
ErezSC42	46c27577b3	fix to jamba config, asserting attention and expert offset (#33316 ) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler	2024-09-17 19:29:27 +01:00
Wang, Yi	74026b473e	idefics2 enable_input_require_grads not aligned with disable_input_re… (#33194 ) * idefics2 enable_input_require_grads not aligned with disable_input_require_grads make peft+idefics2 checkpoints disable fail Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * split test case Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * fix ci failure Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * refine test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-09-17 10:39:34 +01:00
Insu Jang	bcf8946f0a	Fix number of patch check for different vision feature select strategy (#32494 ) * Fix number of patch check for different vision feature select strategy * add test --------- Co-authored-by: raushan <raushan@huggingface.co>	2024-09-17 09:33:07 +02:00
Yoach Lacombe	18e1a9c719	Fix parametrization-based weight norm (#33275 ) * refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading * make style * fix sew * fix sew and sew_d tests	2024-09-17 08:05:21 +02:00
Steven Shimizu	ba1f1dc132	Updated Trainer's liger-kernel integration to call correct patching API (#33502 ) * Updated liger-kernel integration in Trainer to call correct patching API * Fixed styling	2024-09-17 02:40:24 +02:00
Yoach Lacombe	98adf24883	[Whisper test] Fix some failing tests (#33450 ) * Fix failing tensor placement in Whisper * fix long form generation tests * more return_timestamps=True * make fixup * [run_slow] whisper * [run_slow] whisper	2024-09-16 19:05:17 +02:00
Yoni Gozlan	2f62146f0e	Uniformize kwargs for LLaVa processor and update docs (#32858 ) * Uniformize kwargs for LlaVa and update docs * Change order of processor inputs in docstring * Improve BC support for reversed images and text inputs * cleanup llava processor call docstring * Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning * Put function check reversed images text outside base processor class * Refactor _validate_images_text_input_order * Add ProcessingUtilTester * fix processing and test_processing	2024-09-16 11:26:26 -04:00
Arthur	8bd2b1e8c2	Add support for Pixtral (#33449 ) * initial commit * gloups * updates * work * weights match * nits * nits * updates to support the tokenizer :) * updates * Pixtral processor (#33454) * rough outline * Add in image break and end tokens * Fix * Udo some formatting changes * Set patch_size default * Fix * Fix token expansion * nit in conversion script * Fix image token list creation * done * add expected results * Process list of list of images (#33465) * updates * working image and processor * this is the expected format * some fixes * push current updated * working mult images! * add a small integration test * Uodate configuration docstring * Formatting * Config docstring fix * simplify model test * fixup modeling and etests * Return BatchMixFeature in image processor * fix some copies * update * nits * Update model docstring * Apply suggestions from code review * Fix up * updates * revert modeling changes * update * update * fix load safe * addd liscence * update * use pixel_values as required by the model * skip some tests and refactor * Add pixtral image processing tests (#33476) * Image processing tests * Add processing tests * woops * defaults reflect pixtral image processor * fixup post merge * images -> pixel values * oups sorry Mr docbuilder * isort * fix * fix processor tests * small fixes * nit * update * last nits * oups this was really breaking! * nits * is composition needs to be true --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-14 12:28:39 +02:00
Marc Sun	6cc4dfe3f1	Fix the initialization of the cache when we have multi gpu (#33303 ) * init cache multi-gpu * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * switch to execution device map * naming more consistant * fix * mutually exclusive device * added an integration example * remove useless check * suggestion from joao + typing * fix couple of typo and add test * revert check --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-13 15:06:08 +02:00
Amit Garg	dfd31158ee	[Phi-3] Bug on stale kv cache (#33129 ) * fix long seq bug * fixed format * fixed fn copy inconsistency * fix long seq bug * fixed format * fixed fn copy inconsistency * Addressed comments * added a unit test * fixed cache position * Added a warning msg to the forward fn * fixed test case	2024-09-13 14:07:19 +02:00
Alvaro Moran	7a5659872a	Mitigate a conflict when using sentencepiece (#33327 ) * test(tokenizers): add a test showing conflict with sentencepiece This is due to the fact that protobuf C implementation uses a global pool for all added descriptors, so if two different files add descriptors, they will end up conflicting. * fix(tokenizers): mitigate sentencepiece/protobuf conflict When sentencepiece is available, use that protobuf instead of the internal one. * chore(style): fix with ruff	2024-09-13 13:19:06 +02:00
Raushan Turganbay	4b0418df11	Enable `padding_side` as call time kwargs (#33385 ) * fix * add padding-side kwarg * add padding side in all models & fix tests * fix copies * fix tests	2024-09-13 11:58:38 +01:00
Wing Lian	1027a532c5	add a callback hook right before the optimizer step (#33444 )	2024-09-13 10:43:45 +02:00
Raushan Turganbay	9c4639b622	Return image hidden states (#33426 ) * fix * return image hidden states * fix copies * fix test	2024-09-13 10:20:03 +02:00
benniekiss	5c6257d1fc	[whisper] Clarify error message when setting max_new_tokens (#33324 ) * clarify error message when setting max_new_tokens * sync error message in test_generate_with_prompt_ids_max_length * there is no self	2024-09-12 18:48:36 +02:00
Raushan Turganbay	2f611d30d9	Qwen2-VL: clean-up and add more tests (#33354 ) * clean-up on qwen2-vl and add generation tests * add video tests * Update tests/models/qwen2_vl/test_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix and add better tests * Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update docs and address comments * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * remove size at all --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 18:24:04 +02:00
Hannan Komari	8ed635258c	Fix flax whisper tokenizer bug (#33151 ) * Update tokenization_whisper.py Fix issue with flax whisper model * Update tokenization_whisper_fast.py Fix issue with flax whisper model * Update tokenization_whisper.py just check len of token_ids * Update tokenization_whisper_fast.py just use len of token_ids * Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update test_tokenization_whisper.py to add test for _convert_to_list method * Update test_tokenization_whisper.py to fix code style issues * Fix code style * Fix code check again * Update test_tokenization)whisper.py to Improve code style * Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available * Update tests/models/whisper/test_tokenization_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method * Revert the changes automatically applied by formatter and was unrelated to PR * Format for minimal changes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 12:21:59 +01:00
Jonathan Mamou	7a51cbc65f	Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258 ) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-11 14:22:28 +02:00
Theia Vogel	e719b65c31	Fix `FbgemmFp8Linear` not preserving tensor shape (#33239 ) * add tests for linear shape behavior * fix linear shape behavior ended up adding the reshape at the end, after f8f8bf16_rowwise, because adding it directly after quantize_fp8_per_row caused f8f8bf16_rowwise to drop the seq_len dimension. (i.e., (17, 23, 1014) -> (17, 1024)) * save shape up front + comment	2024-09-11 13:26:44 +02:00
Ita Zaporozhets	781bbc4d98	use diff internal model in tests (#33387 ) * use diff internal model in tests * use diff internal model in tests	2024-09-11 11:27:00 +02:00
Guang Yang	f38590dade	Make StaticCache configurable at model construct time (#32830 ) * Make StaticCache configurable at model construct time * integrations import structure * add new doc file to toc --------- Co-authored-by: Guang Yang <guangyang@fb.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-09-10 16:35:57 +01:00
Alazar	96429e74a8	Add support for GGUF Phi-3 (#31844 ) * Update docs for GGUF supported models * Add tensor mappings and define class GGUFPhi3Converter * Fix tokenizer * Working version * Attempt to fix some CI failures * Run ruff format * Add vocab, merges, decoder methods like LlamaConverter * Resolve conflicts since Qwen2Moe was added to gguf - I missed one place when resolving conflict - I also made a mistake with tests_ggml.py and now has been fixed to reflect its master version.	2024-09-10 13:32:38 +02:00
Maciej Adamiak	8e8e7d8558	fixed Mask2Former image processor segmentation maps handling (#33364 ) * fixed mask2former image processor segmentation maps handling * introduced review suggestions * introduced review suggestions	2024-09-10 11:19:56 +01:00
Raushan Turganbay	7d2d6ce9cb	VLM: fixes after refactor (#32907 ) * leave only half of the changes * fix tests * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix tests, first try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix, second try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava	2024-09-10 12:02:37 +02:00
Lysandre Debut	f24f084329	Import structure & first three model refactors (#31329 ) * Import structure & first three model refactors * Register -> Export. Export all in __all__. Sensible defaults according to filename. * Apply most comments from Amy and some comments from Lucain Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com> * Style * Add comment * Clearer .py management * Raise if not in backend mapping * More specific type * More efficient listdir * Misc fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com>	2024-09-10 11:10:53 +02:00
amyeroberts	f745e7d3f9	Remove repeated prepare_images in processor tests (#33163 ) * Remove repeated prepare_images * Address comments - update docstring; explanatory comment	2024-09-09 13:20:27 +01:00
Raushan Turganbay	65bb284448	Compile compatibilty for decoder-only models (#32617 ) * squash into one commit * add qwen2-vl for rope standardization * fix mistral compile * fix qwen2-vl * fix-copies	2024-09-09 10:59:04 +02:00
Wing Lian	62aecd85ff	schedulefree optimizers (#30079 ) * schedulefree optimizers * fix train instead of eval for optimizer * fixes and update docs * chore: lint * add tests and drop overly-verbose _32bit suffix * chore: lint * fix for docs * fix code review issues * use duck-typing to avoid per-optimizer patches * fixup style * fixup style * warn if incorrect accelerate version with schedule free Co-authored-by: Aman Gupta Karmani <aman@tmm1.net> --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-09-09 09:51:39 +02:00
Ita Zaporozhets	e48e5f1f13	Support reading tiktoken tokenizer.model file (#31656 ) * use existing TikTokenConverter to read tiktoken tokenizer.model file * del test file * create titktoken integration file * adding tiktoken llama test * ALTNATIVE IMPLEMENTATION: supports llama 405B * fix one char * remove redundant line * small fix * rm unused import * flag for converting from tiktokeng * remove unneeded file * ruff * remove llamatiktokenconverter, stick to general converter * tiktoken support v2 * update test * remove stale changes * udpate doc * protect import * use is_protobuf_available * add templateprocessor in tiktokenconverter * reverting templateprocessor from tiktoken support * update test * add require_tiktoken * dev-ci * trigger build * trigger build again * dev-ci * [build-ci-image] tiktoken * dev-ci * dev-ci * dev-ci * dev-ci * change tiktoken file name * feedback review * feedback rev * applying feedback, removing tiktoken converters * conform test * adding docs for review * add doc file for review * add doc file for review * add doc file for review * support loading model without config.json file * Revert "support loading model without config.json file" This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb. * remove dev var * updating docs * safely import protobuf * fix protobuf import error * fix protobuf import error * trying isort to fix ruff error * fix ruff error * try to fix ruff again * try to fix ruff again * try to fix ruff again * doc table of contents * add fix for consistency.dockerfile torchaudio * ruff * applying feedback * minor typo * merging with push-ci-image * clean up imports * revert dockerfile consistency	2024-09-06 14:24:02 +02:00
Shiyu	342e800086	support 3D attention mask in bert (#32105 ) * support 3D/4D attention mask in bert * test cases * update doc * fix doc	2024-09-06 14:20:48 +02:00
GeLee	2b18354106	add self.head_dim for VisionAttention in Qwen2-VL (#33211 ) * add self.head_dim for VisionAttention in Qwen2-VL * add self.head_dim for VisionAttention in Qwen2-VL * fix ci * black the test_modeling_qwen2_vl.py * use ruff to format test_modeling_qwen2_vl.py * [run-slow] qwen2_vl * use tying for python3.8 * fix the import format * use ruff to fix the ci error I001 * [run-slow] qwen2_vl * remove unused import * commit for rebase * use ruff fix ci * [run-slow] qwen2_vl --------- Co-authored-by: root <liji>	2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi	3314fe1760	Add validation for maximum sequence length in modeling_whisper.py (#33196 ) * Add validation for maximum sequence length in modeling_whisper.py Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message. This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness. * Change exception message in src/transformers/models/whisper/modeling_whisper.py The exception message is for whisper's label's sequence max length. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py It's for whisper's config.max_target_positions. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change method's documentation in src/transformers/models/whisper/modeling_whisper.py * Add test for maximum label's sequence length in test_modeling_whisper.py * Add self to modeling_whisper.py * Update test_modeling_whisper.py with respect to automatic validations * Update modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Separate test_labels_sequence_max_length tests in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Remove assert from test_modeling_whisper.py * Add max_target_positions to WhisperModelTester in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py * Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py * Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py * Add new tests in test_modeling_whisper.py * Update test_modeling_whisper.py --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-09-06 14:09:49 +02:00
Ita Zaporozhets	363301f221	support loading model without config.json file (#32356 ) * support loading model without config.json file * fix condition * update tests * add test * ruff * ruff * ruff	2024-09-06 13:49:47 +02:00
Xuehai Pan	e1c2b69c34	Load dynamic module (remote code) only once if code isn't change (#33162 ) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review	2024-09-06 12:49:35 +01:00
Sanchit Gandhi	51d15eb1c1	[whisper] alternative fix for long-form timestamps (#32131 ) * [whisper] alternative fix for long-form timestamps * update test	2024-09-06 12:57:08 +02:00
Raushan Turganbay	1759bb9126	Fix: StaticCache & `inputs_embeds` (#32932 ) squash commit	2024-09-06 12:56:59 +05:00
Shijie	21fac7abba	simple align qwen2vl kv_seq_len calculation with qwen2 (#33161 ) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-09-05 21:19:30 +05:00
Vladislav Bronzov	5d11de4a2f	Add Qwen2Moe GGUF loading support (#33264 ) * update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe	2024-09-05 17:42:03 +02:00
Joshua Lochner	c6d2848a23	🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision models (#33226 ) * Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models * Apply formatting * Add missing `self.config = config` * Fix copies * Fix hiera interpolation unit test * Formatting * Update `_import_structure` * make style * Fix docstring * Use `# Copied from` instead of utils * DeiT variable renaming (`class_and_dist_pos_embed`) * Fix Hiera `interpolate_pos_encoding`	2024-09-05 16:17:34 +02:00
Younes Belkada	47b096412d	Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195 ) * fix FM training kernels * fix copies * fix copies * propagate to slow path * make it BC * add comment * fix test	2024-09-05 11:55:08 +02:00
Raushan Turganbay	43df47d8e7	Llava Onevision: add model (#32673 ) * working version * fix copies * update * tests * update docs * codestyle * add more tests * add returns for docs * clean up * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * codestyle * style * shouldn't be reversed * [run-slow] llava_onevision * [run-slow] llava_onevision * add pooling in videos * [run-slow] llava_onevision * num-logits-to-keep * [run-slow] llava_onevision * [run-slow] llava_onevision * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * video matched orig impl * fix tests * chat template was modified * Update docs/source/en/model_doc/llava_onevision.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add morer info in the doc page --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-05 14:43:20 +05:00
Yoni Gozlan	9230d78e76	Add validate images and text inputs order util for processors and test_processing_utils (#33285 ) * Add validate images and test processing utils * Remove encoded text from possible inputs in tests * Removed encoded inputs as valid in processing_utils * change text input check to be recursive * change text check to all element of lists and not just the first one in recursive checks	2024-09-04 13:50:31 -04:00
Aymeric Roucher	2cb543db77	Multi agents with manager (#32687 ) * Add Multi agents with a hierarchical system	2024-09-04 17:30:54 +02:00
amyeroberts	d2dcff96f8	[InstructBLIP] qformer_tokenizer is required input (#33222 ) * [InstructBLIP] qformer_tokenizer is required input * Bit safer * Add to instructblipvideo processor * Fix up * Use video inputs * Update tests/models/instructblipvideo/test_processor_instructblipvideo.py	2024-09-04 16:18:06 +01:00
Alex Sherstinsky	122ded0a11	Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188 ) * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities.	2024-09-04 17:01:12 +02:00
laurentd-lunit	d703477265	[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263 ) * [fix] LlavaNextProcessor '_get_unpadded_features' method * [tests] add test_image_token_filling * [chore] style + comment * [minor] improve readability * [chore] run make fix-copies	2024-09-04 17:41:51 +05:00
Joao Gante	d750b509fc	Config: unified logic to retrieve text config (#33219 )	2024-09-04 12:03:30 +01:00
Raushan Turganbay	ebbe8d8014	Cache docs: update (#32929 ) * some changes * more updates * fix cache copy * nits * nits * add tests	2024-09-04 15:05:31 +05:00
Niklas Muennighoff	ecd61c6286	Add OLMoE (#32406 ) * Add OLMoE * Add OLMoE * Updates * Make norm optional; add keys * Add output * Add * Fix dtype * Fix eos config * Update * Add OLMoE * Fix OLMoE path * Format * Format * Rmv copy statement * Rmv copy statement * Format * Add copies * Cp rotary * Fix aming * Fix naming * Update RoPE integration; num_logits_to_keep; Add copy statements * Add eps to config * Format * Add aux loss * Adapt router_aux_loss_coef * Update md * Adapt * adapt tests	2024-09-03 18:43:12 +02:00
Zach Mueller	6b7d64ac1c	Only disallow DeepSpeed Zero-3 for auto bs finder (#31731 ) * Only disallow DeepSpeed * Clean * DeepSpeed! * Add a test for deepspeed	2024-09-03 09:16:28 -04:00
Isotr0py	edeca4387c	🚨 Support dequantization for most GGML types (#32625 ) * use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere	2024-09-03 12:58:14 +02:00
Marc Sun	9ea1eacd11	remove to restriction for 4-bit model (#33122 ) * remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>	2024-09-02 16:28:50 +02:00
Joao Gante	97c0f45b9c	Generate: fix assistant in different device (#33257 )	2024-09-02 14:37:49 +01:00
Matt	52a0213755	Add assistant prefill for chat templates and TextGenerationPipeline (#33198 ) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-09-02 13:23:47 +01:00
Jeongseok Kang	963ed98bed	docs: Replace package abbreviations with full name(`bitsandbytes`) in docstrings (#33230 ) * docs: Provide fullname for `bitsandbytes` package * docs: Provide fullname for `bitsandbytes` package (2)	2024-09-02 13:40:34 +02:00
Aymeric Roucher	1ca9ff5c91	Add duckduckgo search tool (#32882 ) * Add duckduckgo search tool	2024-09-02 09:56:20 +02:00
Joao Gante	eb5b968c5d	Generate: throw warning when `return_dict_in_generate` is False but should be True (#33146 )	2024-08-31 10:47:08 +01:00
Arthur	b017a9eb11	Refactor CI: more explicit (#30674 ) * don't run custom when not needed? * update test fetcher filtering * fixup and updates * update * update * reduce burden * nit * nit * mising comma * this? * this? * more parallelism * more * nit for real parallelism on tf and torch examples * update * update * update * update * update * update * update * update * update * update * update * update * update to make it more custom * update to make it more custom * update to make it more custom * update to make it more custom * update * update * update * update * update * update * use correct path * fix path to test files and examples * filter-tests * filter? * filter? * filter? * nits * fix naming of the artifacts to be pushed * list vs files * list vs files * fixup * fix list of all tests * fix the install steps * fix the install steps * fix the config * fix the config * only split if needed * only split if needed * extend should fix it * extend should fix it * arg * arg * update * update * run tests * run tests * run tests * more nits * update * update * update * update * update * update * update * simpler way to show the test, reduces the complexity of the generated config * simpler way to show the test, reduces the complexity of the generated config * style * oups * oups * fix import errors * skip some tests for now * update doctestjob * more parallelism * fixup * test only the test in examples * test only the test in examples * nits * from Arthur * fix generated congi * update * update * show tests * oups * oups * fix torch job for now * use single upload setp * oups * fu*k fix * nit * update * nit * fix * fixes * [test-all] * add generate marker and generate job * oups * torch job runs not generate tests * let repo utils test all utils * UPdate * styling * fix repo utils test * more parallel please * don't test * update * bit more verbose sir * more * hub were skipped * split by classname * revert * maybe? * Amazing catch Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix * update * update * maybe non capturing * manual convert? * pass artifacts as parameters as otherwise the config is too long * artifact.json * store output * might not be safe? * my token * mmm? * use CI job IS * can't get a proper id? * ups * build num * update * echo url * this? * this! * fix * wget * ish * dang * udpdate * there we go * update * update * pass all * not .txt * update * fetcg * fix naming * fix * up * update * update * ?? * update * more updates * update * more * skip * oups * pr documentation tests are currently created differently * update * hmmmm * oups * curl -L * update * ???? * nit * mmmm * ish * ouf * update * ish * update * update * updatea * nit * nit * up * oups * documentation_test fix * test hub tests everything, just marker * update * fix * test_hub is the only annoying one now * tf threads? * oups * not sure what is happening? * fix? * just use folder for stating hub * I am getting fucking annoyed * fix the test? * update * uupdate * ? * fixes * add comment! * nit --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2024-08-30 18:17:25 +02:00
Matt	38d58a4427	Fix local repos with remote code not registering for pipelines (#33100 ) * Extremely experimental fix! * Try removing the clause entirely * Add test * make fixup * stash commit * Remove breakpoint * Add anti-regression test * make fixup * Move repos to hf-internal-testing!	2024-08-30 16:56:22 +01:00
Gerben van V	5129671290	Add a static cache that offloads to the CPU or other device (#32161 ) * Add a static cache that offloads to the CPU or other device * Fix PR comments, add unit-tests	2024-08-29 11:51:09 +02:00
rasmi	f9ed05dd03	Fix import paths for test_module (#32888 ) * Fix import path for test_feature_extraction_utils.py See https://github.com/huggingface/transformers/pull/32601 * Fix import path for test_image_processing_utils.py	2024-08-28 12:08:29 +01:00
JB (Don)	f1a385b1de	[RoBERTa-based] Add support for sdpa (#30510 ) * Adding SDPA support for RoBERTa-based models * add not is_cross_attention * fix copies * fix test * add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin * address some review comments * use copied from * style * consistency * fix lists --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-28 10:26:00 +02:00
Anton Vlasjuk	3bfd3e4803	Fix: Jamba batched generation (#32914 ) * init fix * fix mask during cached forward, move mask related stuff to own function * adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp) * revert overwriting new integration tests * move some comments to docstring	2024-08-28 09:24:06 +02:00
Mayank Mishra	c35d2ccf5a	Granite language models (#31502 ) * first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-27 21:27:21 +02:00
Juan Pizarro	7591ca5bc5	🚨 Add Blip2ForImageTextRetrieval (#29261 ) * add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 18:50:27 +01:00
Joao Gante	c6b23fda65	Llama: make slow tests green 🟢 (#33138 )	2024-08-27 14:44:42 +01:00
Matt	9956c2bc98	Add a fix for custom code tokenizers in pipelines (#32300 ) * Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests	2024-08-27 14:39:57 +01:00
Joao Gante	ab0ac3b98f	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 ) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 11:58:27 +01:00
Aya	7562366d4b	fix: multilingual midel convert to tflite get wrong token (#32079 ) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>	2024-08-27 11:44:09 +02:00
Sai-Suraj-27	3bf6dd8aa1	fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850 ) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen	2024-08-27 09:20:59 +02:00
Joao Gante	72d4a3f9c1	mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#33099 )	2024-08-26 15:34:19 +01:00
Joao Gante	894d421ee5	Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#33093 )	2024-08-26 15:23:30 +01:00
Shijie	19e6e80e10	support qwen2-vl (#32318 ) * support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>	2024-08-26 15:16:44 +02:00
Matt	371b9c1486	Enable some Jinja extensions and add datetime capabilities (#32684 ) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils	2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00
Joao Gante	975b988bfe	Gemma2: eager attention by default (#32865 )	2024-08-22 15:59:30 +01:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Marc Sun	fd06ad5438	🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627 ) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci	2024-08-20 11:42:36 +02:00
Younes Belkada	93e538ae2e	Mamba / FalconMamba: Fix mamba left padding (#32677 ) * fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-19 16:01:35 +02:00
Fanli Lin	e55b33ceb4	[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519 ) * enable * fix	2024-08-19 12:46:59 +01:00
Kamil Akesbi	8260cb311e	Add Descript-Audio-Codec model (#31494 ) * dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-08-19 10:21:51 +01:00
MAHIR DAIYAN	843e5e20ca	Add Flax Dinov2 (#31960 ) * tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-08-19 09:28:13 +01:00
Zach Mueller	8ec028aded	Reduce the error log when using core models that need their weights renamed, and provide a step forward (#32656 ) * Fin * Modify msg * Finish up nits	2024-08-16 13:05:57 -04:00
Zach Mueller	0b066bed14	Revert PR 32299, flag users when Zero-3 was missed (#32851 ) Revert PR 32299	2024-08-16 12:35:41 -04:00

1 2 3 4 5 ...

4217 Commits