transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 22:38:58 +06:00

Author	SHA1	Message	Date
Matt	a7d1441d65	Correctly list the chat template file in the Tokenizer saved files list (#34974 ) * Correctly list the chat template file in the saved files list * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add save file checking to test * make fixup * better filename handling * make fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-07 19:11:02 +00:00
eustlb	7f7677307c	[Qwen2Audio] handle input ids expansion during processing (#35534 ) * add audio_token attribute to proc * expand input_ids * and legacy and expanded input_ids * test update * split lines * add possibility not to provide eos and bos audio tokens * raise errors * test incorrect number of audio tokens * add example * fmt * typo	2025-01-07 16:47:27 +01:00
Francesco Cariaggi	f408d55448	Fix bug when requesting input normalization with EnCodec (#34756 ) * EnCodec: unsqueeze padding mask * add test for normalization	2025-01-07 11:50:02 +01:00
松本和真	96bf3d6cc5	Add diffllama (#34083 ) * first adding diffllama * add Diff Attention and other but still with errors * complate make attention Diff-Attention * fix some bugs which may be caused by transformer-cli while adding model * fix a bug caused by forgetting KV cache... * Update src/transformers/models/diffllama/modeling_diffllama.py You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward. Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place. and more visible Co-authored-by: Minho Ryu <ryumin93@gmail.com> * I found Attention missed implemented from paper still on `e072544a3b`. * re-implemented * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * align with transformers code style Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix typo Co-authored-by: Minho Ryu <ryumin93@gmail.com> * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * change SdpaAttention to DiffSdpaAttention Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bug * Update src/transformers/models/diffllama/modeling_diffllama.py resolve "not same outputs" problem Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bugs of places of "GroupNorm with scale" and etc * Revert "fix bugs of places of "GroupNorm with scale" and etc" This reverts commit `26307d92f6`. * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * remove missed type * add diffllama model_doc * apply make style/quality * apply review comment about model * apply review comment about test * place diffllama alphabetically on the src/transformers/__init__.py * fix forgot code * Supports parameters that are not initialized with standard deviation 0 in the conventional method * add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py * remove unused property of config * add to supported model list * add to spda supported model list * fix copyright, remove pretraining_tensor_parallel, and modify for initialization test * remove unused import and etc. * empty commit * empty commit * empty commit * apply modular transformers but with bugs * revert prev commit * create src/transformers/model/diffllama/modular_diffllama.py * run utils/modular_model_converter.py * empty commit * leaner modular diffllama * remove more and more in modular_diffllama.pt * remove more and more in modular_diffllama.pt * resolve missing docstring entries * force reset * convert modular --------- Co-authored-by: Minho Ryu <ryumin93@gmail.com>	2025-01-07 11:34:56 +01:00
Dmitry Rogozhkin	9fd123ac31	ci: mark model_parallel tests as cuda specific (#35269 ) `parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: #35252 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-07 10:16:34 +01:00
pglorio	bd442c6d3a	Zamba new attention standard (#35375 ) * updated zamba to new attention standard * make fixup fixes	2025-01-07 10:08:45 +01:00
Sarthak Karandikar	ca00950057	added logic for deleting adapters once loaded (#34650 ) * added logic for deleting adapters once loaded * updated to the latest version of transformers, merged utility function into the source * updated with missing check * added peft version check * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * changes according to reviewer * added test for deleting adapter(s) * styling changes * styling changes in test * removed redundant code * formatted my contributions with ruff * optimized error handling * ruff formatted with correct config * resolved formatting issues --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-01-06 18:36:40 +00:00
Yijun Lee	e5fd865eba	Add Gemma2 GGUF support (#34002 ) * initial setup for ggml.py * initial setup of GGUFGemma2Converter class * Add gemma2 model to gguf.md doc * Partial work on GGUF_TENSOR_MAPPING * initial setup of GGUF_TENSOR_MAPPING for Gemma2 * refactor: rename GemmaConvert class to GemmaConverter for naming consistency * feat: complete gemma2 tensor mapping implementation * feat: add initial implementation of GGUFGemmaConverter * feat: complete GGUFGemmaConverter implementation * feat: add test code for gemma2 * refactor: minor code cleanup * refactor: minor code cleanup * fix: resolve suggestions * Update tests/quantization/ggml/test_ggml.py Co-authored-by: Isotr0py <2037008807@qq.com> --------- Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-03 14:50:07 +01:00
Jacky Lee	30a9971632	Use `sdpa_kernel` in tests (#35472 ) * update: use sdpa_kernel * update: rerun test	2025-01-03 14:39:52 +01:00
Blanchon	cba49cb2a6	Change `is_soundfile_availble` to `is_soundfile_available` (#35030 )	2025-01-03 14:37:42 +01:00
Matthew Douglas	6b1e86fd4d	Fix new BNB test failures (#35345 )	2025-01-02 11:24:52 +01:00
NielsRogge	6e0515e99c	Add DINOv2 with registers (#35348 ) * added changes from 32905 * fixed mistakes caused by select all paste * rename diff_dinov2... * ran tests * Fix modular * Fix tests * Use new init * Simplify drop path * Convert all checkpoints * Add figure and summary * Update paths * Update docs * Update docs * Update toctree * Update docs --------- Co-authored-by: BernardZach <bernardzach00@gmail.com> Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com>	2024-12-24 13:21:59 +01:00
Yoni Gozlan	93aafdc620	Add compile test for fast image processor (#35184 ) * add compile test for fast image processor * override pixtral test	2024-12-23 13:12:45 -05:00
Miquel Farré	a1780b7ba5	bugfix Idefics3 processor - handle gracefully cases with text and no images (#35363 ) * bugfix processing empty images * fix * fix * Update src/transformers/models/idefics3/processing_idefics3.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * adding tests * fix * fix * fix --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2024-12-23 16:59:01 +01:00
Andrei Panferov	64c05eecd6	HIGGS Quantization Support (#34997 ) * higgs init * working with crunches * per-model workspaces * style * style 2 * tests and style * higgs tests passing * protecting torch import * removed torch.Tensor type annotations * torch.nn.Module inheritance fix maybe * hide inputs inside quantizer calls * style structure something * Update src/transformers/quantizers/quantizer_higgs.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * reworked num_sms * Update src/transformers/integrations/higgs.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * revamped device checks * docstring upd * Update src/transformers/quantizers/quantizer_higgs.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * edited tests and device map assertions * minor edits * updated flute cuda version in docker * Added p=1 and 2,3bit HIGGS * flute version check update * incorporated `modules_to_not_convert` * less hardcoding * Fixed comment * Added docs * Fixed gemma support * example in docs * fixed torch_dtype for HIGGS * Update docs/source/en/quantization/higgs.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Collection link * dequantize interface * newer flute version, torch.compile support * unittest message fix * docs update compile * isort * ValueError instead of assert --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2024-12-23 16:54:49 +01:00
Mohamed Mekkouri	59178780a6	Fix : VPTQ test (#35394 ) fix_test	2024-12-23 16:27:46 +01:00
Tibor Reiss	e10be82b71	uniformize kwargs for SAM (#34578 ) * Make kwargs uniform for SAM * Remove unused attribute * Make point_pad_value part of image_kwargs * Update annotations * Code review - use existing methods * Use ProcessorTesterMixin * Do not add ProcessorTesterMixin everywhere	2024-12-23 13:54:57 +01:00
bastrob	8f38f58f3d	owlvit/2 dynamic input resolution (#34764 ) * owlvit/2 dynamic input resolution. * adapt box grid to patch_dim_h patch_dim_w * fix ci * clarify variable naming * clarify variable naming.. * compute box_bias dynamically inside box_predictor * change style part of code * [run-slow] owlvit, owlv2	2024-12-21 08:51:09 +00:00
Yih-Dar	504c4d3692	Make `test_generate_with_static_cache` even less flaky (#34995 ) * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-20 16:03:26 +01:00
Yih-Dar	05de764e9c	Aurevoir PyTorch 1 (#35358 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-20 14:36:31 +01:00
Sigbjørn Skjæret	eafbb0eca7	Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931 ) * Add AsyncTextIteratorStreamer class * export AsyncTextIteratorStreamer * export AsyncTextIteratorStreamer * improve docs * missing import * missing import * doc example fix * doc example output fix * add pytest-asyncio * first attempt at tests * missing import * add pytest-asyncio * fallback to wait_for and raise TimeoutError on timeout * check for TimeoutError * autodoc * reorder imports * fix style --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-20 12:08:12 +01:00
wejoncy	4e27a4009d	FEAT : Adding VPTQ quantization method to HFQuantizer (#34770 ) * init vptq * add integration * add vptq support fix readme * add tests && format * format * address comments * format * format * address comments * format * address comments * remove debug code * Revert "remove debug code" This reverts commit `ed3b3eaaba`. * fix test --------- Co-authored-by: Yang Wang <wyatuestc@gmail.com>	2024-12-20 09:45:53 +01:00
Anton Vlasjuk	5a2aedca1e	[`Mamba2`] Fix caching, slow path, and multi-gpu (#35154 ) * fixup mamba2 - caching and several other small fixes * fixup cached forward * correct fix this time * fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step) * remove unnecessary (un)squeeze * fixup cache position * simplify a few things * [run-slow] mamba2 * multi gpu attempt two * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * add newer slow path fix * [run-slow] mamba2	2024-12-20 09:27:47 +01:00
Arthur	1fa807fa63	Fix some fa2 tests (#35340 ) * remove fa2 test * remove other failing tests * style	2024-12-19 17:05:25 +01:00
Benjamin Warner	667ed5635e	Add ModernBERT to Transformers (#35158 ) * initial cut of modernbert for transformers * small bug fixes * fixes * Update import * Use compiled mlp->mlp_norm to match research implementation * Propagate changes in modular to modeling * Replace duplicate attn_out_dropout in favor of attention_dropout cc @warner-benjamin let me know if the two should remain separate! * Update BOS to CLS and EOS to SEP Please confirm @warner-benjamin * Set default classifier bias to False, matching research repo * Update tie_word_embeddings description * Fix _init_weights for ForMaskedLM * Match base_model_prefix * Add compiled_head to match research repo outputs * Fix imports for ModernBertForMaskedLM * Just use "gelu" default outright for classifier * Fix config name typo: initalizer -> initializer * Remove some unused parameters in docstring. Still lots to edit there! * Compile the embeddings forward Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model. But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case. * Add drafts for ForSequenceClassification/ForTokenClassification * Add initial SDPA support (not exactly equivalent to FA2 yet!) During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible. * Only use attention dropout if training * Add initial eager attention support (also not equivalent to FA2 yet!) Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value. Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value The fill-mask results are good with eager. * Add initial tests, output_attentions, output_hidden_states, prune_heads Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped * Remove kwargs from ModernBertForMaskedLM Disable sparse_prediction by default to match the normal HF, can be enabled via config * Remove/adjust/skip improper tests; warn if padding but no attn mask * Run formatting etc. * Run python utils/custom_init_isort.py * FlexAttention with unpadded sequences(matches FA2 within bf16 numerics) * Reformat init_weights based on review * self -> module in attention forwards * Remove if config.tie_word_embeddings * Reformat output projection on a different line * Remove pruning * Remove assert * Call contiguous() to simplify paths * Remove prune_qkv_linear_layer * Format code * Keep as kwargs, only use if needed * Remove unused codepaths & related config options * Remove 3d attn_mask test; fix token classification tuple output * Reorder: attention_mask above position_ids, fixes gradient checkpointing * Fix usage if no FA2 or torch v2.5+ * Make torch.compile/triton optional Should we rename 'compile'? It's a bit vague * Separate pooling options into separate functions (cls, mean) - cls as default * Simplify _pad_modernbert_output, remove unused labels path * Update tied weights to remove decoder.weight, simplify decoder loading * Adaptively set config.compile based on hf_device_map/device/resize, etc. * Update ModernBertConfig docstring * Satisfy some consistency checks, add unfinished docs * Only set compile to False if there's more than 1 device * Add docstrings for public ModernBert classes * Dont replace docstring returns - ends up being duplicate * Fix mistake in toctree * Reformat toctree * Patched FlexAttention, SDPA, Eager with Local Attention * Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2 * Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"' * Repad all_hidden_states as well * rename config.compile to reference_compile * disable flex_attention since it crashes * Update modernbert.md * Using dtype min to mask in eager * Fully remove flex attention for now It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa. Also, update compile -> reference_compile in one more case * Call contiguous to allow for .view() * Copyright 2020 -> 2024 Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update/simplify __init__ structure Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove "... if dropout_prob > 0 else identity" As dropout with 0.0 should be efficient like identity * re-use existing pad/unpad functions instead of creating new ones * remove flexattention method * Compute attention_mask and local_attention_mask once in modeling * Simplify sequence classification prediction heads, only CLS now Users can make custom heads if they feel like it Also removes the unnecessary pool parameter * Simplify module.training in eager attn * Also export ModernBertPreTrainedModel * Update the documentation with links to finetuning scripts * Explain local_attention_mask parameter in docstring * Simplify _autoset_attn_implementation, rely on super() * Keep "in" to initialize Prediction head Doublechecked with Benjamin that it's correct/what we used for pretraining * add back mean pooling * Use the pooling head in TokenClassification * update copyright * Reset config._attn_implementation_internal on failure * Allow optional attention_mask in ForMaskedLM head * fix failing run_slow tests * Add links to the paper * Remove unpad_no_grad, always pad/unpad without gradients * local_attention_mask -> sliding_window_mask * Revert "Use the pooling head in TokenClassification" This reverts commit `99c38badd1`. There was no real motivation, no info on whether having this bigger head does anything useful. * Simplify pooling, 2 options via if-else --------- Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com> Co-authored-by: Benjamin Clavié <ben@clavie.eu> Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-19 14:03:35 +01:00
Yu Chin Fabian Lim	9613933b02	Add the Bamba Model (#34982 ) * initial commit for PR Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> * rename dynamic cache Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more unit tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Add modular bamba file * Remove trainer changes from unrelated PR * Modify modular and cofig to get model running * Fix some CI errors and beam search * Fix a plethora of bugs from CI/docs/etc * Add bamba to models with special caches * Updat to newer mamba PR for mamba sublayer * fix test_left_padding_compatibility Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix remaining tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * missed this test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * ran make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * move slow tag to integration obj Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * address comments Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * left out one part of modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * change model Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Make Rotary modular as well * Update bamba.md Added overview, update Model inference card and added config * Update bamba.md * Update bamba.md * Update bamba.md Minor fixes * Add docs for config and model back Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Add warning when using fast kernels * replaced generate example Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Address comments from PR Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Propagate attention fixes Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix attention interfaces to the new API Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix API for decoder layer Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Remove extra weights Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> Co-authored-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com> Co-authored-by: Antoni Viros <ani300@gmail.com>	2024-12-18 20:18:17 +01:00
Arthur	2c47618c1a	🚨All attention refactor🚨 (#35235 ) * refactor LlamaAttention * minimal changes * fix llama * update * modular gemmas * modular nits * modular updates * nits * simplify * gpt2 * more modualr and fixes * granite * modular modular modular * nits * update * qwen2 + starcoder2 * mostly gemma2 * Update image_processing_auto.py * fix * Update modular_starcoder2.py * fix * remove all copied from attentions * remove gcv * make fix-copies * oups * oups2.0 * fix some modulars + all copied from * should be good now * revert unwanted changes * Update modeling_decision_transformer.py * finish cleanup * Update modeling_olmo.py * consistency * re-add gradient checkpointing attribute * fix * style * make config necessary * bis * bis * Update modeling_my_new_model2.py * is_causal attr * fix * remove past kv return from decoder layer * fix * default rope config * correctly fix rope config * fix bias * fix gpt2 attention output * fix test * fix inits * fix default sdpa * fix default sdpa implementation * harmonize classes * fix mistral * fix sliding window models * mixtral * be more explicit * style * fix * several fixes * Update modeling_dbrx.py * fix test * olmo + phi * rotary * syle * phi * phi again * again * kwargs * Update test_modeling_common.py * skip fx tracing tests * Update modeling_utils.py * gemma 2 * again * Update modeling_recurrent_gemma.py * gemma2 * granite * style * starcoder * Update sdpa_attention.py * switch args * Update modeling_mllama.py * fix * cache type tests * gpt2 * Update test_modeling_common.py * fix * consistency * fix shape with encoder * should be the last one * tests non model * most comments * small oupsi * be more explicit in modulars * more explicit modulars * CIs! it works locally * add kwargs to _flash_attention_forward --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2024-12-18 16:53:39 +01:00
jiqing-feng	69e31eb1bf	change bnb tests (#34713 ) * fix training tests * fix xpu check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm pdb Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit logits check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit logits check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add xpu check on int8 training * fix training tests * add llama test on bnb Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * only cpu and xpu disable autocast training Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>	2024-12-18 09:49:59 -05:00
eustlb	da334bcfa8	[Whisper] 🚨 Fix whisper decoding 🚨 (#34135 ) * do not remove decoder_input_ids for the first segment * do not remove eos token in generate_with_fallback * when removing padding tokens, do not remove eos token * remove eos token in generate (and not in generate_with_fallback!) * reconciliate short-from/ long-form behavior * correct avg_logprobs calculation * handle eos token in segments * handle decoder_input_ids and eos token in _prepare_decoder_input_ids * fix incorrect time precision * always remove eos token * always remove decoder_input_ids * no need to handle decoder_inputs_ids and eos token * no need to remove decoder_input_ids * no need to handle eos token * fix num_beams in _retrieve_logit_processors * remove todo unconsistency * no need to add eos token * last_timestamp_pos should indeed be timestamp token pos * patch generate to enable compatibility with GenerationTesterMixin tests * adapt test_generate_continue_from_past_key_values * adapt test_prompt_lookup_decoding_matches_greedy_search * adapt generic GenerationMixin tests to whisper's generate * fix speculative decoding * fix * [run-slow] whisper * change HF_HUB_TOKEN for require_read_token * [run-slow] whisper * prioritize kwargs over generation_config * remove unnecessary args * [run-slow] whisper * update tests * [run-slow] whisper * add comment * update test * [run-slow] whisper * update test + revert require_read_token * docstring updates * revert tokenizer decode args change * do not use a patch + docstring updates * [run-slow] whisper * make * [run-slow] whisper * add a flag to force unique call to generate * test update * [run-slow] whisper * add force_unique_generate_call arg * do not use a patch * correct the timestamps for the pad tokens * docstring update * docstring update * docstring update * upodate TF tests * add require_read_token * [run-slow] whisper * test reset dynamo * [run-slow] whisper * fix * [run-slow] whisper * avoid iterating twice on current_segments * [run-slow] whisper * [run-slow] whisper --------- Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-18 14:13:21 +01:00
Fanli Lin	c7e48053aa	[tests] make cuda-only tests device-agnostic (#35222 ) fix cuda-only tests	2024-12-18 10:14:22 +01:00
Marc Sun	1eee1cedfd	Fix loading with only state dict and low_cpu_mem_usage = True (#35217 ) * fix loading with only state dict and config * style * add tests --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2024-12-18 09:54:32 +01:00
Magnus	6eb00dd2f0	Support for SDPA for SAM models (#34110 ) * feat: add support for sdpa and gradient checkpointing * fix: ruff format * fix: config sdpa * fix: sdpa layer naming convention * fix: update test_eager_matches_sdpa_inference to handle vision_hidden_states * test: skip incompatible tests and fix loading issue with sdpa - Updated tests to skip cases flash and dynamic compile. - Minor adjustment to ensure correct loading of model with sdpa for dispatch test. * style: apply Ruff formatting * ruff fix again after rebase * [run-slow] sam * [run-slow] sam * refactor: Address review comments and improve sub-config handling in SAM model tests - Added attributes for sub_configs as per PR #34410. - Enabled tests for configs, ensuring the composite model (SAM) has several sub-configs in the main config. - Added class attribute _is_composite=True to the tester class - test_sdpa_can_dispatch_composite_models added * [run-slow] sam * style: ruff * [run-slow] sam * style: ruff again ... * [run-slow] sam	2024-12-17 14:46:05 +01:00
Omar Salman	747f361da1	Add sdpa for Beit (#34941 ) * Add sdpa for Beit * Updates * [run-slow] beit * Update inference benchmarks * Update * Fix - add missed to super().forward() * Updates * Fix missing import	2024-12-17 14:44:47 +01:00
Tony Wu	f33a0cebb3	Add ColPali to 🤗 transformers (#33736 ) * feat: run `add-new-model-like` * feat: add paligemma code with "copied from" * feat: add ColPaliProcessor * feat: add ColPaliModel * feat: add ColPaliConfig * feat: rename `ColPaliForConditionalGeneration` to `ColPaliModel` * fixup modeling colpali * fix: fix root import shortcuts * fix: fix `modeling_auto` dict * feat: comment out ColPali test file * fix: fix typos from `add-new-model-like` * feat: explicit the forward input args * feat: move everything to `modular_colpali.py` * fix: put back ColPaliProcesor * feat: add auto-generated files * fix: run `fix-copies` * fix: remove DOCStRING constants to make modular converter work * fix: fix typo + modular converter * fix: add missing imports * feat: no more errors when loading ColPaliModel * fix: remove unused args in forward + tweak doc * feat: rename `ColPaliModel` to `ColPaliForRetrieval` * fix: apply `fix-copies` * feat: add ColPaliProcessor to `modular_colpali` * fix: run make quality + make style * fix: remove duplicate line in configuration_auto * feat: make ColPaliModel inehrit from PaliGemmaForConditionalGeneration * fix: tweak and use ColPaliConfig * feat: rename `score` to `post_process_retrieval` * build: run modular formatter + make style * feat: convert colpali weights + fixes * feat: remove old weight converter file * feat: add and validate tests * feat: replace harcoded path to "vidore/colpali-v1.2-hf" in tests * fix: add bfloat16 conversion in weight converter * feat: replace pytest with unittest in modeling colpali test * feat: add sanity check for weight conversion (doesn't work yet) * feat: add shape sanity check in weigth converter * feat: make ColPaliProcessor args explicit * doc: add doc for ColPali * fix: trying to fix output mismatch * feat: tweaks * fix: ColPaliModelOutput inherits from ModelOutput instead of PaliGemmaCausalLMOutputWithPast * fix: address comments on PR * fix: adapt tests to the Hf norm * wip: try things * feat: add `__call__` method to `ColPaliProcessor` * feat: remove need for dummy image in `process_queries` * build: run new modular converter * fix: fix incorrect method override * Fix tests, processing, modular, convert * fix tokenization auto * hotfix: manually fix processor -> fixme once convert modular is fixed * fix: convert weights working * feat: rename and improve convert weight script * feat: tweaks * fest: remove `device` input for `post_process_retrieval` * refactor: remove unused `get_torch_device` * Fix all tests * docs: update ColPali model doc * wip: fix convert weights to hf * fix logging modular * docs: add acknowledgements in model doc * docs: add missing docstring to ColPaliProcessor * docs: tweak * docs: add doc for `ColPaliForRetrievalOutput.forward` * feat: add modifications from colpali-engine v0.3.2 in ColPaliProcessor * fix: fix and upload colapli hf weights * refactor: rename `post_process_retrieval` to `score_retrieval` * fix: fix wrong typing for `score_retrieval` * test: add integration test for ColPali * chore: rerun convert modular * build: fix root imports * Update docs/source/en/index.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix: address PR comments * wip: reduce the prediction gap in weight conversion * docs: add comment in weight conversion script * docs: add example for `ColPaliForRetrieval.forward` * tests: change dataset path to the new one in hf-internal * fix: colpali weight conversion works * test: add fine-grained check for ColPali integration test * fix: fix typos in convert weight script * docs: move input docstring in a variable * fix: remove hardcoded torch device in test * fix: run the new modular refactor * docs: fix python example for ColPali * feat: add option to choose `score_retrieval`'s output dtype and device * docs: update doc for `score_retrieval` * feat: add `patch_size` property in ColPali model * chore: run `make fix-copies` * docs: update description for ColPali cookbooks * fix: remove `ignore_index` methods * feat: remove non-transformers specific methods * feat: update `__init__.py` to new hf format * fix: fix root imports in transformers * feat: remove ColPali's inheritance from PaliGemma * Fix CI issues * nit remove prints * feat: remove ColPali config and model from `modular_colpali.py` * feat: add `ColPaliPreTrainedModel` and update modeling and configuration code * fix: fix auto-removed imports in root `__init__.py` * fix: various fixes * fix: fix `_init_weight` * temp: comment `AutoModel.from_config` for experiments * fix: add missing `output_attentions` arg in ColPali's forward * fix: fix `resize_token_embeddings` * fix: make `input_ids` optional in forward * feat: rename `projection_layer` to `embedding_proj_layer` * wip: fix convert colpali weight script * fix tests and convert weights from original repo * fix unprotected import * fix unprotected torch import * fix style * change vlm_backbone_config to vlm_config * fix unprotected import in modular this time * fix: load config from Hub + tweaks in convert weight script * docs: move example usage from model docstring to model markdown * docs: fix input docstring for ColPali's forward method * fix: use `sub_configs` for ColPaliConfig * fix: remove non-needed sanity checks in weight conversion script + tweaks * fix: fix issue with `replace_return_docstrings` in ColPali's `forward` * docs: update docstring for `ColPaliConfig` * test: change model path in ColPali test * fix: fix ColPaliConfig * fix: fix weight conversion script * test: fix expected weights for ColPali model * docs: update ColPali markdown * docs: fix minor typo in ColPaliProcessor * Fix tests and add _no_split_modules * add text_config to colpali config * [run slow] colpali * move inputs to torch_device in integration test * skip test_model_parallelism * docs: clarify quickstart snippet in ColPali's model card * docs: update ColPali's model card --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2024-12-17 11:26:43 +01:00
Mohamed Mekkouri	85eb339231	Fix : model used to test ggml conversion of Falcon-7b is incorrect (#35083 ) fixing test model	2024-12-16 13:21:44 +01:00
Yoni Gozlan	5615a39369	Fall back to slow image processor in ImageProcessingAuto when no fast processor available (#34785 ) * refactor image_processing_auto logic * fix fast image processor tests * Fix tests fast vit image processor * Add safeguard when use_fast True and torchvision not available * change default use_fast back to None, add warnings * remove debugging print * call get_image_processor_class_from_name once	2024-12-15 14:00:36 -05:00
Fanli Lin	bdd4201fdb	[tests] fix "Tester object has no attribute '_testMethodName'" (#34910 ) * add more cases * fix method not found in unittest Signed-off-by: Lin, Fanli <fanli.lin@intel.com> * fix more cases * add more models * add all * no unittest.case * remove for oneformer * fix style --------- Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2024-12-13 14:33:45 +01:00
nhamanasu	3d213b57fe	skip Fuyu from test_generate (#35246 ) * skip Fuyu from test_generate * make fixup, quality, repo-consistency	2024-12-13 10:12:49 +01:00
alexrs-cohere	64478c7631	Add Cohere2 model (#35224 )	2024-12-13 09:35:50 +01:00
George	e4e404fdd0	Run model as compressed/uncompressed mode (#34719 ) * draft, run model as compreszed/uncompressed mode * draft * run run_compressed=False * run_compressed as attr * set run_compressed=False using quantization_config * remove redundant line * make is_qat_trainable dependent on run_compressed status * add tests * lint * full in docstring * add decompress * comments * decompress if model is compresssed and not run_compressed * apply_quant_config logic fix -- populate statedict properly * comments * remove non compressed model * make is_compressed as property * cosmetic * run apply_quant_config for non-compressed models -- popualte scales and zeropoints * add pahtway for decompressing sparse models * typo on is_quantization_compressed * lint * fix typo	2024-12-13 08:23:31 +01:00
Nadav Timor	e3ee49fcfb	Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability (#35009 ) * move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * NOTHING. add space to rerun github actions tests * remove it... * replace: `self.prev_tokens` -> `self.prev_assistant_ids` * NOTHING. rerun CI tests * remove it * introduce `self.prev_target_ids_len` * fix style * fix style --------- Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>	2024-12-12 15:47:05 +01:00
Yoach Lacombe	6181c6b095	Fix seamless TTS generate (#34968 ) * fix seamless tts generate * apply same fix for v2 * [run-slow] seamless_m4t, seamless_m4t_v2 * remove TODO * [run-slow] seamless_m4t, seamless_m4t_v2 * [run-slow] seamless_m4t, seamless_m4t_v2 * ignore failing test on multigpus * [run-slow] seamless_m4t, seamless_m4t_v2 * [run-slow] seamless_m4t, seamless_m4t_v2	2024-12-11 15:38:42 +01:00
Pavel Iakubovskii	5fcf6286bf	Add TimmWrapper (#34564 ) * Add files * Init * Add TimmWrapperModel * Fix up * Some fixes * Fix up * Remove old file * Sort out import orders * Fix some model loading * Compatible with pipeline and trainer * Fix up * Delete test_timm_model_1/config.json * Remove accidentally commited files * Delete src/transformers/models/modeling_timm_wrapper.py * Remove empty imports; fix transformations applied * Tidy up * Add image classifcation model to special cases * Create pretrained model; enable device_map='auto' * Enable most tests; fix init order * Sort imports * [run-slow] timm_wrapper * Pass num_classes into timm.create_model * Remove train transforms from image processor * Update timm creation with pretrained=False * Fix gamma/beta issue for timm models * Fixing gamma and beta renaming for timm models * Simplify config and model creation * Remove attn_implementation diff * Fixup * Docstrings * Fix warning msg text according to test case * Fix device_map auto * Set dtype and device for pixel_values in forward * Enable output hidden states * Enable tests for hidden_states and model parallel * Remove default scriptable arg * Refactor inner model * Update timm version * Fix _find_mismatched_keys function * Change inheritance for Classification model (fix weights loading with device_map) * Minor bugfix * Disable save pretrained for image processor * Rename hook method for loaded keys correction * Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm` * Managing num_labels <-> num_classes attributes * Enable loading checkpoints in Trainer to resume training * Update error message for output_hidden_states * Add output hidden states test * Decouple base and classification models * Add more test cases * Add save-load-to-timm test * Fix test name * Fixup * Add do_pooling * Add test for do_pooling * Fix doc * Add tests for TimmWrapperModel * Add validation for `num_classes=0` in timm config + test for DINO checkpoint * Adjust atol for test * Fix docs * dev-ci * dev-ci * Add tests for image processor * Update docs * Update init to new format * Update docs in configuration * Fix some docs in image processor * Improve docs for modeling * fix for is_timm_checkpoint * Update code examples * Fix header * Fix typehint * Increase tolerance a bit * Fix Path * Fixing model parallel tests * Disable "parallel" tests * Add comment for metadata * Refactor AutoImageProcessor for timm wrapper loading * Remove custom test_model_outputs_equivalence * Add require_timm decorator * Fix comment * Make image processor work with older timm versions and tensor input * Save config instead of whole model in image processor tests * Add docstring for `image_processor_filename` * Sanitize kwargs for timm image processor * Fix doc style * Update check for tensor input * Update normalize * Remove _load_timm_model function --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>	2024-12-11 12:40:30 +00:00
Benjamin Bossan	bcc50cc7ce	[PEFT] Better Trainer error when prompt learning with loading best model at the end (#35087 ) Original issue: https://github.com/huggingface/peft/issues/2256 There is a potential error when using load_best_model_at_end=True with a prompt learning PEFT method. This is because Trainer uses load_adapter under the hood but with some prompt learning methods, there is an optimization on the saved model to remove parameters that are not required for inference, which in turn requires a change to the model architecture. This is why load_adapter will fail in such cases and users should instead set load_best_model_at_end=False and use PeftModel.from_pretrained. As this is not obvious, we now intercept the error and add a helpful error message.	2024-12-11 12:44:39 +01:00
Cyril Vallez	d363e71d0e	🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#34858 ) * update * style * fix missing args * remove last trace of old rope classes * remove deprecated copied from * fix copies * trigger CIs * post rebase clean-up * reverse mistral * cleanup after dropping commits * Add comment	2024-12-11 11:16:52 +01:00
Gallil Maimon	6acb4e43a7	Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389 ) * Support BatchNorm in Hubert pos_conv_emb as in fairseq * Correct the new defaults (#34377) * Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue * [auto. ping] Avoid sending empty info + add more team members (#34383) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix glm (#34388) * Fix duplicated * fix import * Use non nested images and batched text Idefics2/3 (#34222) * add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests * Fix onnx non-expotable inplace aten op (#34376) * fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies * Fix right padding in LLaVA models (#34305) * fix right pad llavas * device mismatch * no filter (#34391) * no filter * no filter * no filter --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * SynthID: better example (#34372) * better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits * Tests: upgrade `test_eager_matches_sdpa_generate` (#34386) * Fix bnb training test failure (#34414) * Fix bnb training test: compatibility with OPTSdpaAttention * Avoid check expected exception when it is on CUDA (#34408) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix typos in agents_advanced.md (#34405) * [docs] Cache implementations (#34325) cache * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * [run-slow] hubert --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-12-10 14:18:23 +01:00
Matthew Douglas	34f4080ff5	[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172 )	2024-12-09 13:55:16 -05:00
Mohamed Mekkouri	7238387f67	Fix typo in EETQ Tests (#35160 ) fix	2024-12-09 14:13:36 +01:00
kang sheng	1ccca8f48c	Fix GA loss bugs and add unit test (#35121 ) * fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>	2024-12-09 09:57:41 +01:00
Pavel Iakubovskii	c8c8dffbe4	Update I-JEPA checkpoints path (#35120 ) Update checkpoints path	2024-12-06 13:42:51 +00:00
Aymeric Roucher	9ad4c93536	Add Aria (#34157 ) * Add Aria --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-06 12:17:34 +01:00
Pablo Montalvo	a5bb528471	Fix signatures for processing kwargs (#35105 ) * add conversion script * remove pg2 refs * fixup style * small update * get correct scaling * add back missing bos * fix missing config keys * might revert this pos_embeddings * fixup 9b config * fix 9b * fixup 9b conversion for good + add back num_hidden_layers * add correct query scaling for 2b, 9b, 27b * fixup 27b conversion * Additional variant: 27b-896 * Use CPU for conversion to reduce GPU RAM requirements * fix causal mask generation + formatting * fix in-training causal mask generation edge case * trigger CI * update config * update config * update config * update config * update config * update config * update config * update config * update config * move conversion file to main model dir * handle multi-images + bos token * address comments for input ids * revert ci fixes * [run-slow] paligemma * fix * [run-slow] paligemma * skip end 2 end * [run-slow] paligemma --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-05 18:15:48 +01:00
Jonathan Mamou	e27465c801	Adaptive dynamic number of speculative tokens (#34156 ) * initial commit * update strategy * add tradeoff FPR TPR with cost * all probs * fix * fix * fix style * Update src/transformers/generation/configuration_utils.py shorter docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * import guard * fix style * add is_sklearn_available condition * vectorizing to flatten the for-loop * fix style * disable adaptation for UAG * update doc * add TestAssistedCandidateGeneratorUpdateStrategy * fix style * protect import * fix style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-12-05 17:07:33 +01:00
Yih-Dar	b0a51e5cff	Fix flaky Hub CI (`test_trainer.py`) (#35062 ) * fix * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * fix * fix * fix * fix * fix * fix * fix * fix * check * check * check * check * check * check * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * check * check * check * Final space * Final adjustment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-12-05 17:02:27 +01:00
João Marcelo	50189e36a6	Add I-JEPA (#33125 ) * first draft * add IJepaEmbeddings class * fix copy-from for IJepa model * add weight conversion script * update attention class names in IJepa model * style changes * Add push_to_hub option to convert_ijepa_checkpoint function * add initial tests for I-JEPA * minor style changes to conversion script * make fixup related * rename conversion script * Add I-JEPA to sdpa docs * minor fixes * adjust conversion script * update conversion script * adjust sdpa docs * [run_slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * formatting issues * adjust modeling to modular code * add IJepaModel to objects to ignore in docstring checks * [run-slow] ijepa * fix formatting issues * add usage instruction snippet to docs * change pos encoding, add checkpoint for doc * add verify logits for all models * [run-slow] ijepa * update docs to include image feature extraction instructions * remove pooling layer from IJepaModel in image classification class * [run-slow] ijepa * remove pooling layer from IJepaModel constructor * update docs * [run-slow] ijepa * [run-slow] ijepa * small changes * [run-slow] ijepa * style adjustments * update copyright in init file * adjust modular ijepa * [run-slow] ijepa	2024-12-05 16:14:46 +01:00
eustlb	54aae121eb	[Whisper] Fix whisper tokenizer (#34537 ) * handle single timestamp ending * include last timestamp token * handle single timestamp ending * avoid floating points arithm limitations * ensure float64 operations * new test * make fixup * make copies * handle edge case double tokens ending with different tokens * handle single timestamp ending * make fixup * handle conditioning on prev segments * fix * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * [run-slow] whisper * don't call item() to avoid unnecessary sync * fix --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>	2024-12-05 13:46:29 +01:00
Anton Vlasjuk	46df859975	[`GPTNeoX`] Flex Attention + Refactor (#34896 ) * gpt neox flex attention + refactor * some formatting * small fix on dropout * add assertion on flex attn test * flaky ci :( * add head mask support * style * handle dtype, replace torch where * fixup flex with output attns * code review and several other fixes * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style * remove unnecessary comment * remove incorrect comment * make flex attn check more agnostic tor versions and centralized * change peft input dtype check to value since q and k could be affected by other stuff like RoPE * i forgor * flaky * code review and small fixes * Update src/transformers/models/gpt_neox/modeling_gpt_neox.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-04 14:48:28 +01:00
Wang, Yi	125de41643	fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… (#34454 ) * fix speecht5 failure issue in test_peft_gradient_checkpointing_enable_disable Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * [run-slow] speecht5 --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-12-03 13:58:54 +00:00
Aymeric Roucher	901f504580	Add token cost + runtime monitoring to Agent and HfEngine children (#34548 ) * Add monitoring to Agent and HfEngine children	2024-12-03 13:14:52 +01:00
Dmitry Rogozhkin	31830474bf	Fix `test_eager_matches_sdpa_inference` for `XPU` backend (#34889 ) * Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Fix test_eager_matches_sdpa_inference for XPU backend As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH which is implemented on PyTorch level using aten operators and is device agnostic with respect to implementation of each aten operator. Thus, we can reuse CUDA (or CPU) MATH weights for XPU. Fixes: #34888 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-12-02 16:21:04 +01:00
Tibor Reiss	89d7bf584f	🚨🚨🚨 Uniformize kwargs for TrOCR Processor (#34587 ) * Make kwargs uniform for TrOCR * Add tests * Put back current_processor * Remove args * Add todo comment * Code review - breaking change	2024-11-29 11:58:11 +00:00
Michael Goin	9d6f0ddcec	Add optimized `PixtralImageProcessorFast` (#34836 ) * Add optimized PixtralImageProcessorFast * make style * Add dummy_vision_object * Review comments * Format * Fix dummy * Format * np.ceil for math.ceil	2024-11-28 16:04:05 +01:00
Raushan Turganbay	5e8c1d713d	Offloaded cache: fix generate (#34921 ) * fix cache impl * require_torch_gpu * fix mamba * fix copies	2024-11-28 15:05:56 +01:00
xinpengzz	44af935ec5	Refine the code of Universal Assisted Generation (#34823 ) * removed the useless attritbutes * add configs for window size * fixed the wrong kwargs * added docstring	2024-11-28 15:04:24 +01:00
Benjamin Bossan	f4b674f269	[PEFT] Set eval mode when loading PEFT adapter (#34509 ) * [PEFT] Set eval mode when loading PEFT adapter Resolves #34469 When calling model.load_adapter to load a PEFT adapter, by default the adapter should be set to eval mode. This is now correctly done. Users can still pass is_trainable=True to load the adapter in training mode. * Linter	2024-11-28 13:56:25 +01:00
Arthur	4c1388f48e	[`FlexAttention`] Update gemma2 (#34942 ) * update tests * now maybe this fixes the previous fialing tests! * nit default * Update src/transformers/models/gemma2/modular_gemma2.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * fix-copies --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2024-11-27 11:50:48 +01:00
Matt	d5cf91b346	Separate chat templates into a single file (#33957 ) * Initial draft * Add .jinja file loading for processors * Add processor saving of naked chat template files * make fixup * Add save-load test for tokenizers * Add save-load test for tokenizers * stash commit * Try popping the file * make fixup * Pop the arg correctly * Pop the arg correctly * Add processor test * Fix processor code * stash commit * Processor clobbers child tokenizer's chat template * Processor clobbers child tokenizer's chat template * make fixup * Split processor/tokenizer files to avoid interactions * fix test * Expand processor tests * Rename arg to "save_raw_chat_template" across all classes * Update processor warning * Move templates to single file * Move templates to single file * Improve testing for processor/tokenizer clashes * Improve testing for processor/tokenizer clashes * Extend saving test * Test file priority correctly * make fixup * Don't pop the chat template file before the slow tokenizer gets a look * Remove breakpoint * make fixup * Fix error	2024-11-26 14:18:04 +00:00
eustlb	4d1d0f29a4	[Whisper] Fix whisper integration tests (#34111 ) * fix test_tiny_timestamp_generation * fix test_large_timestamp_generation * fix test_whisper_shortform_single_batch_prev_cond * fix test_whisper_shortform_multi_batch_hard_prev_cond * return_timestamps necessary with long form * fix test_default_multilingual_transcription_long_form * fix test_tiny_token_timestamp_generation_longform * fix test_whisper_longform_multi_batch_hard * Update tests/models/whisper/test_modeling_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * fix typo * do not expect special tokens * fix test_whisper_longform_single_batch_beam * fix test_whisper_longform_multi_batch_hard_prev_cond * update test_whisper_longform_multi_batch_hard_prev_cond * update test_whisper_longform_multi_batch_hard_prev_cond * these tests does not make sense anymore * this test does not make sense anymore * make fixup * suggested nits * add test with forced_decoder_ids * this test does not make sense anymore * change assert for unittest test cases * make fixup * test with prompt_ids and task and language * fix unittest test case call * fix test_tiny_generation * fix test_tiny_en_generation * fix test_tiny_en_batched_generation * fix test_tiny_longform_timestamps_generation * fix test_tiny_timestamp_generation * fix test_large_generation * fix test_large_batched_generation * fix test_large_generation_multilingual * fix test_large_timestamp_generation * fix test_large_timestamp_generation * fix test_tiny_token_timestamp_generation_longform * fix test_tiny_en_batched_generation * make fixup * [run-slow] whisper --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-11-26 12:23:08 +01:00
Mohamed Mekkouri	0e805e6d1e	Skipping aqlm non working inference tests till fix merged (#34865 )	2024-11-26 11:09:30 +01:00
Mohamed Mekkouri	890ea7de93	Fix failling GGML test (#34871 ) fix_test	2024-11-25 18:04:52 +01:00
Yih-Dar	a830df2909	Fix `test_auto_backbone_timm_model_from_pretrained` (#34877 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-25 17:20:41 +01:00
jiqing-feng	a464afbe2a	fix static cache data type miss-match (#34799 ) * fix gptj data type missmatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add low precision static cache tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix low-precision static cache tests * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * avoid config change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * change data type convert in cache copy Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * cast key value after k v out Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2024-11-25 16:59:38 +01:00
Mohamed Mekkouri	4e6b19cd95	Fix : BitNet tests (#34895 ) * fix_tests_bitnet * fix format	2024-11-25 16:47:14 +01:00
Shane A	9121ab8fe8	Rename OLMo November to OLMo2 (#34864 ) * Rename/move OLMo Nov files to OLMo2 * Rename Olmo1124 and its variants to Olmo2	2024-11-25 16:31:22 +01:00
Jacky Lee	f4c04ba32b	Fix Qwen2 failing tests (#34819 ) * fix: qwen2 model ids * fix: line * fix: more format * update: reformat	2024-11-25 15:53:04 +01:00
VictorAtIfInsurance	a0f4f3174f	allow unused input parameters passthrough when chunking in asr pipelines (#33889 ) * allow unused parameter passthrough when chunking in asr pipelines * format code * format * run fixup * update tests * update parameters to pipline in test * updates parametrs in tests * change spelling in gitignore * revert .gitignore to main * add git ignore of devcontainer folder * assert asr output follows expected inference output type * run fixup * Remove .devcontainer from .gitignore * remove compliance check	2024-11-25 11:36:44 +01:00
Arthur	857d46ca0c	[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM (#22105 ) * some modification for roadmap * revert some changes * yups * weird * make it work * sttling * fix-copies * fixup * renaming * more fix-copies * move stuff around * remove torch script warnings * ignore copies * revert bad changes * woops * just styling * nit * revert * style fixup * nits configuration style * fixup * nits * will this fix the tf pt issue? * style * ??????? * update * eval? * update error message * updates * style * grumble grumble * update * style * nit * skip torch fx tests that were failing * style * skip the failing tests * skip another test and make style	2024-11-25 10:43:16 +01:00
Raushan Turganbay	098962dac2	BLIP: fix generation after hub update (#34876 ) * fix blip generation * dont remove it yet * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * modular --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-25 10:41:55 +01:00
Raushan Turganbay	c1a8520419	Cache: init empty cache when `use_cache` (#34274 ) * fix * fix tests * fix copies * add docs * Revert "add docs" This reverts commit `32d35634f1`. * qwen move deltas * mllama can potentiall fullgraph compile * enable mllama compile and fix tests * remove mllama fixes	2024-11-25 10:11:33 +01:00
Mohamed Mekkouri	54be2d7ae8	Bitnet test fix to avoid using gated model (#34863 ) small test fix	2024-11-22 17:18:49 +01:00
Nadav Timor	42b36d7395	Speculative decoding: Test the target distribution (to prevent issues like #32867 ) (#34553 ) * Update test_utils.py * formatting * Update test_utils.py * formatting * formatting * Update test_utils.py * formatting * Update test_utils.py * formatting * format * comments at standard positions	2024-11-22 16:02:37 +01:00
farrosalferro	c57eafdaa1	Add Nemotron GGUF Loading Support (#34725 ) * Add Nemotron GGUF Loading Support * fix the Nemotron architecture assignation --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-11-21 11:37:34 +01:00
Raushan Turganbay	28fb02fc05	VLMs: enable generation tests - last batch (#34484 ) * add tests for 3 more vlms * fix fuyu back * skip test	2024-11-21 11:00:22 +01:00
Marc Sun	3cb8676a91	Fix CI by tweaking torchao tests (#34832 )	2024-11-20 20:28:51 +01:00
Marc Sun	67890de3b8	Torchao weights only + prequantized compability (#34355 ) * weights only compability * better tests from code review * ping torch version * add weights_only check	2024-11-20 17:24:45 +01:00
Tibor Reiss	f297af55df	Fix: take into account meta device (#34134 ) * Do not load for meta device * Make some minor improvements * Add test * Update tests/utils/test_modeling_utils.py Update test parameters Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Make the test simpler --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-11-20 11:32:07 +01:00
Phillip Kuznetsov	8cadf76e1c	fix(DPT,Depth-Anything) `torch.export` (#34103 ) * Fix torch.export issue in dpt based models Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Simplify the if statements Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Move activation definitions of zoe_depth to init() Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Add test_export for dpt and zoedepth Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * add depth anything Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Remove zoedepth non-automated zoedepth changes and zoedepth test Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * [run_slow] dpt, depth_anything, zoedepth Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> --------- Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>	2024-11-20 11:31:21 +01:00
Raushan Turganbay	9470d65324	Fix low memory beam search (#34746 ) * fix * higher max positions in tests	2024-11-20 07:46:35 +01:00
Yih-Dar	469eddbe2d	Fix `check_training_gradient_checkpointing` (#34806 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-19 17:48:34 +01:00
Yih-Dar	05ebe8b9b0	Run `test_medium_seamless_m4t_pt` in `subprocess` to avoid many failures (#34812 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-19 17:32:10 +01:00
Yoni Gozlan	eedc113914	Add Image Processor Fast Deformable DETR (#34353 ) * add deformable detr image processor fast * add fast processor to doc * fix copies * nit docstring * Add tests gpu/cpu and fix docstrings * fix docstring * import changes from detr * fix imports * rebase and fix * fix input data format change in detr and rtdetr fast	2024-11-19 11:18:58 -05:00
Yoni Gozlan	b99ca4d28b	Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline (#34562 ) * add support for openai api image_url input * change continue to elif * Explicitely add support for OpenAI/TGI chat format * rewrite content to transformers chat format and add tests * Add support for typing of image type in chat templates * add base64 to possible image types * refactor nesting	2024-11-19 11:08:37 -05:00
Phillip Kuznetsov	5fa4f64605	🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 (#34393 ) * fix(Mask2Former): torch export Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * revert level_start_index and create a level_start_index_list Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Add a comment to explain the level_start_index_list Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Address comment Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * add torch.export.export test Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * rename arg Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * remove spatial_shapes Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Use the version check from pytorch_utils Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * [run_slow] mask2former Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> --------- Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>	2024-11-19 16:44:53 +01:00
Arthur	4bff54f921	Gemma capping (#34282 ) * softcapping * soft cap before the mask * style * ... * super nit * update * fixes * update * small issue with modular * fix modular imports * update * fixup * simplify a hell lot * simplify cleaning imports * finish fixing * update our design * nits * use a deprecation cycle * updates * Fix modular (recursive deps need to always be computed after merges!) * push * fix * update * fix modular order * make fix-copies * updates * update * ? * don't compile for now * ? * fix some stuff * donc! * fix copies * update * fixup * ? * fix two tests * fix? * for now, don't use head info * eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :)) * fix-copies * revert sdpa check * Apply suggestions from code review Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * rebase, fix-copies and push * add a slow integration test * update the test * fix left padding issue * fix test * remove duplicate scaling * quality * add a small test and make sure it works * 2b --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2024-11-19 13:52:38 +01:00
Arthur	54739a320e	Self-speculation (Layer-Skip Llama) (#34240 ) * 😅 * early exit (#34244) * mvp * docs and tests * a few fixes * no shared cache * Apply suggestions from code review Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> * docs * make fix-copies * cohere fix * [test all] * [test all] consistent model code copies * [test all] make fix-copies :D * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> * Update src/transformers/generation/candidate_generator.py * Update src/transformers/generation/configuration_utils.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * [test all] don't use a stand-alone attribute; fix test --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-11-19 12:20:07 +00:00
Jiahao Li	0db91c3c8d	Support gradient checkpointing in Qwen2VL ViT (#34724 ) * Support gradient checkpointing in Qwen2VL ViT * Enable gradient checkpoint tests for Qwen2VL * [run-slow] qwen2_vl	2024-11-19 12:30:44 +01:00
Ke Wen	20142ab542	Simplify Tensor Parallel implementation with PyTorch TP (#34184 ) * Simplify Tensor Parallel implementation with PyTorch TP * Move tp_plan to config * Lint * Format and warning * Disable copy-from check * Conditionally get attr from config * make fix-copies * Move base_model_tp_plan to PretrainedConfig * Move TP into from_pretrained * Add device context for load * Do not serialize * Move _tp_plan setting to post_init * Add has_tp_plan * Add test_tp * Add 'Multi-gpu inference' doc * Add backward support for device type identification * Auto-detect accelerator * supports_tp_plan * copyright year * Fix copy	2024-11-18 19:51:49 +01:00
Dmitry Rogozhkin	1c471fc307	Fix skip of test_training_gradient_checkpointing (#34723 ) `19d58d31f` has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: #34722 Fixes: `19d58d31f` ("Add MLLama (#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-11-18 15:45:40 +01:00
Raushan Turganbay	1646ffb4d1	VLMs: `patch_size` -> `num_image_tokens` in processing (#33424 ) * use num additional tokens * fix copies + docs * another fix copies :) * add docs * move order for BC	2024-11-18 13:21:07 +01:00
Shane A	3ee24e2208	Add OLMo November 2024 (#34551 ) * Add model skeletion with transformers-cli add-new-model-like * Convert config to modular, add rms_norm_eps, delete clip_qkv * Convert model to modular, add RMSNorm * Add flash attention with qk norm and no qkv clipping * Add decoder layer with RMSNorm after attention/feedforward layers * Add base and causal model * Add converter improvements from OLMo repo * Update weight loading in OLMo to HF converter * Set correct default for rms_norm_eps * Set correct pipeline_model_mapping in test * Run make fixup * Fix model type * Re-run modular conversion * Manually set config docs to fix build errors * Convert olmo-1124 to olmo_1124 to fix flash attention docs errors * Start updating tests * Update tests * Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124 * Rename input_layernorm and post_attention_layernorm to reflect their ops better * Use correct tokenizer * Remove test unsupported by GPT2 tokenizer * Create GenerationConfig outside of from_pretrained call * Use simpler init file structure * Add explicit __all__ to support simplified init * Make safetensor serialization the default * Update OLMo November 2024 docs	2024-11-18 10:43:10 +01:00

1 2 3 4 5 ...

4407 Commits