transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-01 18:51:14 +06:00

Author	SHA1	Message	Date
Yih-Dar	dce9970884	Update `test_flash_attn_2_can_dispatch_composite_models` (#36050 ) * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-02-06 12:09:49 +01:00
Yaswanth Gali	7aee036e54	Iterative generation using Input embeds and `past_key_values` (#35890 ) * Iterative generation using input embeds * ruff fix * Added Testcase * Updated comment * ♻️ Refactored testcase * Skip test for these models * Continue generation using input embeds and cache * Skip generate_continue_from_embeds test * Refactor `prepare_input_for_generation` func * Continue generation using input embeds and cache * Modular changes fix * Overwrite 'prepare_inputs_for_generation' function	2025-02-06 11:06:05 +01:00
Sambhav Dixit	0de15c988b	Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 (#35771 ) * added condition for top_k Doc mismatch fix * initilation of test file for top_k changes * added test for returning all labels * added test for few labels * tests/test_audio_classification_top_k.py * final fix * ruff fix --------- Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>	2025-02-05 16:25:08 +00:00
Stas Bekman	9dc1efa5d4	DeepSpeed github repo move sync (#36021 ) deepspeed github repo move	2025-02-05 08:19:31 -08:00
Yoni Gozlan	fa56dcc2ab	Refactoring of ImageProcessorFast (#35069 ) * add init and base image processing functions * add add_fast_image_processor to transformers-cli * add working fast image processor clip * add fast image processor to doc, working tests * remove "to be implemented" SigLip * fix unprotected import * fix unprotected vision import * update ViTImageProcessorFast * increase threshold slow fast ewuivalence * add fast img blip * add fast class in tests with cli * improve cli * add fast image processor convnext * add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision * add device kwarg to ImagesKwargs for fast processing on cuda * cleanup * fix unprotected import * group images by sizes and add batch processing * Add batch equivalence tests, skip when center_crop is used * cleanup * update init and cli * fix-copies * refactor convnext, cleanup base * fix * remove patching mixins, add piped torchvision transforms for ViT * fix unbatched processing * fix f strings * protect imports * change llava onevision to class transforms (test) * fix convnext * improve formatting (following Pavel review) * fix handling device arg * improve cli * fix * fix inits * Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs * uniformize qwen2_vl fast * fix docstrings * add add fast image processor llava * remove min_pixels max_pixels from accepted size * nit * nit * refactor fast image processors docstrings * cleanup and remove fast class transforms * update add fast image processor transformers cli * cleanup docstring * uniformize pixtral fast and make _process_image explicit * fix prepare image structure llava next/onevision * Use typed kwargs instead of explicit args * nit fix import Unpack * clearly separate pops and gets in base preprocess. Use explicit typed kwargs * make qwen2_vl preprocess arguments hashable	2025-02-04 17:52:31 -05:00
David	8d73a38606	Add DAB-DETR for object detection (#30803 ) * initial commit * encoder+decoder layer changes WIP * architecture checks * working version of detection + segmentation * fix modeling outputs * fix return dict + output att/hs * found the position embedding masking bug * pre-training version * added iamge processors * typo in init.py * iterupdate set to false * fixed num_labels in class_output linear layer bias init * multihead attention shape fixes * test improvements * test update * dab-detr model_doc update * dab-detr model_doc update2 * test fix:test_retain_grad_hidden_states_attentions * config file clean and renaming variables * config file clean and renaming variables fix * updated convert_to_hf file * small fixes * style and qulity checks * return_dict fix * Merge branch main into add_dab_detr * small comment fix * skip test_inputs_embeds test * image processor updates + image processor test updates * check copies test fix update * updates for check_copies.py test * updates for check_copies.py test2 * tied weights fix * fixed image processing tests and fixed shared weights issues * added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py * delete prints from test file * SafeTensor modification to solve HF Trainer issue * removing the safetensor modifications * make fix copies and hf uplaod has been added. * fixed index.md * fixed repo consistency * styel fix and dabdetrimageprocessor docstring update * requested modifications after the first review * Update src/transformers/models/dab_detr/image_processing_dab_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * repo consistency has been fixed * update copied NestedTensor function after main merge * Update src/transformers/models/dab_detr/modeling_dab_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * temp commit * temp commit2 * temp commit 3 * unit tests are fixed * fixed repo consistency * updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file. * temporarialy config modifications and repo consistency fixes * Put dilation parameter back to config * pattern embeddings have been added to the rename_keys method * add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES * delete FeatureExtractor part from docs.md * requested modifications in modeling_dab_detr.py * [run_slow] dab_detr * deleted last segmentation code part, updated conversion script and changed the hf path in test files * temp commit of requested modifications * temp commit of requested modifications 2 * updated config file, resolved codepaths and refactored conversion script * updated decodelayer block types and refactored conversion script * style and quality update * small modifications based on the request * attentions are refactored * removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed * deleted imageprocessor * fixed conversion script + quality and style * fixed config_att * [run_slow] dab_detr * changing model path in conversion file and in test file * fix Decoder variable naming * testing the old loss function * switched back to the new loss function and testing with the odl attention functions * switched back to the new last good result modeling file * moved back to the version when I asked the review * missing new line at the end of the file * old version test * turn back to newest mdoel versino but change image processor * style fix * style fix after merge main * [run_slow] dab_detr * [run_slow] dab_detr * added device and type for head bias data part * [run_slow] dab_detr * fixed model head bias data fill * changed test_inference_object_detection_head assertTrues to torch test assert_close * fixes part 1 * quality update * self.bbox_embed in decoder has been restored * changed Assert true torch closeall methods to torch testing assertclose * modelcard markdown file has been updated * deleted intemediate list from decoder module --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-02-04 17:28:27 +00:00
Yih-Dar	fe52679e74	Update tests regarding attention types after #35235 (#36024 ) * update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-02-04 18:04:47 +01:00
Marc Sun	9f486badd5	Display warning for unknown quants config instead of an error (#35963 ) * add supports_quant_method check * fix * add test and fix suggestions * change logic slightly --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-02-04 15:17:01 +01:00
Sumit Vij	bc9a6d8302	Fix device mismatch error in Whisper model during feature extraction (#35866 ) * Fix device mismatch error in whisper feature extraction * Set default device * Address code review feedback --------- Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>	2025-02-04 12:23:08 +01:00
Cyril Vallez	9afb904b15	Refactor (and fix) gpt_neox (#35610 ) * start a nice modular * Update modular_gpt_neox.py * Update modular_gpt_neox.py * Update modular_gpt_neox.py * Update modular_gpt_neox.py * update * Update modular_gpt_neox.py * convert * fix attribute * fix attrs * oups * fix * fix * fix * fix * fix * fix order to pass test (see with accelerate team) * trigger CIs * modular * update * up * Update test_modeling_gpt_neox.py * Update test_modeling_gpt_neox.py * trigger CIs * correctly pass arg * simplify * remove key warning * update tp -> it's compatible since the view is before * trigger CIs	2025-02-04 11:18:43 +01:00
Ryoo Kwangrok	b1954fd64a	layernorm_decay_fix (#35927 ) * layernorm_decay_fix * W293 fix * ruff format fix * black format * ruff format * erase last layer * add test_get_parameter_names_rmsnorm * rmsnorm fix	2025-02-04 11:01:49 +01:00
Dmitry Tarasov	2ba040a71f	apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True (#35582 ) * apply_chat_template: consistent return_tensors behaviour with return_assistant_tokens_mask flag * test_chat_template_return_assistant_tokens_mask: support tokenizers with no attention mask * test_chat_template_return_assistant_tokens_mask: skip tokenizers with no padding token * test_chat_template_return_assistant_tokens_mask: force tokenizer padding_side=right --------- Co-authored-by: Eduard Allakhverdov <goncharova@airi.net> Co-authored-by: d.tarasov <d.tarasov@airi.net>	2025-02-04 10:27:52 +01:00
Raushan Turganbay	5d75a25b03	Qwen2-VL: fix rope delta calculation (#36013 ) * fix rope delats calculation * add test * style	2025-02-04 09:48:29 +01:00
Alex Brooks	e284c7e954	Update Granite Vision Model Path / Tests (#35998 ) * Update granite vision model path Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Enable granite vision test Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> --------- Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-02-03 20:06:03 +01:00
Arthur	7eecdf2a86	Update-tp test (#35844 ) * update test for now * up * cleanup * update todo	2025-02-03 09:37:02 +01:00
Yoni Gozlan	2b46943195	Add GOT-OCR 2.0 to Transformers (#34721 ) * init modular got_ocr2 * Get correct got_ocr architecture * add processing * run modular with processing * add working inference * apply modular * Refactor and fix style * Refactor, cleanup, fix style * fix init order * Fix docs * add base modeling tests * fix style and consistency * rename doc file * fix repo consistency * fix inference with box * add image processing and support for crop_to_multi_page * Fix batch inference * add tests * fixup * fix slow test * fix docstrings * Add model doc * update to new init * fix input autocast pixel_values dtype * update doc * move doc to multimodal * Reformat crop_image_to_patches and add docstrings * Fix example in forward docstring * Address Pablo review * [run slow] got_ocr2 * remove defaults defined twice * apply modular * add torch_device to integration tests * update modular * follow-up Pavel review * add device variable in doc * fix doc multi-page * Force eager attention for vision encoder to avoid attn implementation conflict * revert qwen2vl doc changes * use Qwen2ForCausalLM instead of Qwen2Model * make fixup * refactor gotocr2 to llava style * uniformize function names and reduce checks * final nits * fix pixel_values dtype error * change checkpoint names * fix modular	2025-01-31 11:28:13 -05:00
Yoni Gozlan	d7188ba600	Add support for nested images to LLava and VipLLava (#35558 ) * move make_flat_list_of_images and make_batched_videos to image_utils * remove unnecessary is_vision_available * move make_nested_list_of_images to image_utils * fix fast pixtral image processor * fix import mllama * fix make_nested_list_of_images * add tests * convert 4d arrays/tensors to list * add test_make_batched_videos * add support nested batch of videos * fix image processing qwen2vl	2025-01-30 16:49:20 -05:00
Marcel	e4227eb4d4	Handle empty change indices in SAM's mask to rle conversion (#35665 ) * Handle empty change indices in RLE conversion for masks * [test] Add unit tests for RLE encoding of masks in SamProcessor * [test] Update RLE conversion tests to use TensorFlow implementation * [test] Fix formatting in SamProcessorTest according to check_code_quality action * [test] Fix formatting in SamProcessorTest according to check_code_quality * [test] Refactored rle test cases into one test and used tf tensors in tf test cases * [test] Fix: removed self parameter from refactored methods * [test] Removed nested methods in run-length encoding tests for PyTorch and TensorFlow * [test] Added description to individual to run-length encoding tests for PyTorch and TensorFlow.	2025-01-30 19:08:38 +00:00
Yih-Dar	5757681837	Less flaky for `TimmBackboneModelTest::test_batching_equivalence` (#35971 ) * fix * remove is_flaky * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-30 16:56:26 +01:00
Raushan Turganbay	365fecb4d0	Whisper: fix static cache CI (#35852 ) * fix * remove overriden method * small change	2025-01-30 12:43:00 +01:00
Raushan Turganbay	9725e5be2f	Pixtral: vectorize patch embeddings and enable tests (#35122 ) * initial POC * - batch mix feature * fix tests * fix tests * make style * do not skip and instead fix tests * update * return back the test * correct text with the correct ckpt	2025-01-30 12:40:18 +01:00
Joao Gante	8bc4c89ee9	[bart] minor test fixes (#35965 ) fix tests	2025-01-30 10:00:11 +00:00
Joao Gante	4d3b1076a1	[generate] move max time tests (#35962 ) * move max time tests to their right place * move test to the right place	2025-01-29 17:56:46 +00:00
Fanli Lin	f0ae65c198	[tests] further fix `Tester object has no attribute '_testMethodName'` (#35781 ) * bug fix * update with more cases * more entries * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-29 16:05:33 +01:00
Jonas Rohw	23d782ead2	Output dicts support in text generation pipeline (#35092 ) * Support for generate_argument: return_dict_in_generate=True, instead of returning a error * fix: call test with return_dict_in_generate=True * fix: Only import torch if it is present * update: Encapsulate output_dict changes * fix: added back original comments --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-01-29 14:44:46 +00:00
Yih-Dar	cf90404807	Fix flaky `test_assisted_decoding_matches_greedy_search` (#35951 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-29 14:50:07 +01:00
Yih-Dar	c600e89f5c	Update `unwrap_and_save_reload_schedule` to use `weights_only=False` (#35952 ) * fix * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-29 14:30:57 +01:00
Nadav Timor	42c8ccfd4c	fix `test_generated_length_assisted_generation` (#34935 ) fix test_generated_length_assisted_generation	2025-01-29 12:03:45 +00:00
Joao Gante	ece8c42488	Test: generate with `torch.compile(model.forward)` as a fast test (#34544 )	2025-01-28 14:10:38 +00:00
Cyril Vallez	f48ecd7608	Fix TP initialization (#35860 ) * fix tp * Update modeling_utils.py * style * style * Update test_tp.py * Update test_tp.py * style * Update test_tp.py * Update test_tp.py * Update test_tp.py * Update test_tp.py	2025-01-28 15:07:37 +01:00
Raushan Turganbay	f85ba20449	Qwen-2-5-VL: fix CI (#35935 ) fix	2025-01-28 14:51:57 +01:00
Cyril Vallez	3f860dba55	Fix mask slicing for models with HybridCache (#35681 ) * correctly slice * check mask * Update modular_gemma2.py * fix * add tests * fix typo * finally fix mask slicing * Finally correctly slice in all cases!! * add test for all attention functions * small fix in tests * trick around dynamo tracing issue * last update * more robust * kwargs propagation * make it explicit for checkpointing * apply modular	2025-01-28 14:35:00 +01:00
Raushan Turganbay	b764c20b09	Fix: loading DBRX back from saved path (#35728 ) * fix dtype as dict for some models + add test * add comment in tests	2025-01-28 11:38:45 +01:00
Isotr0py	e57b459997	Split and clean up GGUF quantization tests (#35502 ) * clean up ggml test Signed-off-by: Isotr0py <2037008807@qq.com> * port remaining tests Signed-off-by: Isotr0py <2037008807@qq.com> * further cleanup Signed-off-by: Isotr0py <2037008807@qq.com> * format Signed-off-by: Isotr0py <2037008807@qq.com> * fix broken tests Signed-off-by: Isotr0py <2037008807@qq.com> * update comment Signed-off-by: Isotr0py <2037008807@qq.com> * fix Signed-off-by: Isotr0py <2037008807@qq.com> * reorganize tests Signed-off-by: Isotr0py <2037008807@qq.com> * k-quants use qwen2.5-0.5B Signed-off-by: Isotr0py <2037008807@qq.com> * move ggml tokenization test Signed-off-by: Isotr0py <2037008807@qq.com> * remove dead code Signed-off-by: Isotr0py <2037008807@qq.com> * add assert for serilization test Signed-off-by: Isotr0py <2037008807@qq.com> * use str for parameterize Signed-off-by: Isotr0py <2037008807@qq.com> --------- Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-27 15:46:57 +01:00
Mikhail Moskovchenko	5450e7c84a	🔴 🔴 🔴 Added `segmentation maps` support for DPT image processor (#34345 ) * Added `segmentation_maps` support for DPT image processor * Added tests for dpt image processor * Moved preprocessing into separate functions * Added # Copied from statements * Fixed # Copied from statements * Added `segmentation_maps` support for DPT image processor * Added tests for dpt image processor * Moved preprocessing into separate functions * Added # Copied from statements * Fixed # Copied from statements	2025-01-27 15:14:00 +01:00
pglorio	33cb1f7b61	Add Zamba2 (#34517 ) * First commit * Finish model implementation * First commit * Finish model implementation * Register zamba2 * generated modeling and configuration * generated modeling and configuration * added hybrid cache * fix attention_mask in mamba * dropped unused loras * fix flash2 * config docstrings * fix config and fwd pass * make fixup fixes * text_modeling_zamba2 * small fixes * make fixup fixes * Fix modular model converter * added inheritances in modular, renamed zamba cache * modular rebase * new modular conversion * fix generated modeling file * fixed import for Zamba2RMSNormGated * modular file cleanup * make fixup and model tests * dropped inheritance for Zamba2PreTrainedModel * make fixup and unit tests * Add inheritance of rope from GemmaRotaryEmbedding * moved rope to model init * drop del self.self_attn and del self.feed_forward * fix tests * renamed lora -> adapter * rewrote adapter implementation * fixed tests * Fix torch_forward in mamba2 layer * Fix torch_forward in mamba2 layer * Fix torch_forward in mamba2 layer * Dropped adapter in-place sum * removed rope from attention init * updated rope * created get_layers method * make fixup fix * make fixup fixes * make fixup fixes * update to new attention standard * update to new attention standard * make fixup fixes * minor fixes * cache_position * removed cache_position postion_ids use_cache * remove config from modular * removed config from modular (2) * import apply_rotary_pos_emb from llama * fixed rope_kwargs * Instantiate cache in Zamba2Model * fix cache * fix @slow decorator * small fix in modular file * Update docs/source/en/model_doc/zamba2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * several minor fixes * inherit mamba2decoder fwd and drop position_ids in mamba * removed docstrings from modular * reinstate zamba2 attention decoder fwd * use regex for tied keys * Revert "use regex for tied keys" This reverts commit `9007a522b1`. * use regex for tied keys * add cpu to slow forward tests * dropped config.use_shared_mlp_adapter * Update docs/source/en/model_doc/zamba2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * re-convert from modular --------- Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-27 10:51:23 +01:00
Arthur	b912f5ee43	use torch.testing.assertclose instead to get more details about error in cis (#35659 ) * use torch.testing.assertclose instead to get more details about error in cis * fix * style * test_all * revert for I bert * fixes and updates * more image processing fixes * more image processors * fix mamba and co * style * less strick * ok I won't be strict * skip and be done * up	2025-01-24 16:55:28 +01:00
CalOmnie	b5aaf87509	Fix `test_pipelines_video_classification` that was always failing (#35842 ) * Fix test_pipelines_video_classification that was always failing * Update video pipeline docstring to reflect actual return type --------- Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>	2025-01-23 19:22:32 +01:00
Alex Brooks	71cc8161b2	Granite Vision Support (#35579 ) * Add multimodal granite support Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Support multiple image feature layres Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Remove failing validation for visual encoders with no cls Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update llava based models / configs to support list of feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add tests for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use conditional instead of except for misaligned feature shapes Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * crop cls from each hidden state Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Fix formatting Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Support single vision feature int in vipllava Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix typo in vision feature selection strategy validation Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Add tentative integration test for granite vision models Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Add granite vision docs Replace multimodal granite refs with granite vision Add granite vision / llava next alias Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Use image url in granitevision example Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> --------- Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-01-23 17:15:52 +01:00
Arthur	8f1509a96c	Fix more CI tests (#35661 ) add tooslow for the fat ones	2025-01-23 14:45:42 +01:00
ShuaiBai623	f3f6c86582	add qwen2.5vl (#35569 ) * add qwen2.5vl * fix * pass check table * add modular file * fix style * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * padd copy check * use modular * fix * fix * fix * update flashatt2&sdpa support_list * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update config * update * fix hf path * rename Qwen2_5_VLVideosKwargs * fix * fix * update * excuted modular * rollback init * fix * formated * simpler init * fix * fix * fix * fix * fix * update docs * fix * fix * update Qwen2VLRotaryEmbedding for yarn * fix --------- Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: gewenbin0992 <gewenbin292@163.com> Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>	2025-01-23 11:23:00 +01:00
Cyril Vallez	d3af76df58	[Backend support] Allow `num_logits_to_keep` as Tensor + add flag (#35757 ) * support * Update modeling_utils.py * style * most models * Other models * fix-copies * tests + generation utils	2025-01-23 09:47:54 +01:00
Arthur	8736e91ad6	[ `tests`] remove some flash attention class tests (#35817 ) remove class from tests	2025-01-23 09:44:21 +01:00
Mohamed Mekkouri	a7738f5a89	Fix : Nemotron tokenizer for GGUF format (#35836 ) fix nemotron gguf	2025-01-22 12:28:40 +01:00
Joao Gante	36c9181f5c	[gpt2] fix generation tests (#35822 ) fix gpt2 generation tests	2025-01-22 09:41:04 +00:00
Raushan Turganbay	373e50e970	Init cache on meta device (#35164 ) * init cache on meta device * offloaded static + enable tests * tests weren't running before :( * update * fix mamba * fix copies * update * address comments and fix tests * fix copies * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * mamba fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-22 09:49:17 +01:00
Joao Gante	90b46e983f	Remove old `benchmark` code (#35730 ) * remove traces of the old deprecated benchmarks * also remove old tf benchmark example, which uses deleted code * run doc builder	2025-01-21 17:56:43 +00:00
eustlb	870eb7b41b	[Mimi] update test expected values for t4 runners (#35696 ) update values for t4	2025-01-21 18:23:36 +01:00
Yoni Gozlan	107f9f5127	add Qwen2-VL image processor fast (#35733 ) * add qwen2_vl image processor fast * add device to ImagesKwargs * remove automatic fix copies * fix fast_is_faster_than_slow * remove unnecessary import	2025-01-21 11:49:05 -05:00
Mohamed Mekkouri	dbd8474125	Fix : BLOOM tie_word_embeddings in GGUF (#35812 ) * fix bloom ggml * fix falcon output * make style	2025-01-21 15:35:54 +01:00
fzyzcjy	dc10f7906a	Support adamw_torch_8bit (#34993 ) * var * more * test	2025-01-21 14:17:49 +01:00
Yih-Dar	fd8d61fdb2	Byebye `test_batching_equivalence`'s flakiness (#35729 ) * fix * fix * skip * better error message --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-21 13:11:33 +01:00
NielsRogge	78f5ee0217	Add LlavaImageProcessor (#33191 ) * First draft * Add equivalence test * Update docstrings * Add tests * Use numpy * Fix tests * Improve variable names * Improve docstring * Add link * Remove script * Add copied from * Address comment * Add note in docs * Add docstring, data format * Improve test * Add test * update * Update src/transformers/models/llava/image_processing_llava.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/llava/image_processing_llava.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * loop once only --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-21 12:47:04 +01:00
Aymeric Roucher	44393df089	Tool calling: support more types (#35776 ) * Tool calling: support NoneType for function return type	2025-01-20 19:15:34 +01:00
jiqing-feng	f19135afc7	fix low-precision audio classification pipeline (#35435 ) * fix low-precision audio classification pipeline Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torch import Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torch import Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-01-20 16:20:51 +00:00
jiqing-feng	641238eb76	Fix vits low-precision dtype (#35418 ) * fix vits dtype Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * use weight dtype Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-01-20 16:19:31 +00:00
jiqing-feng	729b569531	fix document qa bf16 pipeline (#35456 ) * fix document qa bf16 pipeline Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-01-20 16:18:07 +00:00
eustlb	5f0f4b1b93	Patch moonshine (#35731 ) * udpate expected logits for T4 runners * update doc * correct order of the args for better readability * remove generate wrap * convert modular	2025-01-20 16:19:29 +01:00
Mohamed Mekkouri	b80e334e71	Skip Falcon 7B GGML Test (#35783 ) skip test	2025-01-20 15:00:34 +01:00
StevenBucaille	abe57b6f17	Add SuperGlue model (#29886 ) * Initial commit with template code generated by transformers-cli * Multiple additions to SuperGlue implementation : - Added the SuperGlueConfig - Added the SuperGlueModel and its implementation - Added basic weight conversion script - Added new ImageMatchingOutput dataclass * Few changes for SuperGlue * Multiple changes : - Added keypoint detection config to SuperGlueConfig - Completed convert_superglue_to_pytorch and succesfully run inference * Reverted unintentional change * Multiple changes : - Added SuperGlue to a bunch of places - Divided SuperGlue into SuperGlueForImageMatching and SuperGlueModel - Added testing images * Moved things in init files * Added docs (to be finished depending on the final implementation) * Added necessary imports and some doc * Removed unnecessary import * Fixed make fix-copies bug and ran it * Deleted SuperGlueModel Fixed convert script * Added SuperGlueImageProcessor * Changed SuperGlue to support batching pairs of images and modified ImageMatchingOutput in consequences * Changed convert_superglue_to_hf.py script to experiment different ways of reading an image and seeing its impact on performances * Added initial tests for SuperGlueImageProcessor * Added AutoModelForImageMatching in missing places and tests * Fixed keypoint_detector_output instructions * Fix style * Adapted to latest main changes * Added integration test * Fixed bugs to pass tests * Added keypoints returned by keypoint detector in the output of SuperGlue * Added doc to SuperGlue * SuperGlue returning all attention and hidden states for a fixed number of keypoints * Make style * Changed SuperGlueImageProcessor tests * Revert "SuperGlue returning all attention and hidden states for a fixed number of keypoints" Changed tests accordingly This reverts commit 5b3b669c * Added back hidden_states and attentions masked outputs with tests * Renamed ImageMatching occurences into KeypointMatching * Changed SuperGlueImageProcessor to raise error when batch_size is not even * Added docs and clarity to hidden state and attention grouping function * Fixed some code and done refactoring * Fixed typo in SuperPoint output doc * Fixed some of the formatting and variable naming problems * Removed useless function call * Removed AutoModelForKeypointMatching * Fixed SuperGlueImageProcessor to only accept paris of images * Added more fixes to SuperGlueImageProcessor * Simplified the batching of attention and hidden states * Simplified stack functions * Moved attention instructions into class * Removed unused do_batch_norm argument * Moved weight initialization to the proper place * Replaced deepcopy for instantiation * Fixed small bug * Changed from stevenbucaille to magic-leap repo * Renamed London Bridge images to Tower Bridge * Fixed formatting * Renamed remaining "london" to "tower" * Apply suggestions from code review Small changes in the docs Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Added AutoModelForKeypointMatching * Changed images used in example * Several changes to image_processing_superglue and style * Fixed resample type hint * Changed SuperGlueImageProcessor and added test case for list of 2 images * Changed list_of_tuples implementation * Fix in dummy objects * Added normalize_keypoint, log_sinkhorn_iterations and log_optimal_transport docstring * Added missing docstring * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved forward block at bottom * Added docstring to forward method * Added docstring to match_image_pair method * Changed test_model_common_attributes to test_model_get_set_embeddings test method signature * Removed AutoModelForKeypointMatching * Removed image fixtures and added load_dataset * Added padding of images in SuperGlueImageProcessor * Cleaned up convert_superglue_to_hf script * Added missing docs and fixed unused argument * Fixed SuperGlueImageProcessor tests * Transposed all hidden states from SuperGlue to reflect the standard (..., seq_len, feature_dim) shape * Added SuperGlueForKeypointMatching back to modeling_auto * Fixed image processor padding test * Changed SuperGlue docs * changes: - Abstraction to batch, concat and stack of inconsistent tensors - Changed conv1d's to linears to match standard attention implementations - Renamed all tensors to be tensor0 and not tensor_0 and be consistent - Changed match image pair to run keypoint detection on all image first, create batching tensors and then filling these tensors matches after matches - Various changes in docs, etc * Changes to SuperGlueImageProcessor: - Reworked the input image pairs checking function and added tests accordingly - Added Copied from statements - Added do_grayscale tag (also for SuperPointImageProcessor) - Misc changes for better code * Formatting changes * Reverted conv1d to linear conversion because of numerical differences * fix: changed some code to be more straightforward (e.g. filtering keypoints) and converted plot from opencv to matplotlib * fix: removed unnecessary test * chore: removed commented code and added back hidden states transpositions * chore: changed from "inconsistent" to "ragged" function names as suggested Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * docs: applied suggestions Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * docs: updated to display matched output * chore: applied suggestion for check_image_pairs_input function Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * chore: changed check_image_pairs_input function name to validate_and_format_image_pairs and used validate_preprocess_arguments function * tests: simplified tests for image input format and shapes * feat: converted SuperGlue's use of Conv1d with kernel_size of 1 with Linear layers. Changed tests and conversion script accordingly * feat: several changes to address comments Conversion script: - Reverted fuse batchnorm to linear conversion - Changed all 'nn.Module' to respective SuperGlue models - Changed conversion script to use regex mapping and match other recent scripts Modeling SuperGlue: - Added batching with mask and padding to attention - Removed unnecessary concat, stack and batch ragged pairs functions - Reverted batchnorm layer - Renamed query, key, value and merge layers into q, k, v, out proj - Removed Union of different Module into nn.Module in _init_weights method typehint - Changed several method's signature to combine image0 and image1 inputs with appropriate doc changes - Updated SuperGlue's doc with torch.no_grad() Updated test to reflect changes in SuperGlue model * refactor: changed validate_and_format_image_pairs function with clarity * refactor: changed from one SuperGlueMLP class to a list of SuperGlueMLP class * fix: fixed forgotten init weight change from last commit * fix: fixed rebase mistake * fix: removed leftover commented code * fix: added typehint and changed some of arguments default values * fix: fixed attribute default values for SuperGlueConfig * feat: added SuperGlueImageProcessor post process keypoint matching method with tests * fix: fixed SuperGlue attention and hidden state tuples aggregation * chore: fixed mask optionality and reordered tensor reshapes to be cleaner * chore: fixed docs and error message returned in validate_and_format_image_pairs function * fix: fixed returned keypoints to be the ones that SuperPoint returns * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue (bis) * fix: Changed SuperGlueMultiLayerPerceptron instantiation to avoid if statement * fix: Changed convert_superglue_to_hf script to reflect latest SuperGlue changes and got rid of nn.Modules * WIP: implement Attention from an existing class (like BERT) * docs: Changed docs to include more appealing matching plot * WIP: Implement Attention * chore: minor typehint change * chore: changed convert superglue script by removing all classes and apply conv to linear conversion in state dict + rearrange keys to comply with changes in model's layers organisation * Revert "Fixed typo in SuperPoint output doc" This reverts commit `2120390e82`. * chore: added comments in SuperGlueImageProcessor * chore: changed SuperGlue organization HF repo to magic-leap-community * [run-slow] refactor: small change in layer instantiation * [run-slow] chore: replaced remaining stevenbucaille org to magic-leap-community * [run-slow] chore: make style * chore: update image matching fixture dataset HF repository * [run-slow] superglue * tests: overwriting test_batching_equivalence * [run-slow] superglue * tests: changed test to cope with value changing depending on cuda version * [run-slow] superglue * tests: changed matching_threshold value * [run-slow] superglue * [run-slow] superglue * tests: changed tests for integration * [run-slow] superglue * fix: Changed tensor view and permutations to match original implementation results * fix: updated convert script and integration test to include last change in model * fix: increase tolerance for CUDA variances * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * [run-slow] superglue * chore: removed blank whitespaces * [run-slow] superglue * Revert SuperPoint image processor accident changes * [run-slow] superglue * refactor: reverted copy from BERT class * tests: lower the tolerance in integration tests for SuperGlue * [run-slow] superglue * chore: set do_grayscale to False in SuperPoint and SuperGlue image processors * [run-slow] superglue * fix: fixed imports in SuperGlue files * chore: changed do_grayscale SuperGlueImageProcessing default value to True * docs: added typehint to post_process_keypoint_matching method in SuperGlueImageProcessor * fix: set matching_threshold default value to 0.0 instead of 0.2 * feat: added matching_threshold to post_process_keypoint_matching method * docs: update superglue.md to include matching_threshold parameter * docs: updated SuperGlueConfig docstring for matching_threshold default value * refactor: removed unnecessary parameters in SuperGlueConfig * fix: changed from matching_threshold to threshold * fix: re-revert changes to make SuperGlue attention classes copies of BERT * [run-slow] superglue * fix: added missing device argument in post_processing method * [run-slow] superglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching (and docstring) * fix: add device to image_sizes tensor instantiation * tests: added checks on do_grayscale test * chore: reordered and added Optional typehint to KeypointMatchingOutput * LightGluePR suggestions: - use `post_process_keypoint_matching` as default docs example - add `post_process_keypoint_matching` in autodoc - add `SuperPointConfig` import under TYPE_CHECKING condition - format SuperGlueConfig docstring - add device in convert_superglue_to_hf - Fix typo - Fix KeypointMatchingOutput docstring - Removed unnecessary line - Added missing SuperGlueConfig in __init__ methods * LightGluePR suggestions: - use batching to get keypoint detection * refactor: processing images done in 1 for loop instead of 4 * fix: use @ instead of torch.einsum for scores computation * style: added #fmt skip to long tensor values * refactor: rollbacked validate_and_format_image_pairs valid and invalid case to more simple ones * refactor: prepare_imgs * refactor: simplified `validate_and_format_image_pairs` * docs: fixed doc --------- Co-authored-by: steven <steven.bucaillle@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-20 10:32:39 +00:00
NielsRogge	872dfbdd46	[ViTPose] Convert more checkpoints (#35638 ) * Convert more checkpoints * Update docs, convert huge variant * Update model name * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Remove print statements * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Link to collection --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-20 11:29:47 +01:00
Raushan Turganbay	8571bb145a	Fix CI for VLMs (#35690 ) * fix some easy test * more tests * remove logit check here also * add require_torch_large_gpu in Emu3	2025-01-20 11:15:39 +01:00
Dmitry Rogozhkin	7d4b3ddde4	ci: fix xpu skip condition for test_model_parallel_beam_search (#35742 ) `return unittest.skip()` used in the `test_model_parallel_beam_search` in skip condition for xpu did not actually mark test to be skipped running under pytest: * 148 passed, 1 skipped Other tests use `self.skipTest()`. Reusing this approach and moving the condition outside the loop (since it does not depend on it) allows to skip for xpu correctly: * 148 skipped Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch versions. Fixes: `1ea3ad1ae` ("[tests] use `torch_device` instead of `auto` for model testing (#29531)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-17 16:47:27 +01:00
Pavel Iakubovskii	099d93d2e9	Grounding DINO Processor standardization (#34853 ) * Add input ids to model output * Add text preprocessing for processor * Fix snippet * Add test for equivalence * Add type checking guard * Fixing typehint * Fix test for added `input_ids` in output * Add deprecations and "text_labels" to output * Adjust tests * Fix test * Update code examples * Minor docs and code improvement * Remove one-liner functions and rename class to CamelCase * Update docstring * Fixup	2025-01-17 14:18:16 +00:00
Pavel Iakubovskii	42b2857b01	OmDet Turbo processor standardization (#34937 ) * Fix docstring * Fix docstring * Add `classes_structure` to model output * Update omdet postprocessing * Adjust tests * Update code example in docs * Add deprecation to "classes" key in output * Types, docs * Fixing test * Fix missed clip_boxes * [run-slow] omdet_turbo * Apply suggestions from code review Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Make CamelCase class --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-01-17 14:10:19 +00:00
Pavel Iakubovskii	94ae9a8da1	OwlViT/Owlv2 post processing standardization (#34929 ) * Refactor owlvit post_process_object_detection + add text_labels * Fix copies in grounding dino * Sync with Owlv2 postprocessing * Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection * Add test cases * Move text_labels to processors only * [run-slow] owlvit owlv2 * [run-slow] owlvit, owlv2 * Update snippets * Update docs structure * Update deprecated objects for check_repo * Update docstring for post processing of image guided object detection	2025-01-17 13:58:28 +00:00
Cyril Vallez	ab1afd56f5	Fix some tests (#35682 ) * cohere tests * glm tests * cohere2 model name * create decorator * update * fix cohere2 completions * style * style * style * add cuda in comments	2025-01-17 12:10:43 +00:00
Ross Wightman	8c1b5d3782	🚨🚨🚨 An attempt to fix #29554 . Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. (#35615 ) * An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, reduce number of characters searched on every load considerably. * Fix fix on load issue * Fix gamma/beta warning test * A style complaint * Improve efficiency of weight norm key rename. Add better comments about weight norm and layer norm renaming. * Habitual elif redunant with the return	2025-01-16 17:25:44 -08:00
Joao Gante	94af1c0aa2	[generate] return Cache object even if passed in a legacy format (#35673 ) * generate returns a Cache object by default * fix tests * fix test for encoder-decoder models	2025-01-16 17:06:24 +00:00
Joao Gante	2818307e93	[generate] can instantiate `GenerationConfig(cache_implementation="static")` (#35679 ) fix failing instantiation	2025-01-16 17:04:54 +00:00
Joao Gante	aeeceb9916	[cache] add a test to confirm we can use cache at train time (#35709 ) * add test * augment test as suggested * Update tests/utils/test_modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * rerun tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-16 17:02:34 +00:00
kang sheng	2cbcc5877d	Fix condition when GA loss bug fix is not performed (#35651 ) * fix condition when GA loss bug fix is not performed * max loss diff is 2.29 * fix typo * add an extra validation that loss should not vary too much	2025-01-16 13:59:53 +01:00
jiqing-feng	387663e571	Enable gptqmodel (#35012 ) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass *kwargs limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass *kwargs add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-15 14:22:49 +01:00
Raushan Turganbay	09d5f76274	Clean-up composite configs (#34603 ) * remove manual assignment tie-word-embeddings * remove another unused attribute * fix tests * fix tests * remove unnecessary overwrites * fix * decoder=True * clean pix2struct * run-all * forgot `_tied_weights_keys` when adding Emu3 * also Aria + fix-copies * and clean aria	2025-01-15 10:04:07 +01:00
Mahdi Baghbanzadeh	c61fcde910	Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251 ) * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling	2025-01-14 17:01:10 +00:00
Mohamed Mekkouri	a11041ffad	Fix : add require_read_token for gemma2 gated model (#35687 ) fix gemma2 gated model test	2025-01-14 11:47:05 +01:00
Mohamed Mekkouri	df2a812e95	Fix expected output for ggml test (#35686 ) fix expected output	2025-01-14 11:46:55 +01:00
Mohamed Mekkouri	050636518a	Fix : HQQ config when hqq not available (#35655 ) * fix * make style * adding require_hqq * make style	2025-01-14 11:37:37 +01:00
Arthur	c23a1c1932	Add-helium (#35669 ) * Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 18:41:15 +01:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Arthur	e6f9b03464	[`Compile`] Only test compiling model forward pass (#35658 ) * rename test to only compile forward! * style emu	2025-01-13 13:43:29 +01:00
Raushan Turganbay	84a6789145	Enable different torch dtype in sub models (#34873 ) * fix * fix test * add tests * add more tests * fix tests * supposed to be a torch.dtype test * handle BC and make fp32 default	2025-01-13 13:42:08 +01:00
Yih-Dar	1e3c6c1f7d	Skip `MobileNetV1ModelTest::test_batching_equivalence` for now (#35614 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 18:32:36 +01:00
Yih-Dar	04eae987f3	Fix flaky `test_beam_search_low_memory` (#35611 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 17:31:03 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
Yih-Dar	bbc00046b9	Fix flaky `test_custom_4d_attention_mask` (#35606 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 15:40:04 +01:00
Raushan Turganbay	52e1f87c7d	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:23:00 +01:00
Cyril Vallez	ccc0381d36	Fix flex_attention in training mode (#35605 ) * fix flex * add test * style	2025-01-10 11:49:12 +01:00
Raushan Turganbay	e0646f3dce	Chat template: return vectorized output in processors (#34275 ) * update chat template * style * fix tests * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typehints + docs * fix tests * remove unnecessary warnings * forgot code style :( * allow users to pass backend and num frames * Update docs/source/en/chat_templating.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typo fix * style * address comments * align with "pipeline" template * update docs * update docs * unpack for all kwargs? * wrong conflict resolution while rebasing * tmp * update docs * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-10 11:05:29 +01:00
eustlb	5f087d1335	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:00:54 +01:00
Yih-Dar	6f127d3f81	Skip `torchscript` tests if a cache object is in model's outputs (#35596 ) * fix 1 * fix 1 * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 10:46:03 +01:00
Tom Aarsen	6b73ee8905	ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459 ) * Introduce 5 integration tests for the 4 model classes + torch export * ModernBert: reuse GemmaRotaryEmbedding via modular * Revert #35589, keep rope_kwargs; rely on them in modular_modernbert * Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert" This reverts commit `11b44b9ee8`. * Don't set rope_kwargs; override 'self.rope_init_fn' call instead	2025-01-10 10:25:10 +01:00
Cyril Vallez	3a4ae6eace	Refactor/fix Cohere2 (#35594 ) * refactor/fix cohere2 * add kwargs * tests * remove func and import it	2025-01-09 17:54:57 +01:00
Tom Aarsen	32e0db8a69	[`tokenizers`] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593 ) * Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this. * Simplify setting self.add_prefix_space, ensure pre_tok exists * Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized' `862d1a346a/bindings/python/src/pre_tokenizers.rs (L672)` produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer. * Propagate add_prefix_space in T5TokenizerFast to superclass	2025-01-09 17:46:50 +01:00
Cyril Vallez	46276f9a7f	Fix modular edge case + modular sorting order (#35562 ) * look-ahead negation * re add examples by default * Fix the bug in topological sort * Update create_dependency_mapping.py * start adding test * finalize test * more tests * style * style	2025-01-09 17:17:52 +01:00
Yih-Dar	82dd6c14bb	Fix flaky `SwitchTransformersModelTest::test_training_gradient` (#35587 ) * fix * Update tests/models/switch_transformers/test_modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-09 15:36:22 +01:00
Arthur	eb4579cf43	`tokenizer` train from iterator without pre_tokenizers (#35396 ) * fix if else issues * add a test * fix the test * style	2025-01-09 15:34:43 +01:00
Jack Morris	832c6191ed	Add inputs_embeds param to ModernBertModel (#35373 ) * update modular_modernbert -- add inputs_embeds param to ModernBertModel * Fix implementation issues; extend to other classes; docstring First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented. I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes. I also introduced an error if input_ids and input_embeds are both or neither provided. Lastly, I fixed an issue with device being based solely on input_ids with attention_mask. * Propagate inputs_embeds to ModernBertForMaskedLM correctly Also reintroduce inputs_embeds test --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2025-01-09 14:17:26 +01:00
Yih-Dar	1b2f942af7	Fix flaky `test_batching_equivalence` (#35564 ) * yes! * oh no!!! * oh no!!! * style * oh no!!! * oh no!!! * oh no!!! * oh no!!! --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-09 14:00:08 +01:00

1 2 3 4 5 ...

4520 Commits