transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-07 23:00:08 +06:00

Author	SHA1	Message	Date
Arthur	c23a1c1932	Add-helium (#35669 ) * Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 18:41:15 +01:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Arthur	e6f9b03464	[`Compile`] Only test compiling model forward pass (#35658 ) * rename test to only compile forward! * style emu	2025-01-13 13:43:29 +01:00
Raushan Turganbay	84a6789145	Enable different torch dtype in sub models (#34873 ) * fix * fix test * add tests * add more tests * fix tests * supposed to be a torch.dtype test * handle BC and make fp32 default	2025-01-13 13:42:08 +01:00
Yih-Dar	1e3c6c1f7d	Skip `MobileNetV1ModelTest::test_batching_equivalence` for now (#35614 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 18:32:36 +01:00
Yih-Dar	04eae987f3	Fix flaky `test_beam_search_low_memory` (#35611 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 17:31:03 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
Yih-Dar	bbc00046b9	Fix flaky `test_custom_4d_attention_mask` (#35606 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 15:40:04 +01:00
Raushan Turganbay	52e1f87c7d	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:23:00 +01:00
Cyril Vallez	ccc0381d36	Fix flex_attention in training mode (#35605 ) * fix flex * add test * style	2025-01-10 11:49:12 +01:00
Raushan Turganbay	e0646f3dce	Chat template: return vectorized output in processors (#34275 ) * update chat template * style * fix tests * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typehints + docs * fix tests * remove unnecessary warnings * forgot code style :( * allow users to pass backend and num frames * Update docs/source/en/chat_templating.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typo fix * style * address comments * align with "pipeline" template * update docs * update docs * unpack for all kwargs? * wrong conflict resolution while rebasing * tmp * update docs * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-10 11:05:29 +01:00
eustlb	5f087d1335	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:00:54 +01:00
Yih-Dar	6f127d3f81	Skip `torchscript` tests if a cache object is in model's outputs (#35596 ) * fix 1 * fix 1 * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 10:46:03 +01:00
Tom Aarsen	6b73ee8905	ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459 ) * Introduce 5 integration tests for the 4 model classes + torch export * ModernBert: reuse GemmaRotaryEmbedding via modular * Revert #35589, keep rope_kwargs; rely on them in modular_modernbert * Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert" This reverts commit `11b44b9ee8`. * Don't set rope_kwargs; override 'self.rope_init_fn' call instead	2025-01-10 10:25:10 +01:00
Cyril Vallez	3a4ae6eace	Refactor/fix Cohere2 (#35594 ) * refactor/fix cohere2 * add kwargs * tests * remove func and import it	2025-01-09 17:54:57 +01:00
Tom Aarsen	32e0db8a69	[`tokenizers`] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593 ) * Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this. * Simplify setting self.add_prefix_space, ensure pre_tok exists * Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized' `862d1a346a/bindings/python/src/pre_tokenizers.rs (L672)` produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer. * Propagate add_prefix_space in T5TokenizerFast to superclass	2025-01-09 17:46:50 +01:00
Cyril Vallez	46276f9a7f	Fix modular edge case + modular sorting order (#35562 ) * look-ahead negation * re add examples by default * Fix the bug in topological sort * Update create_dependency_mapping.py * start adding test * finalize test * more tests * style * style	2025-01-09 17:17:52 +01:00
Yih-Dar	82dd6c14bb	Fix flaky `SwitchTransformersModelTest::test_training_gradient` (#35587 ) * fix * Update tests/models/switch_transformers/test_modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-09 15:36:22 +01:00
Arthur	eb4579cf43	`tokenizer` train from iterator without pre_tokenizers (#35396 ) * fix if else issues * add a test * fix the test * style	2025-01-09 15:34:43 +01:00
Jack Morris	832c6191ed	Add inputs_embeds param to ModernBertModel (#35373 ) * update modular_modernbert -- add inputs_embeds param to ModernBertModel * Fix implementation issues; extend to other classes; docstring First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented. I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes. I also introduced an error if input_ids and input_embeds are both or neither provided. Lastly, I fixed an issue with device being based solely on input_ids with attention_mask. * Propagate inputs_embeds to ModernBertForMaskedLM correctly Also reintroduce inputs_embeds test --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2025-01-09 14:17:26 +01:00
Yih-Dar	1b2f942af7	Fix flaky `test_batching_equivalence` (#35564 ) * yes! * oh no!!! * oh no!!! * style * oh no!!! * oh no!!! * oh no!!! * oh no!!! --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-09 14:00:08 +01:00
Cyril Vallez	965a2fb320	More model refactoring! (#35359 ) * cohere * style * phi3 * style * small fix * small fix * phi3 longrope * oups * Update rope (only for phi3 still) * Update test_modeling_rope_utils.py * Update modeling_phi3.py * fix * fix copies * style * Fix copied from bad renaming	2025-01-09 11:09:09 +01:00
nhamanasu	b32938aeee	Fix all output_dir in test_trainer.py to use tmp_dir (#35266 ) * update codecarbon * replace directly-specified-test-dirs with tmp_dir * pass tmp_dir to all get_regression_trainer * test_trainer.py: Use tmp_dir consistently for all output_dir arguments * fix some with...as tmp_dir blocks * reflect the comments to improve test_trainer.py * refresh .gitignore	2025-01-08 19:44:39 +01:00
Joao Gante	76da6ca034	Pipeline: simple API for assisted generation (#34504 ) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-01-08 17:08:02 +00:00
Arthur	3f483beab9	[`PixtralLarge`] Update Pixtral conversion script to support large format! (#34801 ) * update conversion script * update for bias again * remove pdv * use my dir * Update how we initialize the tokenizer * Convert in bfloat16 * Undo that one again * fix config dump * .to() was broken for BatchMixFeature * quick debug breakpoint * put the breakpoint in the right place * Add a config flag for the multimodal projector bias * Add a config flag for the multimodal projector bias * Conversion script can load chat templates * Indent config for comparison * Stop clobbering the config * Re-enable the config clobber * Get rid of the config manual save - it has no effect! * Handle adapter bias correctly * Default vision transformer activation to silu * Remove legacy processing path * One commit with all the debug breakpoints before I delete them all, in case I need to revert * Update conversion * Remove vLLM debugging instrumentation * Drop xformers * Remove debug enumerates * make fixup * make fixup * Break copied from in pixtral * Propagate multimodal_projector_bias change * Propagate multimodal_projector_bias change * Remove debug device .to() * Restore attention weights output * Fix Pixtral test * Drop image_seq_length * Drop image_seq_length * Put the legacy processing code back * Add the bias option to the llava_next_video config * Add the bias option to the llava_next_video config * Make certain args required in converter * Make certain args required in converter * typo * make fixup * Reverting some dtype changes since it seems to work without them --------- Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-166-244.ec2.internal> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-01-08 17:39:47 +01:00
NielsRogge	8490d3159c	Add ViTPose (#30530 ) * First draft * Make fixup * Make forward pass worké * Improve code * More improvements * More improvements * Make predictions match * More improvements * Improve image processor * Fix model tests * Add classic decoder * Convert classic decoder * Verify image processor * Fix classic decoder logits * Clean up * Add post_process_pose_estimation * Improve post_process_pose_estimation * Use AutoBackbone * Add support for MoE models * Fix tests, improve num_experts% * Improve variable names * Make fixup * More improvements * Improve post_process_pose_estimation * Compute centers and scales * Improve postprocessing * More improvements * Fix ViTPoseBackbone tests * Add docstrings, fix image processor tests * Update index * Use is_cv2_available * Add model to toctree * Add cv2 to doc tests * Remove script * Improve conversion script * Add coco_to_pascal_voc * Add box_to_center_and_scale to image_transforms * Update tests * Add integration test * Fix merge * Address comments * Replace numpy by pytorch, improve docstrings * Remove get_input_embeddings * Address comments * Move coco_to_pascal_voc * Address comment * Fix style * Address comments * Fix test * Address comment * Remove udp * Remove comment * [WIP] need to check if the numpy function is same as cv * add scipy affine_transform * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * refactor convert * add output_shape * add atol 5e-2 * Use hf_hub_download in conversion script * make box_to_center more applicable * skipt test_get_set_embedding * fix to accept array and fix CI * add co-contributor * make it to tensor type output * add torch * change to torch tensor * add more test * minor change * CI test change * import torch should be above ImageProcessor * make style * try not use torch in def * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fix * fix * add caution * make more detail about dataset_index * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * add docs * Update docs/source/en/model_doc/vitpose.md * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Revert "Update src/transformers/__init__.py" This reverts commit `7ffa504450`. * change name * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vitpose/test_modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * move vitpose only function to image_processor * raise valueerror when using timm backbone * use out_indices * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove camel-case of def flip_back * rename vitposeEstimatorOutput * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix confused camelcase of MLP * remove in-place logic * clear scale description * make consistent batch format * docs update * formatting docstring * add batch tests * test docs change * Update src/transformers/models/vitpose/image_processing_vitpose.py * Update src/transformers/models/vitpose/configuration_vitpose.py * chagne ViT to Vit * change to enable MoE * make fix-copies * Update docs/source/en/model_doc/vitpose.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * extract udp * add more described docs * simple fix * change to accept target_size * make style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change to `verify_backbone_config_arguments` * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove unnecessary copy * make config immutable * enable gradient checkpointing * update inappropriate docstring * linting docs * split function for visibility * make style * check isinstances * change to acceptable use_pretrained_backbone * make style * remove copy in docs * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * simple fix + make style * change input config of activation function to string * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * tmp docs * delete index.md * make fix-copies * simple fix * change conversion to sam2/mllama style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * refactor convert * add supervision * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * remove reduntant def * seperate code block for visualization * add validation for num_moe * final commit * add labels * [run-slow] vitpose, vitpose_backbone * Update src/transformers/models/vitpose/convert_vitpose_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * enable all conversion * final commit * [run-slow] vitpose, vitpose_backbone * ruff check --fix * [run-slow] vitpose, vitpose_backbone * rename split module * [run-slow] vitpose, vitpose_backbone * fix pos_embed * Simplify init * Revert "fix pos_embed" This reverts commit `2c56a4806e`. * refactor single loop * allow flag to enable custom model * efficiency of MoE to not use unused experts * make style * Fix range -> arange to avoid warning * Revert MOE router, a new one does not work * Fix postprocessing a bit (labels) * Fix type hint * Fix docs snippets * Fix links to checkpoints * Fix checkpoints in tests * Fix test * Add image to docs --------- Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: sangbumchoi <danielsejong55@gmail.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-08 16:02:14 +00:00
Minho Shim	4349a0e401	fix: Qwen2-VL generate with inputs_embeds (#35466 ) * fix: Qwen2-VL generate with inputs_embeds * change: optional input_ids in get_rope_index	2025-01-08 16:36:03 +01:00
Sean (Seok-Won) Yi	88e18b3c63	Update doc for `metric_for_best_model` when `save_strategy="best"`. (#35389 ) * Updated docstring for _determine_best_metric. * Updated docstring for metric_for_best_model. * Added test case for save strategy. * Updated incorrect test case. * Changed eval_strategy to match save_strategy. * Separated test cases for metric. * Allow load_best_model when save_strategy == "best". * Updated docstring for metric_for_best_model.	2025-01-08 16:32:35 +01:00
Pavel Iakubovskii	657bb14f98	Enable auto task for timm models in pipeline (#35531 ) * Enable auto task for timm models * Add pipeline test	2025-01-08 15:14:17 +00:00
Pavel Iakubovskii	59e5b3f01b	Timm wrapper label names (#35553 ) * Add timm wrapper label names mapping * Add index to classification pipeline * Revert adding index for pipelines * Add custom model check for loading timm labels * Add tests for labels * [run-slow] timm_wrapper * Add note regarding label2id mapping	2025-01-08 14:09:46 +00:00
Jacky Lee	3c1895aa65	Fix Qwen2VL processor to handle odd number of frames (#35431 ) * fix: processing odd number of frames * feat: add test case * update: test one frame * feat: support custom patch size * fix: test with videos * revert: change on patch repeat * fix: much wow * update: fixups * fixup pls * ruff fixup * fix typo at least	2025-01-08 13:49:00 +01:00
Quentin Lhoest	3fde88b19d	support chat generator as input of TextGenerationPipeline (#35551 ) * support chat generator as input of TextGenerationPipeline * missing import * fix tests * again * simpler * add test	2025-01-08 13:27:07 +01:00
Raushan Turganbay	d1681ec2b6	VLMs: major clean up 🧼 (#34502 ) only lllava models are modified	2025-01-08 10:35:23 +01:00
Jade Choghari	7176e06b52	Add TextNet (#34979 ) * WIP * Add config and modeling for Fast model * Refactor modeling and add tests * More changes * WIP * Add tests * Add conversion script * Add conversion scripts, integration tests, image processor * Fix style and copies * Add fast model to init * Add fast model in docs and other places * Fix import of cv2 * Rename image processing method * Fix build * Fix Build * fix style and fix copies * Fix build * Fix build * Fix Build * Clean up docstrings * Fix Build * Fix Build * Fix Build * Fix build * Add test for image_processing_fast and add documentation tests * some refactorings * Fix failing tests * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Introduce TextNet * Fix failures * Refactor textnet model * Fix failures * Add cv2 to setup * Fix failures * Fix failures * Add CV2 dependency * Fix bugs * Fix build issue * Fix failures * Remove textnet from modeling fast * Fix build and other things * Fix build * some cleanups * some cleanups * Some more cleanups * Fix build * Incorporate PR feedbacks * More cleanup * More cleanup * More cleanup * Fix build * Remove all the references of fast model * More cleanup * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix Build * Fix build * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Fix style * Fix build * Incorporate PR feedbacks * Fix image processing mean and std * Incorporate PR feedbacks * fix build failure * Add assertion to image processor * Incorporate PR feedbacks * Incorporate PR feedbacks * fix style failures * fix build * Fix Imageclassification's linear layer, also introduce TextNetImageProcessor * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix build * Incorporate PR feedbacks * Remove some script * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix image processing in textnet * Incorporate PR Feedbacks * Fix CI failures * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Add textnet to readme * Improve readability * Incorporate PR feedbacks * fix code style * fix key error and convert working * tvlt shouldn't be here * fix test modeling test * Fix tests, make fixup * Make fixup * Make fixup * Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_image_processing_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * space typo Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/configuration_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * make conv layer kernel sizes and strides default to None * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix keyword bug * add batch init and make fixup * Make fixup * Update integration test * Add figure * Update textnet.md * add testing and fix errors (classification, imgprocess) * fix error check * make fixup * make fixup * revert to original docstring * add make style * remove conflict for now * Update modeling_auto.py got a confusion in `timm_wrapper` - was giving some conflicts * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * add changes * Update textnet.md * add doc * add authors hf ckpt + rename * add feedback: classifier/docs --------- Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com> Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co> Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-08 09:52:51 +01:00
Matt	a7d1441d65	Correctly list the chat template file in the Tokenizer saved files list (#34974 ) * Correctly list the chat template file in the saved files list * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add save file checking to test * make fixup * better filename handling * make fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-07 19:11:02 +00:00
eustlb	7f7677307c	[Qwen2Audio] handle input ids expansion during processing (#35534 ) * add audio_token attribute to proc * expand input_ids * and legacy and expanded input_ids * test update * split lines * add possibility not to provide eos and bos audio tokens * raise errors * test incorrect number of audio tokens * add example * fmt * typo	2025-01-07 16:47:27 +01:00
Francesco Cariaggi	f408d55448	Fix bug when requesting input normalization with EnCodec (#34756 ) * EnCodec: unsqueeze padding mask * add test for normalization	2025-01-07 11:50:02 +01:00
松本和真	96bf3d6cc5	Add diffllama (#34083 ) * first adding diffllama * add Diff Attention and other but still with errors * complate make attention Diff-Attention * fix some bugs which may be caused by transformer-cli while adding model * fix a bug caused by forgetting KV cache... * Update src/transformers/models/diffllama/modeling_diffllama.py You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward. Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place. and more visible Co-authored-by: Minho Ryu <ryumin93@gmail.com> * I found Attention missed implemented from paper still on `e072544a3b`. * re-implemented * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * align with transformers code style Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix typo Co-authored-by: Minho Ryu <ryumin93@gmail.com> * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * change SdpaAttention to DiffSdpaAttention Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bug * Update src/transformers/models/diffllama/modeling_diffllama.py resolve "not same outputs" problem Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bugs of places of "GroupNorm with scale" and etc * Revert "fix bugs of places of "GroupNorm with scale" and etc" This reverts commit `26307d92f6`. * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * remove missed type * add diffllama model_doc * apply make style/quality * apply review comment about model * apply review comment about test * place diffllama alphabetically on the src/transformers/__init__.py * fix forgot code * Supports parameters that are not initialized with standard deviation 0 in the conventional method * add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py * remove unused property of config * add to supported model list * add to spda supported model list * fix copyright, remove pretraining_tensor_parallel, and modify for initialization test * remove unused import and etc. * empty commit * empty commit * empty commit * apply modular transformers but with bugs * revert prev commit * create src/transformers/model/diffllama/modular_diffllama.py * run utils/modular_model_converter.py * empty commit * leaner modular diffllama * remove more and more in modular_diffllama.pt * remove more and more in modular_diffllama.pt * resolve missing docstring entries * force reset * convert modular --------- Co-authored-by: Minho Ryu <ryumin93@gmail.com>	2025-01-07 11:34:56 +01:00
Dmitry Rogozhkin	9fd123ac31	ci: mark model_parallel tests as cuda specific (#35269 ) `parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: #35252 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-07 10:16:34 +01:00
pglorio	bd442c6d3a	Zamba new attention standard (#35375 ) * updated zamba to new attention standard * make fixup fixes	2025-01-07 10:08:45 +01:00
Sarthak Karandikar	ca00950057	added logic for deleting adapters once loaded (#34650 ) * added logic for deleting adapters once loaded * updated to the latest version of transformers, merged utility function into the source * updated with missing check * added peft version check * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * changes according to reviewer * added test for deleting adapter(s) * styling changes * styling changes in test * removed redundant code * formatted my contributions with ruff * optimized error handling * ruff formatted with correct config * resolved formatting issues --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-01-06 18:36:40 +00:00
Yijun Lee	e5fd865eba	Add Gemma2 GGUF support (#34002 ) * initial setup for ggml.py * initial setup of GGUFGemma2Converter class * Add gemma2 model to gguf.md doc * Partial work on GGUF_TENSOR_MAPPING * initial setup of GGUF_TENSOR_MAPPING for Gemma2 * refactor: rename GemmaConvert class to GemmaConverter for naming consistency * feat: complete gemma2 tensor mapping implementation * feat: add initial implementation of GGUFGemmaConverter * feat: complete GGUFGemmaConverter implementation * feat: add test code for gemma2 * refactor: minor code cleanup * refactor: minor code cleanup * fix: resolve suggestions * Update tests/quantization/ggml/test_ggml.py Co-authored-by: Isotr0py <2037008807@qq.com> --------- Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-03 14:50:07 +01:00
Jacky Lee	30a9971632	Use `sdpa_kernel` in tests (#35472 ) * update: use sdpa_kernel * update: rerun test	2025-01-03 14:39:52 +01:00
Blanchon	cba49cb2a6	Change `is_soundfile_availble` to `is_soundfile_available` (#35030 )	2025-01-03 14:37:42 +01:00
Matthew Douglas	6b1e86fd4d	Fix new BNB test failures (#35345 )	2025-01-02 11:24:52 +01:00
NielsRogge	6e0515e99c	Add DINOv2 with registers (#35348 ) * added changes from 32905 * fixed mistakes caused by select all paste * rename diff_dinov2... * ran tests * Fix modular * Fix tests * Use new init * Simplify drop path * Convert all checkpoints * Add figure and summary * Update paths * Update docs * Update docs * Update toctree * Update docs --------- Co-authored-by: BernardZach <bernardzach00@gmail.com> Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com>	2024-12-24 13:21:59 +01:00
Yoni Gozlan	93aafdc620	Add compile test for fast image processor (#35184 ) * add compile test for fast image processor * override pixtral test	2024-12-23 13:12:45 -05:00
Miquel Farré	a1780b7ba5	bugfix Idefics3 processor - handle gracefully cases with text and no images (#35363 ) * bugfix processing empty images * fix * fix * Update src/transformers/models/idefics3/processing_idefics3.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * adding tests * fix * fix * fix --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2024-12-23 16:59:01 +01:00
Andrei Panferov	64c05eecd6	HIGGS Quantization Support (#34997 ) * higgs init * working with crunches * per-model workspaces * style * style 2 * tests and style * higgs tests passing * protecting torch import * removed torch.Tensor type annotations * torch.nn.Module inheritance fix maybe * hide inputs inside quantizer calls * style structure something * Update src/transformers/quantizers/quantizer_higgs.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * reworked num_sms * Update src/transformers/integrations/higgs.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * revamped device checks * docstring upd * Update src/transformers/quantizers/quantizer_higgs.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * edited tests and device map assertions * minor edits * updated flute cuda version in docker * Added p=1 and 2,3bit HIGGS * flute version check update * incorporated `modules_to_not_convert` * less hardcoding * Fixed comment * Added docs * Fixed gemma support * example in docs * fixed torch_dtype for HIGGS * Update docs/source/en/quantization/higgs.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Collection link * dequantize interface * newer flute version, torch.compile support * unittest message fix * docs update compile * isort * ValueError instead of assert --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2024-12-23 16:54:49 +01:00
Mohamed Mekkouri	59178780a6	Fix : VPTQ test (#35394 ) fix_test	2024-12-23 16:27:46 +01:00
Tibor Reiss	e10be82b71	uniformize kwargs for SAM (#34578 ) * Make kwargs uniform for SAM * Remove unused attribute * Make point_pad_value part of image_kwargs * Update annotations * Code review - use existing methods * Use ProcessorTesterMixin * Do not add ProcessorTesterMixin everywhere	2024-12-23 13:54:57 +01:00
bastrob	8f38f58f3d	owlvit/2 dynamic input resolution (#34764 ) * owlvit/2 dynamic input resolution. * adapt box grid to patch_dim_h patch_dim_w * fix ci * clarify variable naming * clarify variable naming.. * compute box_bias dynamically inside box_predictor * change style part of code * [run-slow] owlvit, owlv2	2024-12-21 08:51:09 +00:00
Yih-Dar	504c4d3692	Make `test_generate_with_static_cache` even less flaky (#34995 ) * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-20 16:03:26 +01:00
Yih-Dar	05de764e9c	Aurevoir PyTorch 1 (#35358 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-20 14:36:31 +01:00
Sigbjørn Skjæret	eafbb0eca7	Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931 ) * Add AsyncTextIteratorStreamer class * export AsyncTextIteratorStreamer * export AsyncTextIteratorStreamer * improve docs * missing import * missing import * doc example fix * doc example output fix * add pytest-asyncio * first attempt at tests * missing import * add pytest-asyncio * fallback to wait_for and raise TimeoutError on timeout * check for TimeoutError * autodoc * reorder imports * fix style --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-20 12:08:12 +01:00
wejoncy	4e27a4009d	FEAT : Adding VPTQ quantization method to HFQuantizer (#34770 ) * init vptq * add integration * add vptq support fix readme * add tests && format * format * address comments * format * format * address comments * format * address comments * remove debug code * Revert "remove debug code" This reverts commit `ed3b3eaaba`. * fix test --------- Co-authored-by: Yang Wang <wyatuestc@gmail.com>	2024-12-20 09:45:53 +01:00
Anton Vlasjuk	5a2aedca1e	[`Mamba2`] Fix caching, slow path, and multi-gpu (#35154 ) * fixup mamba2 - caching and several other small fixes * fixup cached forward * correct fix this time * fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step) * remove unnecessary (un)squeeze * fixup cache position * simplify a few things * [run-slow] mamba2 * multi gpu attempt two * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * add newer slow path fix * [run-slow] mamba2	2024-12-20 09:27:47 +01:00
Arthur	1fa807fa63	Fix some fa2 tests (#35340 ) * remove fa2 test * remove other failing tests * style	2024-12-19 17:05:25 +01:00
Benjamin Warner	667ed5635e	Add ModernBERT to Transformers (#35158 ) * initial cut of modernbert for transformers * small bug fixes * fixes * Update import * Use compiled mlp->mlp_norm to match research implementation * Propagate changes in modular to modeling * Replace duplicate attn_out_dropout in favor of attention_dropout cc @warner-benjamin let me know if the two should remain separate! * Update BOS to CLS and EOS to SEP Please confirm @warner-benjamin * Set default classifier bias to False, matching research repo * Update tie_word_embeddings description * Fix _init_weights for ForMaskedLM * Match base_model_prefix * Add compiled_head to match research repo outputs * Fix imports for ModernBertForMaskedLM * Just use "gelu" default outright for classifier * Fix config name typo: initalizer -> initializer * Remove some unused parameters in docstring. Still lots to edit there! * Compile the embeddings forward Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model. But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case. * Add drafts for ForSequenceClassification/ForTokenClassification * Add initial SDPA support (not exactly equivalent to FA2 yet!) During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible. * Only use attention dropout if training * Add initial eager attention support (also not equivalent to FA2 yet!) Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value. Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value The fill-mask results are good with eager. * Add initial tests, output_attentions, output_hidden_states, prune_heads Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped * Remove kwargs from ModernBertForMaskedLM Disable sparse_prediction by default to match the normal HF, can be enabled via config * Remove/adjust/skip improper tests; warn if padding but no attn mask * Run formatting etc. * Run python utils/custom_init_isort.py * FlexAttention with unpadded sequences(matches FA2 within bf16 numerics) * Reformat init_weights based on review * self -> module in attention forwards * Remove if config.tie_word_embeddings * Reformat output projection on a different line * Remove pruning * Remove assert * Call contiguous() to simplify paths * Remove prune_qkv_linear_layer * Format code * Keep as kwargs, only use if needed * Remove unused codepaths & related config options * Remove 3d attn_mask test; fix token classification tuple output * Reorder: attention_mask above position_ids, fixes gradient checkpointing * Fix usage if no FA2 or torch v2.5+ * Make torch.compile/triton optional Should we rename 'compile'? It's a bit vague * Separate pooling options into separate functions (cls, mean) - cls as default * Simplify _pad_modernbert_output, remove unused labels path * Update tied weights to remove decoder.weight, simplify decoder loading * Adaptively set config.compile based on hf_device_map/device/resize, etc. * Update ModernBertConfig docstring * Satisfy some consistency checks, add unfinished docs * Only set compile to False if there's more than 1 device * Add docstrings for public ModernBert classes * Dont replace docstring returns - ends up being duplicate * Fix mistake in toctree * Reformat toctree * Patched FlexAttention, SDPA, Eager with Local Attention * Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2 * Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"' * Repad all_hidden_states as well * rename config.compile to reference_compile * disable flex_attention since it crashes * Update modernbert.md * Using dtype min to mask in eager * Fully remove flex attention for now It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa. Also, update compile -> reference_compile in one more case * Call contiguous to allow for .view() * Copyright 2020 -> 2024 Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update/simplify __init__ structure Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove "... if dropout_prob > 0 else identity" As dropout with 0.0 should be efficient like identity * re-use existing pad/unpad functions instead of creating new ones * remove flexattention method * Compute attention_mask and local_attention_mask once in modeling * Simplify sequence classification prediction heads, only CLS now Users can make custom heads if they feel like it Also removes the unnecessary pool parameter * Simplify module.training in eager attn * Also export ModernBertPreTrainedModel * Update the documentation with links to finetuning scripts * Explain local_attention_mask parameter in docstring * Simplify _autoset_attn_implementation, rely on super() * Keep "in" to initialize Prediction head Doublechecked with Benjamin that it's correct/what we used for pretraining * add back mean pooling * Use the pooling head in TokenClassification * update copyright * Reset config._attn_implementation_internal on failure * Allow optional attention_mask in ForMaskedLM head * fix failing run_slow tests * Add links to the paper * Remove unpad_no_grad, always pad/unpad without gradients * local_attention_mask -> sliding_window_mask * Revert "Use the pooling head in TokenClassification" This reverts commit `99c38badd1`. There was no real motivation, no info on whether having this bigger head does anything useful. * Simplify pooling, 2 options via if-else --------- Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com> Co-authored-by: Benjamin Clavié <ben@clavie.eu> Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-19 14:03:35 +01:00
Yu Chin Fabian Lim	9613933b02	Add the Bamba Model (#34982 ) * initial commit for PR Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> * rename dynamic cache Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more unit tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Add modular bamba file * Remove trainer changes from unrelated PR * Modify modular and cofig to get model running * Fix some CI errors and beam search * Fix a plethora of bugs from CI/docs/etc * Add bamba to models with special caches * Updat to newer mamba PR for mamba sublayer * fix test_left_padding_compatibility Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix remaining tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * missed this test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * ran make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * move slow tag to integration obj Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * address comments Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * left out one part of modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * change model Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Make Rotary modular as well * Update bamba.md Added overview, update Model inference card and added config * Update bamba.md * Update bamba.md * Update bamba.md Minor fixes * Add docs for config and model back Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Add warning when using fast kernels * replaced generate example Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Address comments from PR Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Propagate attention fixes Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix attention interfaces to the new API Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix API for decoder layer Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Remove extra weights Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> Co-authored-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com> Co-authored-by: Antoni Viros <ani300@gmail.com>	2024-12-18 20:18:17 +01:00
Arthur	2c47618c1a	🚨All attention refactor🚨 (#35235 ) * refactor LlamaAttention * minimal changes * fix llama * update * modular gemmas * modular nits * modular updates * nits * simplify * gpt2 * more modualr and fixes * granite * modular modular modular * nits * update * qwen2 + starcoder2 * mostly gemma2 * Update image_processing_auto.py * fix * Update modular_starcoder2.py * fix * remove all copied from attentions * remove gcv * make fix-copies * oups * oups2.0 * fix some modulars + all copied from * should be good now * revert unwanted changes * Update modeling_decision_transformer.py * finish cleanup * Update modeling_olmo.py * consistency * re-add gradient checkpointing attribute * fix * style * make config necessary * bis * bis * Update modeling_my_new_model2.py * is_causal attr * fix * remove past kv return from decoder layer * fix * default rope config * correctly fix rope config * fix bias * fix gpt2 attention output * fix test * fix inits * fix default sdpa * fix default sdpa implementation * harmonize classes * fix mistral * fix sliding window models * mixtral * be more explicit * style * fix * several fixes * Update modeling_dbrx.py * fix test * olmo + phi * rotary * syle * phi * phi again * again * kwargs * Update test_modeling_common.py * skip fx tracing tests * Update modeling_utils.py * gemma 2 * again * Update modeling_recurrent_gemma.py * gemma2 * granite * style * starcoder * Update sdpa_attention.py * switch args * Update modeling_mllama.py * fix * cache type tests * gpt2 * Update test_modeling_common.py * fix * consistency * fix shape with encoder * should be the last one * tests non model * most comments * small oupsi * be more explicit in modulars * more explicit modulars * CIs! it works locally * add kwargs to _flash_attention_forward --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2024-12-18 16:53:39 +01:00
jiqing-feng	69e31eb1bf	change bnb tests (#34713 ) * fix training tests * fix xpu check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm pdb Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit logits check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit logits check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add xpu check on int8 training * fix training tests * add llama test on bnb Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * only cpu and xpu disable autocast training Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>	2024-12-18 09:49:59 -05:00
eustlb	da334bcfa8	[Whisper] 🚨 Fix whisper decoding 🚨 (#34135 ) * do not remove decoder_input_ids for the first segment * do not remove eos token in generate_with_fallback * when removing padding tokens, do not remove eos token * remove eos token in generate (and not in generate_with_fallback!) * reconciliate short-from/ long-form behavior * correct avg_logprobs calculation * handle eos token in segments * handle decoder_input_ids and eos token in _prepare_decoder_input_ids * fix incorrect time precision * always remove eos token * always remove decoder_input_ids * no need to handle decoder_inputs_ids and eos token * no need to remove decoder_input_ids * no need to handle eos token * fix num_beams in _retrieve_logit_processors * remove todo unconsistency * no need to add eos token * last_timestamp_pos should indeed be timestamp token pos * patch generate to enable compatibility with GenerationTesterMixin tests * adapt test_generate_continue_from_past_key_values * adapt test_prompt_lookup_decoding_matches_greedy_search * adapt generic GenerationMixin tests to whisper's generate * fix speculative decoding * fix * [run-slow] whisper * change HF_HUB_TOKEN for require_read_token * [run-slow] whisper * prioritize kwargs over generation_config * remove unnecessary args * [run-slow] whisper * update tests * [run-slow] whisper * add comment * update test * [run-slow] whisper * update test + revert require_read_token * docstring updates * revert tokenizer decode args change * do not use a patch + docstring updates * [run-slow] whisper * make * [run-slow] whisper * add a flag to force unique call to generate * test update * [run-slow] whisper * add force_unique_generate_call arg * do not use a patch * correct the timestamps for the pad tokens * docstring update * docstring update * docstring update * upodate TF tests * add require_read_token * [run-slow] whisper * test reset dynamo * [run-slow] whisper * fix * [run-slow] whisper * avoid iterating twice on current_segments * [run-slow] whisper * [run-slow] whisper --------- Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-18 14:13:21 +01:00
Fanli Lin	c7e48053aa	[tests] make cuda-only tests device-agnostic (#35222 ) fix cuda-only tests	2024-12-18 10:14:22 +01:00
Marc Sun	1eee1cedfd	Fix loading with only state dict and low_cpu_mem_usage = True (#35217 ) * fix loading with only state dict and config * style * add tests --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>	2024-12-18 09:54:32 +01:00
Magnus	6eb00dd2f0	Support for SDPA for SAM models (#34110 ) * feat: add support for sdpa and gradient checkpointing * fix: ruff format * fix: config sdpa * fix: sdpa layer naming convention * fix: update test_eager_matches_sdpa_inference to handle vision_hidden_states * test: skip incompatible tests and fix loading issue with sdpa - Updated tests to skip cases flash and dynamic compile. - Minor adjustment to ensure correct loading of model with sdpa for dispatch test. * style: apply Ruff formatting * ruff fix again after rebase * [run-slow] sam * [run-slow] sam * refactor: Address review comments and improve sub-config handling in SAM model tests - Added attributes for sub_configs as per PR #34410. - Enabled tests for configs, ensuring the composite model (SAM) has several sub-configs in the main config. - Added class attribute _is_composite=True to the tester class - test_sdpa_can_dispatch_composite_models added * [run-slow] sam * style: ruff * [run-slow] sam * style: ruff again ... * [run-slow] sam	2024-12-17 14:46:05 +01:00
Omar Salman	747f361da1	Add sdpa for Beit (#34941 ) * Add sdpa for Beit * Updates * [run-slow] beit * Update inference benchmarks * Update * Fix - add missed to super().forward() * Updates * Fix missing import	2024-12-17 14:44:47 +01:00
Tony Wu	f33a0cebb3	Add ColPali to 🤗 transformers (#33736 ) * feat: run `add-new-model-like` * feat: add paligemma code with "copied from" * feat: add ColPaliProcessor * feat: add ColPaliModel * feat: add ColPaliConfig * feat: rename `ColPaliForConditionalGeneration` to `ColPaliModel` * fixup modeling colpali * fix: fix root import shortcuts * fix: fix `modeling_auto` dict * feat: comment out ColPali test file * fix: fix typos from `add-new-model-like` * feat: explicit the forward input args * feat: move everything to `modular_colpali.py` * fix: put back ColPaliProcesor * feat: add auto-generated files * fix: run `fix-copies` * fix: remove DOCStRING constants to make modular converter work * fix: fix typo + modular converter * fix: add missing imports * feat: no more errors when loading ColPaliModel * fix: remove unused args in forward + tweak doc * feat: rename `ColPaliModel` to `ColPaliForRetrieval` * fix: apply `fix-copies` * feat: add ColPaliProcessor to `modular_colpali` * fix: run make quality + make style * fix: remove duplicate line in configuration_auto * feat: make ColPaliModel inehrit from PaliGemmaForConditionalGeneration * fix: tweak and use ColPaliConfig * feat: rename `score` to `post_process_retrieval` * build: run modular formatter + make style * feat: convert colpali weights + fixes * feat: remove old weight converter file * feat: add and validate tests * feat: replace harcoded path to "vidore/colpali-v1.2-hf" in tests * fix: add bfloat16 conversion in weight converter * feat: replace pytest with unittest in modeling colpali test * feat: add sanity check for weight conversion (doesn't work yet) * feat: add shape sanity check in weigth converter * feat: make ColPaliProcessor args explicit * doc: add doc for ColPali * fix: trying to fix output mismatch * feat: tweaks * fix: ColPaliModelOutput inherits from ModelOutput instead of PaliGemmaCausalLMOutputWithPast * fix: address comments on PR * fix: adapt tests to the Hf norm * wip: try things * feat: add `__call__` method to `ColPaliProcessor` * feat: remove need for dummy image in `process_queries` * build: run new modular converter * fix: fix incorrect method override * Fix tests, processing, modular, convert * fix tokenization auto * hotfix: manually fix processor -> fixme once convert modular is fixed * fix: convert weights working * feat: rename and improve convert weight script * feat: tweaks * fest: remove `device` input for `post_process_retrieval` * refactor: remove unused `get_torch_device` * Fix all tests * docs: update ColPali model doc * wip: fix convert weights to hf * fix logging modular * docs: add acknowledgements in model doc * docs: add missing docstring to ColPaliProcessor * docs: tweak * docs: add doc for `ColPaliForRetrievalOutput.forward` * feat: add modifications from colpali-engine v0.3.2 in ColPaliProcessor * fix: fix and upload colapli hf weights * refactor: rename `post_process_retrieval` to `score_retrieval` * fix: fix wrong typing for `score_retrieval` * test: add integration test for ColPali * chore: rerun convert modular * build: fix root imports * Update docs/source/en/index.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix: address PR comments * wip: reduce the prediction gap in weight conversion * docs: add comment in weight conversion script * docs: add example for `ColPaliForRetrieval.forward` * tests: change dataset path to the new one in hf-internal * fix: colpali weight conversion works * test: add fine-grained check for ColPali integration test * fix: fix typos in convert weight script * docs: move input docstring in a variable * fix: remove hardcoded torch device in test * fix: run the new modular refactor * docs: fix python example for ColPali * feat: add option to choose `score_retrieval`'s output dtype and device * docs: update doc for `score_retrieval` * feat: add `patch_size` property in ColPali model * chore: run `make fix-copies` * docs: update description for ColPali cookbooks * fix: remove `ignore_index` methods * feat: remove non-transformers specific methods * feat: update `__init__.py` to new hf format * fix: fix root imports in transformers * feat: remove ColPali's inheritance from PaliGemma * Fix CI issues * nit remove prints * feat: remove ColPali config and model from `modular_colpali.py` * feat: add `ColPaliPreTrainedModel` and update modeling and configuration code * fix: fix auto-removed imports in root `__init__.py` * fix: various fixes * fix: fix `_init_weight` * temp: comment `AutoModel.from_config` for experiments * fix: add missing `output_attentions` arg in ColPali's forward * fix: fix `resize_token_embeddings` * fix: make `input_ids` optional in forward * feat: rename `projection_layer` to `embedding_proj_layer` * wip: fix convert colpali weight script * fix tests and convert weights from original repo * fix unprotected import * fix unprotected torch import * fix style * change vlm_backbone_config to vlm_config * fix unprotected import in modular this time * fix: load config from Hub + tweaks in convert weight script * docs: move example usage from model docstring to model markdown * docs: fix input docstring for ColPali's forward method * fix: use `sub_configs` for ColPaliConfig * fix: remove non-needed sanity checks in weight conversion script + tweaks * fix: fix issue with `replace_return_docstrings` in ColPali's `forward` * docs: update docstring for `ColPaliConfig` * test: change model path in ColPali test * fix: fix ColPaliConfig * fix: fix weight conversion script * test: fix expected weights for ColPali model * docs: update ColPali markdown * docs: fix minor typo in ColPaliProcessor * Fix tests and add _no_split_modules * add text_config to colpali config * [run slow] colpali * move inputs to torch_device in integration test * skip test_model_parallelism * docs: clarify quickstart snippet in ColPali's model card * docs: update ColPali's model card --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2024-12-17 11:26:43 +01:00
Mohamed Mekkouri	85eb339231	Fix : model used to test ggml conversion of Falcon-7b is incorrect (#35083 ) fixing test model	2024-12-16 13:21:44 +01:00
Yoni Gozlan	5615a39369	Fall back to slow image processor in ImageProcessingAuto when no fast processor available (#34785 ) * refactor image_processing_auto logic * fix fast image processor tests * Fix tests fast vit image processor * Add safeguard when use_fast True and torchvision not available * change default use_fast back to None, add warnings * remove debugging print * call get_image_processor_class_from_name once	2024-12-15 14:00:36 -05:00
Fanli Lin	bdd4201fdb	[tests] fix "Tester object has no attribute '_testMethodName'" (#34910 ) * add more cases * fix method not found in unittest Signed-off-by: Lin, Fanli <fanli.lin@intel.com> * fix more cases * add more models * add all * no unittest.case * remove for oneformer * fix style --------- Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2024-12-13 14:33:45 +01:00
nhamanasu	3d213b57fe	skip Fuyu from test_generate (#35246 ) * skip Fuyu from test_generate * make fixup, quality, repo-consistency	2024-12-13 10:12:49 +01:00
alexrs-cohere	64478c7631	Add Cohere2 model (#35224 )	2024-12-13 09:35:50 +01:00
George	e4e404fdd0	Run model as compressed/uncompressed mode (#34719 ) * draft, run model as compreszed/uncompressed mode * draft * run run_compressed=False * run_compressed as attr * set run_compressed=False using quantization_config * remove redundant line * make is_qat_trainable dependent on run_compressed status * add tests * lint * full in docstring * add decompress * comments * decompress if model is compresssed and not run_compressed * apply_quant_config logic fix -- populate statedict properly * comments * remove non compressed model * make is_compressed as property * cosmetic * run apply_quant_config for non-compressed models -- popualte scales and zeropoints * add pahtway for decompressing sparse models * typo on is_quantization_compressed * lint * fix typo	2024-12-13 08:23:31 +01:00
Nadav Timor	e3ee49fcfb	Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability (#35009 ) * move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * NOTHING. add space to rerun github actions tests * remove it... * replace: `self.prev_tokens` -> `self.prev_assistant_ids` * NOTHING. rerun CI tests * remove it * introduce `self.prev_target_ids_len` * fix style * fix style --------- Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>	2024-12-12 15:47:05 +01:00
Yoach Lacombe	6181c6b095	Fix seamless TTS generate (#34968 ) * fix seamless tts generate * apply same fix for v2 * [run-slow] seamless_m4t, seamless_m4t_v2 * remove TODO * [run-slow] seamless_m4t, seamless_m4t_v2 * [run-slow] seamless_m4t, seamless_m4t_v2 * ignore failing test on multigpus * [run-slow] seamless_m4t, seamless_m4t_v2 * [run-slow] seamless_m4t, seamless_m4t_v2	2024-12-11 15:38:42 +01:00
Pavel Iakubovskii	5fcf6286bf	Add TimmWrapper (#34564 ) * Add files * Init * Add TimmWrapperModel * Fix up * Some fixes * Fix up * Remove old file * Sort out import orders * Fix some model loading * Compatible with pipeline and trainer * Fix up * Delete test_timm_model_1/config.json * Remove accidentally commited files * Delete src/transformers/models/modeling_timm_wrapper.py * Remove empty imports; fix transformations applied * Tidy up * Add image classifcation model to special cases * Create pretrained model; enable device_map='auto' * Enable most tests; fix init order * Sort imports * [run-slow] timm_wrapper * Pass num_classes into timm.create_model * Remove train transforms from image processor * Update timm creation with pretrained=False * Fix gamma/beta issue for timm models * Fixing gamma and beta renaming for timm models * Simplify config and model creation * Remove attn_implementation diff * Fixup * Docstrings * Fix warning msg text according to test case * Fix device_map auto * Set dtype and device for pixel_values in forward * Enable output hidden states * Enable tests for hidden_states and model parallel * Remove default scriptable arg * Refactor inner model * Update timm version * Fix _find_mismatched_keys function * Change inheritance for Classification model (fix weights loading with device_map) * Minor bugfix * Disable save pretrained for image processor * Rename hook method for loaded keys correction * Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm` * Managing num_labels <-> num_classes attributes * Enable loading checkpoints in Trainer to resume training * Update error message for output_hidden_states * Add output hidden states test * Decouple base and classification models * Add more test cases * Add save-load-to-timm test * Fix test name * Fixup * Add do_pooling * Add test for do_pooling * Fix doc * Add tests for TimmWrapperModel * Add validation for `num_classes=0` in timm config + test for DINO checkpoint * Adjust atol for test * Fix docs * dev-ci * dev-ci * Add tests for image processor * Update docs * Update init to new format * Update docs in configuration * Fix some docs in image processor * Improve docs for modeling * fix for is_timm_checkpoint * Update code examples * Fix header * Fix typehint * Increase tolerance a bit * Fix Path * Fixing model parallel tests * Disable "parallel" tests * Add comment for metadata * Refactor AutoImageProcessor for timm wrapper loading * Remove custom test_model_outputs_equivalence * Add require_timm decorator * Fix comment * Make image processor work with older timm versions and tensor input * Save config instead of whole model in image processor tests * Add docstring for `image_processor_filename` * Sanitize kwargs for timm image processor * Fix doc style * Update check for tensor input * Update normalize * Remove _load_timm_model function --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>	2024-12-11 12:40:30 +00:00
Benjamin Bossan	bcc50cc7ce	[PEFT] Better Trainer error when prompt learning with loading best model at the end (#35087 ) Original issue: https://github.com/huggingface/peft/issues/2256 There is a potential error when using load_best_model_at_end=True with a prompt learning PEFT method. This is because Trainer uses load_adapter under the hood but with some prompt learning methods, there is an optimization on the saved model to remove parameters that are not required for inference, which in turn requires a change to the model architecture. This is why load_adapter will fail in such cases and users should instead set load_best_model_at_end=False and use PeftModel.from_pretrained. As this is not obvious, we now intercept the error and add a helpful error message.	2024-12-11 12:44:39 +01:00
Cyril Vallez	d363e71d0e	🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#34858 ) * update * style * fix missing args * remove last trace of old rope classes * remove deprecated copied from * fix copies * trigger CIs * post rebase clean-up * reverse mistral * cleanup after dropping commits * Add comment	2024-12-11 11:16:52 +01:00
Gallil Maimon	6acb4e43a7	Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389 ) * Support BatchNorm in Hubert pos_conv_emb as in fairseq * Correct the new defaults (#34377) * Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue * [auto. ping] Avoid sending empty info + add more team members (#34383) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix glm (#34388) * Fix duplicated * fix import * Use non nested images and batched text Idefics2/3 (#34222) * add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests * Fix onnx non-expotable inplace aten op (#34376) * fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies * Fix right padding in LLaVA models (#34305) * fix right pad llavas * device mismatch * no filter (#34391) * no filter * no filter * no filter --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * SynthID: better example (#34372) * better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits * Tests: upgrade `test_eager_matches_sdpa_generate` (#34386) * Fix bnb training test failure (#34414) * Fix bnb training test: compatibility with OPTSdpaAttention * Avoid check expected exception when it is on CUDA (#34408) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix typos in agents_advanced.md (#34405) * [docs] Cache implementations (#34325) cache * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * [run-slow] hubert --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-12-10 14:18:23 +01:00
Matthew Douglas	34f4080ff5	[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172 )	2024-12-09 13:55:16 -05:00
Mohamed Mekkouri	7238387f67	Fix typo in EETQ Tests (#35160 ) fix	2024-12-09 14:13:36 +01:00
kang sheng	1ccca8f48c	Fix GA loss bugs and add unit test (#35121 ) * fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>	2024-12-09 09:57:41 +01:00
Pavel Iakubovskii	c8c8dffbe4	Update I-JEPA checkpoints path (#35120 ) Update checkpoints path	2024-12-06 13:42:51 +00:00
Aymeric Roucher	9ad4c93536	Add Aria (#34157 ) * Add Aria --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-06 12:17:34 +01:00
Pablo Montalvo	a5bb528471	Fix signatures for processing kwargs (#35105 ) * add conversion script * remove pg2 refs * fixup style * small update * get correct scaling * add back missing bos * fix missing config keys * might revert this pos_embeddings * fixup 9b config * fix 9b * fixup 9b conversion for good + add back num_hidden_layers * add correct query scaling for 2b, 9b, 27b * fixup 27b conversion * Additional variant: 27b-896 * Use CPU for conversion to reduce GPU RAM requirements * fix causal mask generation + formatting * fix in-training causal mask generation edge case * trigger CI * update config * update config * update config * update config * update config * update config * update config * update config * update config * move conversion file to main model dir * handle multi-images + bos token * address comments for input ids * revert ci fixes * [run-slow] paligemma * fix * [run-slow] paligemma * skip end 2 end * [run-slow] paligemma --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-05 18:15:48 +01:00
Jonathan Mamou	e27465c801	Adaptive dynamic number of speculative tokens (#34156 ) * initial commit * update strategy * add tradeoff FPR TPR with cost * all probs * fix * fix * fix style * Update src/transformers/generation/configuration_utils.py shorter docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * import guard * fix style * add is_sklearn_available condition * vectorizing to flatten the for-loop * fix style * disable adaptation for UAG * update doc * add TestAssistedCandidateGeneratorUpdateStrategy * fix style * protect import * fix style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-12-05 17:07:33 +01:00
Yih-Dar	b0a51e5cff	Fix flaky Hub CI (`test_trainer.py`) (#35062 ) * fix * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * fix * fix * fix * fix * fix * fix * fix * fix * check * check * check * check * check * check * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * check * check * check * Final space * Final adjustment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-12-05 17:02:27 +01:00
João Marcelo	50189e36a6	Add I-JEPA (#33125 ) * first draft * add IJepaEmbeddings class * fix copy-from for IJepa model * add weight conversion script * update attention class names in IJepa model * style changes * Add push_to_hub option to convert_ijepa_checkpoint function * add initial tests for I-JEPA * minor style changes to conversion script * make fixup related * rename conversion script * Add I-JEPA to sdpa docs * minor fixes * adjust conversion script * update conversion script * adjust sdpa docs * [run_slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * formatting issues * adjust modeling to modular code * add IJepaModel to objects to ignore in docstring checks * [run-slow] ijepa * fix formatting issues * add usage instruction snippet to docs * change pos encoding, add checkpoint for doc * add verify logits for all models * [run-slow] ijepa * update docs to include image feature extraction instructions * remove pooling layer from IJepaModel in image classification class * [run-slow] ijepa * remove pooling layer from IJepaModel constructor * update docs * [run-slow] ijepa * [run-slow] ijepa * small changes * [run-slow] ijepa * style adjustments * update copyright in init file * adjust modular ijepa * [run-slow] ijepa	2024-12-05 16:14:46 +01:00
eustlb	54aae121eb	[Whisper] Fix whisper tokenizer (#34537 ) * handle single timestamp ending * include last timestamp token * handle single timestamp ending * avoid floating points arithm limitations * ensure float64 operations * new test * make fixup * make copies * handle edge case double tokens ending with different tokens * handle single timestamp ending * make fixup * handle conditioning on prev segments * fix * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * [run-slow] whisper * don't call item() to avoid unnecessary sync * fix --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>	2024-12-05 13:46:29 +01:00
Anton Vlasjuk	46df859975	[`GPTNeoX`] Flex Attention + Refactor (#34896 ) * gpt neox flex attention + refactor * some formatting * small fix on dropout * add assertion on flex attn test * flaky ci :( * add head mask support * style * handle dtype, replace torch where * fixup flex with output attns * code review and several other fixes * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style * remove unnecessary comment * remove incorrect comment * make flex attn check more agnostic tor versions and centralized * change peft input dtype check to value since q and k could be affected by other stuff like RoPE * i forgor * flaky * code review and small fixes * Update src/transformers/models/gpt_neox/modeling_gpt_neox.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-04 14:48:28 +01:00
Wang, Yi	125de41643	fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… (#34454 ) * fix speecht5 failure issue in test_peft_gradient_checkpointing_enable_disable Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * [run-slow] speecht5 --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-12-03 13:58:54 +00:00
Aymeric Roucher	901f504580	Add token cost + runtime monitoring to Agent and HfEngine children (#34548 ) * Add monitoring to Agent and HfEngine children	2024-12-03 13:14:52 +01:00
Dmitry Rogozhkin	31830474bf	Fix `test_eager_matches_sdpa_inference` for `XPU` backend (#34889 ) * Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Fix test_eager_matches_sdpa_inference for XPU backend As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH which is implemented on PyTorch level using aten operators and is device agnostic with respect to implementation of each aten operator. Thus, we can reuse CUDA (or CPU) MATH weights for XPU. Fixes: #34888 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-12-02 16:21:04 +01:00
Tibor Reiss	89d7bf584f	🚨🚨🚨 Uniformize kwargs for TrOCR Processor (#34587 ) * Make kwargs uniform for TrOCR * Add tests * Put back current_processor * Remove args * Add todo comment * Code review - breaking change	2024-11-29 11:58:11 +00:00
Michael Goin	9d6f0ddcec	Add optimized `PixtralImageProcessorFast` (#34836 ) * Add optimized PixtralImageProcessorFast * make style * Add dummy_vision_object * Review comments * Format * Fix dummy * Format * np.ceil for math.ceil	2024-11-28 16:04:05 +01:00
Raushan Turganbay	5e8c1d713d	Offloaded cache: fix generate (#34921 ) * fix cache impl * require_torch_gpu * fix mamba * fix copies	2024-11-28 15:05:56 +01:00
xinpengzz	44af935ec5	Refine the code of Universal Assisted Generation (#34823 ) * removed the useless attritbutes * add configs for window size * fixed the wrong kwargs * added docstring	2024-11-28 15:04:24 +01:00
Benjamin Bossan	f4b674f269	[PEFT] Set eval mode when loading PEFT adapter (#34509 ) * [PEFT] Set eval mode when loading PEFT adapter Resolves #34469 When calling model.load_adapter to load a PEFT adapter, by default the adapter should be set to eval mode. This is now correctly done. Users can still pass is_trainable=True to load the adapter in training mode. * Linter	2024-11-28 13:56:25 +01:00
Arthur	4c1388f48e	[`FlexAttention`] Update gemma2 (#34942 ) * update tests * now maybe this fixes the previous fialing tests! * nit default * Update src/transformers/models/gemma2/modular_gemma2.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * fix-copies --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2024-11-27 11:50:48 +01:00
Matt	d5cf91b346	Separate chat templates into a single file (#33957 ) * Initial draft * Add .jinja file loading for processors * Add processor saving of naked chat template files * make fixup * Add save-load test for tokenizers * Add save-load test for tokenizers * stash commit * Try popping the file * make fixup * Pop the arg correctly * Pop the arg correctly * Add processor test * Fix processor code * stash commit * Processor clobbers child tokenizer's chat template * Processor clobbers child tokenizer's chat template * make fixup * Split processor/tokenizer files to avoid interactions * fix test * Expand processor tests * Rename arg to "save_raw_chat_template" across all classes * Update processor warning * Move templates to single file * Move templates to single file * Improve testing for processor/tokenizer clashes * Improve testing for processor/tokenizer clashes * Extend saving test * Test file priority correctly * make fixup * Don't pop the chat template file before the slow tokenizer gets a look * Remove breakpoint * make fixup * Fix error	2024-11-26 14:18:04 +00:00
eustlb	4d1d0f29a4	[Whisper] Fix whisper integration tests (#34111 ) * fix test_tiny_timestamp_generation * fix test_large_timestamp_generation * fix test_whisper_shortform_single_batch_prev_cond * fix test_whisper_shortform_multi_batch_hard_prev_cond * return_timestamps necessary with long form * fix test_default_multilingual_transcription_long_form * fix test_tiny_token_timestamp_generation_longform * fix test_whisper_longform_multi_batch_hard * Update tests/models/whisper/test_modeling_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * fix typo * do not expect special tokens * fix test_whisper_longform_single_batch_beam * fix test_whisper_longform_multi_batch_hard_prev_cond * update test_whisper_longform_multi_batch_hard_prev_cond * update test_whisper_longform_multi_batch_hard_prev_cond * these tests does not make sense anymore * this test does not make sense anymore * make fixup * suggested nits * add test with forced_decoder_ids * this test does not make sense anymore * change assert for unittest test cases * make fixup * test with prompt_ids and task and language * fix unittest test case call * fix test_tiny_generation * fix test_tiny_en_generation * fix test_tiny_en_batched_generation * fix test_tiny_longform_timestamps_generation * fix test_tiny_timestamp_generation * fix test_large_generation * fix test_large_batched_generation * fix test_large_generation_multilingual * fix test_large_timestamp_generation * fix test_large_timestamp_generation * fix test_tiny_token_timestamp_generation_longform * fix test_tiny_en_batched_generation * make fixup * [run-slow] whisper --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-11-26 12:23:08 +01:00
Mohamed Mekkouri	0e805e6d1e	Skipping aqlm non working inference tests till fix merged (#34865 )	2024-11-26 11:09:30 +01:00
Mohamed Mekkouri	890ea7de93	Fix failling GGML test (#34871 ) fix_test	2024-11-25 18:04:52 +01:00
Yih-Dar	a830df2909	Fix `test_auto_backbone_timm_model_from_pretrained` (#34877 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-25 17:20:41 +01:00
jiqing-feng	a464afbe2a	fix static cache data type miss-match (#34799 ) * fix gptj data type missmatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add low precision static cache tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix low-precision static cache tests * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * avoid config change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * change data type convert in cache copy Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * cast key value after k v out Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2024-11-25 16:59:38 +01:00
Mohamed Mekkouri	4e6b19cd95	Fix : BitNet tests (#34895 ) * fix_tests_bitnet * fix format	2024-11-25 16:47:14 +01:00
Shane A	9121ab8fe8	Rename OLMo November to OLMo2 (#34864 ) * Rename/move OLMo Nov files to OLMo2 * Rename Olmo1124 and its variants to Olmo2	2024-11-25 16:31:22 +01:00
Jacky Lee	f4c04ba32b	Fix Qwen2 failing tests (#34819 ) * fix: qwen2 model ids * fix: line * fix: more format * update: reformat	2024-11-25 15:53:04 +01:00
VictorAtIfInsurance	a0f4f3174f	allow unused input parameters passthrough when chunking in asr pipelines (#33889 ) * allow unused parameter passthrough when chunking in asr pipelines * format code * format * run fixup * update tests * update parameters to pipline in test * updates parametrs in tests * change spelling in gitignore * revert .gitignore to main * add git ignore of devcontainer folder * assert asr output follows expected inference output type * run fixup * Remove .devcontainer from .gitignore * remove compliance check	2024-11-25 11:36:44 +01:00
Arthur	857d46ca0c	[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM (#22105 ) * some modification for roadmap * revert some changes * yups * weird * make it work * sttling * fix-copies * fixup * renaming * more fix-copies * move stuff around * remove torch script warnings * ignore copies * revert bad changes * woops * just styling * nit * revert * style fixup * nits configuration style * fixup * nits * will this fix the tf pt issue? * style * ??????? * update * eval? * update error message * updates * style * grumble grumble * update * style * nit * skip torch fx tests that were failing * style * skip the failing tests * skip another test and make style	2024-11-25 10:43:16 +01:00
Raushan Turganbay	098962dac2	BLIP: fix generation after hub update (#34876 ) * fix blip generation * dont remove it yet * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * modular --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-25 10:41:55 +01:00
Raushan Turganbay	c1a8520419	Cache: init empty cache when `use_cache` (#34274 ) * fix * fix tests * fix copies * add docs * Revert "add docs" This reverts commit `32d35634f1`. * qwen move deltas * mllama can potentiall fullgraph compile * enable mllama compile and fix tests * remove mllama fixes	2024-11-25 10:11:33 +01:00
Mohamed Mekkouri	54be2d7ae8	Bitnet test fix to avoid using gated model (#34863 ) small test fix	2024-11-22 17:18:49 +01:00
Nadav Timor	42b36d7395	Speculative decoding: Test the target distribution (to prevent issues like #32867 ) (#34553 ) * Update test_utils.py * formatting * Update test_utils.py * formatting * formatting * Update test_utils.py * formatting * Update test_utils.py * formatting * format * comments at standard positions	2024-11-22 16:02:37 +01:00
farrosalferro	c57eafdaa1	Add Nemotron GGUF Loading Support (#34725 ) * Add Nemotron GGUF Loading Support * fix the Nemotron architecture assignation --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-11-21 11:37:34 +01:00
Raushan Turganbay	28fb02fc05	VLMs: enable generation tests - last batch (#34484 ) * add tests for 3 more vlms * fix fuyu back * skip test	2024-11-21 11:00:22 +01:00
Marc Sun	3cb8676a91	Fix CI by tweaking torchao tests (#34832 )	2024-11-20 20:28:51 +01:00
Marc Sun	67890de3b8	Torchao weights only + prequantized compability (#34355 ) * weights only compability * better tests from code review * ping torch version * add weights_only check	2024-11-20 17:24:45 +01:00
Tibor Reiss	f297af55df	Fix: take into account meta device (#34134 ) * Do not load for meta device * Make some minor improvements * Add test * Update tests/utils/test_modeling_utils.py Update test parameters Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Make the test simpler --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-11-20 11:32:07 +01:00
Phillip Kuznetsov	8cadf76e1c	fix(DPT,Depth-Anything) `torch.export` (#34103 ) * Fix torch.export issue in dpt based models Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Simplify the if statements Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Move activation definitions of zoe_depth to init() Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Add test_export for dpt and zoedepth Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * add depth anything Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Remove zoedepth non-automated zoedepth changes and zoedepth test Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * [run_slow] dpt, depth_anything, zoedepth Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> --------- Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>	2024-11-20 11:31:21 +01:00
Raushan Turganbay	9470d65324	Fix low memory beam search (#34746 ) * fix * higher max positions in tests	2024-11-20 07:46:35 +01:00
Yih-Dar	469eddbe2d	Fix `check_training_gradient_checkpointing` (#34806 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-19 17:48:34 +01:00
Yih-Dar	05ebe8b9b0	Run `test_medium_seamless_m4t_pt` in `subprocess` to avoid many failures (#34812 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-19 17:32:10 +01:00
Yoni Gozlan	eedc113914	Add Image Processor Fast Deformable DETR (#34353 ) * add deformable detr image processor fast * add fast processor to doc * fix copies * nit docstring * Add tests gpu/cpu and fix docstrings * fix docstring * import changes from detr * fix imports * rebase and fix * fix input data format change in detr and rtdetr fast	2024-11-19 11:18:58 -05:00
Yoni Gozlan	b99ca4d28b	Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline (#34562 ) * add support for openai api image_url input * change continue to elif * Explicitely add support for OpenAI/TGI chat format * rewrite content to transformers chat format and add tests * Add support for typing of image type in chat templates * add base64 to possible image types * refactor nesting	2024-11-19 11:08:37 -05:00
Phillip Kuznetsov	5fa4f64605	🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 (#34393 ) * fix(Mask2Former): torch export Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * revert level_start_index and create a level_start_index_list Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Add a comment to explain the level_start_index_list Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Address comment Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * add torch.export.export test Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * rename arg Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * remove spatial_shapes Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * Use the version check from pytorch_utils Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * [run_slow] mask2former Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> --------- Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>	2024-11-19 16:44:53 +01:00
Arthur	4bff54f921	Gemma capping (#34282 ) * softcapping * soft cap before the mask * style * ... * super nit * update * fixes * update * small issue with modular * fix modular imports * update * fixup * simplify a hell lot * simplify cleaning imports * finish fixing * update our design * nits * use a deprecation cycle * updates * Fix modular (recursive deps need to always be computed after merges!) * push * fix * update * fix modular order * make fix-copies * updates * update * ? * don't compile for now * ? * fix some stuff * donc! * fix copies * update * fixup * ? * fix two tests * fix? * for now, don't use head info * eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :)) * fix-copies * revert sdpa check * Apply suggestions from code review Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * rebase, fix-copies and push * add a slow integration test * update the test * fix left padding issue * fix test * remove duplicate scaling * quality * add a small test and make sure it works * 2b --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2024-11-19 13:52:38 +01:00
Arthur	54739a320e	Self-speculation (Layer-Skip Llama) (#34240 ) * 😅 * early exit (#34244) * mvp * docs and tests * a few fixes * no shared cache * Apply suggestions from code review Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> * docs * make fix-copies * cohere fix * [test all] * [test all] consistent model code copies * [test all] make fix-copies :D * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> * Update src/transformers/generation/candidate_generator.py * Update src/transformers/generation/configuration_utils.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * [test all] don't use a stand-alone attribute; fix test --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-11-19 12:20:07 +00:00
Jiahao Li	0db91c3c8d	Support gradient checkpointing in Qwen2VL ViT (#34724 ) * Support gradient checkpointing in Qwen2VL ViT * Enable gradient checkpoint tests for Qwen2VL * [run-slow] qwen2_vl	2024-11-19 12:30:44 +01:00
Ke Wen	20142ab542	Simplify Tensor Parallel implementation with PyTorch TP (#34184 ) * Simplify Tensor Parallel implementation with PyTorch TP * Move tp_plan to config * Lint * Format and warning * Disable copy-from check * Conditionally get attr from config * make fix-copies * Move base_model_tp_plan to PretrainedConfig * Move TP into from_pretrained * Add device context for load * Do not serialize * Move _tp_plan setting to post_init * Add has_tp_plan * Add test_tp * Add 'Multi-gpu inference' doc * Add backward support for device type identification * Auto-detect accelerator * supports_tp_plan * copyright year * Fix copy	2024-11-18 19:51:49 +01:00
Dmitry Rogozhkin	1c471fc307	Fix skip of test_training_gradient_checkpointing (#34723 ) `19d58d31f` has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: #34722 Fixes: `19d58d31f` ("Add MLLama (#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-11-18 15:45:40 +01:00
Raushan Turganbay	1646ffb4d1	VLMs: `patch_size` -> `num_image_tokens` in processing (#33424 ) * use num additional tokens * fix copies + docs * another fix copies :) * add docs * move order for BC	2024-11-18 13:21:07 +01:00
Shane A	3ee24e2208	Add OLMo November 2024 (#34551 ) * Add model skeletion with transformers-cli add-new-model-like * Convert config to modular, add rms_norm_eps, delete clip_qkv * Convert model to modular, add RMSNorm * Add flash attention with qk norm and no qkv clipping * Add decoder layer with RMSNorm after attention/feedforward layers * Add base and causal model * Add converter improvements from OLMo repo * Update weight loading in OLMo to HF converter * Set correct default for rms_norm_eps * Set correct pipeline_model_mapping in test * Run make fixup * Fix model type * Re-run modular conversion * Manually set config docs to fix build errors * Convert olmo-1124 to olmo_1124 to fix flash attention docs errors * Start updating tests * Update tests * Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124 * Rename input_layernorm and post_attention_layernorm to reflect their ops better * Use correct tokenizer * Remove test unsupported by GPT2 tokenizer * Create GenerationConfig outside of from_pretrained call * Use simpler init file structure * Add explicit __all__ to support simplified init * Make safetensor serialization the default * Update OLMo November 2024 docs	2024-11-18 10:43:10 +01:00
Joao Gante	13493215ab	🧼 remove v4.44 deprecations (#34245 ) * remove v4.44 deprecations * PR comments * deprecations scheduled for v4.50 * hub version update * make fiuxp --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-15 23:07:24 +01:00
AbdelKarim ELJANDOUBI	8d50fda644	Remove FSDP wrapping from sub-models. (#34452 ) * Remove FSDP wrapping from sub-models. * solve conflict trainer.py * make fixup * add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size * put back extract_model_from_parallel * use transformers unwrap_model	2024-11-15 23:00:03 +01:00
Wing Lian	b0c0ba7b4d	FSDP grad accum fix (#34645 ) * add gradient accumulation steps tests for fsdp * invert no_sync context to fix training for fsdp	2024-11-15 22:28:06 +01:00
lewtun	8ba3e1505e	Retain newlines in chat template when `continue_final_message=True` (#34253 ) * Retain newlines in chat template when * Add try/except * Add regression test * Simplify test * Apply suggestions from code review Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-11-15 14:27:04 +00:00
Aymeric Roucher	33eef99250	Agents: Small fixes in streaming to gradio + add tests (#34549 ) * Better support transformers.agents in gradio: small fixes and additional tests	2024-11-11 20:52:09 +01:00
Isotr0py	e83aaaa86b	Fix `use_parallel_residual` and `qkv_bias` for StableLM GGUF config extraction (#34450 ) * fix stablelm qkv_bias * fix stablelm qkv_bias and use_parallel_residual * remove original_model.config for stablelm gguf test	2024-11-05 18:26:20 +01:00
Yih-Dar	f2d5dfbab2	Remove `@slow` for `test_eager_matches_sdpa_inference` (#34558 ) * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-11-05 16:10:42 +01:00
Yoni Gottesman	082e57e0d4	Fix #34494 assistant tokens when truncated (#34531 ) * Fix assistant tokens when truncated * fix test * fix test * step	2024-11-05 15:10:15 +00:00
Guang Yang	663c851239	DistilBERT is ExecuTorch compatible (#34475 ) * DistillBERT is ExecuTorch compatible * [run_slow] distilbert * [run_slow] distilbert --------- Co-authored-by: Guang Yang <guangyang@fb.com>	2024-11-05 13:41:48 +01:00
Raushan Turganbay	893ad04fad	Load sub-configs from composite configs (#34410 ) * save/load sub-configs * nit forgot these * fix copies * move test to common * use dict for sub-configs * add load-save-laod test * clean up modeling check * oops this are correct keys * fix some tests, missed some composite configs * this model was missed	2024-11-05 11:34:01 +01:00
Benjamin Bossan	5e1fd4e204	FIX: Broken repr of TorchAoConfig (#34560 ) FIX Broken repr of TorchAoConfig The __repr__ method references a non-existent self.kwargs. This is now fixed. There does not appear to be a uniform way of defining __repr__ for quantization configs. I copied the method as implemented for HQQ: `e2ac16b28a/src/transformers/utils/quantization_config.py (L285-L287)`	2024-11-05 10:26:13 +01:00
Joao Gante	34927b0f73	MPS: `isin_mps_friendly` can support 0D tensors (#34538 ) * apply fix * tested * make fixup	2024-11-04 16:18:50 +00:00
Raushan Turganbay	187439c3fa	VLM: special multimodal Tokenizer (#34461 ) * kinda works * update * add tests * update * use special tokens in processors * typo * fix copies * fix * fix moshi after rebase * update * fix tests * update * Update docs/source/en/main_classes/tokenizer.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs * test for load time adding tokens * fix some more tests which are now fetched better * one more fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-04 16:37:51 +01:00
Zach Mueller	ef976a7e18	Update trainer for easier handling of accumulate, compile fixes, and proper reporting (#34511 ) * Update trainer for easier handling of accumulate + proper reporting * test * Fixup tests * Full fix * Fix style * rm comment * Fix tests * Minimize test + remove py 311 check * Unused import * Forward contrib credits from discussions * Fix reported metrics * Refactor, good as it's going to get * rm pad tok id check * object detection and audio are being annoying * Fin * Fin x2 --------- Co-authored-by: Gyanateet Dutta <Ryukijano@users.noreply.github.com>	2024-11-04 07:47:34 -05:00
Raushan Turganbay	4cc0813e28	BLIP: enable generation tests (#34174 ) * blip2 tests * instructblips * copies * fix slow tests * fix * uncomment this * clean up after rebase * should be model main input * fix overwritten tests * oops len should be multiple of frame number * style * fix some tests	2024-11-01 08:54:48 +01:00
Raushan Turganbay	6beb3f1691	Blip: get/set input embeddings correctly (#34152 ) * set-get embeds * add tests * fix tests * remove * return dict True * fix tests * why did i remove this * enabel torchscript tests	2024-11-01 08:39:39 +01:00
NielsRogge	df8640cedb	[CLIPSeg] Make interpolate_pos_encoding default to True (#34419 ) * Remove interpolate_pos_encoding * Make fixup * Make interpolate_pos_encoding default to True * Reuse existing interpolation * Add integration test	2024-10-31 22:15:04 +01:00
Yoni Gozlan	203e27059b	Add image text to text pipeline (#34170 ) * Standardize image-text-to-text-models-output add post_process_image_text_to_text to chameleon and cleanup Fix legacy kwarg behavior and deprecation warning add post_process_image_text_to_text to qwen2_vl and llava_onevision Add post_process_image_text_to_text to idefics3, mllama, pixtral processor * nit var name post_process_image_text_to_text udop * nit fix deprecation warnings * Add image-text-to-text pipeline * add support for image url in chat template for pipeline * Reformat to be fully compatible with chat templates * Add tests chat template * Fix imports and tests * Add pipeline tag * change logic handling of single prompt ans multiple images * add pipeline mapping to models * fix batched inference * fix tests * Add manual batching for preprocessing * Fix outputs with nested images * Add support for all common processing kwargs * Add default padding when multiple text inputs (batch size>1) * nit change version deprecation warning * Add support for text only inference * add chat_template warnings * Add pipeline tests and add copied from post process function * Fix batched pipeline tests * nit * Fix pipeline tests blip2 * remove unnecessary max_new_tokens * revert processing kosmos2 and remove unnecessary max_new_tokens * fix pipeline tests idefics * Force try loading processor if pipeline supports it * revert load_processor change * hardcode loading only processor * remove unnecessary try except * skip imagetexttotext tests for kosmos2 as tiny model causes problems * Make code clearer * Address review comments * remove preprocessing logic from pipeline * fix fuyu * add BC resize fuyu * Move post_process_image_text_to_text to ProcessorMixin * add guard in post_process * fix zero shot object detection pipeline * add support for generator input in pipeline * nit * change default image-text-to-text model to llava onevision * fix owlv2 size dict * Change legacy deprecation warning to only show when True	2024-10-31 15:48:11 -04:00
Yih-Dar	114dd812dd	make `test_eager_matches_sdpa_inference` less flaky (#34512 ) * try * try * try * try * try * try * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-31 18:34:00 +01:00
Phillip Kuznetsov	b5919e12f7	fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests (#34518 ) * fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai> * [run_slow] dpt, depth_anything --------- Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>	2024-10-31 16:47:58 +01:00
Joao Gante	4ca004eac6	Qwen2VL: skip base `input_ids`-`inputs_embeds` equivalence check (#34535 ) it has complex inputs_embeds computation	2024-10-31 15:42:13 +00:00
Yih-Dar	ab98f0b0a1	avoid calling `gc.collect` and `cuda.empty_cache` (#34514 ) * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-31 16:36:13 +01:00
jiqing-feng	f38531619d	enable QA bf16 pipeline (#34483 ) * enable QA bf16 pipeline * add tests	2024-10-31 12:55:53 +00:00
Yoni Gozlan	48872fd6ae	Add Image Processor Fast RT-DETR (#34354 ) * add fast image processor rtdetr * add gpu/cpu test and fix docstring * remove prints * add to doc * nit docstring * avoid iterating over images/annotations several times * change torch typing * Add image processor fast documentation	2024-10-30 13:49:47 -04:00
Vladislav Bronzov	5251fe6271	Add GGUF for Mamba (#34200 ) * add mamba architecture for gguf * add logic for weights conversion, some fixes and refactoring * add lm_head layers, unit test refactoring * more fixes for tests * remove lm_head creation * remove unused comments	2024-10-30 16:52:17 +01:00
Pablo Montalvo	241d79026f	fix pixtral processor (#34486 ) * fix pixtral processor * test out full length batches + remove undue ValueError * fix up processing * fix tests * fix * last fixup * style * [run-slow] pixtral * [run-slow] pixtral * fix config key * skip torchscript tests * [run-slow] pixtral * add missing key * [run-slow] pixtral * fix docs * [run-slow] pixtral * fix wrong url for integration test * [run-slow] pixtral * pixtralVisionModel does not have a lm head * [run-slow] pixtral	2024-10-30 14:17:20 +01:00
Joao Gante	8a734ea2c3	Tests: move `generate` tests to the right mixin and delete redundant tests (#34464 ) * tmp commit * tmp commit * cull overwrites of deleted tests * typo * more specific docstring * make fixup * parameterize at the top? * correction * more deletions :D * tmp commit * for VLMs too * fix _check_outputs * test nit * make fixup * fix another flaky * test_generate_from_inputs_embeds -- handle missing attention mask	2024-10-30 10:59:08 +00:00
Raushan Turganbay	913330ca9f	VLMs: fix number of image tokens (#34332 ) * fix * fix tests * add tests * style * style * fix qwen after rebase * fix video llava	2024-10-30 10:21:37 +01:00
Guang Yang	cd277618d4	Roberta is ExecuTorch compatible (#34425 ) * Roberta is ExecuTorch compatible * [run_slow] roberta --------- Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-30 08:36:45 +00:00
Matt	9bee9ff5db	Un-deprecate timeout arg in pipelines (#34382 ) * Un-deprecate timeout * Put "timeout" on the allowed list * make fixup	2024-10-29 18:45:14 +00:00
Guang Yang	f339042b0b	Albert is ExecuTorch compatible (#34476 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-29 16:22:13 +01:00
Guang Yang	34620e8f0a	MobileBERT is ExecuTorch compatible (#34473 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-29 16:14:31 +01:00
Guang Yang	5392f12e16	Bert is ExecuTorch compatible (#34424 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-29 14:30:02 +01:00
Marc Sun	004530aa05	Fix regression loading dtype (#34409 ) * fix regression * add test for torchao * expected output * better fix	2024-10-29 11:41:04 +01:00
Yih-Dar	439334c8fb	Simplify running tests in a subprocess (#34213 ) * check * check * check * check * add docstring --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-29 10:48:57 +01:00
StevenBucaille	a1835195d1	🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing (#33200 ) * feat: Added int conversion and unwrapping * test: added tests for post_process_keypoint_detection of SuperPointImageProcessor * docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib * test: changed test to not depend on SuperPointModel forward * test: added missing require_torch decorator * docs: changed pyplot parameters for the keypoints to be more visible in the example * tests: changed import torch location to make test_flax and test_tf * Revert "tests: changed import torch location to make test_flax and test_tf" This reverts commit `39b32a2f69`. * tests: fixed import * chore: applied suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * tests: fixed import * tests: fixed import (bis) * tests: fixed import (ter) * feat: added choice of type for target_size and changed tests accordingly * docs: updated code snippet to reflect the addition of target size type choice in post process method * tests: fixed imports (...) * tests: fixed imports (...) * style: formatting file * docs: fixed typo from image[0] to image.size[0] * docs: added output image and fixed some tests * Update docs/source/en/model_doc/superpoint.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative * docs: changed SuperPoint's docs to print output instead of just accessing * style: applied make style * docs: added missing output type and precision in docstring of post_process_keypoint_detection * perf: deleted loop to perform keypoint conversion in one statement * fix: moved keypoint conversion at the end of model forward * docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method * fix: changed type hint * refactor: removed unnecessary brackets * revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder * Update docs/source/en/model_doc/superpoint.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-10-29 09:36:03 +00:00
kang sheng	655bec2da7	use a tinymodel to test generation config which aviod timeout (#34482 ) * use a tinymodel to test generation config which aviod timeout * remove tailing whitespace	2024-10-29 09:39:06 +01:00
Raushan Turganbay	63ca6d9771	Fix CI (#34458 ) * fix * fix mistral	2024-10-29 08:26:04 +01:00
Raushan Turganbay	808d6c50f8	Generation: fix test (#34369 ) * fix test * fix copies	2024-10-29 07:57:10 +01:00
Alexandros Benetatos	a769ed45e1	Add `post_process_depth_estimation` for GLPN (#34413 ) * add depth postprocessing for GLPN * remove previous temp fix for glpn tests * Style changes for GLPN's `post_process_depth_estimation` Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * additional style fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-28 19:44:20 +01:00
Sean (Seok-Won) Yi	c1753436db	New option called `"best"` for `args.save_strategy`. (#31817 ) * Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-10-28 16:02:22 +01:00
AbdelKarim ELJANDOUBI	8b3b9b48fc	exclude fsdp from delay_optimizer_creation (#34140 ) * exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup	2024-10-28 13:50:16 +01:00
Ilyas Moutawwakil	fddbd3c13c	Fix pix2struct (#34374 ) * fix * fix and test use_cache test * style * remove atol	2024-10-28 11:24:56 +01:00
Yih-Dar	f73f5e62e2	Avoid check expected exception when it is on CUDA (#34408 ) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-25 17:14:07 +02:00
Matthew Douglas	e447185b1f	Fix bnb training test failure (#34414 ) * Fix bnb training test: compatibility with OPTSdpaAttention	2024-10-25 10:23:20 -04:00
Joao Gante	186b8dc190	Tests: upgrade `test_eager_matches_sdpa_generate` (#34386 )	2024-10-25 11:55:07 +01:00
Yoni Gozlan	940a6bd343	Use non nested images and batched text Idefics2/3 (#34222 ) * add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests	2024-10-24 20:00:13 -04:00
Cyril Vallez	4c6e0c9252	Correct the new defaults (#34377 ) * Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue	2024-10-24 18:42:03 +02:00
Michael Benayoun	1c5918d910	Fix `torch.fx` issue related to the new `loss_kwargs` keyword argument (#34380 ) * Fix FX * Unskip tests	2024-10-24 18:34:28 +02:00
Benjamin Bossan	d9989e0b9a	[PEFT] Add warning for missing key in LoRA adapter (#34068 ) When loading a LoRA adapter, so far, there was only a warning when there were unexpected keys in the checkpoint. Now, there is also a warning when there are missing keys. This change is consistent with https://github.com/huggingface/peft/pull/2118 in PEFT and the planned PR https://github.com/huggingface/diffusers/pull/9622 in diffusers. Apart from this change, the error message for unexpected keys was slightly altered for consistency (it should be more readable now). Also, besides adding a test for the missing keys warning, a test for unexpected keys warning was also added, as it was missing so far.	2024-10-24 17:56:40 +02:00
김준재	dd267fca72	Add T5 GGUF loading support (#33389 ) * add: GGUFT5Converter * add: tensormapping for t5 * add: test code for t5 * fix: Remove whitespace from blank line * add: t5 fp16 tests * fix: whitespace formatting * fix: minor formatting * fix: testing every weights	2024-10-24 15:10:59 +02:00
Raushan Turganbay	b29c24ff1e	CI: fix failures (#34371 ) fix	2024-10-24 13:44:53 +02:00
Joao Gante	b0f0c61899	Add SynthID (watermerking by Google DeepMind) (#34350 ) * Add SynthIDTextWatermarkLogitsProcessor * esolving comments. * Resolving comments. * esolving commits, * Improving SynthIDWatermark tests. * switch to PT version * detector as pretrained model + style * update training + style * rebase * Update logits_process.py * Improving SynthIDWatermark tests. * Shift detector training to wikitext negatives and stabilize with lower learning rate. * Clean up. * in for 7B * cleanup * upport python 3.8. * README and final cleanup. * HF Hub upload and initiaze. * Update requirements for synthid_text. * Adding SynthIDTextWatermarkDetector. * Detector testing. * Documentation changes. * Copyrights fix. * Fix detector api. * ironing out errors * ironing out errors * training checks * make fixup and make fix-copies * docstrings and add to docs * copyright * BC * test docstrings * move import * protect type hints * top level imports * watermarking example * direct imports * tpr fpr meaning * process_kwargs * SynthIDTextWatermarkingConfig docstring * assert -> exception * example updates * no immutable dict (cant be serialized) * pack fn * einsum equivalent * import order * fix test on gpu * add detector example --------- Co-authored-by: Sumedh Ghaisas <sumedhg@google.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co>	2024-10-23 21:18:52 +01:00
Yih-Dar	c42b3223db	skip `test_pipeline_depth_estimation` temporarily (#34316 ) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-23 17:27:51 +02:00
Zach Mueller	d9f733625c	Enable Gradient Accumulation fix across all models + trainer fully in forward() (#34283 ) * Enable grad accum fix across all models + trainer fully in forward() * handle peft case * Account for DDP: need to run scale tests * Use accelerator state * Quality * Guard * Experiment w/ only fairseq fix * Fairseq only * Revert multiply_grads fix * Mult by grad accum to fully bring back solution * Style * Good to go now * Skip fx tests for now * Bookmark * Working now	2024-10-23 11:24:57 -04:00
Yoni Gozlan	e7c3fa7f57	Fix continue_final_message for image-text-to-text chat templates (#34236 ) * fix continue_final_message for vlms * Add one test for vlms continue_final_message chat template	2024-10-22 11:57:44 -04:00
Michael Kamerath	eef6b0ba42	Add option for running ffmpeg_microphone_live as a background process (#32838 ) * Add option for running ffmpeg_microphone_live as a background process * Code quality checks for audio_utils * Code clean up for audio_utils * Fixing logic in ffmpeg_microphone calls in audio_utils * Allowing any arbitrary arguments to be passed to ffmpeg_microphone_live * Formatting * Fixing last problems with adding ffmpeg_additional_args * Fixing default arguments and formatting issues * Fixing comments for ffmpeg_additional_args * Adding two shorts tests for ffmpeg_microphone_live * Fixing test bug	2024-10-22 15:56:41 +02:00
Guang Yang	c14ccbcd64	Olmo is ExecuTorch Compatible (#34181 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-22 15:53:01 +02:00
Guang Yang	7a08a772cc	Qwen2.5 is ExecuTorch Compatible (#34102 ) Qwen2 is ExecuTorch Compatible Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-22 15:52:23 +02:00
Alexandros Benetatos	c31a6ff474	Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550 ) * add colorize_depth and matplotlib availability check * add post_process_depth_estimation for zoedepth + tests * add post_process_depth_estimation for DPT + tests * add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth * run `make fixup` * fix import related error on tests * fix more import related errors on test * forgot some `torch` calls in declerations * remove `torch` call in zoedepth tests that caused error * updated docs for depth estimation * small fix for `colorize` input/output types * remove `colorize_depth`, fix various names, remove matplotlib dependency * fix formatting * run fixup * different images for test * update examples in `forward` functions * fixed broken links * fix output types for docs * possible format fix inside `<Tip>` * Readability related updates Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Readability related update * cleanup after merge * refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation` * rewrite dict merging to support python 3.8 --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-10-22 15:50:54 +02:00
Matt	681fc43713	Sync video classification pipeline with huggingface_hub spec (#34288 ) * Sync video classification pipeline * Add disclaimer	2024-10-22 13:33:49 +01:00
Raushan Turganbay	73d65e637b	T5 compile compatibilty (#34089 ) * this worked in normal generation, needs more tests * fix almost all tests in t5 * nit * longt5, umt5, mt5 * style * udop, pix2struct * more models * fix some tests * fix onnx tests * tracing tests fixed * compile enabled and tested for t5 models * fix small bug in slow tests * [run-slow] t5 * uncomment * style * update with new generation refactoring * nit * fix copies * this is the fix, had to change t5 to fix copies * update * [run-slow] t5 * [run-slow] t5 * update * add test for encoder only T5 * clean up after rebase * fix pop2piano * add comment * style * fix copies after rebase * fix copies missed this one	2024-10-22 08:23:53 +02:00
Raushan Turganbay	21d5025826	Attn implementation for composite models (#32238 ) * first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-10-22 06:54:44 +02:00
Yoni Gozlan	a4122813d1	Add DetrImageProcessorFast (#34063 ) * add fully functionning image_processing_detr_fast * Create tensors on the correct device * fix copies * fix doc * add tests equivalence cpu gpu * fix doc en * add relative imports and copied from * Fix copies and nit	2024-10-21 09:05:05 -04:00
Raushan Turganbay	ca541bd4f4	Generation tests: don't rely on main input name (#34228 ) * don't rely on main input name * update	2024-10-21 10:00:14 +02:00
Cyril Vallez	6604764007	add Glm (#33823 ) * Create modular_glm.py * Update modular_glm.py * Finalize architecture without all attentions * Add all attentions modules * Finalize modular * Update given last version * Last update * Finalize model * Finalize converter * Update convert_glm_weights_to_hf.py * style * style * Create __init__.py * Aff all inits * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Correct the rotary embeddings * Remove apply_residual_connection_post_layernorm (always false) * remove use_rms_norm (always true) * remove past_layer_norm (always true) * Update __init__.py * Update config and license * start adding tests and doc * Add doc + style * Update test_modeling_glm.py * Add dummies * Apply correct modeling * Refactor attention to follow llama * Update __init__.py * Update convert_glm_weights_to_hf.py * Correct bias * remove linear_bias and pdrop (never used) * apply modular * Simplify converter * remove dummies + style * add model_input_names * Add pretraining_tp to config for when eager attention is used * Update modular to remove all pretraining_tp * Update test_modeling_glm.py * Update the __all__ * Update __all__ * Update __init__.py * Update test_modeling_glm.py * add revisions * Add the correct repos and revisions * style * Update __init__.py * update exports * remove import of modular files * style * Apply Llama changes + refine converter * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * Update convert_glm_weights_to_hf.py * style * Use new modular converter * add pretrainedmodel to init * style * Update test_modeling_glm.py * Move config outside modular to please CI about docstrings * Add dummies to please CI * Update glm.md * Update glm.md	2024-10-18 17:41:12 +02:00
Arthur	b54109c746	Fix-red-ci (#34230 ) * fix copies, skip fx for llama * styke * re-fix copies * last? * style	2024-10-17 23:38:35 +02:00
Zach Mueller	6ba31a8a94	Enable users to use their own loss functions + deal with prefetching for grad accum (#34198 ) * bookmark * Bookmark * Bookmark * Actually implement * Pass in kwarg explicitly * Adjust for if we do or don't have labels * Bookmark fix for od * bookmark * Fin * closer * Negate accelerate grad accum div * Fixup not training long enough * Add in compute_loss to take full model output * Document * compute_loss -> compute_loss_fn * Add a test * Refactor * Refactor * Uncomment tests * Update tests/trainer/test_trainer.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-10-17 17:01:56 -04:00
Guang Yang	9470c00042	Llama3 and Llama2 are ExecuTorch compatible (#34101 ) Llama3_1b and Llama2_7b are ExecuTorch compatible Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-17 17:33:19 +02:00
Name	7f5088503f	removes decord (#33987 ) * removes decord dependency optimize np Revert "optimize" This reverts commit faa136b51ec4ec5858e5b0ae40eb7ef89a88b475. helpers as documentation pydoc missing keys * make fixup * require_av --------- Co-authored-by: ad <hi@arnaudiaz.com>	2024-10-17 17:27:34 +02:00
Marc Sun	3f06f95ebe	Revert "Fix FSDP resume Initialization issue" (#34193 ) Revert "Fix FSDP resume Initialization issue (#34032)" This reverts commit `4de1bdbf63`.	2024-10-16 15:25:18 -04:00
alpertunga-bile	98bad9c6d6	[fix] fix token healing tests and usage errors (#33931 ) * auto-gptq requirement is removed & model is changed & tokenizer pad token is assigned * values func is changed with extensions & sequence key value bug is fixed * map key value check is added in ExtensionsTree * empty trimmed_ids bug is fixed * tail_id IndexError is fixed * empty trimmed_ids bug fix is updated for failed test * too much specific case for specific tokenizer is removed * input_ids check is updated * require auto-gptq import is removed * key error check is changed with empty list check * empty input_ids check is added * empty trimmed_ids fix is checked with numel function * usage change comments are added * test changes are commented * comment style and quality bugs are fixed * test comment style and quality bug is fixed	2024-10-16 14:22:55 +02:00
Yoach Lacombe	9ba021ea75	Moshi integration (#33624 ) * clean mimi commit * some nits suggestions from Arthur * make fixup * first moshi WIP * converting weights working + configuration + generation configuration * finalize converting script - still missing tokenizer and FE and processor * fix saving model w/o default config * working generation * use GenerationMixin instead of inheriting * add delay pattern mask * fix right order: moshi codes then user codes * unconditional inputs + generation config * get rid of MoshiGenerationConfig * blank user inputs * update convert script:fix conversion, add tokenizer, feature extractor and bf16 * add and correct Auto classes * update modeling code, configuration and tests * make fixup * fix some copies * WIP: add integration tests * add dummy objects * propose better readiblity and code organisation * update tokenization tests * update docstrigns, eval and modeling * add .md * make fixup * add MoshiForConditionalGeneration to ignore Auto * revert mimi changes * re * further fix * Update moshi.md * correct md formating * move prepare causal mask to class * fix copies * fix depth decoder causal * fix and correct some tests * make style and update .md * correct config checkpoitn * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/moshi/test_tokenization_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make style * Update src/transformers/models/moshi/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * change firm in copyrights * udpate config with nested dict * replace einsum * make style * change split to True * add back splt=False * remove tests in convert * Update tests/models/moshi/test_modeling_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add default config repo + add model to FA2 docstrings * remove logits float * fix some tokenization tests and ignore some others * make style tokenization tests * update modeling with sliding window + update modeling tests * [run-slow] moshi * remove prepare for generation frol CausalLM * isort * remove copied from * ignore offload tests * update causal mask and prepare 4D mask aligned with recent changes * further test refine + add back prepare_inputs_for_generation for depth decoder * correct conditional use of prepare mask * update slow integration tests * fix multi-device forward * remove previous solution to device_map * save_load is flaky * fix generate multi-devices * fix device * move tensor to int --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co>	2024-10-16 11:21:49 +02:00
Raushan Turganbay	d087165db0	IDEFICS: support inputs embeds (#34043 ) * support embeds * use cache from config * style... * fix tests after rebase	2024-10-16 09:25:26 +02:00
laurentd-lunit	0f49deacbf	[feat] LlavaNext add feature size check to avoid CUDA Runtime Error (#33608 ) * [feat] add feature size check to avoid CUDA Runtime Error * [minor] add error handling to all llava models * [minor] avoid nested if else * [minor] add error message to Qwen2-vl and chameleon * [fix] token dimension for check * [minor] add feature dim check for videos too * [fix] dimension check * [fix] test reference values --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>	2024-10-15 16:19:18 +02:00
Subhalingam D	5ee9e786d1	Fix default behaviour in TextClassificationPipeline for regression problem type (#34066 ) * update code * update docstrings * update tests	2024-10-15 13:06:20 +01:00
Shikhar Mishra	4de1bdbf63	Fix FSDP resume Initialization issue (#34032 ) * Fix FSDP Initialization for resume training * Added init_fsdp function to work with dummy values * Fix FSDP initialization for resuming training * Added CUDA decorator for tests * Added torch_gpu decorator to FSDP tests * Fixup for failing code quality tests	2024-10-15 13:48:10 +02:00
Prakarsh Kaushik	293e6271c6	Add sdpa for Vivit (#33757 ) * chore:add sdpa to vivit * fix:failing slow test_inference_interpolate_pos_encoding(failing on main branch too) * chore:fix nits * ci:fix repo consistency failure * chore:add info and benchmark to model doc * [run_slow] vivit * chore:revert interpolation test fix for new issue * [run_slow] vivit * [run_slow] vivit * [run_slow] vivit * chore:add fallback for output_attentions being True * [run_slow] vivit * style:make fixup * [run_slow] vivit	2024-10-15 11:27:54 +02:00
Raushan Turganbay	23874f5948	Idefics: enable generation tests (#34062 ) * add idefics * conflicts after merging main * enable tests but need to fix some * fix tests * no print * fix/skip some slow tests * continue not skip * rebasing broken smth, this is the fix	2024-10-15 11:17:14 +02:00
Vladislav Bronzov	cb5ca3265f	Add GGUF for starcoder2 (#34094 ) * add starcoder2 arch support for gguf * fix q6 test	2024-10-14 10:22:49 +02:00
Anton Vlasjuk	7434c0ed21	Mistral-related models for QnA (#34045 ) * mistral qna start * mixtral qna * oops * qwen2 qna * qwen2moe qna * add missing input embed methods * add copied to all methods, can't directly from llama due to the prefix * make top level copied from	2024-10-14 08:53:32 +02:00
Yih-Dar	80bee7b114	Avoid many test failures for `LlavaNextVideoForConditionalGeneration` (#34070 ) * skip * [run-slow] llava_next_video * skip * [run-slow] video_llava, llava_next_video * skip * [run-slow] llava_next_video --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-11 17:41:50 +02:00
Joao Gante	37ac078535	Generate: move `prepare_inputs_for_generation` in encoder-decoder llms (#34048 )	2024-10-11 16:11:18 +01:00
Yih-Dar	7b06473b8f	avoid many failures for ImageGPT (#34071 ) * skip * [run-slow] imagegpt * skip * [run-slow] imagegpt * [run-slow] imagegpt,video_llava * skip * [run-slow] imagegpt,video_llava --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-10-11 15:24:01 +02:00
Lucain	1c66be8062	Fix PushToHubMixin when pusing to a PR revision (#34090 )	2024-10-11 15:06:15 +02:00
Lysandre Debut	409dd2d19c	Fix failing conversion (#34010 ) * Fix * Tests * Typo * Typo	2024-10-11 14:59:23 +02:00
Yoach Lacombe	9dca0c9116	Fix DAC slow tests (#34088 ) * Fix DAC slow tests and fix decode * [run-slow] dac	2024-10-11 14:43:03 +02:00
Joao Gante	e878eaa9fc	Tests: upcast `logits` to `float()` (#34042 ) upcast	2024-10-11 11:51:49 +01:00
Guang Yang	7d97cca8dd	Generate using exported model and enable gemma2-2b in ExecuTorch (#33707 ) * Generate using exported model and enable gemma2-2b in ExecuTorch * [run_slow] gemma, gemma2 * truncate expected output message * Bump required torch version to support gemma2 export * [run_slow] gemma, gemma2 --------- Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-11 10:16:31 +02:00
Matthew Hoffman	70b07d97cf	Default `synced_gpus` to `True` when using `FullyShardedDataParallel` (#33483 ) * Default synced_gpus to True when using FullyShardedDataParallel Fixes #30228 Related: * https://github.com/pytorch/pytorch/issues/100069 * https://github.com/pytorch/pytorch/issues/123962 Similar to DeepSpeed ZeRO Stage 3, when using FSDP with multiple GPUs and differently sized data per rank, the ranks reach different synchronization points at the same time, leading to deadlock To avoid this, we can automatically set synced_gpus to True if we detect that a PreTrainedModel is being managed by FSDP using _is_fsdp_managed_module, which was added in 2.0.0 for torch.compile: https://github.com/pytorch/pytorch/blob/v2.0.0/torch/distributed/fsdp/_dynamo_utils.py * Remove test file * ruff formatting * ruff format * Update copyright year Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add test for FSDP-wrapped model generation Before #33483, these tests would have hung for 10 minutes before crashing due to a timeout error * Ruff format * Move argparse import * Remove barrier I think this might cause more problems if one of the workers was killed * Move import into function to decrease load time https://github.com/huggingface/transformers/pull/33483#discussion_r1787972735 * Add test for accelerate and Trainer https://github.com/huggingface/transformers/pull/33483#discussion_r1790309675 * Refactor imports * Ruff format * Use nullcontext --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-10 14:09:04 -04:00
Pavel Iakubovskii	8363fd8346	Update Blip2 `is_pipeline_test_to_skip` method signature (#34067 ) Update method signature	2024-10-10 16:32:08 +01:00
Yoach Lacombe	e7dfb917f8	[TESTS] ASR pipeline (#33925 ) * fix whisper translation * correct slow_unfinished_sequence test * make fixup	2024-10-10 17:31:22 +02:00
Mohamed Abu El-Nasr	4a3f1a686f	check if eigenvalues of covariance matrix are complex. (#34037 ) check if eigenvalues of covariance complex for psd checking	2024-10-10 14:44:05 +02:00
Daniel Korat	fb0c6b521d	Universal Assisted Generation: Assisted generation with any assistant model (by Intel Labs) (#33383 ) * Update candidate_generator.py * Update utils.py * add lookbehind params to _get_candidate_generator * make fixup * add unit tests * fix failing tests * add docstrings * fix docstrings; remove non-optimized AnyTokenizer * added any tokenizer generation correctness test * make fixup * fix assertion syntax * PR review fixes * address additional PR comments * fix tests * remove stropping criteria arg * make fixup * add AssistantConfig * fix prev_tokens branching * pass tokenizers through `generate()`kwargs * fix lookbehind values; tokenizer params WIP * fixup * AssistantConfig * remove AssistantConfig; apply PR suggestions * restructure tests * fixup * fix assistant_tokenizer arg validation * fixup * fix tests in TestAssistedCandidateGeneratorDifferentTokenizers * fix class docstring * PR suggestions * doc * doc update and improvements to `_validate_assistant()` --------- Co-authored-by: mosheber <moshe.berchansky@intel.com>	2024-10-10 14:41:53 +02:00
Matt	f8a260e2a4	Sync QuestionAnsweringPipeline (#34039 ) * Sync QuestionAnsweringPipeline * typo fixes * Update deprecation warnings	2024-10-10 13:38:14 +01:00
Vladislav Bronzov	c9afee5392	Add gguf support for gpt2 (#34044 ) * add gpt2 gguf support * add doc change * small refactoring	2024-10-10 13:42:18 +02:00
Pavel Iakubovskii	66e08dba71	Fix pipelines tests (#34049 ) * Fix wrong skip annotation * Remove error raise	2024-10-10 12:04:06 +01:00
Dani Martí	a84c413773	HfArgumentParser: allow for hyhenated field names in long-options (#33990 ) Allow for hyphenated field names in long-options argparse converts hyphens into underscores before assignment (e.g., an option passed as `--long-option` will be stored under `long_option`), So there is no need to pass options as literal attributes, as in `--long_option` (with an underscore instead of a hyphen). This commit ensures that this behavior is respected by `parse_args_into_dataclasses` as well. Issue: #33933 Co-authored-by: Daniel Marti <mrtidm@amazon.com>	2024-10-10 11:58:26 +02:00
Raushan Turganbay	adea67541a	Phi3: fix attn for sliding window (#33586 ) * fix phi3 attn fir sliding window * fix tests * address most comment * style * update after rebase * add more models * fix tests	2024-10-10 11:50:39 +02:00
Avishai Elmakies	a265600c60	add sdpa to OPT (#33298 ) * add sdpa to OPT * chore: remove redundant whitespace in OPTDecoder class * fixup * bug fix * add sdpa and attention generate test * fixup * Refactor OPTAttention forward method for improved readability and maintainability * undo refactor for _shape and key,val states * add OPT to doc, fixup didn't find it for some reason * change order * change default attn_implemntation in testing to eager * [run-slow] opt * change test_eager_matches_sdpa_generate to the one llama * Update default attention implementation in testing common * [run-slow] opt * remove uneeded print * [run-slow] opt * refactor model testers to have attn_implementation="eager" * [run-slow] opt * convert test_eager_matches_sdpa_generate to opt-350M * bug fix when creating mask for opt * [run-slow] opt * if layer head mask default to eager * if head mask is not none fall to eager * [run-slow] opt * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Clean up Unpack imports (#33631) clean up Unpack imports * Fix DPT /Dinov2 sdpa regression on main (#33660) * fallback to eager if output attentions. * fix copies * handle dependency errors in check_imports (#33622) * handle dependency errors in check_imports * change log level to warning * add back self.max_position_embeddings = config.max_position_embeddings (#33550) * add back self.max_position_embeddings = config.max_position_embeddings * fix-copies * Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (#33613) fix llavaqwen2 model conversion * Uniformize kwargs for Udop processor and update docs (#33628) * Add optional kwargs and uniformize udop * cleanup Unpack * nit Udop * Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203) * Enable BNB multi-backend support (#31098) * enable cpu bnb path * fix style * fix code style * fix 4 bit path * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * add multi backend refactor tests * fix style * tweak 4bit quantizer + fix corresponding tests * tweak 8bit quantizer + try fixing corresponding tests * fix dequant bnb 8bit * account for Intel CPU in variability of expected outputs * enable cpu and xpu device map * further tweaks to account for Intel CPU * fix autocast to work with both cpu + cuda * fix comments * fix comments * switch to testing_utils.torch_device * allow for xpu in multi-gpu tests * fix tests 4bit for CPU NF4 * fix bug with is_torch_xpu_available needing to be called as func * avoid issue where test reports attr err due to other failure * fix formatting * fix typo from resolving of merge conflict * polish based on last PR review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix CI * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix error log * fix error msg * add \n in error log * make quality * rm bnb cuda restriction in doc * cpu model don't need dispatch * fix doc * fix style * check cuda avaliable in testing * fix tests * Update docs/source/en/model_doc/chameleon.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix doc * fix check multibackends * fix import sort * remove check torch in bnb * docs: update bitsandbytes references with multi-backend info * docs: fix small mistakes in bnb paragraph * run formatting * reveret bnb check * move bnb multi-backend check to import_utils * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix bnb check * minor fix for bnb * check lib first * fix code style * Revert "run formatting" This reverts commit `ac108c6d6b`. * fix format * give warning when bnb version is low and no cuda found] * fix device assignment check to be multi-device capable * address akx feedback on get_avlbl_dev fn * revert partially, as we don't want the function that public, as docs would be too much (enforced) --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix error string after refactoring into get_chat_template (#33652) * Fix error string after refactoring into get_chat_template * Take suggestion from CR Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * uniformize git processor (#33668) * uniformize git processor * update doctring * Modular `transformers`: modularity and inheritance for new model additions (#33248) * update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <hi@lysand.re> * Fix CIs post merging modular transformers (#33681) update * Fixed docstring for cohere model regarding unavailability of prune_he… (#33253) * Fixed docstring for cohere model regarding unavailability of prune_head() methods The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality. * Update src/transformers/models/cohere/modeling_cohere.py --------- Co-authored-by: Lysandre Debut <hi@lysand.re> * Generation tests: update imagegpt input name, remove unused functions (#33663) * Improve Error Messaging for Flash Attention 2 on CPU (#33655) Update flash-attn error message on CPU Rebased to latest branch * Gemma2: fix config initialization (`cache_implementation`) (#33684) * Fix ByteLevel alphabet missing when Sequence pretokenizer is used (#33556) * Fix ByteLevel alphabet missing when Sequence pretokenizer is used * Fixed formatting with `ruff`. * Uniformize kwargs for image-text-to-text processors (#32544) * uniformize FUYU processor kwargs * Uniformize instructblip processor kwargs * Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2 * Uniformize llava_next processor * Fix save_load test for processor with chat_template only as extra init args * Fix import Unpack * Fix Fuyu Processor import * Fix FuyuProcessor import * Fix FuyuProcessor * Add defaults for specific kwargs kosmos2 * Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs * Add tests processor Udop * remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature * Fix overwrite tests kwargs processors * Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop * Fix processing test fuyu * remove unnecessary pad_token check in instructblip ProcessorTest * Fix BC tests and cleanup * FIx imports fuyu * Uniformize Pix2Struct * Fix wrong name for FuyuProcessorKwargs * Fix slow tests reversed inputs align fuyu llava-next, change udop warning * Fix wrong logging import udop * Add check images text input order * Fix copies * change text pair handling when positional arg * rebase on main, fix imports in test_processing_common * remove optional args and udop uniformization from this PR * fix failing tests * remove unnecessary test, fix processing utils and test processing common * cleanup Unpack * cleanup * fix conflict grounding dino * 🚨🚨 Setting default behavior of assisted decoding (#33657) * tests: fix pytorch tensor placement errors (#33485) This commit fixes the following errors: * Fix "expected all tensors to be on the same device" error * Fix "can't convert device type tensor to numpy" According to pytorch documentation torch.Tensor.numpy(force=False) performs conversion only if tensor is on CPU (plus few other restrictions) which is not the case. For our case we need force=True since we just need a data and don't care about tensors coherency. Fixes: #33517 See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * bump tokenizers, fix added tokens fast (#32535) * update based on tokenizers release * update * nits * update * revert re addition * don't break that yet * fmt * revert unwanted * update tokenizers version * update dep table * update * update in conversion script as well * some fix * revert * fully revert * fix training * remove set trace * fixup * update * update * [Pixtral] Improve docs, rename model (#33491) * Improve docs, rename model * Fix style * Update repo id * fix code quality after merge * HFQuantizer implementation for compressed-tensors library (#31704) * Add compressed-tensors HFQuantizer implementation * flag serializable as False * run * revive lines deleted by ruff * fixes to load+save from sparseml, edit config to quantization_config, and load back * address satrat comment * compressed_tensors to compressed-tensors and revert back is_serializable * rename quant_method from sparseml to compressed-tensors * tests * edit tests * clean up tests * make style * cleanup * cleanup * add test skip for when compressed tensors is not installed * remove pydantic import + style * delay torch import in test * initial docs * update main init for compressed tensors config * make fix-copies * docstring * remove fill_docstring * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * review comments * review comments * comments - suppress warnings on state dict load, tests, fixes * bug-fix - remove unnecessary call to apply quant lifecycle * run_compressed compatability * revert changes not needed for compression * no longer need unexpected keys fn * unexpected keys not needed either * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * add to_diff_dict * update docs and expand testing * Update _toctree.yml with compressed-tensors * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update doc * add note about saving a loaded model --------- Co-authored-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Sara Adkins <sara.adkins65@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Dipika <dipikasikka1@gmail.com> * update model card for opt * add batch size to inference table * [slow-run] opt * [run-slow] opt --------- Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Tibor Reiss <75096465+tibor-reiss@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Muhammad Naufil <m.naufil1@gmail.com> Co-authored-by: sizhky <yyeshr@gmail.com> Co-authored-by: Umar Butler <umar@umar.au> Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com> Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Sara Adkins <sara.adkins65@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-10-10 11:49:34 +02:00
Mohamed Mekkouri	36d410dab6	FEAT : Adding BitNet quantization method to HFQuantizer (#33410 ) * rebasing changes * fixing style * adding some doc to functions * remove bitblas * change dtype * fixing check_code_quality * fixing import order * adding doc to tree * Small update on BitLinear * adding some tests * sorting imports * small update * reformatting * reformatting * reformatting with ruff * adding assert * changes after review * update disk offloading * adapting after review * Update after review * add is_serializable back * fixing style * adding serialization test * make style * small updates after review	2024-10-09 17:51:41 +02:00
Pavel Iakubovskii	48461c0fe2	Make `pipeline` able to load `processor` (#32514 ) * Refactor get_test_pipeline * Fixup * Fixing tests * Add processor loading in tests * Restructure processors loading * Add processor to the pipeline * Move model loading on tom of the test * Update `get_test_pipeline` * Fixup * Add class-based flags for loading processors * Change `is_pipeline_test_to_skip` signature * Skip t5 failing test for slow tokenizer * Fixup * Fix copies for T5 * Fix typo * Add try/except for tokenizer loading (kosmos-2 case) * Fixup * Llama not fails for long generation * Revert processor pass in text-generation test * Fix docs * Switch back to json file for image processors and feature extractors * Add processor type check * Remove except for tokenizers * Fix docstring * Fix empty lists for tests * Fixup * Fix load check * Ensure we have non-empty test cases * Update src/transformers/pipelines/__init__.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/pipelines/base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Rework comment * Better docs, add note about pipeline components * Change warning to error raise * Fixup * Refine pipeline docs --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-10-09 16:46:11 +01:00
Zach Mueller	4fb28703ad	Fix PIL dep for tests (#34028 ) Fix PIL dep for tess	2024-10-09 10:45:06 -04:00
Raushan Turganbay	5ee52ae0bc	Mllama: fix tests (#34000 ) * fix tests * don't need this * style	2024-10-09 14:02:56 +02:00
Joao Gante	295a90cb40	Generate: remove most decoder-only LLMs `prepare_inputs_for_generation` (#33870 )	2024-10-09 12:15:48 +01:00
Mohamed Abu El-Nasr	cdee5285ca	Fix Failed tests with mobile bert resize tokens embedding (#33950 ) * Fix Failed tests with mobile bert * Cast to the correct dtype * Code fixup * Fix padding_idx larger that embedding_size * Reduce covariance more. use 1e-7 instead of 1e-5 * Comment fix * Reduce covariance more. use 1e-9 instead of 1e-7 * Copy new config * all but MRA fixed * fix mra * very flaky * skip instead * make fixup --------- Co-authored-by: Joao Gante <joao@huggingface.co>	2024-10-09 11:23:50 +01:00
Vladislav Bronzov	faa0f63b93	Add gguf support for StableLM (#33793 ) * add stablelm gguf architecture support * add additional quantization tests * resolve merge conflict, add weight conversion tests for fp16	2024-10-09 12:16:13 +02:00
Matt	3b44d2f042	Image pipelines spec compliance (#33899 ) * Update many similar visual pipelines * Add input tests * Add ImageToText as well * Add output tests * Add output tests * Add output tests * OutputElement -> Output * Correctly test elements * make fixup * fix typo in the task list * Fix VQA testing * Add copyright to image_classification.py * Revert changes to VQA pipeline because outputs have differences - will move to another PR * make fixup * Remove deprecation warnings	2024-10-08 13:34:28 +01:00
Yoni Gozlan	e2001c3413	Add auto model for image-text-to-text (#32472 ) * Add Auto model for image-text-to-text * Remove donut from processing auto, add chameleon ti image text to text models * add qwen2_vl and llava_onevision * add pixtral to auto model for image-text-to-text * add mllama and idefics3 * remove models in IGNORE_NON_AUTO_CONFIGURED * add AutoModelForImageTextToText to tests and doc	2024-10-08 14:26:43 +02:00
Arthur	736c7cde51	[`pytes collection`] Fix flax test collection (#34004 ) bit weird but to filter I had to use this	2024-10-07 18:11:13 +02:00
Arthur	9b4b0c07db	[`Red CIs`] Fix hub failures (#34001 ) maybe setup should work?	2024-10-07 10:56:24 +02:00
TomLim	1bd604d11c	[WIP] Add Tokenizer for MyT5 Model (#31286 ) * Initial commit for MyT5 model * custom implementation of MyT5 tokenizer, unused files deleted * unittest for myt5 tokenizer * upadate of import structure and style * removed remmanents of MyT5Config * fixed docstrings * Updates after review: filled documentaion file, new docstrings and tests added * Fixed code style issues * fixed copied from to refer to function * updated loading myt5 tokenizer in tests, added sample byte map file to fixtures * changes after review * removed redundant copied from * removed redundant copied from * optimalization and loading model from hf * [run_slow] myt5 * [run-slow] myt5 * Updated en documentation for myt5 Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-06 10:33:16 +02:00
Anton Vlasjuk	5ef432e474	[`TF`] Fix Tensorflow XLA Generation on limited seq_len models (#33903 ) * fix tf xla generation on limited seq_len models * [run-slow] opt * [run-slow] opt	2024-10-05 16:20:50 +02:00
Vladislav Bronzov	22e102ad98	Bug fix gguf qwen2moe (#33940 ) * fix qwen2moe tensors mapping, add unit tests * add expert tensor split logic, test refactoring * small params refactoring * add comment to tensor reshaping	2024-10-05 16:19:01 +02:00
Yehoshua Cohen	56be9f1925	add test for Jamba with new model jamba-tiny-dev (#33863 ) * add test for jamba with new model * ruff fix --------- Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>	2024-10-05 16:03:12 +02:00
Raushan Turganbay	612065efeb	Paligemma: fix static cache test (#33941 ) * fix * not flaky anymore + style	2024-10-05 09:47:37 +02:00
Joao Gante	38f9f10dd9	Cache: revert DynamicCache init for BC (#33861 ) * tmp commit * tmp commit * make fixup * missing removal * fix condition * fix end-to-end compilation * if -> elif * BC * BC * use @deprecate_kwarg("num_hidden_layers", version="4.47.0") * wups the import * 🥴 --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2024-10-04 22:47:08 +02:00
Arthur	f92d354823	fix red check-copies (#33964 )	2024-10-04 22:45:37 +02:00
pglorio	f319ba16fa	Add Zamba (#30950 ) * Update index.md * Rebase * Rebase * Updates from make fixup * Update zamba.md * Batched inference * Update * Fix tests * Fix tests * Fix tests * Fix tests * Update docs/source/en/model_doc/zamba.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/zamba.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update configuration_zamba.py * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update modeling_zamba.py * Update modeling_zamba.py * Update modeling_zamba.py * Update configuration_zamba.py * Update modeling_zamba.py * Update modeling_zamba.py * Merge branch 'main' of https://github.com/Zyphra/transformers_zamba * Update ZambaForCausalLM * Update ZambaForCausalLM * Describe diffs with original mamba layer * Moved mamba init into `_init_weights` * Update index.md * Rebase * Rebase * Updates from make fixup * Update zamba.md * Batched inference * Update * Fix tests * Fix tests * Fix tests * Fix tests * Update docs/source/en/model_doc/zamba.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/zamba.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update configuration_zamba.py * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update modeling_zamba.py * Update modeling_zamba.py * Update modeling_zamba.py * Update configuration_zamba.py * Update modeling_zamba.py * Update modeling_zamba.py * Merge branch 'main' of https://github.com/Zyphra/transformers_zamba * Update ZambaForCausalLM * Moved mamba init into `_init_weights` * Update ZambaForCausalLM * Describe diffs with original mamba layer * make fixup fixes * quality test fixes * Fix Zamba model path * circleci fixes * circleci fixes * circleci fixes * circleci fixes * circleci fixes * circleci fixes * circleci fixes * circleci fixes * circleci fixes * Update * circleci fixes * fix zamba test from merge * fix ValueError for disabling mamba kernels * add HF copyright Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * shared_transf --> shared_transformer * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fixes * Move attention head dim to config * Fix circle/ci tests * Update modeling_zamba.py * apply GenerationMixin inheritance change from upstream * apply import ordering * update needed transformers version for zamba Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add contribution author * add @slow to avoid CI * Update src/transformers/models/zamba/modeling_zamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Define attention_hidden_size * Added doc for attention_head_size * trigger CI * Fix doc of attention_hidden_size * [run-slow] zamba * Fixed shared layer logic, swapped up<->gate in mlp * shared_transformer -> shared_transf * reformat HybridLayer __init__ * fix docstrings in zamba config * added definition of _get_input_ids_and_config * fixed formatting of _get_input_ids_and_config --------- Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal> Co-authored-by: Quentin Anthony <qganthony@yahoo.com>	2024-10-04 22:28:05 +02:00
Amit Garg	e3775539c8	PhiMoE (#33363 ) * onboard phimoe model * removed debug code * added unit tests * updated docs * formatted * fixed unit tests * fixed test case * fixed format * refactored code * fixed expected outputs in the integration tests * Added a warning msg * Addressed comments * Addressed comments * fixed test cases * added paper link * Addressed comments * Refactored PhimoeForCausalLM forward fn * Refactored PhimoeRotaryEmbedding class * fixed test cases * fixed testcase * fixed test case * Addressed comments * fixed test cases * fixed testcases * Used cache position instead to get the seq len	2024-10-04 21:39:45 +02:00
Longjie Zheng	0d1692a49b	Fix attn mask ignore logic in training-time trace (#32613 ) * fix attn mask logic for training-time trace * add test * fix * fix * fix * fix * fix * format * [run-slow] llama * avoid accelearate * [run-slow] llama	2024-10-04 19:00:45 +02:00
Mohamed Abu El-Nasr	78ef58325c	🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325 ) * intilize new embeddings from normal distrib * Fix typo in comments * Fix typo in comments * Fix style * Fix variables naming * Add tests * Fix style * code consistency nit * Add deepspeed support * Add deepspeed support * Conver embeddings weights to float32 before computations * Add deepspeed tests * Cover when vocab_size is smaller than embedding_size * Style fix * Add tests for vocab_size smaller than hiddin_size * Style fix * Nits in tests * Nits in tests * Check for deepspeed before importing it * Increase vocab_size for positive definite covariance matrix test * Add warning * Add multivariate_resizing flag and implement resizing for lm_heads * Fix typo * Fix wrong bias indexing * Fix bias is zero check * remove multivariate_resizing flag from tests * Intialize bias from old bias normal distribution * Fixup * Code usability * Use mean_resizing instead of multivariate_resizing * Fix up * Fix comments and docs	2024-10-04 16:29:55 +02:00
jiqing-feng	b916efcb3c	Enables CPU AWQ model with IPEX version. (#33460 ) * enable cpu awq ipex linear * add doc for cpu awq with ipex kernel * add tests for cpu awq * fix code style * fix doc and tests * Update docs/source/en/quantization/awq.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/autoawq/test_awq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix comments * fix log * fix log * fix style --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-10-04 16:25:10 +02:00
Raushan Turganbay	061c2c4c38	Ignore keys on `validate_rope` (#33753 ) * ignore keys on check rope * add tests * fix tests, so maybe better leave at logger lvl	2024-10-04 12:39:37 +02:00
Yoach Lacombe	124713c32b	Fix distil whisper segment computation (#33920 ) * Fix distil whisper segment computation * [run-slow] whisper	2024-10-04 11:18:01 +02:00
Yoni Gozlan	074aa3b3fd	Uniformize kwargs for Idefics/2 processors (#32568 ) * Add uniformize idefics processor kwargs and tests * Uniformize idefics2 processor kwargs * add image_processor tests idefics * add BC args order change idefics2 processor and update doc * Add support for multiple images per prompt in image-text-to-text mode idefics * Fix processor input args in idefics tests * improve test processing common, remove unnecessary tests, update process uniformization * fix doctrings idefics * fix tests processors idefics/2	2024-10-03 18:08:24 +02:00
Joao Gante	b0c5660e88	Config: lower `save_pretrained` exception to warning (#33906 ) * lower to warning * msg * make fixup * rm extra comma	2024-10-03 16:45:14 +01:00
Benjamin Bossan	6500f78c86	[PEFT] Support low_cpu_mem_usage option for PEFT loading adapters (#33725 ) * [PEFT] Support low_cpu_mem_usage for PEFT loading PEFT added support for low_cpu_mem_usage=True when loading adapters in https://github.com/huggingface/peft/pull/1961. This feature is now available when installing PEFT v0.13.0. With this PR, this option is also supported when loading PEFT adapters directly into transformers models. Additionally, with this PR, https://github.com/huggingface/diffusers/pull/9510 will be unblocked, which implements this option in diffusers. * Fix typo	2024-10-03 16:15:36 +02:00
Yoach Lacombe	bf0ffe3d29	[Tests] Diverse Whisper fixes (#33665 ) * fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper	2024-10-03 15:59:01 +02:00
Joao Gante	d29738f5b4	Generate tests: modality-agnostic input preparation (#33685 )	2024-10-03 14:01:24 +01:00
Arie Pratama Sutiono	f2bf4fcf3d	Add `SplinterTokenizer` unit test (#32652 ) * add unit tests for splinter_tokenizer * add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called * remove unused import * remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines * remove all the spaces Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-03 14:49:56 +02:00
Joao Gante	6f0ce52760	VLM Generate: tag `test_static_cache_matches_dynamic` as flaky (#33630 ) flaky	2024-10-03 12:27:02 +01:00
Yoni Gozlan	d7950bff82	uniformize processor Mllama (#33876 ) * uniformize processor Mllama * nit syntax * nit	2024-10-02 16:50:15 +02:00
Yoni Gozlan	62e8c759c3	rename all test_processing_.py to test_processor_.py (#33878 ) * rename all test_processing_.py to test_processor_.py ans fix duplicate test processor paligemma * fix copies * fix broken tests * fix-copies * fix test processor bridgetower	2024-10-02 16:43:43 +02:00
Marc Sun	cac4a4876b	[Quantization] Switch to optimum-quanto (#31732 ) * switch to optimum-quanto rebase squach * fix import check * again * test try-except * style	2024-10-02 15:14:34 +02:00
amyeroberts	b7474f211d	Trainer - deprecate tokenizer for processing_class (#32385 ) * Trainer - deprecate tokenizer for processing_class * Extend chage across Seq2Seq trainer and docs * Add tests * Update to FutureWarning and add deprecation version	2024-10-02 14:08:46 +01:00
g-prz	fe484726aa	Add falcon gguf (#33437 ) * feat(gguf): add falcon q2 k * fix(gguf): remove useless renaming * feat(gguf): seperate falcon 7b and 40b * feat(gguf): apply fixup * fix(test): error rebase * feat(gguf): add fp16 weight comparison for falcon * feat(gguf): test weight of all layers * test(gguf): add falcon 40b under skip decorator * feat(gguf): quick example for extracting model size	2024-10-02 14:10:39 +02:00
Pablo Montalvo	50290cf7a0	Uniformize model processors (#31368 ) * add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * add uniform processors for blip + blip2 * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * add blip, blip2, bridgetower Added tests for bridgetower which override common. Also modified common tests to force center cropping if existing * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * removed copied from * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * add missing import * fix * adapt bridgetower tests to shortest edge * uniformize donut processor + tests * add wav2vec2 * extend common testing to audio processors * add testing + bert version * propagate common kwargs to different modalities * BC order of arguments * check py version * revert kwargs merging * add draft overlap test * update * fix blip2 and wav2vec due to updates * fix copies * ensure overlapping kwargs do not disappear * replace .pop by .get to handle duplicated kwargs * fix copies * fix missing import * add clearly wav2vec2_bert to uniformized models * fix copies * increase number of features * fix style * [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * fix concatenation * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * Update tests/test_processing_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * 🧹 * address comments * clean up + tests * [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-10-02 10:41:08 +02:00
Yoni Gozlan	61ac161a9d	Add support for custom inputs and batched inputs in ProcessorTesterMixin (#33711 ) * add support for custom inputs and batched inputs in ProcessorTesterMixin * Fix batch_size behavior ProcessorTesterMixin * Change format prepare inputs batched * Remove override test pixtral processor * Remove unnecessary tests and cleanup after new prepare_inputs functions * Fix instructBlipVideo image processor	2024-10-01 23:52:03 +02:00
Prakarsh Kaushik	68a2b50069	[Fix] ViViT interpolate_pos_encoding (#33815 ) * fix:test_inference_interpolate_pos_encoding * style:make style;make fixup * test: add suggestion to test_modeling_vivit * chore:add suggestions * style:make style * [run_slow] vivit * ci:slow test fix * [run_slow] vivit	2024-10-01 20:14:35 +01:00
Matt	a43e84cb3b	Make ASR pipeline compliant with Hub spec + add tests (#33769 ) * Remove max_new_tokens arg * Add ASR pipeline to testing * make fixup * Factor the output test out into a util * Full error reporting * Full error reporting * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Small comment --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-10-01 18:15:04 +01:00
Nicola De Angeli	0256520794	fix: repair depth estimation multiprocessing (#33759 ) * fix: repair depth estimation multiprocessing * test: add test for multiprocess depth estimation	2024-10-01 17:59:59 +01:00
Guang Yang	808997a634	Fix passing str dtype to static cache (#33741 ) Co-authored-by: Guang Yang <guangyang@fb.com>	2024-10-01 09:50:17 +02:00
Adibvafa Fallahpour	c269c5c74d	Fix Mamba slow path bug with dtype mismatch. (#32691 ) * Fix Mamba slow path bug with dtype mismatch. * Update test_modeling_mamba.py * Improve style. * Fix issue with cache position of dtype mismatch test. * Change test for slow path. * Revert changes. * Switch to buggy code and add test to catch it. * Fix the dtype mismatch bug and add test code to verify it. * Fix minor bug with test. * Fix incorrect dtype of model output. * Fix incorrect dtype of cache. * Fix incorrect dtype of ssm cache. * Fix incorrect dtype of conv state. * Remove assertion for ssm state. * Add assertion for conv state dtype. * Fix all issues with dtype mismatch test.	2024-10-01 09:28:40 +02:00
Joshua Lochner	18c5b216f1	Fix ViT-MAE decoder interpolate (#33330 ) * Fix ViT-MAE decoder interpolate * Add unit test for `interpolate_pos_encoding` w/ custom sizes * [run_slow] vit_mae	2024-09-30 18:47:13 +02:00
mobicham	f5247aca01	Hqq serialization (#33141 ) * HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-30 14:47:18 +02:00
Matt	d3821c4aed	Make audio classification pipeline spec-compliant and add test (#33730 ) * Make audio classification pipeline spec-compliant and add test * Check that test actually running in CI * Try a different pipeline for the CI * Move the test so it gets triggered * Move it again, this time into task_tests! * make fixup * indentation fix * comment * Move everything from testing_utils to test_pipeline_mixin * Add output testing too * revert small diff with main * make fixup * Clarify comment * Update tests/pipelines/test_pipelines_audio_classification.py Co-authored-by: Lucain <lucainp@gmail.com> * Update tests/test_pipeline_mixin.py Co-authored-by: Lucain <lucainp@gmail.com> * Rename function and js_args -> hub_args * Cleanup the spec recursion * Check keys for all outputs --------- Co-authored-by: Lucain <lucainp@gmail.com>	2024-09-27 17:01:06 +01:00
Vladislav Bronzov	9d200cfbee	Add gguf support for bloom (#33473 ) * add bloom arch support for gguf * apply format * small refactoring, bug fix in GGUF_TENSOR_MAPPING naming * optimize bloom GGUF_TENSOR_MAPPING * implement reverse reshaping for bloom gguf * add qkv weights test * add q_8 test for bloom	2024-09-27 12:13:40 +02:00
Raushan Turganbay	3e039d3827	Paligemma support for multi-image (#33447 ) * upadte * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update docs * better example in tests * support image tokens * read token * Update tests/models/paligemma/test_processing_paligemma.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * nit: naming * Update docs/source/en/model_doc/paligemma.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * conflicts after rebasing --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2024-09-27 11:23:14 +02:00
Ita Zaporozhets	6730485b02	clean_up_tokenization_spaces=False if unset (#31938 ) * clean_up_tokenization_spaces=False if unset * deprecate warning * updating param for old models * update models * make fix-copies * fix-copies and update bert models * warning msg * update prophet and clvp * updating test since space before is arbitrarily removed * remove warning for 4.45	2024-09-26 19:38:20 +02:00
Joao Gante	3557f9a14a	Generate: `can_generate()` recursive check (#33718 ) * add recursive check and test warnings * missing space * models without can_generate	2024-09-26 18:11:14 +01:00
Arthur	46841d3eb2	[`MllamaProcessor`] Update errors and API with multiple image (#33715 ) * update error * update and add a test * update * update	2024-09-26 16:33:25 +02:00
Franz Louis Cesista	0a21381ba3	Uniformize kwargs for chameleon processor (#32181 ) * uniformize kwargs of Chameleon * fix linter nit * rm stride default * add tests for chameleon processor * fix tests * add comment on get_component * rm Chameleon's slow tokenizer * add check order images text + nit * update docs and tests * Fix LlamaTokenizer tests * fix gated repo access * fix wrong import --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2024-09-26 10:18:07 -04:00
Andrés Marafioti	f2c388e3f9	Add Idefics 3! (#32473 ) * Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-25 21:28:49 +02:00
Manuel	a55adee890	adding positional encoder changes and tests (#32600 ) * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>	2024-09-25 19:05:01 +01:00
Arthur	19d58d31f1	Add MLLama (#33703 ) * current changes * nit * Add cross_attenttion_mask to processor * multi-image fixed * Add cross_attenttion_mask to processor * cross attn works in all cases * WIP refactoring function for image processor * WIP refactoring image processor functions * Refactor preprocess to use global loops instead of list nested list comps * Docstrings * Add channels unification * fix dtype issues * Update docsrings and format * Consistent max_image_tiles * current script * updates * Add convert to rgb * Add image processor tests * updates! * update * god damn it I am dumb sometimes * Precompute aspect ratios * now this works, full match * fix 😉 * nits * style * fix model and conversion * nit * nit * kinda works * hack for sdpa non-contiguous bias * nits here and there * latest c hanges * merge? * run forward * Add aspect_ratio_mask * vision attention mask * update script and config variable names * nit * nits * be able to load * style * nits * there * nits * make forward run * small update * enable generation multi-turn * nit * nit * Clean up a bit for errors and typos * A bit more constant fixes * 90B keys and shapes match * Fix for 11B model * Fixup, remove debug part * Docs * Make max_aspect_ratio_id to be minimal * Update image processing code to match new implementation * Adjust conversion for final checkpoint state * Change dim in repeat_interleave (accordig to meta code) * tmp fix for num_tiles * Fix for conversion (gate<->up, q/k_proj rope permute) * nits * codestyle * Vision encoder fixes * pass cross attn mask further * Refactor aspect ratio mask * Disable text-only generation * Fix cross attention layers order, remove q/k norm rotation for cross atention layers * Refactor gated position embeddings * fix bugs but needs test with new weights * rope scaling should be llama3 * Fix rope scaling name * Remove debug for linear layer * fix copies * Make mask prepare private func * Remove linear patch embed * Make precomputed embeddings as nn.Embedding module * MllamaPrecomputedAspectRatioEmbedding with config init * Remove unused self.output_dim * nit, intermediate layers * Rename ln and pos_embed * vision_chunk_size -> image_size * return_intermediate -> intermediate_layers_indices * vision_input_dim -> hidden_size * Fix copied from statements * fix most tests * Fix more copied from * layer_id->layer_idx * Comment * Fix tests for processor * Copied from for _prepare_4d_causal_attention_mask_with_cache_position * Style fix * Add MllamaForCausalLM * WIP fixing tests * Remove duplicated layers * Remove dummy file * Fix style * Fix consistency * Fix some TODOs * fix language_model instantiation, add docstring * Move docstring, remove todos for precomputed embeds (we cannot init them properly) * Add initial docstrings * Fix * fix some tests * lets skip these * nits, remove print, style * Add one more copied from * Improve test message * Make validate func private * Fix dummy objects * Refactor `data_format` a bit + add comment * typos/nits Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix dummy objects and imports * Add chat template config json * remove num_kv_heads from vision attention * fix * move some commits and add more tests * fix test * Remove `update_key_name` from modeling utils * remove num-kv-heads again * some prelimiary docs * Update chat template + tests * nit, conversion script max_num_tiles from params * Fix warning for text-only generation * Update conversion script for instruct models * Update chat template in converstion + test * add tests for CausalLM model * model_max_length, avoid null chat_template * Refactor conversion script * Fix forward * Fix integration tests * Refactor vision config + docs * Fix default * Refactor text config * Doc fixes * Remove unused args, fix docs example * Squashed commit of the following: commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830 Author: qubvel <qubvel@gmail.com> Date: Wed Sep 18 13:39:15 2024 +0000 Move model + add output hidden states and output attentions * Fix num_channels * Add mllama text and mllama vision models * Fixing repo consistency * Style fix * Fixing repo consistency * Fixing unused config params * Fix failed tests after refactoring * hidden_activation -> hidden_act for text mlp * Remove from_pretrained from sub-configs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Reuse lambda in conversion script * Remove run.py * Update docs/source/en/model_doc/mllama.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/processing_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused LlamaTokenizerFast * Fix logging * Refactor gating * Remove cycle for collecting intermediate states * Refactor text-only check, add integration test for text-only * Revert from pretrained to configs * Fix example * Add auto `bos_token` adding in processor * Fix tips * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Enable supports_gradient_checkpointing model flag * add eager/sdpa options * don't skip attn tests and bring back GC skips (did i really remove those?) * Fix signature, but get error with None gradient * Fix output attention tests * Disable GC back * Change no split modules * Fix dropout * Style * Add Mllama to sdpa list * Add post init for vision model * Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model * if skipped, say it, don't pass * Clean vision tester config * Doc for args * Update tests/models/mllama/test_modeling_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add cross_attention_mask to test * typehint * Remove todo * Enable gradient checkpointing * Docstring * Style * Fixing and skipping some tests for new cache * Mark flaky test * Skip `test_sdpa_can_compile_dynamic` test * Fixing some offload tests * Add direct GenerationMixin inheritance * Remove unused code * Add initializer_range to vision config * update the test to make sure we show if split * fix gc? * Fix repo consistency * Undo modeling utils debug changes * Fix link * mllama -> Mllama * [mllama] -> [Mllama] * Enable compile test for CausalLM model (text-only) * Fix TextModel prefix * Update doc * Docs for forward, type hints, and vision model prefix * make sure to reset * fix init * small script refactor and styling * nit * updates! * some nits * Interpolate embeddings for 560 size and update integration tests * nit * does not suppor static cache! * update * fix * nit2 * this? * Fix conversion * Style * 4x memory improvement with image cache AFAIK * Token decorator for tests * Skip failing tests * update processor errors * fix split issues * style * weird * style * fix failing tests * update * nit fixing the whisper tests * fix path * update --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal> Co-authored-by: qubvel <qubvel@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-09-25 19:56:25 +02:00
Yoni Gozlan	94f18cf23c	Add OmDet-Turbo (#31843 ) * Add template with add-new-model-like * Add rough OmDetTurboEncoder and OmDetTurboDecoder * Add working OmDetTurbo convert to hf * Change OmDetTurbo encoder to RT-DETR encoder * Add swin timm backbone as default, add always partition fix for swin timm * Add labels and tasks caching * Fix make fix-copies * Format omdet_turbo * fix Tokenizer tests * Fix style and quality * Reformat omdet_turbo * Fix quality, style, copies * Standardize processor kwargs * Fix style * Add output_hidden_states and ouput_attentions * Add personalize multi-head attention, improve docstrings * Add integrated test and fix copy, style, quality * Fix unprotected import * Cleanup comments and fix unprotected imports * Add fix different prompts in batch (key_padding_mask) * Add key_padding_mask to custom multi-head attention module * Replace attention_mask by key_padding_mask * Remove OmDetTurboModel and refactor * Refactor processing of classes and abstract use of timm backbone * Add testing, fix output attentions and hidden states, add cache for anchors generation * Fix copies, style, quality * Add documentation, conver key_padding_mask to attention_mask * revert changes to backbone_utils * Fic docstrings rst * Fix unused argument in config * Fix image link documentation * Reorder config and cleanup * Add tokenizer_init_kwargs in merge_kwargs of the processor * Change AutoTokenizer to CLIPTokenizer in convert * Fix init_weights * Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs * change processor kwargs and make task input optional * Fix omdet docs * Remove unnecessary tests for processor kwargs * Replace nested BatchEncoding output of the processor by a flattened BatchFeature * Make modifications from Pavel review * Add changes Amy review * Remove unused param * Remove normalize_before param, Modify processor call docstring * Remove redundant decoder class, add gradient checkpointing for decoder * Remove commented out code * Fix inference in fp16 and add fp16 integrated test * update omdet md doc * Add OmdetTurboModel * fix caching and nit * add OmDetTurboModel to tests * nit change repeated key test * Improve inference speed in eager mode * fix copies * Fix nit * remove OmdetTurboModel * [run-slow] omdet_turbo * [run-slow] omdet_turbo * skip dataparallel test * [run-slow] omdet_turbo * update weights to new path * remove unnecessary config in class --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>	2024-09-25 13:26:28 -04:00
Matthew Douglas	196d35ccfc	Add AdEMAMix optimizer (#33682 ) * Add AdEMAMix optimizer * Fix test * Update tests/trainer/test_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-09-25 18:07:21 +01:00
Benjamin Fineran	574a9e12bb	HFQuantizer implementation for compressed-tensors library (#31704 ) * Add compressed-tensors HFQuantizer implementation * flag serializable as False * run * revive lines deleted by ruff * fixes to load+save from sparseml, edit config to quantization_config, and load back * address satrat comment * compressed_tensors to compressed-tensors and revert back is_serializable * rename quant_method from sparseml to compressed-tensors * tests * edit tests * clean up tests * make style * cleanup * cleanup * add test skip for when compressed tensors is not installed * remove pydantic import + style * delay torch import in test * initial docs * update main init for compressed tensors config * make fix-copies * docstring * remove fill_docstring * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * review comments * review comments * comments - suppress warnings on state dict load, tests, fixes * bug-fix - remove unnecessary call to apply quant lifecycle * run_compressed compatability * revert changes not needed for compression * no longer need unexpected keys fn * unexpected keys not needed either * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * add to_diff_dict * update docs and expand testing * Update _toctree.yml with compressed-tensors * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update doc * add note about saving a loaded model --------- Co-authored-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: Sara Adkins <sara.adkins65@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-09-25 14:31:38 +02:00
NielsRogge	06e27e3dc0	[Pixtral] Improve docs, rename model (#33491 ) * Improve docs, rename model * Fix style * Update repo id	2024-09-25 13:53:12 +02:00
Dmitry Rogozhkin	5e2916bc14	tests: fix pytorch tensor placement errors (#33485 ) This commit fixes the following errors: * Fix "expected all tensors to be on the same device" error * Fix "can't convert device type tensor to numpy" According to pytorch documentation torch.Tensor.numpy(force=False) performs conversion only if tensor is on CPU (plus few other restrictions) which is not the case. For our case we need force=True since we just need a data and don't care about tensors coherency. Fixes: #33517 See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2024-09-25 12:21:53 +01:00
Jonathan Mamou	52daf4ec76	🚨🚨 Setting default behavior of assisted decoding (#33657 )	2024-09-25 09:39:09 +01:00
Yoni Gozlan	5f0c181f4e	Uniformize kwargs for image-text-to-text processors (#32544 ) * uniformize FUYU processor kwargs * Uniformize instructblip processor kwargs * Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2 * Uniformize llava_next processor * Fix save_load test for processor with chat_template only as extra init args * Fix import Unpack * Fix Fuyu Processor import * Fix FuyuProcessor import * Fix FuyuProcessor * Add defaults for specific kwargs kosmos2 * Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs * Add tests processor Udop * remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature * Fix overwrite tests kwargs processors * Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop * Fix processing test fuyu * remove unnecessary pad_token check in instructblip ProcessorTest * Fix BC tests and cleanup * FIx imports fuyu * Uniformize Pix2Struct * Fix wrong name for FuyuProcessorKwargs * Fix slow tests reversed inputs align fuyu llava-next, change udop warning * Fix wrong logging import udop * Add check images text input order * Fix copies * change text pair handling when positional arg * rebase on main, fix imports in test_processing_common * remove optional args and udop uniformization from this PR * fix failing tests * remove unnecessary test, fix processing utils and test processing common * cleanup Unpack * cleanup * fix conflict grounding dino	2024-09-24 21:28:19 -04:00
Joao Gante	a7734238ff	Generation tests: update imagegpt input name, remove unused functions (#33663 )	2024-09-24 16:40:48 +01:00
jiqing-feng	11c27dd331	Enable BNB multi-backend support (#31098 ) * enable cpu bnb path * fix style * fix code style * fix 4 bit path * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * add multi backend refactor tests * fix style * tweak 4bit quantizer + fix corresponding tests * tweak 8bit quantizer + try fixing corresponding tests * fix dequant bnb 8bit * account for Intel CPU in variability of expected outputs * enable cpu and xpu device map * further tweaks to account for Intel CPU * fix autocast to work with both cpu + cuda * fix comments * fix comments * switch to testing_utils.torch_device * allow for xpu in multi-gpu tests * fix tests 4bit for CPU NF4 * fix bug with is_torch_xpu_available needing to be called as func * avoid issue where test reports attr err due to other failure * fix formatting * fix typo from resolving of merge conflict * polish based on last PR review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix CI * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix error log * fix error msg * add \n in error log * make quality * rm bnb cuda restriction in doc * cpu model don't need dispatch * fix doc * fix style * check cuda avaliable in testing * fix tests * Update docs/source/en/model_doc/chameleon.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix doc * fix check multibackends * fix import sort * remove check torch in bnb * docs: update bitsandbytes references with multi-backend info * docs: fix small mistakes in bnb paragraph * run formatting * reveret bnb check * move bnb multi-backend check to import_utils * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix bnb check * minor fix for bnb * check lib first * fix code style * Revert "run formatting" This reverts commit `ac108c6d6b`. * fix format * give warning when bnb version is low and no cuda found] * fix device assignment check to be multi-device capable * address akx feedback on get_avlbl_dev fn * revert partially, as we don't want the function that public, as docs would be too much (enforced) --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-24 03:40:56 -06:00
Joao Gante	e15687fffe	Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203 )	2024-09-23 18:28:36 +01:00
Yoni Gozlan	1456120929	Uniformize kwargs for Udop processor and update docs (#33628 ) * Add optional kwargs and uniformize udop * cleanup Unpack * nit Udop	2024-09-23 12:47:32 -04:00
Pablo Montalvo	9eb93854b9	Clean up Unpack imports (#33631 ) clean up Unpack imports	2024-09-23 10:21:17 +02:00
Avishai Elmakies	78b2929c05	Sdpa dino v2 (#33403 ) * add sdpa to dinov2 * fixup * add dinov2 to sdpa doc * update doc order * [run-slow] dinov2 * common to eager * [run-slow] dinov2 * update attn implementation in common * update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit * [run-slow] dinov2 --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>	2024-09-21 01:58:00 +01:00
Mayank Mishra	e472e077c2	Granitemoe (#33207 ) * first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * add granitemoe * add decoration * remove moe from sequenceclassification * fix test * fix * fix * fix * move rope? * merge * drop bias * drop bias * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * drop * drop * fix * fix * cleanup * cleanup * fix * fix granite tests * fp32 test * fix * drop jitter * fix * rename * rename * fix config * add gen test --------- Co-authored-by: Yikang Shen <yikang.shn@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-21 01:43:50 +02:00
jiqing-feng	49a0bef4c1	enable low-precision pipeline (#31625 ) * enable low-precision pipeline * fix parameter for ASR * reformat * fix asr bug * fix bug for zero-shot * add dtype check * rm useless comments * add np.float16 check * Update src/transformers/pipelines/image_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/pipelines/token_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix comments * fix asr check * make fixup * No more need for is_torch_available() --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-09-20 16:43:30 -07:00
Yih-Dar	077b552f07	Fix some missing tests in circleci (#33559 ) * fix * fix * fix * fix * skip * skip more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-09-20 20:58:51 +02:00
Duc-Viet Hoang	dc8b6eaeee	Fix contrastive search to correctly handle input with padding (#33507 ) * fix: handle padding in contrastive search for decoder-only models * fix: handle padding in contrastive search for encoder-decoder models * tests: move padding contrastive test to test_util, add t5 test * fix: handle if model_kwargs["decoder_attention_mask"] is None * refactor: improve padding input contrastive search generation tests * chore: _ranking_fast to use LongTensor for cosine_matrix_mask	2024-09-20 16:52:08 +01:00
Yoni Gozlan	c0c6815dc9	Add support for args to ProcessorMixin for backward compatibility (#33479 ) * add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin * change size and crop_size in processor kwargs tests to do_rescale and rescale_factor * remove unnecessary llava processor kwargs test overwrite * nit * change data_arg_name to input_name * Remove unnecessary test override * Remove unnecessary tests Paligemma * Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring	2024-09-20 11:40:59 -04:00
Yih-Dar	31caf0b95f	Fix missing test in `torch_job` (#33593 ) fix missing tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-09-20 17:16:44 +02:00
Joao Gante	2fdb5e74cc	VLM generate: tests can't generate image/video tokens (#33623 )	2024-09-20 15:43:27 +01:00
amyeroberts	f9b4409726	Remove unnecessary CPM model tests (#33621 ) Remove model tests	2024-09-20 14:20:57 +01:00
Joao Gante	266d0a6375	Generate: remove flakyness in `test_generate_from_inputs_embeds_decoder_only` (#33602 ) almost zero is not zero	2024-09-20 14:50:42 +02:00
Lake Lee	ec1424c6a3	Update modeling_mamba2.py, fix pad size (#32599 ) * Update modeling_mamba2.py Fix pad_size calculation to ensure it's less than self.chunk_size * [run_slow] mamba2 * [run-slow] mamba2 * [run-slow] Add @require_read_token decorator to failing tests for token propagation * [run_slow] mamba2	2024-09-20 11:40:57 +01:00
Fanli Lin	8bd1f2f338	[tests] make more tests device-agnostic (#33580 ) * enable * fix * add xpu skip * add marker * skip for xpu * add more * enable on accelerator * add more cases * add more tests * add more	2024-09-20 10:16:43 +01:00
Fanli Lin	4d8908df27	[tests] enable GemmaIntegrationTest on XPU (#33555 ) enable GemmaIntegrationTest	2024-09-19 19:39:19 +01:00
Fanli Lin	b87755aa6d	[tests] skip tests for xpu (#33553 ) * enable * fix * add xpu skip * add marker * skip for xpu * add more * add one more	2024-09-19 19:28:04 +01:00
Yoni Gozlan	f111d5b783	Uniformize kwargs for Paligemma processor and update docs (#33571 ) * Uniformize paligemma processor * nit	2024-09-19 14:14:06 -04:00
Joao Gante	52920b5dd5	Cache: don't throw warnings on `gemma2` when instantiating a new cache (#33595 )	2024-09-19 17:42:47 +01:00
Anton Vlasjuk	b50ff5993a	[`Mamba2`] Move dt calculations to kernel (#33520 ) * use kernel for dt calculations * add small test * [run-slow] mamba2	2024-09-19 17:41:17 +01:00
Vladislav Bronzov	162056a3f4	change sequence_bias type of SequenceBiasLogitsProcessor to list, add… (#33375 ) * change sequence_bias type of SequenceBiasLogitsProcessor tp list, add config tests for all processors * fix format * small fix for all_token_bias_pairs_are_valid internal func * small typo fix in description * improve test impl, some SequenceBiasLogitsProcessor refactoring	2024-09-19 17:35:44 +01:00
Pablo Montalvo	413008c580	add uniform processors for altclip + chinese_clip (#31198 ) * add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * remove try/catch block * deprecate kwarg * format * add copyright + remove unused method * [run-slow]altclip, chinese_clip * clean imports * fix version * clean up deprecation * fix style * add corner case test on kwarg overlap * resume processing - add Unpack as importable * add tmpdirname * fix altclip * fix up * add back crop_size to specific tests * generalize tests to possible video_processor * add back crop_size arg * fixup overlapping kwargs test for qformer_tokenizer * remove copied from * fixup chinese_clip tests values * fixup tests - qformer tokenizers * [run-slow] altclip, chinese_clip * remove prepare_image_inputs	2024-09-19 17:21:54 +02:00
Pablo Montalvo	4f0246e535	fix tests with main revision and read token (#33560 ) * fix tests with main revision and read token * [run-slow]mamba2 * test previously skipped tests * [run-slow]mamba2 * skip some tests * [run-slow]mamba2 * finalize tests * [run-slow]mamba2	2024-09-19 17:10:22 +02:00
Joao Gante	f3b3810fe6	rag: fix CI (#33578 )	2024-09-19 11:55:26 +01:00
Raushan Turganbay	d7975a5874	VLMs: enable generation tests (#33533 ) * add tests * fix whisper * update * nit * add qwen2-vl * more updates! * better this way * fix this one * fix more tests * fix final tests, hope so * fix led * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * pr comments * not pass pixels and extra for low-mem tests, very flaky because of visio tower --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-19 12:04:24 +02:00
Raushan Turganbay	e40bb4845e	Load and save video-processor from separate folder (#33562 ) * load and save from video-processor folder * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-19 09:56:52 +02:00
Yoach Lacombe	5af7d41e49	Codec integration (#33565 ) * clean mimi commit * some nits suggestions from Arthur * make fixup * rename repo id + change readme * Update docs/source/en/model_doc/mimi.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add flaky flag to batching equivalence due to audio_codes failing sometimes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-18 19:23:44 +02:00
Joao Gante	7542fac2c7	Pipeline: no side-effects on `model.config` and `model.generation_config` 🔫 (#33480 )	2024-09-18 15:43:06 +01:00
Yoach Lacombe	f883827c0a	Fix tests in ASR pipeline (#33545 )	2024-09-18 16:25:45 +02:00
Raushan Turganbay	db72894b48	Chat template: save and load correctly for processors (#33462 ) * fix * add tests * fix tests * Update tests/models/llava/test_processor_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * fix tests * update tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-18 13:00:44 +02:00
Duygu Altinok	52e22cbf67	Fix for slow the bug tokenizer adding spaces to single id decodes (#32564 ) * _decode signature change and quick return * added bunch of decoding tests * signature match and return * added tests for decoding * merged decoding test * more tests for special tokens * cosmetics * fixed param * ruffed the file * refinement for single special tokens * added test for single special tokens * slight change to test name Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * minor change test name for skip tokens Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * killed already defined var Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * minor update with vars Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> * killed already defined var once more Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> --------- Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>	2024-09-18 12:32:02 +02:00
Aymeric Roucher	e6d9f39dd7	Decorator for easier tool building (#33439 ) * Decorator for tool building	2024-09-18 11:07:51 +02:00
Wang, Yi	454a0f2efd	fix patch_attention_mask incorrect setting which leads to the differe… (#33499 ) * fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1 Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * fix format Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * [run_slow] idefics2 --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-09-17 22:24:42 +01:00
teamclouday	6c051b4e1e	Add revision to trainer push_to_hub (#33482 ) * add revision to trainer push_to_hub * apply suggestions * add test for revision * apply ruff format * reorganize imports * change test trainer path	2024-09-17 23:11:32 +02:00
Yoni Gozlan	d8500cd229	Uniformize kwargs for Pixtral processor (#33521 ) * add uniformized pixtral and kwargs * update doc * fix _validate_images_text_input_order * nit	2024-09-17 14:44:27 -04:00
Nikita Krasnytskyi	c29a8694b0	Fix missing `sequences_scores` in the Whisper beam search output (#32970 ) * added sequences_scores to the output * added beam_indices to output * added test to check for beam_indices, sequences_scores and their shape * removed redundant whitespaces * make fixup	2024-09-17 19:36:11 +01:00
ErezSC42	46c27577b3	fix to jamba config, asserting attention and expert offset (#33316 ) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler	2024-09-17 19:29:27 +01:00
Wang, Yi	74026b473e	idefics2 enable_input_require_grads not aligned with disable_input_re… (#33194 ) * idefics2 enable_input_require_grads not aligned with disable_input_require_grads make peft+idefics2 checkpoints disable fail Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * split test case Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * fix ci failure Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * refine test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-09-17 10:39:34 +01:00
Insu Jang	bcf8946f0a	Fix number of patch check for different vision feature select strategy (#32494 ) * Fix number of patch check for different vision feature select strategy * add test --------- Co-authored-by: raushan <raushan@huggingface.co>	2024-09-17 09:33:07 +02:00
Yoach Lacombe	18e1a9c719	Fix parametrization-based weight norm (#33275 ) * refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading * make style * fix sew * fix sew and sew_d tests	2024-09-17 08:05:21 +02:00
Steven Shimizu	ba1f1dc132	Updated Trainer's liger-kernel integration to call correct patching API (#33502 ) * Updated liger-kernel integration in Trainer to call correct patching API * Fixed styling	2024-09-17 02:40:24 +02:00
Yoach Lacombe	98adf24883	[Whisper test] Fix some failing tests (#33450 ) * Fix failing tensor placement in Whisper * fix long form generation tests * more return_timestamps=True * make fixup * [run_slow] whisper * [run_slow] whisper	2024-09-16 19:05:17 +02:00
Yoni Gozlan	2f62146f0e	Uniformize kwargs for LLaVa processor and update docs (#32858 ) * Uniformize kwargs for LlaVa and update docs * Change order of processor inputs in docstring * Improve BC support for reversed images and text inputs * cleanup llava processor call docstring * Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning * Put function check reversed images text outside base processor class * Refactor _validate_images_text_input_order * Add ProcessingUtilTester * fix processing and test_processing	2024-09-16 11:26:26 -04:00
Arthur	8bd2b1e8c2	Add support for Pixtral (#33449 ) * initial commit * gloups * updates * work * weights match * nits * nits * updates to support the tokenizer :) * updates * Pixtral processor (#33454) * rough outline * Add in image break and end tokens * Fix * Udo some formatting changes * Set patch_size default * Fix * Fix token expansion * nit in conversion script * Fix image token list creation * done * add expected results * Process list of list of images (#33465) * updates * working image and processor * this is the expected format * some fixes * push current updated * working mult images! * add a small integration test * Uodate configuration docstring * Formatting * Config docstring fix * simplify model test * fixup modeling and etests * Return BatchMixFeature in image processor * fix some copies * update * nits * Update model docstring * Apply suggestions from code review * Fix up * updates * revert modeling changes * update * update * fix load safe * addd liscence * update * use pixel_values as required by the model * skip some tests and refactor * Add pixtral image processing tests (#33476) * Image processing tests * Add processing tests * woops * defaults reflect pixtral image processor * fixup post merge * images -> pixel values * oups sorry Mr docbuilder * isort * fix * fix processor tests * small fixes * nit * update * last nits * oups this was really breaking! * nits * is composition needs to be true --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-14 12:28:39 +02:00
Marc Sun	6cc4dfe3f1	Fix the initialization of the cache when we have multi gpu (#33303 ) * init cache multi-gpu * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * switch to execution device map * naming more consistant * fix * mutually exclusive device * added an integration example * remove useless check * suggestion from joao + typing * fix couple of typo and add test * revert check --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-13 15:06:08 +02:00
Amit Garg	dfd31158ee	[Phi-3] Bug on stale kv cache (#33129 ) * fix long seq bug * fixed format * fixed fn copy inconsistency * fix long seq bug * fixed format * fixed fn copy inconsistency * Addressed comments * added a unit test * fixed cache position * Added a warning msg to the forward fn * fixed test case	2024-09-13 14:07:19 +02:00
Alvaro Moran	7a5659872a	Mitigate a conflict when using sentencepiece (#33327 ) * test(tokenizers): add a test showing conflict with sentencepiece This is due to the fact that protobuf C implementation uses a global pool for all added descriptors, so if two different files add descriptors, they will end up conflicting. * fix(tokenizers): mitigate sentencepiece/protobuf conflict When sentencepiece is available, use that protobuf instead of the internal one. * chore(style): fix with ruff	2024-09-13 13:19:06 +02:00
Raushan Turganbay	4b0418df11	Enable `padding_side` as call time kwargs (#33385 ) * fix * add padding-side kwarg * add padding side in all models & fix tests * fix copies * fix tests	2024-09-13 11:58:38 +01:00
Wing Lian	1027a532c5	add a callback hook right before the optimizer step (#33444 )	2024-09-13 10:43:45 +02:00
Raushan Turganbay	9c4639b622	Return image hidden states (#33426 ) * fix * return image hidden states * fix copies * fix test	2024-09-13 10:20:03 +02:00
benniekiss	5c6257d1fc	[whisper] Clarify error message when setting max_new_tokens (#33324 ) * clarify error message when setting max_new_tokens * sync error message in test_generate_with_prompt_ids_max_length * there is no self	2024-09-12 18:48:36 +02:00
Raushan Turganbay	2f611d30d9	Qwen2-VL: clean-up and add more tests (#33354 ) * clean-up on qwen2-vl and add generation tests * add video tests * Update tests/models/qwen2_vl/test_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix and add better tests * Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update docs and address comments * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_vl.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * remove size at all --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 18:24:04 +02:00
Hannan Komari	8ed635258c	Fix flax whisper tokenizer bug (#33151 ) * Update tokenization_whisper.py Fix issue with flax whisper model * Update tokenization_whisper_fast.py Fix issue with flax whisper model * Update tokenization_whisper.py just check len of token_ids * Update tokenization_whisper_fast.py just use len of token_ids * Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list * Update test_tokenization_whisper.py to add test for _convert_to_list method * Update test_tokenization_whisper.py to fix code style issues * Fix code style * Fix code check again * Update test_tokenization)whisper.py to Improve code style * Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available * Update tests/models/whisper/test_tokenization_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method * Revert the changes automatically applied by formatter and was unrelated to PR * Format for minimal changes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-12 12:21:59 +01:00
Jonathan Mamou	7a51cbc65f	Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258 ) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-11 14:22:28 +02:00
Theia Vogel	e719b65c31	Fix `FbgemmFp8Linear` not preserving tensor shape (#33239 ) * add tests for linear shape behavior * fix linear shape behavior ended up adding the reshape at the end, after f8f8bf16_rowwise, because adding it directly after quantize_fp8_per_row caused f8f8bf16_rowwise to drop the seq_len dimension. (i.e., (17, 23, 1014) -> (17, 1024)) * save shape up front + comment	2024-09-11 13:26:44 +02:00
Ita Zaporozhets	781bbc4d98	use diff internal model in tests (#33387 ) * use diff internal model in tests * use diff internal model in tests	2024-09-11 11:27:00 +02:00
Guang Yang	f38590dade	Make StaticCache configurable at model construct time (#32830 ) * Make StaticCache configurable at model construct time * integrations import structure * add new doc file to toc --------- Co-authored-by: Guang Yang <guangyang@fb.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-09-10 16:35:57 +01:00
Alazar	96429e74a8	Add support for GGUF Phi-3 (#31844 ) * Update docs for GGUF supported models * Add tensor mappings and define class GGUFPhi3Converter * Fix tokenizer * Working version * Attempt to fix some CI failures * Run ruff format * Add vocab, merges, decoder methods like LlamaConverter * Resolve conflicts since Qwen2Moe was added to gguf - I missed one place when resolving conflict - I also made a mistake with tests_ggml.py and now has been fixed to reflect its master version.	2024-09-10 13:32:38 +02:00
Maciej Adamiak	8e8e7d8558	fixed Mask2Former image processor segmentation maps handling (#33364 ) * fixed mask2former image processor segmentation maps handling * introduced review suggestions * introduced review suggestions	2024-09-10 11:19:56 +01:00
Raushan Turganbay	7d2d6ce9cb	VLM: fixes after refactor (#32907 ) * leave only half of the changes * fix tests * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix tests, first try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix, second try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava	2024-09-10 12:02:37 +02:00
Lysandre Debut	f24f084329	Import structure & first three model refactors (#31329 ) * Import structure & first three model refactors * Register -> Export. Export all in __all__. Sensible defaults according to filename. * Apply most comments from Amy and some comments from Lucain Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com> * Style * Add comment * Clearer .py management * Raise if not in backend mapping * More specific type * More efficient listdir * Misc fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com>	2024-09-10 11:10:53 +02:00
amyeroberts	f745e7d3f9	Remove repeated prepare_images in processor tests (#33163 ) * Remove repeated prepare_images * Address comments - update docstring; explanatory comment	2024-09-09 13:20:27 +01:00
Raushan Turganbay	65bb284448	Compile compatibilty for decoder-only models (#32617 ) * squash into one commit * add qwen2-vl for rope standardization * fix mistral compile * fix qwen2-vl * fix-copies	2024-09-09 10:59:04 +02:00
Wing Lian	62aecd85ff	schedulefree optimizers (#30079 ) * schedulefree optimizers * fix train instead of eval for optimizer * fixes and update docs * chore: lint * add tests and drop overly-verbose _32bit suffix * chore: lint * fix for docs * fix code review issues * use duck-typing to avoid per-optimizer patches * fixup style * fixup style * warn if incorrect accelerate version with schedule free Co-authored-by: Aman Gupta Karmani <aman@tmm1.net> --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-09-09 09:51:39 +02:00
Ita Zaporozhets	e48e5f1f13	Support reading tiktoken tokenizer.model file (#31656 ) * use existing TikTokenConverter to read tiktoken tokenizer.model file * del test file * create titktoken integration file * adding tiktoken llama test * ALTNATIVE IMPLEMENTATION: supports llama 405B * fix one char * remove redundant line * small fix * rm unused import * flag for converting from tiktokeng * remove unneeded file * ruff * remove llamatiktokenconverter, stick to general converter * tiktoken support v2 * update test * remove stale changes * udpate doc * protect import * use is_protobuf_available * add templateprocessor in tiktokenconverter * reverting templateprocessor from tiktoken support * update test * add require_tiktoken * dev-ci * trigger build * trigger build again * dev-ci * [build-ci-image] tiktoken * dev-ci * dev-ci * dev-ci * dev-ci * change tiktoken file name * feedback review * feedback rev * applying feedback, removing tiktoken converters * conform test * adding docs for review * add doc file for review * add doc file for review * add doc file for review * support loading model without config.json file * Revert "support loading model without config.json file" This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb. * remove dev var * updating docs * safely import protobuf * fix protobuf import error * fix protobuf import error * trying isort to fix ruff error * fix ruff error * try to fix ruff again * try to fix ruff again * try to fix ruff again * doc table of contents * add fix for consistency.dockerfile torchaudio * ruff * applying feedback * minor typo * merging with push-ci-image * clean up imports * revert dockerfile consistency	2024-09-06 14:24:02 +02:00
Shiyu	342e800086	support 3D attention mask in bert (#32105 ) * support 3D/4D attention mask in bert * test cases * update doc * fix doc	2024-09-06 14:20:48 +02:00
GeLee	2b18354106	add self.head_dim for VisionAttention in Qwen2-VL (#33211 ) * add self.head_dim for VisionAttention in Qwen2-VL * add self.head_dim for VisionAttention in Qwen2-VL * fix ci * black the test_modeling_qwen2_vl.py * use ruff to format test_modeling_qwen2_vl.py * [run-slow] qwen2_vl * use tying for python3.8 * fix the import format * use ruff to fix the ci error I001 * [run-slow] qwen2_vl * remove unused import * commit for rebase * use ruff fix ci * [run-slow] qwen2_vl --------- Co-authored-by: root <liji>	2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi	3314fe1760	Add validation for maximum sequence length in modeling_whisper.py (#33196 ) * Add validation for maximum sequence length in modeling_whisper.py Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message. This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness. * Change exception message in src/transformers/models/whisper/modeling_whisper.py The exception message is for whisper's label's sequence max length. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py It's for whisper's config.max_target_positions. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change method's documentation in src/transformers/models/whisper/modeling_whisper.py * Add test for maximum label's sequence length in test_modeling_whisper.py * Add self to modeling_whisper.py * Update test_modeling_whisper.py with respect to automatic validations * Update modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Separate test_labels_sequence_max_length tests in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Remove assert from test_modeling_whisper.py * Add max_target_positions to WhisperModelTester in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py * Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py * Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py * Add new tests in test_modeling_whisper.py * Update test_modeling_whisper.py --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-09-06 14:09:49 +02:00
Ita Zaporozhets	363301f221	support loading model without config.json file (#32356 ) * support loading model without config.json file * fix condition * update tests * add test * ruff * ruff * ruff	2024-09-06 13:49:47 +02:00
Xuehai Pan	e1c2b69c34	Load dynamic module (remote code) only once if code isn't change (#33162 ) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review	2024-09-06 12:49:35 +01:00
Sanchit Gandhi	51d15eb1c1	[whisper] alternative fix for long-form timestamps (#32131 ) * [whisper] alternative fix for long-form timestamps * update test	2024-09-06 12:57:08 +02:00
Raushan Turganbay	1759bb9126	Fix: StaticCache & `inputs_embeds` (#32932 ) squash commit	2024-09-06 12:56:59 +05:00
Shijie	21fac7abba	simple align qwen2vl kv_seq_len calculation with qwen2 (#33161 ) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-09-05 21:19:30 +05:00
Vladislav Bronzov	5d11de4a2f	Add Qwen2Moe GGUF loading support (#33264 ) * update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe	2024-09-05 17:42:03 +02:00
Joshua Lochner	c6d2848a23	🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision models (#33226 ) * Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models * Apply formatting * Add missing `self.config = config` * Fix copies * Fix hiera interpolation unit test * Formatting * Update `_import_structure` * make style * Fix docstring * Use `# Copied from` instead of utils * DeiT variable renaming (`class_and_dist_pos_embed`) * Fix Hiera `interpolate_pos_encoding`	2024-09-05 16:17:34 +02:00
Younes Belkada	47b096412d	Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195 ) * fix FM training kernels * fix copies * fix copies * propagate to slow path * make it BC * add comment * fix test	2024-09-05 11:55:08 +02:00
Raushan Turganbay	43df47d8e7	Llava Onevision: add model (#32673 ) * working version * fix copies * update * tests * update docs * codestyle * add more tests * add returns for docs * clean up * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * codestyle * style * shouldn't be reversed * [run-slow] llava_onevision * [run-slow] llava_onevision * add pooling in videos * [run-slow] llava_onevision * num-logits-to-keep * [run-slow] llava_onevision * [run-slow] llava_onevision * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * video matched orig impl * fix tests * chat template was modified * Update docs/source/en/model_doc/llava_onevision.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add morer info in the doc page --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-05 14:43:20 +05:00
Yoni Gozlan	9230d78e76	Add validate images and text inputs order util for processors and test_processing_utils (#33285 ) * Add validate images and test processing utils * Remove encoded text from possible inputs in tests * Removed encoded inputs as valid in processing_utils * change text input check to be recursive * change text check to all element of lists and not just the first one in recursive checks	2024-09-04 13:50:31 -04:00
Aymeric Roucher	2cb543db77	Multi agents with manager (#32687 ) * Add Multi agents with a hierarchical system	2024-09-04 17:30:54 +02:00
amyeroberts	d2dcff96f8	[InstructBLIP] qformer_tokenizer is required input (#33222 ) * [InstructBLIP] qformer_tokenizer is required input * Bit safer * Add to instructblipvideo processor * Fix up * Use video inputs * Update tests/models/instructblipvideo/test_processor_instructblipvideo.py	2024-09-04 16:18:06 +01:00
Alex Sherstinsky	122ded0a11	Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188 ) * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities. * Fixing a bug in the way "attention_factor" is validated in ROPE utilities.	2024-09-04 17:01:12 +02:00
laurentd-lunit	d703477265	[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263 ) * [fix] LlavaNextProcessor '_get_unpadded_features' method * [tests] add test_image_token_filling * [chore] style + comment * [minor] improve readability * [chore] run make fix-copies	2024-09-04 17:41:51 +05:00
Joao Gante	d750b509fc	Config: unified logic to retrieve text config (#33219 )	2024-09-04 12:03:30 +01:00
Raushan Turganbay	ebbe8d8014	Cache docs: update (#32929 ) * some changes * more updates * fix cache copy * nits * nits * add tests	2024-09-04 15:05:31 +05:00
Niklas Muennighoff	ecd61c6286	Add OLMoE (#32406 ) * Add OLMoE * Add OLMoE * Updates * Make norm optional; add keys * Add output * Add * Fix dtype * Fix eos config * Update * Add OLMoE * Fix OLMoE path * Format * Format * Rmv copy statement * Rmv copy statement * Format * Add copies * Cp rotary * Fix aming * Fix naming * Update RoPE integration; num_logits_to_keep; Add copy statements * Add eps to config * Format * Add aux loss * Adapt router_aux_loss_coef * Update md * Adapt * adapt tests	2024-09-03 18:43:12 +02:00
Zach Mueller	6b7d64ac1c	Only disallow DeepSpeed Zero-3 for auto bs finder (#31731 ) * Only disallow DeepSpeed * Clean * DeepSpeed! * Add a test for deepspeed	2024-09-03 09:16:28 -04:00
Isotr0py	edeca4387c	🚨 Support dequantization for most GGML types (#32625 ) * use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere	2024-09-03 12:58:14 +02:00
Marc Sun	9ea1eacd11	remove to restriction for 4-bit model (#33122 ) * remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>	2024-09-02 16:28:50 +02:00
Joao Gante	97c0f45b9c	Generate: fix assistant in different device (#33257 )	2024-09-02 14:37:49 +01:00
Matt	52a0213755	Add assistant prefill for chat templates and TextGenerationPipeline (#33198 ) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-09-02 13:23:47 +01:00
Jeongseok Kang	963ed98bed	docs: Replace package abbreviations with full name(`bitsandbytes`) in docstrings (#33230 ) * docs: Provide fullname for `bitsandbytes` package * docs: Provide fullname for `bitsandbytes` package (2)	2024-09-02 13:40:34 +02:00
Aymeric Roucher	1ca9ff5c91	Add duckduckgo search tool (#32882 ) * Add duckduckgo search tool	2024-09-02 09:56:20 +02:00
Joao Gante	eb5b968c5d	Generate: throw warning when `return_dict_in_generate` is False but should be True (#33146 )	2024-08-31 10:47:08 +01:00
Arthur	b017a9eb11	Refactor CI: more explicit (#30674 ) * don't run custom when not needed? * update test fetcher filtering * fixup and updates * update * update * reduce burden * nit * nit * mising comma * this? * this? * more parallelism * more * nit for real parallelism on tf and torch examples * update * update * update * update * update * update * update * update * update * update * update * update * update to make it more custom * update to make it more custom * update to make it more custom * update to make it more custom * update * update * update * update * update * update * use correct path * fix path to test files and examples * filter-tests * filter? * filter? * filter? * nits * fix naming of the artifacts to be pushed * list vs files * list vs files * fixup * fix list of all tests * fix the install steps * fix the install steps * fix the config * fix the config * only split if needed * only split if needed * extend should fix it * extend should fix it * arg * arg * update * update * run tests * run tests * run tests * more nits * update * update * update * update * update * update * update * simpler way to show the test, reduces the complexity of the generated config * simpler way to show the test, reduces the complexity of the generated config * style * oups * oups * fix import errors * skip some tests for now * update doctestjob * more parallelism * fixup * test only the test in examples * test only the test in examples * nits * from Arthur * fix generated congi * update * update * show tests * oups * oups * fix torch job for now * use single upload setp * oups * fu*k fix * nit * update * nit * fix * fixes * [test-all] * add generate marker and generate job * oups * torch job runs not generate tests * let repo utils test all utils * UPdate * styling * fix repo utils test * more parallel please * don't test * update * bit more verbose sir * more * hub were skipped * split by classname * revert * maybe? * Amazing catch Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix * update * update * maybe non capturing * manual convert? * pass artifacts as parameters as otherwise the config is too long * artifact.json * store output * might not be safe? * my token * mmm? * use CI job IS * can't get a proper id? * ups * build num * update * echo url * this? * this! * fix * wget * ish * dang * udpdate * there we go * update * update * pass all * not .txt * update * fetcg * fix naming * fix * up * update * update * ?? * update * more updates * update * more * skip * oups * pr documentation tests are currently created differently * update * hmmmm * oups * curl -L * update * ???? * nit * mmmm * ish * ouf * update * ish * update * update * updatea * nit * nit * up * oups * documentation_test fix * test hub tests everything, just marker * update * fix * test_hub is the only annoying one now * tf threads? * oups * not sure what is happening? * fix? * just use folder for stating hub * I am getting fucking annoyed * fix the test? * update * uupdate * ? * fixes * add comment! * nit --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2024-08-30 18:17:25 +02:00
Matt	38d58a4427	Fix local repos with remote code not registering for pipelines (#33100 ) * Extremely experimental fix! * Try removing the clause entirely * Add test * make fixup * stash commit * Remove breakpoint * Add anti-regression test * make fixup * Move repos to hf-internal-testing!	2024-08-30 16:56:22 +01:00
Gerben van V	5129671290	Add a static cache that offloads to the CPU or other device (#32161 ) * Add a static cache that offloads to the CPU or other device * Fix PR comments, add unit-tests	2024-08-29 11:51:09 +02:00
rasmi	f9ed05dd03	Fix import paths for test_module (#32888 ) * Fix import path for test_feature_extraction_utils.py See https://github.com/huggingface/transformers/pull/32601 * Fix import path for test_image_processing_utils.py	2024-08-28 12:08:29 +01:00
JB (Don)	f1a385b1de	[RoBERTa-based] Add support for sdpa (#30510 ) * Adding SDPA support for RoBERTa-based models * add not is_cross_attention * fix copies * fix test * add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin * address some review comments * use copied from * style * consistency * fix lists --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-28 10:26:00 +02:00
Anton Vlasjuk	3bfd3e4803	Fix: Jamba batched generation (#32914 ) * init fix * fix mask during cached forward, move mask related stuff to own function * adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp) * revert overwriting new integration tests * move some comments to docstring	2024-08-28 09:24:06 +02:00
Mayank Mishra	c35d2ccf5a	Granite language models (#31502 ) * first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-27 21:27:21 +02:00
Juan Pizarro	7591ca5bc5	🚨 Add Blip2ForImageTextRetrieval (#29261 ) * add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 18:50:27 +01:00
Joao Gante	c6b23fda65	Llama: make slow tests green 🟢 (#33138 )	2024-08-27 14:44:42 +01:00
Matt	9956c2bc98	Add a fix for custom code tokenizers in pipelines (#32300 ) * Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests	2024-08-27 14:39:57 +01:00
Joao Gante	ab0ac3b98f	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 ) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 11:58:27 +01:00
Aya	7562366d4b	fix: multilingual midel convert to tflite get wrong token (#32079 ) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>	2024-08-27 11:44:09 +02:00
Sai-Suraj-27	3bf6dd8aa1	fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850 ) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen	2024-08-27 09:20:59 +02:00
Joao Gante	72d4a3f9c1	mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#33099 )	2024-08-26 15:34:19 +01:00
Joao Gante	894d421ee5	Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#33093 )	2024-08-26 15:23:30 +01:00
Shijie	19e6e80e10	support qwen2-vl (#32318 ) * support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>	2024-08-26 15:16:44 +02:00
Matt	371b9c1486	Enable some Jinja extensions and add datetime capabilities (#32684 ) * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Add new Jinja features: - Do extension - Break/continue in loops - Call strftime to get current datetime in any format * Fix strftime template * Add template strip() just to be safe * Remove the do extension to make porting easier, and also because it's the least useful * Rename test * strftime -> strftime_now * Split test * Update test to use strftime_now * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils * Refactor everything out into chat_template_utils	2024-08-23 14:26:12 +01:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Joao Gante	970a16ec7f	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-23 11:12:53 +01:00
Cyril Vallez	22e6f14525	Reducing memory usage: removing useless logits computation in generate() (#31292 ) * Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs	2024-08-23 11:08:34 +01:00
Joao Gante	a26de15139	Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863 )	2024-08-22 20:01:52 +01:00
Andrés Marafioti	18199b34e5	[run_slow] idefics2 (#32840 )	2024-08-22 18:08:03 +02:00
Joao Gante	975b988bfe	Gemma2: eager attention by default (#32865 )	2024-08-22 15:59:30 +01:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Joao Gante	f6e2586a36	Jamba: update integration tests (#32250 ) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-22 11:46:10 +01:00
Marc Sun	fd06ad5438	🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627 ) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci	2024-08-20 11:42:36 +02:00
Younes Belkada	93e538ae2e	Mamba / FalconMamba: Fix mamba left padding (#32677 ) * fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-19 16:01:35 +02:00
Fanli Lin	e55b33ceb4	[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519 ) * enable * fix	2024-08-19 12:46:59 +01:00
Kamil Akesbi	8260cb311e	Add Descript-Audio-Codec model (#31494 ) * dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-08-19 10:21:51 +01:00
MAHIR DAIYAN	843e5e20ca	Add Flax Dinov2 (#31960 ) * tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-08-19 09:28:13 +01:00
Zach Mueller	8ec028aded	Reduce the error log when using core models that need their weights renamed, and provide a step forward (#32656 ) * Fin * Modify msg * Finish up nits	2024-08-16 13:05:57 -04:00
Zach Mueller	0b066bed14	Revert PR 32299, flag users when Zero-3 was missed (#32851 ) Revert PR 32299	2024-08-16 12:35:41 -04:00
Joao Gante	cf32ee1753	Cache: use `batch_size` instead of `max_batch_size` (#32657 ) * more precise name * better docstrings * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-16 11:48:45 +01:00
Fanli Lin	8f9fa3b081	[tests] make test_sdpa_equivalence device-agnostic (#32520 ) * fix on xpu * [run_all]	2024-08-16 11:34:13 +01:00
Joao Gante	70d5df6107	Generate: unify `LogitsWarper` and `LogitsProcessor` (#32626 )	2024-08-16 11:20:41 +01:00
jp	e840127370	reopen: llava-next fails to consider padding_side during Training (#32679 ) restore #32386	2024-08-15 11:44:19 +01:00
Yih-Dar	20a04497a8	Fix `JetMoeIntegrationTest` (#32332 ) JetMoeIntegrationTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-08-14 16:22:06 +02:00
Jerry Zhang	78d78cdf8a	Add TorchAOHfQuantizer (#32306 ) * Add TorchAOHfQuantizer Summary: Enable loading torchao quantized model in huggingface. Test Plan: local test Reviewers: Subscribers: Tasks: Tags: * Fix a few issues * style * Added tests and addressed some comments about dtype conversion * fix torch_dtype warning message * fix tests * style * TorchAOConfig -> TorchAoConfig * enable offload + fix memory with multi-gpu * update torchao version requirement to 0.4.0 * better comments * add torch.compile to torchao README, add perf number link --------- Co-authored-by: Marc Sun <marc@huggingface.co>	2024-08-14 16:14:24 +02:00
Sai-Suraj-27	df323476a3	fix: Fixed failing tests in `tests/utils/test_add_new_model_like.py` (#32678 ) * Fixed failing tests in tests/utils/test_add_new_model_like.py * Fixed formatting using ruff. * Small nit.	2024-08-14 12:06:17 +01:00
Pablo Montalvo	c1357834e8	Fix tests recurrent (#32651 ) * add fix for recurrentgemma * [no-filter] * trigger-ci * [no-filter] * [no-filter] * attempt to fix mysterious zip error * [no-filter] * fix lookup error * [no-filter] * remove summarization hack * [no-filter]	2024-08-13 23:40:50 +02:00
Yoni Gozlan	5bcbdff159	Modify ProcessorTesterMixin for better generalization (#32637 ) * Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs * remove crop_size argument in align processor tests to be coherent with base tests * Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino	2024-08-13 11:48:53 -04:00
Sai-Suraj-27	c3cd9d807e	Fix: Fixed directory path for utils folder in `test_tokenization_utils.py` (#32601 ) * Removed un-necessary expressions. * Fixed directory path for utils folder in test_tokenization_utils.py	2024-08-13 16:48:15 +01:00
Bertrand Thia	cc25757a44	Add Depth Anything V2 Metric models (#32126 ) * add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * rename depth_estimation to depth_estimation_type * add integration test * Refactored tests to include metric depth model inference test * Integration test pass when the timm backbone lines are commented (L220-L227) * address feedback * replace model path to use organization path * formatting * delete deprecated TODO * address feedback * [run_slow] depth_anything	2024-08-13 16:16:30 +02:00
Eric Hartford	481e15604a	Add support for GrokAdamW optimizer (#32521 ) * add grokadamw * reformat * code review feedback, unit test * reformat * reformat	2024-08-13 13:20:28 +01:00
Fanli Lin	b5016d5de7	fix tensors on different devices in `WhisperGenerationMixin` (#32316 ) * fix * enable on xpu * no manual remove * move to device * remove to * add move to	2024-08-13 11:29:57 +01:00
Pablo Montalvo	a5a8291ad1	Fix tests (#32649 ) * skip failing tests * [no-filter] * [no-filter] * fix wording catch in FA2 test * [no-filter] * trigger normal CI without filtering	2024-08-13 09:46:21 +01:00
Lysandre Debut	29c3a0fa01	Automatically add `transformers` tag to the modelcard (#32623 ) * Automatically add `transformers` tag to the modelcard * Specify library_name and test	2024-08-13 07:59:01 +02:00
Raushan Turganbay	a29eabd0eb	Expand inputs in processors for VLMs (#30962 ) * let it be * draft * should not have changed * add warnings * fix & add tests * fix tests * ipnuts embeds cannot be passed with pixels * more updates * paligemma ready! * minor typos * update blip-2 * fix tests & raise error * docstring * add blip2 test * tmp * add image seq length to config * update docstring * delete * fix tests * fix blip * fix paligemma * out-of-place scatter * add llava-next-video * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * remove tmp * codestyle * nits * more nits * remove overriding in tests * comprehension when merging video * fix-copies * revert changes for embeds test * fix tests after making comprehension * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * more updates * fix tests --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2024-08-13 10:14:39 +05:00
Quentin Gallouédec	f1c8542ff7	"to be not" -> "not to be" (#32636 ) * "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py	2024-08-12 20:20:17 +01:00
Sai-Suraj-27	ce4b28830a	fix: Fixed failing `test_find_base_model_checkpoint` (#32638 ) Fixed failing test_find_base_model_checkpoint.	2024-08-12 19:51:30 +01:00
Raushan Turganbay	8f2b6d5e3d	Fix: FA2 with packed training (#32487 ) * fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment	2024-08-12 13:40:07 +05:00
Younes Belkada	7c11491208	Add new model (#32615 ) * v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-12 08:22:47 +02:00
Arthur	e4522fe399	fix slow integration gemma2 test (#32534 ) no empty revision	2024-08-09 11:28:22 +02:00
Guang Yang	0164560353	Fixed test `test_static_cache_exportability` with torch 2.4.0 (#32516 ) Workaround the export issue in torch 2.4 Co-authored-by: Guang Yang <guangyang@fb.com>	2024-08-08 18:13:40 +01:00
Pablo Montalvo	044281605f	Fix generate with `inputs_embeds` as input (#32493 ) * I think inputs_embeds has ndim == 3 * fix sequence length catch * add generate test * [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama * skip whisper * fix bart test * more fixes	2024-08-08 18:44:53 +02:00
Yunfei Chu	16ed0640be	Add Qwen2-Audio (#32137 ) * add qwen2audio * Update check_repo.py * fix style * fix test * fix style * add model size * Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * switch the attention_mask and the feature_attention_mask * add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py * fix initialization * update chat_template * fix consistency issue after copy * add docstrings to _merge_input_ids_with_audio_features * add copied from to prepare_inputs_for_generation * add more details to docs * rm comment * add init_std * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * update * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update tests * rm ignore_index * update processor * rm ffmpeg_read * Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * typo * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * fix quality * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * add official model --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-08 15:47:24 +02:00
Sangbum Daniel Choi	d3b3551750	Uniformize kwargs for processors - GroundingDINO (#31964 ) * fix typo * uniform kwargs * make style * add comments * remove return_tensors * remove common_kwargs from processor since it propagates * make style * return_token_type_ids to True * revert the default imagekwargs since does not accept any value in the image processro * revert processing_utils.py * make style * add molbap's commit * fix typo * fix common processor * remain * Revert "add molbap's commit" This reverts commit `a476c6ee88`. * add unsync PR * revert * make CI happy * nit * import annotationformat	2024-08-08 14:03:08 +01:00
Aymeric Roucher	e0d82534cc	Agents use grammar (#31735 ) * Allow optional use of grammars to constrain generation	2024-08-07 11:42:52 +02:00
Raushan Turganbay	a30c865f99	Cache: new Cache format in decoder-only models (#31421 ) * draft bart with new cache * add cache for decoder-only models * revert utils * modify docstring * revert bart * minor fixes * fix copies (not related) * revert tests * remove enc-dec related code * remove bloom * remove opt (enc-dec) * update docstring * git, codegen, gpt_neo, gpt_neox, gpj * clean up * copied from statements * revert * tmp * update warning msg * forgot git * add more flags * run-slow git,codegen,gpt_neo,gpt_neox,gpj * add cache flag to VLMs * remove files * style * video LLMs also need a flag * style * llava will go in another PR * style * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copy from * deprecate until v4.45 and warn if not training * nit * fix test * test static cache * add more tests and fix models * fix copies * return sliding window mask * run slow tests & fix + codestyle * one more falcon fix for alibi --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-07 10:02:16 +05:00
Pablo Montalvo	80b90e7b2f	Add codestral mamba2 (#32080 ) * add new model like * draft cuda forward - mismatched keys (sharding on conv1) * match keys successfully * fix split * get generation/forward running (wrong gens, norm?) * :update * some refactoring * fixes * works up until copy to cache * fix * update * NON WORKING VERSION * version that work? * nit * fix config * fix conversion script * working cuda forward * nit * update * simplifcation * make mamba slow simple work * no einops * todo * fix style * no einops * update fix no einsum * nit * remove einops * bug: scan_output differs strongly * add rms norm option * fix fast + slow generation with and w/o cache ✔️ * draft integration tests * remove a big chunk of the einsum * fix slow, fast generations, without any einsum * fix copies * fix structure * fix up modeling and tests * fix tests * clamping is indeed worse * recover mamba2 cache test * fix copies * no cache position (yet) * fix tf tests * fix matmul for generate * fixup * skip cache tests for now * [run-slow]mamba2 * tune out hidden states for padding * test batched generation * propagate attention mask changes * fix past length * fix integration test * style * address comments * update readme * add mamba2 version check * fix tests * [run-slow]mamba2 * skip edge tests * [run-slow]mamba2 * last fixup * [run-slow]mamba2 * update README --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2024-08-06 16:39:52 +02:00
Ao Tang	6a03942db7	Add Nemotron HF Support (#31699 ) * Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree	2024-08-06 15:42:05 +02:00
Francisco Kurucz	438d06c95a	Fix get large model config for Switch Transformer encoder only tester (#32438 )	2024-08-06 11:48:32 +01:00
Pavel Iakubovskii	fb66ef8147	Update kwargs validation for `preprocess` with decorator (#32024 ) * BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests	2024-08-06 11:33:05 +01:00
Fanli Lin	e85d86398a	add the missing flash attention test marker (#32419 ) * add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more	2024-08-06 11:18:58 +01:00
amyeroberts	7e5d46ded4	Respect the config's attn_implementation if set (#32383 ) * Respect the config's attn if set * Update test - can override in from_config * Fix	2024-08-05 16:33:19 +01:00
Sai-Suraj-27	458b0cd2c5	fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413 ) Fixed tokenizertests for luke, mluke models.	2024-08-05 15:19:42 +01:00
Abdi	baf7e5c927	Persist embedding type of BART and mBART models after resize (#32242 ) * fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize	2024-08-05 14:15:36 +01:00
Ita Zaporozhets	3d7c2f9dea	#32184 save total_vocab_size (#32240 ) * save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer)	2024-08-05 09:22:48 +02:00
Raushan Turganbay	3bb646a54f	Phi3 tests: fix typing for Python 3.8 (#32388 ) fix phi	2024-08-05 11:58:42 +05:00
TechInterMezzo	05ae3a300d	fix: SeamlessM4TFeatureExtractor stride remainder (#32088 ) * fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction	2024-08-05 08:40:58 +02:00
Joao Gante	083e13b7c4	RoPE: Add numerical tests ✨ (#32380 ) tests! :D	2024-08-02 09:39:45 +01:00
Zach Mueller	82efc53513	Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299 ) * Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-01 15:18:43 -04:00
OsamaS99	51ab25e293	Fixed Hybrid Cache Shape Initialization. (#32163 ) * fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by: Aaron Haag <aaron.haag@siemens.com>	2024-08-01 13:57:42 +01:00
Nikos Karampatziakis	ca59d6f77c	Offloaded KV Cache (#31325 ) * Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams	2024-08-01 14:42:07 +02:00
Omar Salman	b4727a1216	Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233 ) * Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion	2024-08-01 14:32:13 +02:00
Ita Zaporozhets	2229ebe722	update clean_up_tokenization_spaces warning (#32371 )	2024-08-01 13:57:41 +02:00
Lunwen He	48ed24c50a	Remove size check between attn_weights and kv_seq_len for phi3 (#32339 ) * Remove size check between attn_weights and kv_seq_len * add unit tests	2024-08-01 13:49:00 +02:00
Sanchit Gandhi	e234061cdd	[whisper] compile compatibility with long-form decoding (#31772 ) * [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace	2024-08-01 18:10:56 +08:00
fxmarty	92abe60334	>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227 ) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-01 02:03:07 +08:00
amyeroberts	5f1fcc299c	[Idefics2] - Fix FA2 call for Perceiver layer (#32275 ) * Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2	2024-07-31 14:51:04 +01:00
Joao Gante	b75ad56620	Llama 3.1: Fix incorrect `inv_freq` assignment (#32330 ) fix 💩	2024-07-31 11:12:46 +01:00
Raushan Turganbay	7f552e28e0	Gemma2 and flash-attention (#32188 ) * enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore	2024-07-31 10:33:38 +05:00
Joshua Lochner	6e2d04e429	Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 ) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI	2024-07-30 23:36:38 +02:00
Guang Yang	811a9caa21	Make static cache compatible with torch.export (#32168 )	2024-07-29 18:19:15 +01:00
Sanchit Gandhi	7f5d644e69	[pipeline] fix padding for 1-d tensors (#31776 ) * [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>	2024-07-29 21:24:42 +08:00
Kamil Akesbi	3fbaaaa64d	Whisper tokenizer word level timestamps (#32197 ) * fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test	2024-07-29 11:19:52 +01:00
Joao Gante	7ffe25f2b9	Generate: end-to-end compilation (#30788 ) * mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test 😈 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation	2024-07-29 10:52:13 +01:00
Raushan Turganbay	f739687684	🚨 Bloom support for cache class (#31445 ) * bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-29 10:58:59 +05:00
Raushan Turganbay	81233c069c	Flash-Attn: fix generation when no attention mask or no pading (#32241 ) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2	2024-07-26 14:45:55 +05:00
Fanli Lin	27c7f971c0	[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039 ) * add flash attention check * fix * fix	2024-07-26 11:41:27 +02:00
Sai-Suraj-27	b8e5cd5396	Refactor: Removed un-necessary `object` base class (#32230 ) * Refactored to remove un-necessary object base class. * small fix.	2024-07-26 10:33:02 +02:00
Raushan Turganbay	fad15fba78	Llava: generate without images (#32183 ) * llava w/o images * tests	2024-07-26 10:17:27 +05:00
Raushan Turganbay	4ab33c2d81	Generation: stop at `eos` for assisted decoding (#31301 ) * fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 10:16:06 +05:00
Yih-Dar	df6eee9201	Follow up for #31973 (#32025 ) * fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-25 16:12:23 +02:00
Kashif Rasul	de2318894e	[warnings] fix E721 warnings (#32223 ) fix E721 warnings	2024-07-25 15:12:23 +02:00
Sanchit Gandhi	5658e749ad	[whisper] fix short-form output type (#32178 ) * [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test	2024-07-25 16:58:02 +08:00
Sai-Suraj-27	85a1269e19	fix: Replaced deprecated `unittest method` with the correct one (#32198 ) Replaced deprecated unittest method with the correct one.	2024-07-24 18:00:21 +01:00
Matt	edd68f4ed8	🚨 No more default chat templates (#31733 ) * No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again	2024-07-24 17:36:32 +01:00
Penut Chen	1c122a46dc	Support dequantizing GGUF FP16 format (#31783 ) * support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16	2024-07-24 17:59:59 +02:00
Joao Gante	e0182f3bd7	RoPE: relaxed rope validation (#32182 ) * relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist	2024-07-24 15:00:48 +01:00
amyeroberts	165116bc14	Remove conversational pipeline tests (#32099 ) Remove conversation pipeline tests	2024-07-24 14:03:40 +01:00
Sai-Suraj-27	d2c687b3f1	Updated `ruff` to the latest version (#31926 ) * Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff	2024-07-23 17:07:31 +02:00
RhuiDih	9cf4f2aa9a	Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629 ) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py	2024-07-23 15:56:41 +02:00
Sanchit Gandhi	3263b34354	Revert "Incorrect Whisper long-form decoding timestamps " (#32148 ) Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit `cd48553fc8`.	2024-07-23 18:34:30 +08:00
Amit Garg	034b477847	Rename Phi-3 rope scaling type (#31436 ) * renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format	2024-07-23 12:33:22 +02:00
Merve Noyan	9ced33ca7f	Fix video batching to videollava (#32139 ) --------- Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>	2024-07-23 13:23:23 +03:00
Ita Zaporozhets	a1844a3209	gguf conversion add_prefix_space=None for llama3 (#31937 ) * gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test * typo * clean test	2024-07-23 11:45:54 +02:00
Joao Gante	2e113422b3	Llama: RoPE refactor (#32135 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-07-23 10:42:55 +01:00
bayllama	5a4a76edb7	Modify resize_token_embeddings to ensure output type is same as input (#31979 ) * Change resize_token_embeddings to make it return same Class that is passed to it * Add explanatory comment as requested in review * Add explanatory comments for add resizing function in lxmert * Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining --------- Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net> Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>	2024-07-23 10:28:44 +01:00
mig-mfreitas	34b43211d7	Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910 ) * Add YaRN and Dynamic-YaRN RoPE Scaling Methods YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071. Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> * Refactor YaRN implementation for LLaMA Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt> * Refactor Tensor Building Logic for YaRN - Comply with the the tensor building logic introduced in #30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> * remove unwanted file --------- Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-07-23 10:07:58 +01:00
Anton Vlasjuk	605f3245dc	Fix mask creations of `GPTNeoX` and `GPT2` (#31944 ) * fix mask creation of gpt2 and gpt_neox caused by me * forgot the reshape of masks when shape > 2 * add tests for gpt neox and gpt2 * nit on a comment	2024-07-23 10:11:12 +02:00
Sanchit Gandhi	f83c6f1d02	Remove `trust_remote_code` when loading Libri Dummy (#31748 ) * [whisper integration] use parquet dataset for testing * propagate to others * more propagation * last one	2024-07-23 14:54:38 +08:00
Raushan Turganbay	3aefb4ec7f	LLaVaNeXT: pad on right if training (#32134 ) * pad on right if training * docs * add tests	2024-07-23 10:23:55 +05:00
Marc Sun	96a074fa7e	Add new quant method (#32047 ) * Add new quant method * update * fix multi-device * add test * add offload * style * style * add simple example * initial doc * docstring * style again * works ? * better docs * switch to non persistant * remove print * fix init * code review	2024-07-22 20:21:59 +02:00
amyeroberts	817a676bd7	Don't default to other weights file when use_safetensors=True (#31874 ) * Don't default to other weights file when use_safetensors=True * Add tests * Update tests/utils/test_modeling_utils.py * Add clarifying comments to tests * Update tests/utils/test_modeling_utils.py * Update tests/utils/test_modeling_utils.py	2024-07-22 18:29:50 +01:00
Yoni Gottesman	74d0eb3fed	Return assistant generated tokens mask in apply_chat_template (#30650 ) return assistant generated tokens mask in apply_chat_template	2024-07-22 18:24:43 +01:00
Sai-Suraj-27	12b6880c81	fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111 ) * Raised TypeError instead of ValueError for invalid types. * Updated formatting using ruff. * Retrieved few changes. * Retrieved few changes. * Updated tests accordingly.	2024-07-22 17:46:17 +01:00
Matt	7ba028fccb	Fix failing test with race condition (#32140 ) * Fix failing test with race condition * make fixup * monotonic_ns instead of randint * uuid4 instead of monotonic_ns * Add a finally cleanup step	2024-07-22 16:07:29 +01:00
Lucain	f2a1e3ca68	Mention model_info.id instead of model_info.modelId (#32106 )	2024-07-22 14:14:47 +01:00
Sai-Suraj-27	0fcfc5ccc9	fix: Replaced deprecated `mktemp()` function (#32123 ) Replaced deprecated mktemp function.	2024-07-22 14:13:39 +01:00
Joao Gante	c38c55f4fb	Generate: store special token tensors under a unique variable name (#31980 ) * rename stuff * english; this one shouldn't be changed * add a _ to the new var names * musicgen * derp	2024-07-22 14:06:49 +01:00
Aymeric Roucher	b381880597	Agents planning (#31702 ) * Allow planning for agents	2024-07-22 10:49:57 +02:00
Lucain	0fdea8607d	Fix tests after `huggingface_hub` 0.24 (#32054 ) * adapt tests * style * comment	2024-07-19 19:32:39 +01:00
Kamil Akesbi	89575b567e	Support generating with fallback for short form audio in Whisper (#30984 ) * remove is_shortform * adapt _retrieve_max_frames_and_seek for short_form * return bos token in short and long form * add decoder_input_ids to short form audios * add eos token for short form * handle short form token_timestamps * no need to return scores * add is_shortform conditions * handle when max_new_tokens is None - short form * handle assistant decoding * fix * handle return_dict_in_generate * handle split_by_batch for encoder_attentions attribute * handle num_beams>1 * handle num_return_sequences>1 in generate_with_fallback * handle num_return_sequences>1 with return_dict_in_generate=True * raise error if max_new_tokens + decoder_inputs_ids > max_target_pos * fix * apply review suggestions * fix * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fix * logits for both short form and long form * handle if logits_processor is None * test * apply review changes to num_return_sequences * add _expand_variables_for_generation * remove short form commented section * update comments * uncomment num_beams line in generate_with_fallback * update assistant decoding * handle return_segment with short form generation * up * fix output format is_shortform * overwrite beam_sample test * update _set_return_timestamps * apply review suggestions * apply review suggestions * remove seek_outputs_short_form * fix _stack_split_outputs * fix stack dim in _stack_split_outputs * update tests * fix past_key_values + beam tests * fix * clean _expand_variables_for_generation * make style * fix slow tests * make style * max_length condition * make style * add slow tests for shortform fallback * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review changes * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * fix slow tests * apply review suggestions * update test * make style * small fix * fix * fix test_new_cache_format * fix past_key_values * fix * make style * fix slow tests * fix --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-07-19 13:42:22 +01:00
Kamil Akesbi	cd48553fc8	Incorrect Whisper long-form decoding timestamps (#32003 ) * fix lo form timestamps in decode_batch * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * add test * make style * fix copies * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/whisper/processing_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply review suggestions * fix * fix copies * fix * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix-copies --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-19 09:26:38 +01:00
Raushan Turganbay	b873234cb6	Llava: add default chat templates (#31691 ) * add default chat templates * Update src/transformers/models/llava/processing_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * more clear docstring and docs * Update docs/source/en/model_doc/llava.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * add tests * remove default templates (see #31733) * load chat template from another file * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * revert some changes in docs * forgot vipllava * chat template file is not temporary hack * warn if loading from processor * not that file * similarly modify `save_pretrained` * Update tests/models/llava_next/test_processor_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vipllava/test_processor_vipllava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2024-07-19 10:08:56 +05:00
Longjie Zheng	c75969ee28	Add torch.compile Support For Mamba (#31247 ) * modify mamba cache * set up cache * add test * [run-slow] mamba * [run-slow] mamba * address comments * [run-slow] mamba * use_cache_position * [run-slow] mamba * [run-slow] mamba * [run-slow] mamba * [run-slow] mamba * fix * cache in generate * [run-slow] mamba * address comments * [run-slow] mamba * [run-slow] mamba * address comments * [run-slow] mamba * fix * [run-slow] mamba * fix * [run-slow] mamba * fix cache name * [run-slow] mamba	2024-07-18 11:54:54 -04:00
Raushan Turganbay	673d30b826	Chameleon: minor fixes after shipping (#32037 ) * fix merging * make chameleon conditional	2024-07-18 16:54:07 +05:00
Pavel Iakubovskii	1c37e8c1a6	Add `sdpa` and FA2 for CLIP (#31940 ) * Squashed commit of the following: commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri Jul 12 18:23:52 2024 +0000 Add model-specific sdpa tests commit 60e4c88581abf89ec098da84ed8e92aa904c997d Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri Jul 12 18:20:53 2024 +0000 Add fallback to eager (expensive operation) commit c29033d30e7ffde4327e8a15cbbc6bee37546f80 Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Thu Jul 11 17:09:55 2024 +0000 Fix attn_implementation propagation commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:05:27 2024 +0530 style commit e77e703ca75d00447cda277eca6b886cd32bddc0 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:04:57 2024 +0530 add comment to explain why I had to touch forbidden codebase. commit ab9d8849758e7773a31778ccba71588d18552623 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:03:02 2024 +0530 fix: flax attribute access. commit c570fc0abf9d1bd58c291aae3c7e384f995996d2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 08:23:54 2024 +0530 fix tensorflow attribute name. commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 07:57:10 2024 +0530 fix attribute access. commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 07:44:02 2024 +0530 _from_config. commit 35aed64ff602422adcf41d7f677a0a24bd9eccae Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 18:46:52 2024 +0530 propagation of attn_implementation. commit 4c25c19845438b1dc1d35a5adf9436151c8c5940 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:24:36 2024 +0530 style again commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:19:05 2024 +0530 use from_config. commit b70c409956d0359fa6ae5372275d2a20ba7e3389 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:13:43 2024 +0530 quality commit a7b63beff53d0fc754c6564e2a7b51731ddee49d Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 14:35:10 2024 +0200 add benchmark numbers commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:50:16 2024 +0200 Revert "reflect feedback more" This reverts commit `dc123e71ef`. commit ca674829d28787349c2a9593a14e0f1d41f04ea4 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:50:05 2024 +0200 Revert "fix" This reverts commit `37a1cb35b8`. commit fab2dd8576c099eb1a3464958cb206a664d28247 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:47:46 2024 +0200 fix commit fbc6ae50fd6f2d36294d31e191761631b701d696 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:38:30 2024 +0200 reflect feedback more commit 87245bb020b2d60a89afe318a951df0159404fc9 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 08:54:34 2024 +0530 fixes commit 1057cc26390ee839251e7f8b3326c4207595fb23 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:49:03 2024 +0530 don't explicit set attn_implementation in tests commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:43:54 2024 +0530 explicitly override attn_implementation in the towers. commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:38:42 2024 +0530 import in one-line. commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:34:58 2024 +0530 move sdpa mention to usage tips. commit 92884766c64dbb456926a3a84dd427be1349fa95 Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 29 10:58:26 2024 +0530 fix: memory allocation problem. commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 29 09:56:59 2024 +0530 fix-copies commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri Apr 26 20:16:12 2024 +0530 address arthur's comments. commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9 Author: Sayak Paul <spsayakpaul@gmail.com> Date: Fri Apr 26 20:08:15 2024 +0530 Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> commit 46e04361f37ded5c522ff05e9f725b9f82dce40e Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:55:27 2024 +0530 add to docs. commit 831629158ad40d34d8983f209afb2740ba041af2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:33:10 2024 +0530 styling.g commit d263a119c77314250f4b4c8469caf42559197f22 Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:15:20 2024 +0530 up commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 18:40:42 2024 +0530 handle causal and attention mask commit 122f1d60153df6666b634a94e38d073f3f260926 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 15:18:21 2024 +0530 test fixes. commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 09:39:25 2024 +0530 fix: scaling inside sdpa. commit 0f629989efc48b7315cf19405a81e02955efe7e5 Author: Sayak Paul <spsayakpaul@gmail.com> Date: Tue Apr 23 08:14:58 2024 +0530 Update src/transformers/models/clip/modeling_clip.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> commit 14367316877dc27ea40f767ad1aee38bbc97e4ce Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 22 16:21:36 2024 +0530 add: sdpa support to clip. * Remove fallback for empty attention mask (expensive operation) * Fix typing in copies * Add flash attention * Add flash attention tests * List CLIP in FA docs * Fix embeddings attributes and tf * [run-slow] clip * Update clip documentation * Remove commented code, skip compile dynamic for CLIPModel * Fix doc * Fix doc 2 * Remove double transpose * Add torch version check for contiguous() * Add comment to test mixin * Fix copies * Add comment for mask * Update docs * [run-slow] clip	2024-07-18 10:30:37 +05:30
Robin Bakker	b31d595040	Add language to word timestamps for Whisper (#31572 ) * add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results	2024-07-17 21:32:53 +01:00
Sai-Suraj-27	72fb02c47d	Fixed `log messages` that are resulting in TypeError due to too many arguments (#32017 ) * Fixed log messages that are resulting in TypeErrors due to too many arguments. * Removed un-necessary imports.	2024-07-17 10:56:44 +01:00
Pavel Iakubovskii	691586b0dc	Fix tests skip (#32012 ) * [run-slow] clip * [run-slow] clip * Fix skip -> skipTest * [run-slow] clip	2024-07-17 08:37:43 +01:00
Raushan Turganbay	24cfcc2114	Chameleon: add model (#31534 ) * Chameleon model integration Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com> * fix 7B, again. mask away image tokens * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove pretrained_config_map * make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file * remove tokenizer (use llama's); remove codechameleon tests * a few copied from statements and minor changes * copied from in ChameleonModel * some copies in ChameleonForCausalLM * a few more copies * VQModel moved to ChameleonModel (as opposed to being in the processor) * ChameleonProcessor ready * Fix chameleon weights convert * update conversion script * clean-up processing * update modeling a bit * update * update (throws error...) * correct conversion ready * fix tests * fix docs * docs * ve swin norm * fix device for vocab map * add normalization * update * update script with rope rotations * final fix on model conversion * add slow tests * more info in docs * fix repo consistency tests * fix repo tests * fix-copies * hope this will make CI happy * fix for 30b model * Update docs/source/en/index.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments * remove assertion in conversion script * add image processor test * not copied * port changes for qk layernorm * fix-copies * read token decorator for tests * [run-slow] chameleon * one more read-token * address some comments * qk norm changes * tests and repo check * moved rope permutations to conversion, YAY! * fix past kv check * docs * layernorm done! * let's be consistent in naming * fix slow tests * weird thing with slow CI, but let's see * once more try * remove past-kv as tuple following llama * ignore * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: jacobkahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com> Co-authored-by: Leonid Shamis <lshamis@meta.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-17 10:41:43 +05:00
Joao Gante	999981daf4	Tests: remove cuda versions when the result is the same 🧹🧹 (#31955 ) remove cuda versions when the result is the same	2024-07-16 16:49:54 +01:00
Zach Mueller	693cb828ff	Fix bad test about slower init (#32002 ) Bronked main	2024-07-16 10:33:05 -04:00
Fanli Lin	25e5e3fa56	[tests] fix deepspeed zero3 config for `test_stage3_nvme_offload` (#31881 ) fix config	2024-07-16 16:11:37 +02:00
Zach Mueller	e0dfd7bcaf	Speedup model init on CPU (by 10x+ for llama-3-8B as one example) (#31771 ) * 1,100%! * Clean * Don't touch DS * Experiment with dtype allocation * skip test_load_save_without_tied_weights test * A little faster * Include proper upscaling? * Fixup tests * Potentially skip? * Let's see if this fixes git history * Maintain new dtype * Fin * Rm hook idea for now * New approach, see what breaks * stage * Clean * Stash * Should be fin now, just need to mark failing models * Clean up * Simplify * Deal with weird models * Enc/Dec * Skip w/ reason * Adjust test * Fix test * one more test * Keep experimenting * Fix ref * TO REMOVE: testing feedback CI * Right push * Update tests/utils/test_modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * disable * Add new func * Test nits from Amy * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Adjust comment * Adjust comment on skip * make private * Fin * Should be a not flag * Clarify and rename test --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-16 09:32:01 -04:00
Penut Chen	ac946aac25	Fix the incorrect permutation of gguf (#31788 ) * Fix the incorrect permutation of gguf * rename num_kv_heads Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * add typing to num_kv_heads Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * rename variables * refactor permute function name * update the expected text of the llama3 q4 test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-07-16 08:20:34 +02:00
Joao Gante	e4682de635	Masking: remove flakiness from test (#31939 )	2024-07-15 18:49:37 +01:00
Yih-Dar	a1a34657d4	Avoid race condition (#31973 ) * [test_all] hub * remove delete * remove delete * remove delete * remove delete * remove delete * remove delete * [test_all] * [test_all] * [test_all] * [test_all] * [test_all] * [test_all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-15 17:56:24 +02:00
Joao Gante	739a63166d	Generate: remove deprecated code due to `Cache` and `cache_position` being default (#31898 ) * tmp commit * shorter * nit * explicit kwargs * propagate changes * mass propagation with a few manual touches (let's see how CI behaves) * fix cacheless case * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-07-14 15:16:58 +01:00
Aviv Shamsian	7f79a97399	fix prompt strip to support tensors and np arrays (#27818 ) * fix prompt strip to support tensors and np arrays * framework agnostic * change logic check before converting prompt into list Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adding _convert_to_list to tokenization_whisper_fast * adding tests for prompt decoding * adding comment Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adding comment Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * revert minor * make style formatting * style formatting after update * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fixing _strip_prompt to handle _decode_with_timestamps * fix copies --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-07-12 20:07:10 +01:00
Naman Garg	c1e139c2b0	Adding hiera (#30356 ) * initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * Removed tim dependency * added HieraBlock * fixed: Model name * added tests for HieraModel, HieraBlock * fixed imports * fixed quality & copies * Fixes * Update docs/source/en/model_doc/hiera.md Fix name Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fixed formatting * Code quality & Import differences * quality and repo-consistency fix * fixed no torch error * Docstring fix * Docstring fix * doc string fix * fixed example usage * Resolved issues in modeling_hiera * Removed Hiera MAE * Added test and resolved bug * fixed doc string * First commit * Finished conversion script and model forward working * Resolved all issues * nits * Improving tests * Nits * More nits * Improving HieraForMaskedImageModeling * More improvements and nits * Fixed docstrings of outputs * More fixes * More imrpovments * Updated conversion script * Fixed docstrings * Improved tests * Fixed attentou outputs test * All tests green * Removed unnecessary file * contribution attribution * Resolved a few issues * Resolved Comments * Updated model repo id and fixed bugs * Removed loss print * Make tests green * Updated docstrings * Fix style * Fixed num_heads in config * Removed unnecessary video checkpoint related code in the conversion script * Fix style * Changed atol in conversion script * HieraConfig * Fix copies * Fixed typo * Resolved few issues * make * converted conv_nd -> nn.Module * Removed video complexities * Removed video complexities * fix style * Addressing comments * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix style * Fixed tests * Fixed typo * Fixed interpolate test * Made torch fx compatible * Made sure imageprocesor is correct * Addressed comments * Noise directly as torch * Remove unnecesary attr * Added return_dit * Update src/transformers/models/hiera/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated checkpoints * [run_slow] hiera * Fixed device mismatch * [run_slow] hiera * Fixed GPU tests * [run_slow] hiera --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com> Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-11 22:13:56 +01:00
fxmarty	ad4ef3a290	Fix fx tests with inputs_embeds (#31862 ) * fix tests * [test_all] check * address review comments	2024-07-11 20:14:03 +08:00
Omar Salman	1499a55008	Add warning message for beta and gamma parameters (#31654 ) * Add warning message for and parameters * Fix when the warning is raised * Formatting changes * Improve testing and remove duplicated warning from _fix_key	2024-07-11 13:01:47 +01:00
Sai-Suraj-27	2e48b3e872	fix: Fixed the `1st argument` name in classmethods (#31907 ) Fixed the first argument name in few classmethods.	2024-07-11 12:11:50 +01:00
Yih-Dar	080e14b24c	Modify `warnings` in a `with` block to avoid flaky tests (#31893 ) * fix * [test_all] check before merge --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-10 17:56:12 +02:00
Sai-Suraj-27	da79b18087	fix: Removed `duplicate` field definitions in some classes (#31888 ) Removed duplicate field definitions in classes.	2024-07-10 13:46:31 +01:00
Yih-Dar	9d98706b3f	Fix failed tests in #31851 (#31879 ) * Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)" This reverts commit `b45dd5de9c`. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-10 14:25:24 +02:00
Yih-Dar	b45dd5de9c	Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868 ) Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851)" This reverts commit `4c8149d643`.	2024-07-09 23:00:56 +02:00
Yih-Dar	4c8149d643	Fix `_init_weights` for `ResNetPreTrainedModel` (#31851 ) * init * test --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-09 20:09:08 +02:00
Yung-Sung Chuang	d094d8d9ec	Generate: Add new decoding strategy "DoLa" in `.generate()` (#29619 ) Co-authored-by: Joao Gante <joao@huggingface.co>	2024-07-09 17:37:38 +01:00
Joao Gante	4c2538b863	Test loading generation config with safetensor weights (#31550 ) fix test	2024-07-09 16:22:43 +02:00
fxmarty	0abf5e8eae	FX symbolic_trace: do not test decoder_inputs_embeds (#31840 ) only test input_embeds, not decoder_input_embeds	2024-07-09 08:07:46 +02:00
Yih-Dar	4879ac2b33	Avoid failure `TFBlipModelTest::test_pipeline_image_to_text` (#31827 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-08 13:49:21 +02:00
fxmarty	ba743700f4	transformers.fx.symbolic_trace supports inputs_embeds (#31574 ) * symbolic trace supports inputs_embeds * fix test? * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-08 19:17:28 +08:00
Pavel Iakubovskii	a177821b24	Add FA2 and `sdpa` support for SigLIP (#31499 ) * Rebase to main * Fix attention implementation autoset for tex and vision configs * Fixup * Minor fixes * Fix copies * Fix attention_mask for FA2 * Add eqvivalence tests for siglip * Remove right padding test * Uncomment flaky * Fix import * Add to docs * Fix test message * Add sdpa * Add sdpa equivalence test * Add siglip sdpa to docs * Fix typing for attention output * Add sdpa tests * Fix signature of FA2 * Autoset attn_implementation in config * Rename bsz -> batch_size * Move back autoset attn method * Mark as flaky * Correct attention mask padding * [run-slow] siglip * Add FA2 and sdpa docs * Style fix * Remove flaky for FA2 test * Change attention implementation set * Change attn_implementaiton propogation * Fix typos * Add modality to assert message * Add more sdpa backends in test * [run slow] siglip * Add math sdpa backend for all options * [run slow] siglip	2024-07-08 11:10:02 +01:00
NielsRogge	06fd7972ac	Add ZoeDepth (#30136 ) * First draft * Add docs * Clean up code * Convert model * Add image processor * Convert Zoe_K * More improvements * Improve variable names and docstrings * Improve variable names * Improve variable names * Replace nn.sequential * More improvements * Convert ZoeD_NK * Fix most tests * Verify pixel values * Verify pixel values * Add squeeze * Update beit to support arbitrary window sizes * Improve image processor * Improve docstring * Improve beit * Improve model outputs * Add figure * Fix beit * Update checkpoint * Fix repo id * Add _keys_to_ignore_on_load_unexpected * More improvements * Address comments * Address comments * Address comments * Address comments * Rename variable name * Add backbone_hidden_size * Vectorize * Vectorize more * Address comments * Clarify docstring * Remove backbone_hidden_size * Fix image processor * Remove print statements * Remove print statement * Add integration test * Address comments * Address comments * Address comments * Address comments * Add requires_backends * Clean up * Simplify conversion script * Simplify more * Simplify more * Simplify more * Clean up * Make sure beit is loaded correctly * Address comment * Address bin_configurations * Use bin_configurations * Convert models, add integration tests * Fix doc test * Address comments * Unify regressor classes * Clarify arguments * Improve resize_image * Add num_relative_features * Address comment * [run-slow]beit,data2vec,zoedepth * [run-slow]beit,data2vec,zoedepth * Address comments * Address comment * Address comment * Replace nn.TransformerEncoderLayer and nn.TransformerEncoder * Replace nn.MultiheadAttention * Add attributes for patch transformer to config * Add tests for ensure_multiple_of * Update organization * Add tests * [run-slow] beit data2vec * Update ruff * [run-slow] beit data2vec * Add comment * Improve docstrings, add test * Fix interpolate_pos_encoding * Fix slow tests * Add docstring * Update src/transformers/models/zoedepth/image_processing_zoedepth.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/zoedepth/image_processing_zoedepth.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Improve tests and docstrings * Use run_common_tests * Improve docstrings * Improve docstrings * Improve tests * Improve tests * Remove print statements --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-08 11:43:33 +02:00
Anton Vlasjuk	a01b033cb4	Fix galore lr display with schedulers (#31710 ) * fix galore lr display with lr schedulers * style * add some tests to check for displayed lrs * copy-paste err for warmup steps * standardize the default lr to be only in the optimizer * trying out my luck with the reads	2024-07-05 18:59:09 +01:00
Billy Cao	ac26260436	Allow FP16 or other precision inference for Pipelines (#31342 ) * cast image features to model.dtype where needed to support FP16 or other precision in pipelines * Update src/transformers/pipelines/image_feature_extraction.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use .to instead * Add FP16 pipeline support for zeroshot audio classification * Remove unused torch imports * Add docs on FP16 pipeline * Remove unused import * Add FP16 tests to pipeline mixin * Add fp16 placeholder for mask_generation pipeline test * Add FP16 tests for all pipelines * Fix formatting * Remove torch_dtype arg from is_pipeline_test_to_skip* * Fix format * trigger ci --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-05 17:21:50 +01:00
Billy Cao	1d3eaa6f7e	Add training support for SigLIP (#31495 ) * Add siglip loss function * Update docs * Enable training tests [experimental] enable GC training tests as it has worked for my own data * Remove test_training* overrides to enable training tests [run_slow] siglip * Skip training tests for Siglip text model and ImageClassificationModel [run_slow] siglip * Skip GC training tests for SiglipForImageClassification * Explicitly skip training tests for SiglipVisionModel Add skip reason for training tests for SiglipTextModel * Remove copied from to fix CI	2024-07-05 14:50:39 +01:00
Aymeric Roucher	1556025271	Code agent: allow function persistence between steps (#31769 ) * Code agent: allow function persistence between steps	2024-07-05 11:09:11 +02:00
Yih-Dar	eef0507f3d	Fix gemma tests (#31794 ) * skip 3 7b tests * fix * fix * fix * [run-slow] gemma --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-05 10:17:59 +02:00
Marc Sun	8c5c180de0	Fix serialization for offloaded model (#31727 ) * Fix serialization * style * add test	2024-07-05 08:07:07 +02:00
Pavel Iakubovskii	048f599f35	Fix RT-DETR weights initialization (#31724 ) * Fix init for rt-detr heads * Fixup * Add separate prior_prob value to config for initialization * Add bbox init * Change to 1 / num_labels init * Adjust weights init test * Fix style for test	2024-07-03 14:29:02 +01:00
Pavel Iakubovskii	b97521614a	Fix RT-DETR cache for generate_anchors (#31671 ) * Fix cache and type conversion * Add test * Fixup * nit * [run slow] rt_detr * Fix test * Fixup * [run slow] rt_detr * Update src/transformers/models/rt_detr/modeling_rt_detr.py	2024-07-03 14:19:57 +01:00
Joao Gante	ddfaf11926	Gemma 2: Update slow tests (#31759 ) gemma 2 slow tests	2024-07-03 11:43:44 +02:00
jiqing-feng	7f91f168a1	fix assisted decoding (#31401 ) * fix assisted decoding * check None * fix typo * fix _prepare_special_tokens * fix style * fix lint * add tests for assisted decoding * fix style * fix tests check	2024-07-03 09:22:56 +01:00
Matt	cd0935dd55	Make tool JSON schemas consistent (#31756 ) Make the order of array items consistent using sorted()	2024-07-02 20:00:42 +01:00
Joao Gante	82486e5995	🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747 ) * rely on the tokenizer default kwargs * fix a few tests	2024-07-02 16:17:42 +02:00
Sanchit Gandhi	a9701953ff	[whisper] static kv cache (#31166 ) * make work with cache abstraction * correct for static cache * hacks for compile * make fast * fix * fix pos ids * generate * fix sdpa * fix sdpa cache pos * fix fa2 * clean fa2 * integrate cache into generate * make style * copies * more copies * update eager * update sdpa * update fa2 * simplify * use cache pos * always compute cross-cache for debug * avoid recompiles Co-authored-by: Arthur Zucker <arthur@huggingface.co> * fix fix * fix fix fix * more fix * try encoder-decoder cache (too messy) * revert encoder-decoder cache * check cross-attn cache * use enc-dec dataclass * use richer enc-dec dataclass * clean-up * revert static cache changes * small fixes * revert to cpu flag * fix copies * add static slow test * past k/v docstring * more docstrings * cache_position docstrings * add to docs * add enc-dec cache to docs * make style * fix after rebase * fix beam * style * fix generation strategies * fix most decoder-only tests * style * skip test * more clean up * small docstrings * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add todo * only crop self-attn * check cache in mixin * style * fix re-compile after rebase * move `is_updated` logic to enc-dec wrapper * revert back * revert cache back * finalise design * fix * fix fix * style * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * deprecate * updates * final updates * style * style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-07-02 13:24:15 +01:00
Yih-Dar	93cd94b79d	Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` (#31730 ) * move * move * move * move * Update tests/utils/test_image_processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-02 13:46:03 +02:00
Sangbum Daniel Choi	cb298978ad	add gather_use_object arguments (#31514 ) * add gather_use_object arguments * fix name and pass the CI test for Seq2SeqTrainer * make style * make it to functools * fix typo * add accelerate version: * adding warning * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * make style * Update src/transformers/training_args.py * check function move to initial part * add test for eval_use_gather_object --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-06-28 13:50:27 +01:00
Jacky Lee	82a1fc7256	Fix return_dict in encodec (#31646 ) * fix: use return_dict parameter * fix: type checks * fix: unused imports * update: one-line if else * remove: recursive check	2024-06-28 12:18:01 +01:00
Arthur	75a6319864	Fix post gemma merge (#31660 ) * nit * toctree issue * protect gemma2 tests as well * sdpa supported	2024-06-27 17:51:42 +02:00
Arthur	0cf60f13ab	Add gemma 2 (#31659 ) * inital commit * Add doc * protect? * fixup stuffs * update tests * fix build documentation * mmmmmmm config attributes * style * nit * uodate * nit * Fix docs * protect some stuff --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2024-06-27 17:36:19 +02:00
Sangbum Daniel Choi	be50a0338b	change anchor_image_size None for compatibility (#31640 ) * change anchor_image_size None for compatibility * make fix-copies	2024-06-27 12:36:55 +01:00
Billy Cao	3a028101e9	[QoL] Allow dtype str for torch_dtype arg of from_pretrained (#31590 ) * Allow dtype str for torch_dtype in from_pretrained * Update docstring * Add tests for str torch_dtype	2024-06-27 12:41:49 +02:00
amyeroberts	1de7dc7403	Skip tests properly (#31308 ) * Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]	2024-06-26 21:59:08 +01:00
Billy Cao	1f9f57ab4c	Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference (#31589 ) * Fix dtype casting in modeling_swin2sr to allow non-FP32 inference * Fix formattting * Fix for swinv2 too * Update src/transformers/models/swin2sr/modeling_swin2sr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/swinv2/modeling_swinv2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add FP16 tests for swin2sr and swinv2 * [run_slow] swin2sr, swinv2 * [run_slow] swin2sr, swinv2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-26 18:46:48 +01:00
Pablo Montalvo	492ee17ec3	Fix paligemma detection inference (#31587 ) * fix extended attention mask * add slow test for detection instance * [run-slow]paligemma	2024-06-26 19:17:09 +02:00
Raushan Turganbay	e71f2863d7	Add LLaVa NeXT Video (#31252 ) * squash into single commit * run diff once more * docstring * tests * minor chnages and ready to go * Update src/transformers/models/llava_next_video/processing_llava_next_video.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * [run-slow] llava-next-video * [run-slow] llava-next-video * [run-slow] llava_next_video * fix two tests * fix slow tests * remove logit checks due to numeric errors * run test once more * [run-slow] llava_next_video * final try to pass the test * [run-slow] llava_next_video * [run-slow] llava_next_video * [run-slow] llava_next_video * style * fix * style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-26 21:52:28 +05:00
Pavel Iakubovskii	b1ec745475	Fix RT-DETR inference with float16 and bfloat16 (#31639 ) * [run_slow] rt_detr * Fix positional embeddings and anchors dtypes * [run slow] rt_detr * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-26 17:50:10 +01:00
Younes Belkada	3f93fd0694	Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161 ) * fix llama fsdp * fixup * adding FSDP tests for CPU offloading * fixes * fix tests * fix tests * add it for mixtral * propagate the changes on other models * Update src/transformers/models/phi/modeling_phi.py * Delete utils/testing_scripts/fsdp_cpu_offloading.py Remove script - FSDP + CPU offloading it tested in the test suite * Delete utils/testing_scripts/dummy_fsdp_config.yml * Update + add cache_positions docstring --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-26 14:50:08 +01:00
Anton Vlasjuk	b07770c5eb	[`GPT-NeoX`] Add SDPA support (#31031 ) * starting support for sdpa in `gptneox` models * small comment on tests * fix dropout * documentation and style * clarify concrete paths for reference * generalise attn projections and rope application added head mask check to sdpa mask creation handle sdpa memory backend bug via own version flag * update docs and style * move dtype casting outside of general attn_projection_and_rope function fix flash_attn_2 stuff * more generic attn warning if output_attns or head_mask * simplify head mask check by moving head mask creation to a later point * remove copied llama artifact * remove padding_mask from attention function signature * removing unnecessary comments, only "save" attn implementation once * [run_slow] gpt_neox	2024-06-26 13:56:36 +01:00
amyeroberts	0f67ba1d74	Add ViTImageProcessorFast to tests (#31424 ) * Add ViTImageProcessor to tests * Correct data format * Review comments	2024-06-25 13:36:58 +01:00
Raushan Turganbay	fc689d75a0	Add video modality for InstrucBLIP (#30182 ) * squash in single commit * add docs * dummy obj * more changes in diff converter * tiny fix * make docs happy * skip test * repo consistency tests * update docstring * style * fix tests * change diff imports * [run-slow] instructblipvideo * [run-slow] instructblipvideo * fix tests and remove logit check * [run-slow] instructblipvideo	2024-06-25 15:45:39 +05:00
jiqing-feng	a958c4a801	fix output data type of image classification (#31444 ) * fix output data type of image classification * add tests for low-precision pipeline * add bf16 pipeline tests * fix bf16 tests * Update tests/pipelines/test_pipelines_image_classification.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix import * fix import torch * fix style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-25 11:14:39 +01:00
Raushan Turganbay	7e86cb6c6f	Siglip: add `_no_split_module` (#31566 ) * device-map siglip * move split modules to PretrainedSigLip	2024-06-25 09:49:55 +05:00
Hiroshi Matsuda	0e23e60a5a	Fix bug about add_special_tokens and so on (#31496 ) * fix bug about add_special_tokens and so on * improve add_special_tokens and padding behavior * add a test case for add_special_tokens and padding	2024-06-24 14:05:16 +01:00
Zhiyong Wang	dce253f645	Add implementation of `spectrogram_batch` (#27159 ) * Add initial implementation of `spectrogram_batch` * Format the initial implementation * Add test suite for the `spectrogram_batch` * Update `spectrogram_batch` to ensure compatibility with test suite * Update `spectrogram_batch` to include pre and post-processing * Add `amplitude_to_db_batch` function and associated tests * Add `power_to_db_batch` function and associated tests * Reimplement the test suite for `spectrogram_batch` * Fix errors in `spectrogram_batch` * Add the function annotation for `spectrogram_batch` * Address code quality * Re-add `test_chroma_equivalence` function * Update src/transformers/audio_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/audio_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-24 09:19:12 +02:00
Pavel Iakubovskii	3c2d4d60d7	Correct @is_flaky test decoration (#31480 ) * Correct @is_flaky decorator	2024-06-24 08:09:21 +01:00
Sangbum Daniel Choi	74a207404e	New model support RTDETR (#29077 ) * fill out docs string in configuration `75dcd3a0e8 (r1506391856)` * reduce the input image size for the tests * remove the unappropriate tests * only 5 failes exists * make style * fill up missed architecture for object detection in docs * fix auto modeling * simple fix in missing import * major change including backbone refactor and objectdetectionoutput refactor * minor fix only 4 fails left * intermediate fix * revert __init__.py * revert __init__.py * make style * fixes in pr_docs * intermediate fix * make style * two fixes * pass doctest * only one fix left * intermediate commit * all fixed * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/rt_detr/test_modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * function class above the model definition in dice_loss * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * simple fix * layernorm add config.layer_norm_eps * fix inputs_docstring * make style * simple fix * add custom coco loading test in image_processor * fix error in BaseModelOutput https://github.com/huggingface/transformers/pull/29077#discussion_r1516657790 * simple typo * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * intermediate fix * fix with load_backbone format * remove unused configuration * 3 fix test left * make style * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: Sounak Dey <dey.sounak@gmail.com> * change last_hidden_state to first index * all pass fix TO DO: minor update in comments * make fix-copies * remove deepcopy * pr_document fix * revert deepcopy due to the issue of unexpceted behavior in decoderlayer * add atol in final * add no_split_module * _no_split_modules = None * device transfer for model parallelism * minor fix * make fix-copies * fix typo * add test_image_processor with post_processing * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add config in RTDETRPredictionHead * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set lru_cache with max_size 32 * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add lru_cache import and configuration change * change the order of definition * make fix-copies * add docs and change config error * revert strange make-fix * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * test pass * fix get_clones related and remove deepcopy * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * nit for paper section * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * rename denoising related parameters * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * check the image transformation logic * make style * make style * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * pe_encoding -> positional_encoding_temperature * remove TODO * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * remove eval_idx since transformer DETR is giving all decoder output * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * change variable name * make style and docs import update * Revert "Update src/transformers/models/rt_detr/image_processing_rt_detr.py" This reverts commit `74aa3e1de0`. * fix typo * add postprocessing in docs * move import scipy to top * change varaible name * make fix-copies * remove eval_idx in test * move to after first sentence * update image_processor since box loss requires normalized one * change appropriate name to auxiliary_outputs * Update src/transformers/models/rt_detr/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/rt_detr/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/rt_detr.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/rt_detr.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * make style * remove panoptic related comments * make style * revert valid_processor_keys * fix aux related test * make style * change origination from config to backbone API * enable the dn_loss * fix test and conversion * renewal weight initialization * change initializer_range * make fix-up * fix the loss issue in the auxiliary output and denoising part * change weight loss to original RTDETR * fix in initialization * sync shape format of dn and aux * make style * stable fine-tuning and compatible conversion for resnet101 * make style * skip input_embed * change encoder related variable * enable converting rtdetr_r101 * add r101 related conversion code * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/rt_detr.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/image_processing_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change name _shape to _reshape * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * maket style * make fix-copies * remove deprecated import * more fix * remove last_hidden_state for task-specific model * Revert "remove last_hidden_state for task-specific model" This reverts commit `ccb7a34051`. * minore change in convert * remove print * make style and fix-copies * add custom rtdetr backbone for r18, r34 * remove print * change copied * add pad_size * make style * change layertype to optional to pass the CI * make style * add test in modeling_resnet_rt_detr * make fix-copies * skip tmp file test * fix comment * add docs * change to modeling_resnet file format * enabling resnet50 above * Update src/transformers/models/rt_detr/modeling_rt_detr.py Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com> * enable all the rtdetr model :) * finish except CI * add RTDetrResNetBackbone * make fix-copies * fix TO DO: CI enable * make style * rename test * add docs * add special fix * revert resnet * Update src/transformers/models/rt_detr/modeling_rt_detr_resnet.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * add more comment * remove swin comment * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * rename convert and add verify backbone * Update docs/source/en/_toctree.yml Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/rt_detr.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/rt_detr.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * make style * requests for docs * more general test docs * general script docs * make fix-copies * final commit * Revert "Update src/transformers/models/rt_detr/configuration_rt_detr.py" This reverts commit `d136225cd3`. * skip test_model_get_set_embeddings * remove target * add changes * make fix-copies * remove decoder_attention_mask * add load_backbone function for auto_backbone * remove comment * fix repo name * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * final commit * remove unused downsample_in_bottleneck * new test for autobackbone * change to appropriate indices * test fix * fix dict in test_image_processor * fix test * [run-slow] rt_detr, rt_detr_resnet * change the slow test * [run-slow] rt_detr * [run-slow] rt_detr, rt_detr_resnet * make in to same cuda in CSPRepLayer * [run-slow] rt_detr, rt_detr_resnet --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sounak Dey <dey.sounak@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com> Co-authored-by: ChoiSangBum <choisangbum@ChoiSangBumui-MacBookPro.local>	2024-06-21 17:50:08 +01:00
Ita Zaporozhets	1e79eade41	SPLIT PR: add user defined symbols and control symbols (#31305 ) * PR SPLIT: moving origina changes for adding user defined symbols * adding gemma test and generalizing gemma converter * ruff * update common test * update serialization test * deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added * removing commented lines * applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols * add comment referencing sentencepiece	2024-06-21 01:48:10 -07:00
Yih-Dar	ec905f3a76	unskip 2 tests in cohere (#31517 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-20 17:21:08 +02:00
Joao Gante	1fd60fec75	RWKV: enable generation tests (#31490 ) * add rwkv tests * has_attentions set in individual tests	2024-06-20 14:15:01 +01:00
Younes Belkada	6d4306160a	GGUF: Fix llama 3 GGUF (#31358 ) * Create push-important-models.yml * llama3 support for GGUF * fixup * Update src/transformers/integrations/ggml.py * fix pre-tokenizer * fix * fix * fix * fix * fix * fix * address final comment * handle special tokens + add tests	2024-06-20 14:29:58 +02:00
Joao Gante	83259e406d	Mamba: add generative tests (#31478 )	2024-06-19 10:27:23 +01:00
Fanli Lin	077c139f57	[tests] rename `test_config_object` to `test_ds_config_object` (#31403 ) fix name	2024-06-19 11:19:15 +02:00
amyeroberts	609e662243	Use self.config_tester.run_common_tests() (#31431 ) * First testing updating config tests * Use run_common_tests	2024-06-19 10:18:08 +01:00
Anton Vlasjuk	b275a41005	[`GPT2`] Add SDPA support (#31172 ) * `gpt2` sdpa support * fix (at least) one test, style, repo consistency * fix sdpa mask in forward --> fixes generation * test * test2 * test3 * test4 * simplify shapes for attn mask creation and small comments * hub fail test * benchmarks * flash attn 2 mask should not be inverted on enc-dec setup * fix comment * apply some suggestion from code review - only save _attn_implentation once - remove unnecessary comment * change elif logic * [run-slow] gpt2 * modify `test_gpt2_sample_max_time` to follow previous assertion patterns	2024-06-19 09:40:57 +02:00
Matt	28316d0e8b	Fix single letter stop strings (#31448 ) * Fix single letter stop strings * Change the 0 to a 1 to avoid potential empty vector headaches later * Restructure for clarity * Update tests/generation/test_stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add the unsqueeze --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-18 14:07:16 +01:00
Aymeric Roucher	b38612d312	Agents: Improve python interpreter (#31409 ) * Improve Python interpreter * Add with and assert statements * Prevent overwriting existing tools * Check interpreter errors are well logged in code agent * Add lazy evaluation for and and or * Improve variable assignment * Fix early return statements in functions * Add small import fix on interpreter tool	2024-06-18 11:55:36 +02:00
Ella Charlaix	02300273e2	🚨 Remove dataset with restrictive license (#31452 ) remove dataset with restrictive license	2024-06-17 17:56:51 +01:00
Albert Villanova del Moral	a14b055b65	Pass datasets trust_remote_code (#31406 ) * Pass datasets trust_remote_code * Pass trust_remote_code in more tests * Add trust_remote_dataset_code arg to some tests * Revert "Temporarily pin datasets upper version to fix CI" This reverts commit `b7672826ca`. * Pass trust_remote_code in librispeech_asr_dummy docstrings * Revert "Pin datasets<2.20.0 for examples" This reverts commit `833fc17a3e`. * Pass trust_remote_code to all examples * Revert "Add trust_remote_dataset_code arg to some tests" to research_projects * Pass trust_remote_code to tests * Pass trust_remote_code to docstrings * Fix flax examples tests requirements * Pass trust_remote_dataset_code arg to tests * Replace trust_remote_dataset_code with trust_remote_code in one example * Fix duplicate trust_remote_code * Replace args.trust_remote_dataset_code with args.trust_remote_code * Replace trust_remote_dataset_code with trust_remote_code in parser * Replace trust_remote_dataset_code with trust_remote_code in dataclasses * Replace trust_remote_dataset_code with trust_remote_code arg	2024-06-17 17:29:13 +01:00
Bastien Le Chenadec	485fd81471	Support multiple validation datasets when `dataloader_persistent_workers=True` (#30627 ) * Support multiple validation datasets when dataloader_persistent_workers=True * Test support of multiple validation datasets	2024-06-17 16:58:39 +01:00
Fanli Lin	9454f437b0	[tests] make `TestDeepSpeedModelZoo` device-agnostic (#31402 ) * fix * use accelerator device count * ci fix	2024-06-17 16:42:57 +02:00
amyeroberts	02c525d226	Rename misnamed image processor test files (#31430 )	2024-06-17 10:21:28 +01:00
amyeroberts	20812237ce	Remove empty create_and_test_config_common_properties tests (#31359 ) Remove empty tests	2024-06-14 20:15:48 +01:00
Yoach Lacombe	7e1c7dc8b6	Fix SpeechT5 `decoder_attention_mask` shape (#28071 ) * Fix SpeechT5 * add test foward with labels and attention mask * make style	2024-06-14 15:20:11 +02:00
Yoach Lacombe	d9daeff297	Set seed for M4T retain grad test (#31419 )	2024-06-14 14:48:04 +02:00
Pablo Montalvo	c624d5ba0b	add initial design for uniform processors + align model (#31197 ) * add initial design for uniform processors + align model * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * expand VideoInput * fix * fix style * remove defaults values * add comment to indicate documentation on adding kwargs * protect imports * [run-slow]align * fix * remove set() that breaks ordering * test more * removed unused func * [run-slow]align	2024-06-13 16:27:16 +02:00
Marc Sun	254b25abd9	Use huggingface_hub helper function to split state dict (#31091 ) * shard saving from hf hub * index = None * fix tests * indent	2024-06-12 14:10:32 +02:00
Jason (Siyu) Zhu	a2ede66674	Add support to declare imports for code agent (#31355 ) * Support import declaration in Code Agent	2024-06-12 09:32:28 +02:00
amyeroberts	f53fe35b29	Fast image processor (#28847 ) * Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-06-11 15:47:38 +01:00
Matt	edc1dffd00	Chat Template support for function calling and RAG (#30621 ) * First draft, still missing automatic function conversion * First draft of the automatic schema generator * Lots of small fixes * the walrus has betrayed me * please stop committing your debug breakpoints * Lots of cleanup and edge cases, looking better now * Comments and bugfixes for the type hint parser * More cleanup * Add tests, update schema generator * Update tests, proper handling of return values * Small docstring change * More doc updates * More doc updates * Add json_schema decorator * Clean up the TODOs and finish the docs * self.maxDiff = None to see the whole diff for the nested list test * add import for add_json_schema * Quick test fix * Fix something that was bugging me in the chat template docstring * Less "anyOf" when unnecessary * Support return types for the templates that need them * Proper return type tests * Switch to Google format docstrings * Update chat templating docs to match new format * Stop putting the return type in with the other parameters * Add Tuple support * No more decorator - we just do it implicitly! * Add enum support to get_json_schema * Update docstring * Add copyright header * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/chat_template_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/chat_template_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add copyright header * make fixup * Fix indentation * Reformat chat_template_utils * Correct return value * Make regexes module-level * Support more complex, multi-line arg docstrings * Update error message for ... * Update ruff * Add document type validation * Refactor docs * Refactor docs * Refactor docs * Clean up Tuple error * Add an extra test for very complex defs and docstrings and clean everything up for it * Document enum block * Quick test fixes * Stop supporting type hints in docstring to fix bugs and simplify the regex * Update docs for the regex change * Clean up enum regex * Wrap functions in {"type": "function", "function": ...} * Update src/transformers/utils/chat_template_utils.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Temporary tool calling commit * Add type hints to chat template utils, partially update docs (incomplete!) * Code cleanup based on @molbap's suggestion * Add comments to explain regexes * Fix up type parsing for unions and lists * Add custom exception types and adjust tests to look for them * Update docs with a demo! * Docs cleanup * Pass content as string * Update tool call formatting * Update docs with new function format * Update docs * Update docs with a second tool to show the model choosing correctly --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2024-06-11 15:46:38 +01:00
amyeroberts	a4e1a1d028	🚨 FLAVA: Remove double softmax (#31322 ) Remove double softmax	2024-06-10 15:01:27 +01:00
Yih-Dar	8fff07ded0	Fix Cohere CI (#31263 ) * [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-10 15:16:58 +02:00
Pavel Iakubovskii	517df566f5	Decorators for deprecation and named arguments validation (#30799 ) * Fix do_reduce_labels for maskformer image processor * Deprecate reduce_labels in favor to do_reduce_labels * Deprecate reduce_labels in favor to do_reduce_labels (segformer) * Deprecate reduce_labels in favor to do_reduce_labels (oneformer) * Deprecate reduce_labels in favor to do_reduce_labels (maskformer) * Deprecate reduce_labels in favor to do_reduce_labels (mask2former) * Fix typo * Update mask2former test * fixup * Update segmentation examples * Update docs * Fixup * Imports fixup * Add deprecation decorator draft * Add deprecation decorator * Fixup * Add deprecate_kwarg decorator * Validate kwargs decorator * Kwargs validation (beit) * fixup * Kwargs validation (mask2former) * Kwargs validation (maskformer) * Kwargs validation (oneformer) * Kwargs validation (segformer) * Better message * Fix oneformer processor save-load test * Update src/transformers/utils/deprecation.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/deprecation.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/deprecation.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Update src/transformers/utils/deprecation.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Better handle classmethod warning * Fix typo, remove warn * Add header * Docs and `additional_message` * Move to filter decorator ot generic * Proper deprecation for semantic segm scripts * Add to __init__ and update import * Basic tests for filter decorator * Fix doc * Override `to_dict()` to pop depracated `_max_size` * Pop unused parameters * Fix trailing whitespace * Add test for deprecation * Add deprecation warning control parameter * Update generic test * Fixup deprecation tests * Introduce init service kwargs * Revert popping unused params * Revert oneformer test * Allow "metadata" to pass * Better docs * Fix test * Add notion in docstring * Fix notification for both names * Add func name to warning message * Fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2024-06-10 12:35:10 +01:00
Pablo Montalvo	6b11f89c6b	Fix paligemma inverted mask (#31207 ) * pass inverted causal mask * add sanity check for paligemma finetuning * [run-slow]paligemma	2024-06-10 11:22:39 +02:00
amyeroberts	25245ec26d	Rename test_model_common_attributes -> test_model_get_set_embeddings (#31321 ) * Rename to test_model_common_attributes The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models * Explicitly skip	2024-06-07 19:40:26 +01:00
BHUVAN M	3b9174f248	interpolation added for TVP. (#30863 ) * Update TVP model to interpolate pre-trained image pad prompter encodings * feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding * added required comments * Update TVP model to interpolate pre-trained image pad prompter encodings * feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding * added required comments * docstring and argument fix * doc fixes and test case fix suggested in review. * varibale typo fix * styling and name fixes for padding interpolation flag.	2024-06-07 18:44:16 +01:00
Matt	065729a692	Remove ConversationalPipeline and Conversation object (#31165 ) * Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal * Update not-doctested.txt * Fix JA and ZH docs * Fix JA and ZH docs some more * Fix JA and ZH docs some more	2024-06-07 17:50:18 +01:00
조준래	60861fe1fd	Implement JSON dump conversion for torch_dtype in TrainingArguments (#31224 ) * Implement JSON dump conversion for torch_dtype in TrainingArguments * Add unit test for converting torch_dtype in TrainingArguments to JSON * move unit test for converting torch_dtype into TrainerIntegrationTest class * reformating using ruff * convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str --------- Co-authored-by: jun.4 <jun.4@kakaobrain.com>	2024-06-07 15:43:34 +01:00
Benjamin Badger	ff689f57aa	Extend save_pretrained to offloaded models (#27412 ) * added hidden subset * debugged hidden subset contrastive search * added contrastive search compression * debugged compressed contrastive search * memory reduction for contrastive search * debugged mem red * added low memory option feature * debugged mem optmimization output stack * debugged mem optmimization output stack * debugged low mem * added low mem cache * fixed 2047 tensor view * debugged 2042 past key val inputs * reformatted tensors * changed low mem output * final clean * removed subset hidden csearch * fixed hidden device * fixed hidden device * changed compressor dtype * removed hstate compression * integrated csearch in generate * test csearch integration into generation exit() * fixed csearch kwarg integration with generation * final wrap and added doc * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * added debug print * direct hstate cat * direct hstate cat * direct hstate cat debug * direct hstate cat debug * expanded full hidden state stack * expanded full hidden state stack * matched dims for hstates * matched dims for hstates * logits fix * equality test * equality hidden debug * debug * added prints for debug * added prints for debug * equality check * switched squeeze dim * input format debug * tracing top_k_ids * removed trace * added test context * added jitter * added jitter * added jitter * returned state * rebuilt past key value reconstruction * debugged * cleaned traces * added selection for pkv * changed output to dict * cleaned * cleaned * cleaned up contrastive search test * moved low_memory kwarg * debugged * changed low mem test batch size to 1 * removed output * debugged test input shape * reformatted csearch test * added trace * removed unsqueeze on final forward pass * replaced unsqueeze with view * removed traces * cleaned * debugged model kwargs * removed special models from test * ran make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * refactored * refactored * refactored * make fixup * renamed flag sequential * renamed flag sequential * iterative onloading * black style and test utils * added traces for integrated test * debugged * added traces * make style * removed traces, make style * included suggestions and added test * debugged test * added offload module check and make style * is_accelerate_available and make style * added test decorator * changed test model and config spec * added offload condition * added lazy loading for each shard * debugged * modified sharding * debugged * added traces * removed safe serialization * no index overload; * trace on safe save ptrs * added ptr condition * debugged * debugged ptr * moved module map init * remake shard only for offloaded modules * refactored * debugged * refactored * debugged * cleaned and make style * cleaned and make style * added trace * sparse module map * debugged * removed module map conditional * refactored * debug * debugged * added traces * added shard mem trace * added shard mem trace * removed underlying storage check * refactored * memory leak removal and make style * cleaned * swapped test decs and make style * added mem checks and make style * added free mem warning * implemented some suggestions * moved onloading to accelerate * refactored for accelerate integration * cleaned test * make style * debugged offload map name * cleaned and make style * replaced meta device check for sharding * cleaned and make style * implemented some suggestions * more suggestions * update warning Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * more suggestions * make style * new make style * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-07 07:50:35 -04:00
Cyril Vallez	8bcf9c8dd4	Fix jetmoe model (#31279 ) * Fix jetmoe model * Remove skip-tests	2024-06-07 11:51:41 +02:00
amyeroberts	bdf36dcd48	Enable HF pretrained backbones (#31145 ) * Enable load HF or tim backbone checkpoints * Fix up * Fix test - pass in proper out_indices * Update docs * Fix tvp tests * Fix doc examples * Fix doc examples * Try to resolve DPT backbone param init * Don't conditionally set to None * Add condition based on whether backbone is defined * Address review comments	2024-06-06 22:02:38 +01:00
Vu Huy Nguyen	f9296249a3	Pipeline VQA: Add support for list of images and questions as pipeline input (#31217 ) * Add list check for image and question * Handle passing two lists and update docstring * Add tests * Add support for dataset * Add test for dataset as input * fixup * fix unprotected import * fix unprotected import * fix import again * fix param type	2024-06-06 14:50:45 +01:00
amyeroberts	c53fcd8381	Mark MobileNetV1ModelTest::test_batching_equivalence as flaky (#31258 ) * Mark MobileNetV1ModelTest::test_batching_equivalence as flaky * Add link to issue * woops	2024-06-06 14:47:58 +01:00
Omar Salman	681183974a	Enable dynamic resolution input for Beit (#31053 ) * Initial attempt * Updates: PR suggestions * Interpolate the relative position bias when interpolate_pos_encoding is True * Add slow tag for the added tests * Add in DATA2VEC_VISION_INPUTS_DOCSTRING	2024-06-06 14:47:41 +01:00
Marc Sun	99895ae5e2	fix accelerate tests for roberta xl (#31288 ) * fix accelerate tests for roberta xl * style	2024-06-06 14:44:35 +01:00
Raushan Turganbay	5fabd1e83b	Generation: fix handling of special tokens (#31254 ) * fix special tokens in generatioon * fix test * add warning * fix the check * warn once * fix	2024-06-06 15:21:32 +05:00
Raushan Turganbay	7729b77478	Make mamba use cache (#31116 ) * make mamba use cache * uss cache naming as in mamba * fix musicgen	2024-06-06 13:37:29 +05:00
amyeroberts	940fde8daf	Skip failing JetMOE generation tests (#31266 ) Skip failing tests for now	2024-06-05 19:06:46 +01:00
bastrob	464d986b6c	Add missing Flaubert tokenizer tests (#30492 ) * add flaubert tokenization test, enrich inheritance in FlaubertTokenizer. * fix quality code ci * ensure parameter consistency * fix ci * fix copyright year and flatten vocab list. * fix style	2024-06-05 13:52:16 +02:00
Yih-Dar	fd3238b4b0	Fix `MistralIntegrationTest` (#31231 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-04 18:04:08 +02:00
amyeroberts	4ba66fdb4c	Fix pipeline tests - torch imports (#31227 ) * Fix pipeline tests - torch imports * Frameowrk dependant float conversion	2024-06-04 12:30:23 +01:00
Chujie Zheng	6b22a8f2d8	fix bf16 issue in text classification pipeline (#30996 ) * fix logits dtype * Add bf16/fp16 tests for text_classification pipeline * Update test_pipelines_text_classification.py * fix * fix	2024-06-04 11:20:48 +01:00
Kristen Pereira	de460e28e1	Add dynamic resolution input/interpolate position embedding to deit (#31131 ) * Added interpolate pos encoding feature and test to deit * Added interpolate pos encoding feature and test for deit TF model * readded accidentally delted test for multi_gpu * storing only patch_size instead of entire config and removed commented code * Update modeling_tf_deit.py to remove extra line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-04 10:29:01 +01:00
Raushan Turganbay	d64e4da713	Video-LLaVa: handle any number of frames (#31221 ) video-llava can handle more frames	2024-06-04 14:20:03 +05:00
DomHudson	e83cf58145	Fix sentence fragment within test comments (#31218 )	2024-06-04 10:09:24 +01:00
Raushan Turganbay	83238eeebc	Pass device in Logits Processor's init (#29804 ) * add device in logits processor * remove device when not needed * codestyle * tests * forgot `melody` version * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * codestyle * updates --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-06-04 10:19:19 +05:00
Yih-Dar	8a1a23ae4d	Fix GPU OOM for `mistral.py::Mask4DTestHard` (#31212 ) * build * build * build * build --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-03 19:25:15 +02:00
Sangbum Daniel Choi	874ac129bb	fix the get_size_with_aspect_ratio in max_size situation (#30902 ) * fix the get_size_with_aspect_ratio in max_size situation * make fix-up * add more general solution * consider when max_size is not defined * fix typo * fix typo * simple fix * fix error * fix if else error * fix error of size overwrite * fix yolos image processing * fix detr image processing * make * add longest related test script * Update src/transformers/models/yolos/image_processing_yolos.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add more test * add test script about longest size * remove deprecated --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-03 16:12:08 +01:00
Isotr0py	e4628434d8	Add Qwen2 GGUF loading support (#31175 ) * add qwen2 gguf support * Update docs * fix qwen2 tokenizer * add qwen2 gguf test * fix typo in qwen2 gguf test * format code * Remove mistral, clarify the error message * format code * add typing and update docstring	2024-06-03 14:55:10 +01:00
Yih-Dar	df848acc5d	Fix `test_compile_static_cache` (#30991 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-03 15:16:28 +02:00
fxmarty	221aaec6ec	Ignore non-causal mask in more cases with SDPA (#30138 ) * update non-causal mask for sdpa * add test * update docstrings * add one more test * fix cross attention bug * gentler atol/rtol	2024-06-03 19:08:41 +08:00
Ahmed Moubtahij	39b2ff69d6	Token healing (#30081 ) * token healing impl + trie with extensions * make fixup * prefix-robust space tokenization * examples readme and requirements * make fixup * allow input prompt and model * redundant defaults * Specialized Trie * make fixup * updated tests with new inherited Tree * input ids to auto device_map * rm unused import * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * naming convention * Revert "naming convention" This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0. * naming convention * last -hopefully- changes --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-06-03 10:53:15 +02:00
Aymeric Roucher	9837a25481	Add streaming, various fixes (#30838 ) * Implement streaming run in ReAct agents * Allow additional imports in code agents * Python interpreter: support classes and exceptions, fixes	2024-05-31 14:16:23 +02:00
Marc Sun	48cada87c3	Fix quantized cache output (#31143 )	2024-05-31 12:08:55 +02:00
zspo	cda9c82a63	fix get_scheduler when name is warmup_stable_decay (#31128 ) fix get_scheduler args	2024-05-30 15:25:43 +01:00
Younes Belkada	5e5c4d629d	FIX / Quantization: Add extra validation for bnb config (#31135 ) add validation for bnb config	2024-05-30 11:45:03 +02:00
Dhruv Pai	5c88253556	Add on_optimizer_step to callback options (#31095 ) * Modified test * Added on_optimizer_step to callbacks * Move callback after step is called * Added on optimizer step callback	2024-05-29 16:20:59 +02:00
Lucain	c3044ec2f3	Use `HF_HUB_OFFLINE` + fix has_file in offline mode (#31016 ) * Fix has_file in offline mode * harmonize env variable for offline mode * Switch to HF_HUB_OFFLINE * fix test * revert test_offline to test TRANSFORMERS_OFFLINE * Add new offline test * merge conflicts * docs	2024-05-29 11:55:43 +01:00
amyeroberts	a564d10afe	Deprecate low use models (#30781 ) * Deprecate models - graphormer - time_series_transformer - xlm_prophetnet - qdqbert - nat - ernie_m - tvlt - nezha - mega - jukebox - vit_hybrid - x_clip - deta - speech_to_text_2 - efficientformer - realm - gptsan_japanese * Fix up * Fix speech2text2 imports * Make sure message isn't indented * Fix docstrings * Correctly map for deprecated models from model_type * Uncomment out * Add back time series transformer and x-clip * Import fix and fix-up * Fix up with updated ruff	2024-05-28 18:07:07 +01:00
Younes Belkada	3264be4114	TST: Fix instruct-blip tests (#31088 ) * fix flan t5 tests * better format	2024-05-28 18:29:11 +02:00
Yih-Dar	3af7bf30ad	skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` (#31086 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-28 17:44:52 +02:00
Raushan Turganbay	779bc360ff	Watermark: fix tests (#30961 ) * fix tests * style * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-28 17:07:42 +05:00
Lysandre Debut	a3c7b59e31	Fix failing tokenizer tests (#31083 ) * Fix failing tokenizer tests * Use small tokenizer * Fix remaining reference	2024-05-28 13:34:23 +02:00
Pavel Iakubovskii	98e2d48e9a	Fix OWLv2 post_process_object_detection for multiple images (#31082 ) * Add test for multiple images * [run slow] owlv2 * Fix box rescaling * [run slow] owlv2	2024-05-28 12:06:06 +01:00
oOraph	936ab7bae5	fix from_pretrained in offline mode when model is preloaded in cache (#31010 ) * Unit test to verify fix Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * fix from_pretrained in offline mode when model is preloaded in cache Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * minor: fmt Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> --------- Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>	2024-05-28 11:56:05 +02:00
Yih-Dar	8e3b1fef97	Remove `ninja` from docker image build (#31080 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-28 11:36:26 +02:00
Yih-Dar	9d35edbb30	skip `test_model_parallelism` for 2 model test classes (#31067 ) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-27 18:36:39 +02:00
Yoach Lacombe	d355741eca	Fix pad_to_max_length Whisper (#30787 ) * fix pad_to_max_length Whisper * add tests * make style	2024-05-27 16:09:05 +02:00
Marc Sun	b84cd67526	Fix quanto tests (#31062 ) fix quanto tests	2024-05-27 15:53:45 +02:00
Ita Zaporozhets	deba7655e6	Add split special tokens (#30772 ) * seems like `split_special_tokens` is used here * split special token * add new line at end of file * moving split special token test to common tests * added assertions * test * fixup * add co-author * passing rest of args to gptsan_japanese, fixing tests * removing direct comparison of fast and slow models * adding test support for UDOP and LayoutXLM * ruff fix * readd check if slow tokenizer * modify test to handle bos tokens * removing commented function * trigger build * applying review feedback - updated docstrings, var names, and simplified tests * ruff fixes * Update tests/test_tokenization_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * applying feedback, comments * shutil temp directory fix --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain> Co-authored-by: itazap <itazap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>	2024-05-24 08:38:58 -07:00
BHUVAN M	e5103a76cc	added interpolation for vitmae model in pytorch as well as tf. (#30732 ) * added interpolation for vitmae model in pytorch as well as tf. * Update modeling_vit_mae.py irreugalr import fixed * small changes and proper formatting * changes suggested in review. * modified decoder interpolate_func * arguments and docstring fix * Apply suggestions from code review doc fixes Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-24 16:20:09 +01:00
Younes Belkada	658b849aeb	Quantization / TST: Fix remaining quantization tests (#31000 ) * Fix remaining quant tests * Update test_quanto.py	2024-05-24 14:35:59 +02:00
Lucain	fd3c128040	Fix resume_download future warning (#31007 ) * Fix resume_download future warning * better like this * Add regression test	2024-05-24 14:35:40 +02:00
Marc Sun	ae87f9797b	FIX / TST: Fix expected results on Mistral AWQ test (#30971 ) fix awq mistral test	2024-05-24 14:06:31 +02:00
Fanli Lin	04c7c176d7	[tests] make `test_model_parallelism` device-agnostic (#30844 ) * enable on xpu * fix style * add comment and mps	2024-05-24 11:51:51 +01:00
Yixiang Gao	42d8dd8716	Perceiver interpolate position embedding (#30979 ) * add test that currently fails * test passed * all perceiver passed * fixup, style, quality, repo-consistency, all passed * Apply suggestions from code review: default to False + compute sqrt once only Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix a minor bracket * replace dim with self._num_channels * add arguments to the rest preprocessors --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-24 11:13:58 +01:00
Ita Zaporozhets	7f6e87413f	add prefix space ignored in llama #29625 (#30964 ) * add prefix space ignored in llama #29625 * adding test with add_prefix_space=False * ruff --------- Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>	2024-05-24 01:03:00 -07:00
Yasmin Moslem	6d3d5b1039	Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py (#29834 ) * Fix typo in tokenization_nllb.py Change `adder_tokens_decoder` into `added_tokens_decoder` and improve the warning's readability. * Fix typo in tokenization_nllb_fast.py Change `adder_tokens_decoder` into `added_tokens_decoder` and improve the warning's readability. * Remove deprecated attributes in tokenization_nllb.py Remove deprecated attributes: `lang_code_to_id`, `fairseq_tokens_to_ids`, `id_to_lang_code`, and `fairseq_ids_to_tokens` * Remove deprecated attribute in tokenization_nllb_fast.py Remove deprecated attribute `lang_code_to_id` * Remove deprecated properties in tokenization_nllb.py Remove deprecated properties - fix format * Remove deprecated properties in tokenization_nllb_fast.py Remove deprecated properties - fix format * Update test_tokenization_nllb.py * update test_tokenization_nllb.py * Update tokenization_nllb.py * Update test_tokenization_seamless_m4t.py * Update test_tokenization_seamless_m4t.py	2024-05-23 18:53:26 +02:00
Aritra Roy Gosthipaty	965e98dc54	[Port] TensorFlow implementation of Mistral (#29708 ) * chore: initial commit * chore: adding imports and inits * chore: adding the causal and classification code * chore: adding names to the layers * chore: using single self attn layer * chore: built the model and layers * chore: start with testing * chore: docstring change, transpose fix * fix: rotary embedding * chore: adding cache implementation * remove unused torch * chore: fixing the indexing issue * make fix-copies * Use modeling_tf_utils.keras * make fixup * chore: fixing tests * chore: adding past key value logic * chore: adding multi label classfication test * fix: switching on the built parameters in the layers * fixing repo consistency * ruff formats * style changes * fix: tf and pt equivalence * removing returns from docstrings * fix docstrings * fix docstrings * removing todos * fix copies * fix docstring * fix docstring * chore: using easier rotate_half * adding integration tests * chore: addressing review related to rotary embedding layer * review changes * [run-slow] mistral * skip: test save load after resize token embedding * style --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2024-05-23 17:48:49 +01:00
Yih-Dar	2a89673fe5	Update 4 `MptIntegrationTests` expected outputs (#30989 ) * fix * fix * fix * fix * fix * [run-slow] mpt --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-23 18:27:54 +02:00
Fanli Lin	21339a5213	[tests] add `torch.use_deterministic_algorithms` for XPU (#30774 ) * add xpu check * add marker * add documentation * update doc * fix ci * remove from global init * fix	2024-05-23 16:53:07 +01:00
Marc Sun	8366b57241	Fix accelerate failing tests (#30836 ) * Fix accelerate tests * fix clip * skip dbrx tests * fix GPTSan * fix M2M100Model * same fix as jamba * fix mt5 * Fix T5Model * Fix umt5 model * fix switch_transformers * fix whisper * fix gptsan again * fix siglip recent test * skip siglip tests * wrong place fixed	2024-05-23 17:18:58 +02:00
Poedator	6739e1d261	test_custom_4d_attention_mask skip with sliding window attn (#30833 )	2024-05-23 15:22:10 +02:00
Raushan Turganbay	d583f1317b	Quantized KV Cache (#30483 ) * clean-up * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * more suggestions * mapping if torch available * run tests & add 'support_quantized' flag * fix jamba test * revert, will be fixed by another PR * codestyle * HQQ and versatile cache classes * final update * typo * make tests happy --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-05-23 17:25:20 +05:00
Kamil Akesbi	eb1a77bbb0	Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size (#30637 ) * fiw input to generate in pipeline * fixup * pass input_features to generate with assistant * error if model and assistant with different enc size * fix * apply review suggestions * use self.config.is_encoder_decoder * pass inputs to generate directly * add slow tests * Update src/transformers/generation/utils.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply code review * update attributes encoder_xyz to check * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add slow test * solve conflicts --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-05-23 09:59:38 +01:00
Pablo Montalvo	a25f7d3c12	Paligemma causal attention mask (#30967 ) * PaliGemma working causal attention * Formatting * Style * Docstrings + remove commented code * Update docstring for PaliGemma Config * PaliGemma - add separator ind to model/labels * Refactor + docstring paligemma processor method * Style * return token type ids when tokenizing labels * use token type ids when building causal mask * add token type ids to tester * remove separator from config * fix style * don't ignore separator * add processor documentation * simplify tokenization * fix causal mask * style * fix label propagation, revert suffix naming * fix style * fix labels tokenization * [run-slow]paligemma * add eos if suffixes are present * [run-slow]paligemma * [run-slow]paligemma * add misssing tokens to fast version * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style * [run-slow]paligemma --------- Co-authored-by: Peter Robicheaux <peter@roboflow.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-22 19:37:15 +02:00
Sanchit Gandhi	0948c827de	[Whisper] Strip prompt before finding common subsequence (#27836 )	2024-05-22 17:25:47 +01:00
Raushan Turganbay	b1065aa08a	Generation: get special tokens from model config (#30899 ) * fix * let's do this way? * codestyle * update * add tests	2024-05-22 18:15:41 +02:00
amyeroberts	dff54ad2d9	🚨 out_indices always a list (#30941 ) * out_indices always a list * Update src/transformers/utils/backbone_utils.py * Update src/transformers/utils/backbone_utils.py * Move type casting * nit	2024-05-22 15:23:04 +01:00
Pablo Montalvo	250ae9f746	Paligemma - fix slow tests, add bf16 and f16 slow tests (#30851 ) * fix slow tests, add bf16 and f16 slow tests * few fixes * [run-slow]paligemma * add gate decorator * [run-slow]paligemma * add missing gating * [run-slow]paligemma * [run-slow]paligemma	2024-05-22 16:20:07 +02:00
Jonatan Kłosko	1518508467	Avoid extra chunk in speech recognition (#29539 )	2024-05-22 14:07:51 +01:00
Marc Sun	5c186003b8	Fix low cpu mem usage tests (#30808 ) * Fix tests * fix udop failing test * remove skip * style	2024-05-22 14:09:01 +02:00
Arthur	673440d073	update ruff version (#30932 ) * update ruff version * fix research projects * Empty * Fix errors --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2024-05-22 06:40:15 +02:00
Matthew Beckers	3b09d3f05f	fix: center_crop occasionally outputs off-by-one dimension matrix (#30934 ) If required padding for a crop larger than input image is odd-numbered, the padding would be rounded down instead of rounded up, causing the output dimension to be one smaller than it should be.	2024-05-21 13:56:52 +01:00
Zach Mueller	daf281f44f	Enforce saving at end of training if saving option chosen (#30160 ) * Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean	2024-05-21 07:50:11 -04:00
Mohit Sharma	7a4792e6b3	CI: AMD MI300 tests fix (#30797 ) * add fix * update import * updated dicts and comments * remove prints * Update testing_utils.py	2024-05-21 12:46:07 +01:00
Younes Belkada	8871b26150	FEAT / Trainer: LOMO optimizer support (#30178 ) * add V1 - adalomo not working yet * add todo docs + refactor from comments * adjust LR * add docs * add more elaborated test * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix * push * add accelerate check * fix DDP case * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * init kwargs * safely add attribute * revert to enum logic * Update src/transformers/trainer.py --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-21 10:16:37 +02:00
Younes Belkada	c876d12127	FIX / TST: Fix expected results on Mistral slow test (A10) (#30909 ) Update test_modeling_mistral.py	2024-05-21 09:14:14 +02:00
Longjie Zheng	616bb11d48	Add torch.compile for Mistral (#30642 ) * first version * fix sliding window * fix style * add sliding window cache * fix style * address comments * fix test * fix style * move sliding window check inside cache init * revert changes on irrelevant files & add comment on SlidingWindowCache * address comments & fix style fix style * update causal mask * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] llama * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * revert CI from a10 to t4 * wrap up	2024-05-20 16:27:24 +02:00
Zach Mueller	92d1d97c05	Introduce configured_state arg for accelerator_config (#29781 ) * Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-20 09:21:40 -04:00
Yoach Lacombe	e6708709cb	Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM (#28706 ) * Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM * update with a type filter * add raises error test * fix added test	2024-05-20 13:40:42 +02:00
Hafedh	c11ac7857b	fix for custom pipeline configuration (#29004 ) * fix for custom pipeline configuration * fix for custom pipelines * remove extra exception * added test for custom pipelines extra tag * format with ruff * limit extra tag for first time only * format with ruff * improve tests for custom pipelines	2024-05-20 11:38:32 +02:00
Kamil Akesbi	1c2bb3ac54	add return_token_timestamps to WhisperProcessor (#30812 ) * compute num_frames in WhisperFeatureExtractor * add return_num_frames in WhisperFeatureProcessor + adapt pipeline * return_timestamps renaming + pipeline fix * fix * fix * fix * add tests * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review changes * fix * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/models/whisper/test_modeling_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * fix * review changes * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make style quality * EXPECTED_OUTPUT in single line * small numpy->torch fix * fix --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-20 09:53:58 +01:00
Raushan Turganbay	5d0bf59b4d	LLaVa-Next: Update docs with batched inference (#30857 ) * update docs with batch ex * Update docs/source/en/model_doc/llava_next.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * accept nested list of img --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2024-05-20 13:45:56 +05:00
Benjamin Warner	cd6bd0af34	Add support for torch.compile dynamic shapes (#30560 ) * add torch.compile dynamic support * Add SDPA dynamic shapes compile test & improve SDPA comment * comment consistency	2024-05-20 10:36:57 +02:00
Joseph Enguehard	07bf2dff78	Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878 ) * Add MistralForTokenClassification * Add tests and docs * Add token classification for Mixtral and Qwen2 * Save llma for token classification draft * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2 * Formatting * Add token classification support for Qwen2Moe model * Add dropout layer to each ForTokenClassification model * Add copied from in tests * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Propagate suggested changes * Style --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-05-20 10:06:57 +02:00
Abhiroop Tejomay	481a957814	Enable dynamic resolution input for Swin Transformer and variants (#30656 ) * add interpolation of positional encoding support to swin * add style changes * use default image processor and make size a dictionary Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove logits testing Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Refactor image size validation logic when interpolation is disabled Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove asserts in modeling Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add dynamic resolution input support to swinv2 * change size to ensure interpolation encoding path is triggered * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False * add dynamic resolution input to donut swin * add dynamic resolution input to maskformer swin --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-17 18:38:46 +01:00
Pavel Iakubovskii	bf646fbf2d	Add fixed resize and pad strategy for object detection (#30742 ) * Add resize and pad strategy * Merge get_size functions * Add pad_size + tests to object detection models * Fixup * Update docstrings * Fixup	2024-05-17 16:21:26 +01:00
Arthur	0a9300f474	Support arbitrary processor (#30875 ) * Support arbitrary processor * fix * nit * update * nit * nit * fix and revert * add a small test * better check * fixup * bug so let's just use class for now * oups * .	2024-05-17 16:51:31 +02:00
Younes Belkada	3d7d3a87a0	TEST: Add llama logits tests (#30835 ) * add llama logits test * fix * fix tests " " * fix for a10 * format * format * fix * [run-slow] remove fmt: skip * Your commit message * test commit * Revert "test commit" This reverts commit `b66e01e55f`. * [run-slow]llama * Update tests/models/llama/test_modeling_llama.py * [run-slow]llama * empty commit	2024-05-17 12:23:00 +02:00
Yih-Dar	1b3dba9417	Make `Gemma` work with `torch.compile` (#30775 ) * fix * [run-slow] gemma * add test * add `test_compile_static_cache` * fix * style * remove subprocess * use attribute * fix * style * update * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-16 13:41:33 +02:00

... 12 13 14 15 16 ...

5042 Commits