transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

Author	SHA1	Message	Date
Mahdi Baghbanzadeh	c61fcde910	Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251 ) * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling	2025-01-14 17:01:10 +00:00
Mohamed Mekkouri	a11041ffad	Fix : add require_read_token for gemma2 gated model (#35687 ) fix gemma2 gated model test	2025-01-14 11:47:05 +01:00
Mohamed Mekkouri	df2a812e95	Fix expected output for ggml test (#35686 ) fix expected output	2025-01-14 11:46:55 +01:00
Mohamed Mekkouri	050636518a	Fix : HQQ config when hqq not available (#35655 ) * fix * make style * adding require_hqq * make style	2025-01-14 11:37:37 +01:00
Arthur	c23a1c1932	Add-helium (#35669 ) * Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 18:41:15 +01:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Arthur	e6f9b03464	[`Compile`] Only test compiling model forward pass (#35658 ) * rename test to only compile forward! * style emu	2025-01-13 13:43:29 +01:00
Raushan Turganbay	84a6789145	Enable different torch dtype in sub models (#34873 ) * fix * fix test * add tests * add more tests * fix tests * supposed to be a torch.dtype test * handle BC and make fp32 default	2025-01-13 13:42:08 +01:00
Yih-Dar	1e3c6c1f7d	Skip `MobileNetV1ModelTest::test_batching_equivalence` for now (#35614 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 18:32:36 +01:00
Yih-Dar	04eae987f3	Fix flaky `test_beam_search_low_memory` (#35611 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 17:31:03 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
Yih-Dar	bbc00046b9	Fix flaky `test_custom_4d_attention_mask` (#35606 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 15:40:04 +01:00
Raushan Turganbay	52e1f87c7d	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:23:00 +01:00
Cyril Vallez	ccc0381d36	Fix flex_attention in training mode (#35605 ) * fix flex * add test * style	2025-01-10 11:49:12 +01:00
Raushan Turganbay	e0646f3dce	Chat template: return vectorized output in processors (#34275 ) * update chat template * style * fix tests * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typehints + docs * fix tests * remove unnecessary warnings * forgot code style :( * allow users to pass backend and num frames * Update docs/source/en/chat_templating.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typo fix * style * address comments * align with "pipeline" template * update docs * update docs * unpack for all kwargs? * wrong conflict resolution while rebasing * tmp * update docs * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-10 11:05:29 +01:00
eustlb	5f087d1335	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:00:54 +01:00
Yih-Dar	6f127d3f81	Skip `torchscript` tests if a cache object is in model's outputs (#35596 ) * fix 1 * fix 1 * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-10 10:46:03 +01:00
Tom Aarsen	6b73ee8905	ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459 ) * Introduce 5 integration tests for the 4 model classes + torch export * ModernBert: reuse GemmaRotaryEmbedding via modular * Revert #35589, keep rope_kwargs; rely on them in modular_modernbert * Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert" This reverts commit `11b44b9ee8`. * Don't set rope_kwargs; override 'self.rope_init_fn' call instead	2025-01-10 10:25:10 +01:00
Cyril Vallez	3a4ae6eace	Refactor/fix Cohere2 (#35594 ) * refactor/fix cohere2 * add kwargs * tests * remove func and import it	2025-01-09 17:54:57 +01:00
Tom Aarsen	32e0db8a69	[`tokenizers`] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593 ) * Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this. * Simplify setting self.add_prefix_space, ensure pre_tok exists * Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized' `862d1a346a/bindings/python/src/pre_tokenizers.rs (L672)` produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer. * Propagate add_prefix_space in T5TokenizerFast to superclass	2025-01-09 17:46:50 +01:00
Cyril Vallez	46276f9a7f	Fix modular edge case + modular sorting order (#35562 ) * look-ahead negation * re add examples by default * Fix the bug in topological sort * Update create_dependency_mapping.py * start adding test * finalize test * more tests * style * style	2025-01-09 17:17:52 +01:00
Yih-Dar	82dd6c14bb	Fix flaky `SwitchTransformersModelTest::test_training_gradient` (#35587 ) * fix * Update tests/models/switch_transformers/test_modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-09 15:36:22 +01:00
Arthur	eb4579cf43	`tokenizer` train from iterator without pre_tokenizers (#35396 ) * fix if else issues * add a test * fix the test * style	2025-01-09 15:34:43 +01:00
Jack Morris	832c6191ed	Add inputs_embeds param to ModernBertModel (#35373 ) * update modular_modernbert -- add inputs_embeds param to ModernBertModel * Fix implementation issues; extend to other classes; docstring First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented. I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes. I also introduced an error if input_ids and input_embeds are both or neither provided. Lastly, I fixed an issue with device being based solely on input_ids with attention_mask. * Propagate inputs_embeds to ModernBertForMaskedLM correctly Also reintroduce inputs_embeds test --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>	2025-01-09 14:17:26 +01:00
Yih-Dar	1b2f942af7	Fix flaky `test_batching_equivalence` (#35564 ) * yes! * oh no!!! * oh no!!! * style * oh no!!! * oh no!!! * oh no!!! * oh no!!! --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-09 14:00:08 +01:00
Cyril Vallez	965a2fb320	More model refactoring! (#35359 ) * cohere * style * phi3 * style * small fix * small fix * phi3 longrope * oups * Update rope (only for phi3 still) * Update test_modeling_rope_utils.py * Update modeling_phi3.py * fix * fix copies * style * Fix copied from bad renaming	2025-01-09 11:09:09 +01:00
nhamanasu	b32938aeee	Fix all output_dir in test_trainer.py to use tmp_dir (#35266 ) * update codecarbon * replace directly-specified-test-dirs with tmp_dir * pass tmp_dir to all get_regression_trainer * test_trainer.py: Use tmp_dir consistently for all output_dir arguments * fix some with...as tmp_dir blocks * reflect the comments to improve test_trainer.py * refresh .gitignore	2025-01-08 19:44:39 +01:00
Joao Gante	76da6ca034	Pipeline: simple API for assisted generation (#34504 ) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-01-08 17:08:02 +00:00
Arthur	3f483beab9	[`PixtralLarge`] Update Pixtral conversion script to support large format! (#34801 ) * update conversion script * update for bias again * remove pdv * use my dir * Update how we initialize the tokenizer * Convert in bfloat16 * Undo that one again * fix config dump * .to() was broken for BatchMixFeature * quick debug breakpoint * put the breakpoint in the right place * Add a config flag for the multimodal projector bias * Add a config flag for the multimodal projector bias * Conversion script can load chat templates * Indent config for comparison * Stop clobbering the config * Re-enable the config clobber * Get rid of the config manual save - it has no effect! * Handle adapter bias correctly * Default vision transformer activation to silu * Remove legacy processing path * One commit with all the debug breakpoints before I delete them all, in case I need to revert * Update conversion * Remove vLLM debugging instrumentation * Drop xformers * Remove debug enumerates * make fixup * make fixup * Break copied from in pixtral * Propagate multimodal_projector_bias change * Propagate multimodal_projector_bias change * Remove debug device .to() * Restore attention weights output * Fix Pixtral test * Drop image_seq_length * Drop image_seq_length * Put the legacy processing code back * Add the bias option to the llava_next_video config * Add the bias option to the llava_next_video config * Make certain args required in converter * Make certain args required in converter * typo * make fixup * Reverting some dtype changes since it seems to work without them --------- Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-166-244.ec2.internal> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-01-08 17:39:47 +01:00
NielsRogge	8490d3159c	Add ViTPose (#30530 ) * First draft * Make fixup * Make forward pass worké * Improve code * More improvements * More improvements * Make predictions match * More improvements * Improve image processor * Fix model tests * Add classic decoder * Convert classic decoder * Verify image processor * Fix classic decoder logits * Clean up * Add post_process_pose_estimation * Improve post_process_pose_estimation * Use AutoBackbone * Add support for MoE models * Fix tests, improve num_experts% * Improve variable names * Make fixup * More improvements * Improve post_process_pose_estimation * Compute centers and scales * Improve postprocessing * More improvements * Fix ViTPoseBackbone tests * Add docstrings, fix image processor tests * Update index * Use is_cv2_available * Add model to toctree * Add cv2 to doc tests * Remove script * Improve conversion script * Add coco_to_pascal_voc * Add box_to_center_and_scale to image_transforms * Update tests * Add integration test * Fix merge * Address comments * Replace numpy by pytorch, improve docstrings * Remove get_input_embeddings * Address comments * Move coco_to_pascal_voc * Address comment * Fix style * Address comments * Fix test * Address comment * Remove udp * Remove comment * [WIP] need to check if the numpy function is same as cv * add scipy affine_transform * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * refactor convert * add output_shape * add atol 5e-2 * Use hf_hub_download in conversion script * make box_to_center more applicable * skipt test_get_set_embedding * fix to accept array and fix CI * add co-contributor * make it to tensor type output * add torch * change to torch tensor * add more test * minor change * CI test change * import torch should be above ImageProcessor * make style * try not use torch in def * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fix * fix * add caution * make more detail about dataset_index * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * add docs * Update docs/source/en/model_doc/vitpose.md * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Revert "Update src/transformers/__init__.py" This reverts commit `7ffa504450`. * change name * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vitpose/test_modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * move vitpose only function to image_processor * raise valueerror when using timm backbone * use out_indices * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove camel-case of def flip_back * rename vitposeEstimatorOutput * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix confused camelcase of MLP * remove in-place logic * clear scale description * make consistent batch format * docs update * formatting docstring * add batch tests * test docs change * Update src/transformers/models/vitpose/image_processing_vitpose.py * Update src/transformers/models/vitpose/configuration_vitpose.py * chagne ViT to Vit * change to enable MoE * make fix-copies * Update docs/source/en/model_doc/vitpose.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * extract udp * add more described docs * simple fix * change to accept target_size * make style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vitpose/configuration_vitpose.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change to `verify_backbone_config_arguments` * Update docs/source/en/model_doc/vitpose.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove unnecessary copy * make config immutable * enable gradient checkpointing * update inappropriate docstring * linting docs * split function for visibility * make style * check isinstances * change to acceptable use_pretrained_backbone * make style * remove copy in docs * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * simple fix + make style * change input config of activation function to string * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * tmp docs * delete index.md * make fix-copies * simple fix * change conversion to sam2/mllama style * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/vitpose/image_processing_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * refactor convert * add supervision * Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * remove reduntant def * seperate code block for visualization * add validation for num_moe * final commit * add labels * [run-slow] vitpose, vitpose_backbone * Update src/transformers/models/vitpose/convert_vitpose_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * enable all conversion * final commit * [run-slow] vitpose, vitpose_backbone * ruff check --fix * [run-slow] vitpose, vitpose_backbone * rename split module * [run-slow] vitpose, vitpose_backbone * fix pos_embed * Simplify init * Revert "fix pos_embed" This reverts commit `2c56a4806e`. * refactor single loop * allow flag to enable custom model * efficiency of MoE to not use unused experts * make style * Fix range -> arange to avoid warning * Revert MOE router, a new one does not work * Fix postprocessing a bit (labels) * Fix type hint * Fix docs snippets * Fix links to checkpoints * Fix checkpoints in tests * Fix test * Add image to docs --------- Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: sangbumchoi <danielsejong55@gmail.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-08 16:02:14 +00:00
Minho Shim	4349a0e401	fix: Qwen2-VL generate with inputs_embeds (#35466 ) * fix: Qwen2-VL generate with inputs_embeds * change: optional input_ids in get_rope_index	2025-01-08 16:36:03 +01:00
Sean (Seok-Won) Yi	88e18b3c63	Update doc for `metric_for_best_model` when `save_strategy="best"`. (#35389 ) * Updated docstring for _determine_best_metric. * Updated docstring for metric_for_best_model. * Added test case for save strategy. * Updated incorrect test case. * Changed eval_strategy to match save_strategy. * Separated test cases for metric. * Allow load_best_model when save_strategy == "best". * Updated docstring for metric_for_best_model.	2025-01-08 16:32:35 +01:00
Pavel Iakubovskii	657bb14f98	Enable auto task for timm models in pipeline (#35531 ) * Enable auto task for timm models * Add pipeline test	2025-01-08 15:14:17 +00:00
Pavel Iakubovskii	59e5b3f01b	Timm wrapper label names (#35553 ) * Add timm wrapper label names mapping * Add index to classification pipeline * Revert adding index for pipelines * Add custom model check for loading timm labels * Add tests for labels * [run-slow] timm_wrapper * Add note regarding label2id mapping	2025-01-08 14:09:46 +00:00
Jacky Lee	3c1895aa65	Fix Qwen2VL processor to handle odd number of frames (#35431 ) * fix: processing odd number of frames * feat: add test case * update: test one frame * feat: support custom patch size * fix: test with videos * revert: change on patch repeat * fix: much wow * update: fixups * fixup pls * ruff fixup * fix typo at least	2025-01-08 13:49:00 +01:00
Quentin Lhoest	3fde88b19d	support chat generator as input of TextGenerationPipeline (#35551 ) * support chat generator as input of TextGenerationPipeline * missing import * fix tests * again * simpler * add test	2025-01-08 13:27:07 +01:00
Raushan Turganbay	d1681ec2b6	VLMs: major clean up 🧼 (#34502 ) only lllava models are modified	2025-01-08 10:35:23 +01:00
Jade Choghari	7176e06b52	Add TextNet (#34979 ) * WIP * Add config and modeling for Fast model * Refactor modeling and add tests * More changes * WIP * Add tests * Add conversion script * Add conversion scripts, integration tests, image processor * Fix style and copies * Add fast model to init * Add fast model in docs and other places * Fix import of cv2 * Rename image processing method * Fix build * Fix Build * fix style and fix copies * Fix build * Fix build * Fix Build * Clean up docstrings * Fix Build * Fix Build * Fix Build * Fix build * Add test for image_processing_fast and add documentation tests * some refactorings * Fix failing tests * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Introduce TextNet * Fix failures * Refactor textnet model * Fix failures * Add cv2 to setup * Fix failures * Fix failures * Add CV2 dependency * Fix bugs * Fix build issue * Fix failures * Remove textnet from modeling fast * Fix build and other things * Fix build * some cleanups * some cleanups * Some more cleanups * Fix build * Incorporate PR feedbacks * More cleanup * More cleanup * More cleanup * Fix build * Remove all the references of fast model * More cleanup * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix Build * Fix build * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Fix style * Fix build * Incorporate PR feedbacks * Fix image processing mean and std * Incorporate PR feedbacks * fix build failure * Add assertion to image processor * Incorporate PR feedbacks * Incorporate PR feedbacks * fix style failures * fix build * Fix Imageclassification's linear layer, also introduce TextNetImageProcessor * Fix build * Fix build * Fix build * Fix build * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix build * Incorporate PR feedbacks * Remove some script * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Incorporate PR feedbacks * Fix image processing in textnet * Incorporate PR Feedbacks * Fix CI failures * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Fix failing test * Add textnet to readme * Improve readability * Incorporate PR feedbacks * fix code style * fix key error and convert working * tvlt shouldn't be here * fix test modeling test * Fix tests, make fixup * Make fixup * Make fixup * Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_image_processing_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * space typo Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * improve type annotation Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/configuration_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * make conv layer kernel sizes and strides default to None * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix keyword bug * add batch init and make fixup * Make fixup * Update integration test * Add figure * Update textnet.md * add testing and fix errors (classification, imgprocess) * fix error check * make fixup * make fixup * revert to original docstring * add make style * remove conflict for now * Update modeling_auto.py got a confusion in `timm_wrapper` - was giving some conflicts * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/textnet/test_modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/textnet/modeling_textnet.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * add changes * Update textnet.md * add doc * add authors hf ckpt + rename * add feedback: classifier/docs --------- Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com> Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co> Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-08 09:52:51 +01:00
Matt	a7d1441d65	Correctly list the chat template file in the Tokenizer saved files list (#34974 ) * Correctly list the chat template file in the saved files list * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add save file checking to test * make fixup * better filename handling * make fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-07 19:11:02 +00:00
eustlb	7f7677307c	[Qwen2Audio] handle input ids expansion during processing (#35534 ) * add audio_token attribute to proc * expand input_ids * and legacy and expanded input_ids * test update * split lines * add possibility not to provide eos and bos audio tokens * raise errors * test incorrect number of audio tokens * add example * fmt * typo	2025-01-07 16:47:27 +01:00
Francesco Cariaggi	f408d55448	Fix bug when requesting input normalization with EnCodec (#34756 ) * EnCodec: unsqueeze padding mask * add test for normalization	2025-01-07 11:50:02 +01:00
松本和真	96bf3d6cc5	Add diffllama (#34083 ) * first adding diffllama * add Diff Attention and other but still with errors * complate make attention Diff-Attention * fix some bugs which may be caused by transformer-cli while adding model * fix a bug caused by forgetting KV cache... * Update src/transformers/models/diffllama/modeling_diffllama.py You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward. Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py new codes are more meaningful than before Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fix 2times divide by sqrt(self.head_dim) Co-authored-by: Minho Ryu <ryumin93@gmail.com> * Update src/transformers/models/diffllama/modeling_diffllama.py fit to changeing "num_heads // 2" place. and more visible Co-authored-by: Minho Ryu <ryumin93@gmail.com> * I found Attention missed implemented from paper still on `e072544a3b`. * re-implemented * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * align with transformers code style Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix typo Co-authored-by: Minho Ryu <ryumin93@gmail.com> * adding groupnorm Co-authored-by: Minho Ryu <ryumin93@gmail.com> * change SdpaAttention to DiffSdpaAttention Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bug * Update src/transformers/models/diffllama/modeling_diffllama.py resolve "not same outputs" problem Co-authored-by: Minho Ryu <ryumin93@gmail.com> * fix bugs of places of "GroupNorm with scale" and etc * Revert "fix bugs of places of "GroupNorm with scale" and etc" This reverts commit `26307d92f6`. * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * simplify multiple of attention (matmul) operations into one by repeating value_states Co-authored-by: Minho Ryu <ryumin93@gmail.com> * remove missed type * add diffllama model_doc * apply make style/quality * apply review comment about model * apply review comment about test * place diffllama alphabetically on the src/transformers/__init__.py * fix forgot code * Supports parameters that are not initialized with standard deviation 0 in the conventional method * add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py * remove unused property of config * add to supported model list * add to spda supported model list * fix copyright, remove pretraining_tensor_parallel, and modify for initialization test * remove unused import and etc. * empty commit * empty commit * empty commit * apply modular transformers but with bugs * revert prev commit * create src/transformers/model/diffllama/modular_diffllama.py * run utils/modular_model_converter.py * empty commit * leaner modular diffllama * remove more and more in modular_diffllama.pt * remove more and more in modular_diffllama.pt * resolve missing docstring entries * force reset * convert modular --------- Co-authored-by: Minho Ryu <ryumin93@gmail.com>	2025-01-07 11:34:56 +01:00
Dmitry Rogozhkin	9fd123ac31	ci: mark model_parallel tests as cuda specific (#35269 ) `parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: #35252 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-07 10:16:34 +01:00
pglorio	bd442c6d3a	Zamba new attention standard (#35375 ) * updated zamba to new attention standard * make fixup fixes	2025-01-07 10:08:45 +01:00
Sarthak Karandikar	ca00950057	added logic for deleting adapters once loaded (#34650 ) * added logic for deleting adapters once loaded * updated to the latest version of transformers, merged utility function into the source * updated with missing check * added peft version check * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * changes according to reviewer * added test for deleting adapter(s) * styling changes * styling changes in test * removed redundant code * formatted my contributions with ruff * optimized error handling * ruff formatted with correct config * resolved formatting issues --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-01-06 18:36:40 +00:00
Yijun Lee	e5fd865eba	Add Gemma2 GGUF support (#34002 ) * initial setup for ggml.py * initial setup of GGUFGemma2Converter class * Add gemma2 model to gguf.md doc * Partial work on GGUF_TENSOR_MAPPING * initial setup of GGUF_TENSOR_MAPPING for Gemma2 * refactor: rename GemmaConvert class to GemmaConverter for naming consistency * feat: complete gemma2 tensor mapping implementation * feat: add initial implementation of GGUFGemmaConverter * feat: complete GGUFGemmaConverter implementation * feat: add test code for gemma2 * refactor: minor code cleanup * refactor: minor code cleanup * fix: resolve suggestions * Update tests/quantization/ggml/test_ggml.py Co-authored-by: Isotr0py <2037008807@qq.com> --------- Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-03 14:50:07 +01:00
Jacky Lee	30a9971632	Use `sdpa_kernel` in tests (#35472 ) * update: use sdpa_kernel * update: rerun test	2025-01-03 14:39:52 +01:00
Blanchon	cba49cb2a6	Change `is_soundfile_availble` to `is_soundfile_available` (#35030 )	2025-01-03 14:37:42 +01:00
Matthew Douglas	6b1e86fd4d	Fix new BNB test failures (#35345 )	2025-01-02 11:24:52 +01:00

1 2 3 4 5 ...

4396 Commits