transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

Author	SHA1	Message	Date
Xiaojian Ma	e1f379bb09	Fixing the example in generation strategy doc (#37598 ) Update generation_strategies.md The prompt text shown in the example does not match what is inside the generated output. As the generated output always include the prompt, the correct prompt should be "Hugging Face is an open-source company".	2025-04-18 12:50:17 -07:00
Pavel Iakubovskii	4f58fc9c82	Deprecate modeling_utils.py classes (#37298 ) * Move utils classes into models * Add deprecation warnings * Remove from docs * Update config attributes check	2025-04-18 18:47:34 +01:00
Yoni Gozlan	a245011252	Add InternVL (2.5 MPO) (#35968 ) * initial commit * add convert internvl * add first end-to-end working internvl * nit prompt and image proc * add working chat template * add conversion llama-based models * add tests * pass all tests * fix isort * fix modular after main merge * add video processing for internvl * add support for interlaced images and videos * Remove processing and config from modular, add more tests * add llama model tests * Modify processor for compatibility with refactored got ocr image processor * add comments in processor * Add docs and nits * change video processing to use custom sample_indices_fn * rebase and fix tests * add processor tests * Add changes Raushan review * Use the new attention interface for the vision model * nits * add support for custom video_load_backend * remove mention to InternVLTokenizer * refactor vision model to simplify logic * refactor processor for better readibility * fix copies * fix require av processor test * refactor internVL vision * Update processor and fix processing tests * fix docstring * update convert_weights for internvl3 * change image processor to fast by default * remove do_center_crop=True in convert_weights * force use_cache to True * push_to_hub before reloading * fix internVLVision for larger models * update convert weight for qk norm * fix convert_weights * fix eos_token_id in convert * update docs and integration tests * make modifs after review * fix wrong k_norm and reduce modular * change image_token_index to image_token_id * change checkpoint to OpenGVLab org * last nits * explicitely del self.num_key_value_groups * add extra special tokens	2025-04-18 18:57:33 +02:00
Pablo Montalvo	4afd3f4820	Model debugger upgrades (#37391 ) * debugging improvements * add debugging details * add more debugging details * debug more * clean up layers + output * add summary json file * cleanup * copies 👀 * remove hooks + add documentation * draft a small test, why not * respect the format (respect it) * fixup imports * nit * add tests and configurable pruning of layers	2025-04-18 16:45:54 +02:00
Cyril Vallez	4acf692ace	Update Phi4 converter (#37594 ) * fix converter * Update phi4_multimodal.md	2025-04-17 23:08:24 +02:00
Anthony Song	346f1eebbd	docs: fix typo (#37567 ) Co-authored-by: Anthony <anthony.song@capitalone.com>	2025-04-17 14:54:44 +01:00
Raushan Turganbay	3bc44eaaee	[qwen-vl] Standardize config (#37268 ) * update * fix tests * fixup * update * skip this one * fixup * fix	2025-04-17 09:38:12 +02:00
Yaswanth Gali	a2ef3cf537	Add Janus model (#36053 ) * Iterative generation using input embeds * Add Janus model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Skip test for these models * Add Janus Model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Refactor to Text config * ✨ Added generate function * Saving intermediate convert file. Still need to read configs from the hub and convert them to our format. * Adding version that reads from the JSON files. Still have to tweak some parameters manually. * relative imports * Initial tests * Refactor image processor * Seemingly working version of the conversion script, will need to test further. * Adding command message * Fixing conflicting JanusTextConfig class * Incorporating some of the discussed changes. * Small fix to create dir. * Removing system from JINJA template * Adding draft processor tests * style fixes * Minor fixes and enhancement * added generation config * Initial tests * Small modifications, tests are now passing. * Small changes I noticed while reading code. * more fixes * Added JanusModel class * Small merge adaptations * Small merge adaptations * Image processing tests passing * More tests and fixes * Convert script updated and refactored * Tests and cleanup * make style * Postprocessing for image generation * generate refactor * fixes * - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload) - Passing tests of dispatching SDPA - Only gradient checkpointing tests are left. * Removing temporary code * Changes * Writing change to modular * Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next * Gradient checkpoint tests passing * Removing debug code * Major generate refactor 😮‍💨 * Temp changes for testing * Green quality CI * 2 out of 4 integration tests passing * breadcrumbs * Usage Examples * Regenerate modeling after merge * dirty code * JanusIntegrationTest are passing * breadcrumbs * happy CI * fixes * Changing template * nits * Text generation logits matching original codebase at 100% precision * Remove ./tmp from git tracking * Remove ./tmp from git tracking * Checkpointing changes after reviewing * Fixing code in docstrings * CHanging comments and small bug in convert file * Fixing bug in image_token_id for 7B version * Removing line that was added by both of us * Pushing changes after discussion. Only one left is to change the key mapping for convert file. * Updating module file * New convert file using dict. Tested that it is equivalent to the old one by: - comparing keys in a script - comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test. * revert changes * mistake * consistency change for CI * make style * doc fixes * more fixes * experimenting with masking out pad token * checkpoint * Batched generation with multi-images working for 1B models. Will test 7B next. * Device fix. * Writing changes to modular, previous ones were written to modeling just for quick testing. * Using passed processor attention mask (only in modeling for now) * Matching performance done in the non-standard way * Working version of batched generation. Will change how some args are passed to make it more similar to language case * More compliant version of the code * Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position` * Updating modular file, making masked filling with paddings more efficient * Slightly more efficient version * Modifying JanusVisionModel to be a wrapper * Fixing test to comply with new names * Modular overhaul * More refactoring * - Changing JanusVisionModel back - Changing forward pass - Adding boi token to the comparison * - Removing whole context model_ids - Using inherited implementation of prepare_inputs_for_generation * Moving the way boi token is passed to the model * Fixing sdpa test * Minor changes * testing changes * Minor fix * - Adding postprocessing test - checking values of generated image on integration test * changes * Removing pooled attention vision module, fixing convert script as a consequence * More changes * Fixes * Draft after merge * Bug fixes * More bug fix * Fixing docs * Nits * Refactor return dict * Moving image post processing test to main processor post process * Passing guidance_scale as kwarg * make style * 🔥 refactor * make style * Update and green CI * Nits and tests update * up * Added MID block * fix * Dead code * update testcase * update * model_id change * init_weight changes --------- Co-authored-by: hsilva664 <metallic-silver@hotmail.com>	2025-04-17 09:18:51 +02:00
Vinh H. Pham	0a83588c51	Bridgetower fast image processor (#37373 ) * add support for fast tokenizer * make style * fix according to reviews * make style * relax slow_fast_equivalence mean diff --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-04-16 22:39:18 +02:00
Zeeshan Khan Suri	a7d2bbaaa8	Add EfficientNet Image PreProcessor (#37055 ) * added efficientnet image preprocessor but tests fail * ruff checks pass * ruff formatted * properly pass rescale_offset through the functions * - corrected indentation, ordering of methods - reshape test passes when casted to float64 - equivalence test doesn't pass * all tests now pass - changes order of rescale, normalize acc to slow - rescale_offset defaults to False acc to slow - resample was causing difference in fast and slow. Changing test to bilinear resolves this difference * ruff reformat * F.InterpolationMode.NEAREST_EXACT gives TypeError: Object of type InterpolationMode is not JSON serializable * fixes offset not being applied when do_rescale and do_normalization are both true * - using nearest_exact sampling - added tests for rescale + normalize * resolving reviews --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-16 21:59:24 +02:00
DerekLiu35	9ddcf5fce5	Update quantization docs (#37439 )	2025-04-16 15:44:53 +02:00
Jinan Zhou	a91020aed0	Add TimesFM Time Series Forecasting Model (#34082 ) * initial documentation * rename mask to attention_mask * smaller tests * fixup * fix copies * move to time series section * sort docs * isort fix * batch_size is not a configuration * rename to TimesFMModelForPrediction * initial script * add check_outputs * remove dropout_rate * works with torch.Tensor inputs * rename script * fix docstrings * fix freq when window_size is given * add loss * fix _quantile_loss * formatting * fix isort * add weight init * add support for sdpa and flash_attention_2 * fixes for flash_attention * formatting * remove flash_attention * fix tests * fix file name * fix quantile loss * added initial TimesFMModelIntegrationTests * fix formatting * fix import order * fix _quantile_loss * add doc for SDPA * use timesfm 2.0 * bug fix in timesfm decode function. * compare mean forecasts * refactor type hints, use CamelCase * consolidate decode func * more readable code for weight conversion * fix-copies * simpler init * renaem TimesFmMLP * use T5LayerNorm * fix tests * use initializer_range * TimesFmModel instead of TimesFmDecoder * TimesFmPositionalEmbedding takes config for its init * 2.0-500m-pytorch default configs * use TimesFmModel * fix formatting * ignore TimesFmModel for testing * fix docstring * override generate as its not needed * add doc strings * fix logging * add docstrings to output data classes * initial copy from t5 * added config and attention layers * add TimesFMPositionalEmbedding * calcuate scale_factor once * add more configs and TimesFMResidualBlock * fix input_dims * standardize code format with black * remove unneeded modules * TimesFM Model * order of imports * copy from Google official implementation * remove covariate forecasting * Adapting TimesFM to HF format * restructing in progress * adapted to HF convention * timesfm test * the model runs * fixing unit tests * fixing unit tests in progress * add post_init * do not change TimesFMOutput * fixing unit tests * all unit tests passed * remove timesfm_layers * add intermediate_size and initialize with config * initial documentation * rename mask to attention_mask * smaller tests * fixup * fix copies * move to time series section * sort docs * isort fix * batch_size is not a configuration * rename to TimesFMModelForPrediction * initial script * add check_outputs * remove dropout_rate * works with torch.Tensor inputs * rename script * fix docstrings * fix freq when window_size is given * add loss * fix _quantile_loss * formatting * fix isort * add weight init * add support for sdpa and flash_attention_2 * fixes for flash_attention * formatting * remove flash_attention * fix tests * fix file name * fix quantile loss * added initial TimesFMModelIntegrationTests * fix formatting * fix import order * fix _quantile_loss * add doc for SDPA * use timesfm 2.0 * bug fix in timesfm decode function. * compare mean forecasts * refactor type hints, use CamelCase * consolidate decode func * more readable code for weight conversion * fix-copies * simpler init * renaem TimesFmMLP * use T5LayerNorm * fix tests * use initializer_range * TimesFmModel instead of TimesFmDecoder * TimesFmPositionalEmbedding takes config for its init * 2.0-500m-pytorch default configs * use TimesFmModel * fix formatting * ignore TimesFmModel for testing * fix docstring * override generate as its not needed * add doc strings * fix logging * add docstrings to output data classes * add _CHECKPOINT_FOR_DOC * fix comments * Revert "fix comments" This reverts commit `8deeb3e191`. * add _prepare_4d_attention_mask * we do not have generative model classes * use Cache * return past_key_values * modules initialized with config only * update year * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add layer_idx to cache * modular timesfm * fix test * unwrap sequential class * fix toctree * remove TimesFmOnnxConfig * fix modular * remove TimesFmStackedDecoder * split qkv layer into individual layers * rename projection layers * use ALL_ATTENTION_FUNCTIONS * is_causal is True * rename config * does not support flash_attn_2 * formatting * fix typo in docsstring * rename inputs * add time series mapping * Update src/transformers/models/olmo2/modeling_olmo2.py * Update src/transformers/models/moonshine/modeling_moonshine.py * use updated arguments * fix class name * add MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING * isort * consolidate _preprocess into forward * fix a typo * fix a typo * fix toc * fix modular * remove aaserts * use self.config._attn_implementation * move to _postprocess_output * remove timesfm_get_large_negative_number * use view unstead of multiple unsqueeze * make helpers static methods of the Model * use to_tuple * use to_tuple if not return_dict * remove unused intitialization block as its incorporated in nn.Linear * remove unused num_key_value_groups * use the same convention as the masking method * update modular * do not use unsqueeze * use view instead of unsqueeze * use buffer for inv_timescales * formatting * modular conversion * remove unneeded intialization * add missing docstrings * remove cache * use simple_eager_attention_forward * support tp_plan * support for flex and flash attention masks * Revert "support for flex and flash attention masks" This reverts commit `def36c4fcf`. * fix device * fix tests on gpu * remove unsued large model test * removed unneeded comments * add example usage * fix style * add import * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * inherit from LlamaRMSNorm * use can_return_tuple decorator * remvoe return_dict * fix year * Update docs/source/en/model_doc/timesfm.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * pretrained does not inherit from GenerationMixin * use model for integration test --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Rajat Sen <rsen91@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-04-16 15:00:53 +02:00
Mohamed Mekkouri	8669c016d2	Refactor torchao docs (#37490 ) * refactor docs * add serialization * Update docs/source/en/quantization/torchao.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * reorder * add link * change automatic to autoquant Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * nits * refactor * add colab * update --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-16 14:56:48 +02:00
Parteek	6fd87d1172	Add Fast Grounding-Dino Processor (#37108 ) * Add Fast Grounding-Dino Processor * Added modular file --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-16 12:26:08 +02:00
Carceller--Meunier Pierre	3165eb7c28	Refactor ColPali model documentation (#37309 ) * Refactor ColPali model documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Include quantisation exemple + real images * simpler image loading --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-15 13:52:11 -07:00
汪志鹏	33c6fdb2cf	Update VITS model card (#37335 ) * Update VITS model card * Update docs/source/en/model_doc/vits.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vits.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vits.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vits.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update vits.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-15 13:16:05 -07:00
Parteek	51f544a4d4	Add Fast Conditional-DETR Processor (#37071 ) * Add Fast Conditional-DETR Processor * Update image_processing_conditional_detr_fast.py * Add modular_conditional_detr.py * Update image_processing_conditional_detr_fast.py * Update tests * make fix --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-15 18:33:34 +02:00
Parteek	4f1dbe8152	Add Fast Chinese-CLIP Processor (#37012 ) * Add Fast Chinese-CLIP Processor * Update dummy_torchvision_objects.py * Fix tests	2025-04-15 18:31:20 +02:00
Merve Noyan	c08997c52e	VDR task guide (#37485 ) * VDR task guide * Add to toctree * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/visual_document_retrieval.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-15 08:55:13 -07:00
Yao Matrix	57da364d8e	fix and enhance pipeline_webserver.md (#36992 ) * fix and enhance pipeline_webserver.md Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update docs/source/en/pipeline_webserver.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/pipeline_webserver.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * use pipe Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-15 08:35:05 -07:00
Parteek	f6c79f767c	Add Fast Yolos Processor (#37292 ) * Add Fast Yolos Processor * Update modular file * Fix copies --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-15 14:23:08 +02:00
Huajie Tan	6f7ea1cf00	Add MLCD model (#36182 ) * Add MLCD model * Update codes for auto-mapping * Add test scripts for MLCD * Update doc for MLCD model * Fix import error * Fix import error * Fix CI error for attention_outputs * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix code style for CI * Fix CI error for initialization * Fix code style for CI * Fix code style for CI * Reformat codes and docs for CI test * Reformat codes and docs for CI test * Remove unused attributes for CI test * Fix style for CI test * List MLCD in flash_attn doc * Fix: typos, modulars, refactors from suggestions * Refactoring convert_mlcd_weights_to_hf.py from suggestions * Fix: docs conflicts * Fix error for CI test * Fix style for CI test * Add integration test for MLCD * Refactoring by class inheritance * Fix: refactor attention interface, adjust codes * Fix: merging conflicts * Fix: merging conflicts * Fix: style for CI test * Fix: style for CI test * Fix: set test_resize_embeddings to be False * Fix: initializer for CI test * Fix: conflicts, CI test, warning and refactoring * Fix: merging conflicts * Refactor * Update docs * Fix mistakes * Remove unused args and fix multi-gpu error * Revert position_embeddings * Solve conflicts * Solve conflicts * Remove dummy * Update _init_weights * Update _init_weights * Update _init_weights for CI test	2025-04-15 11:33:09 +01:00
Parteek	20ceaca228	Add Fast owlvit Processor (#37164 ) * Add Fast Owlvit Processor * Update image_processing_owlvit_fast.py * Update image_processing_owlvit_fast.py --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 17:58:09 +02:00
Parteek	a53a63c9c2	Add Fast Mobilenet-V2 Processor (#37113 ) Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 17:08:47 +02:00
Yann Chéné	4774a39d05	Add ImageProcessorFast to BiT processor (#37180 ) * Add ImageProcessorFast to BiT processor * propose a fast processor and add tests * all tests pass except one * run make * remove useless print * use same test as clip * apply make * Update src/transformers/models/bit/image_processing_bit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update setup.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/bit/image_processing_bit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * apply review comment --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 17:07:48 +02:00
Parteek	e43f168eb3	Add Fast LeViT Processor (#37154 ) * Add Fast LeViT Processor * Update levit.md * Update src/transformers/models/levit/image_processing_levit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * ruff check --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 17:07:36 +02:00
Vinh H. Pham	7cc9e61a3a	Add Fast Image Processor for Donut (#37081 ) * add donut fast image processor support * run make style * Update src/transformers/models/donut/image_processing_donut_fast.py Co-authored-by: Parteek <parteekkamboj112@gmail.com> * update test, remove none default values * add do_align_axis = True test, fix bug in slow image processor * run make style * remove np usage * make style * Apply suggestions from code review * Update src/transformers/models/donut/image_processing_donut_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * add size revert in preprocess * make style * fix copies * add test for preprocess with kwargs * make style * handle None input_data_format in align_long_axis --------- Co-authored-by: Parteek <parteekkamboj112@gmail.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 16:24:01 +02:00
Vinh H. Pham	1897a02d83	Add Fast Image Processor for LayoutLMv3 (#37201 ) * support fast image processor layoutlmv3 * make style * add warning and update test * make style * Update src/transformers/models/layoutlmv3/image_processing_layoutlmv3_fast.py * Update image_processing_auto.py --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 15:42:11 +02:00
Cypher Pepe	7bff4bdcf6	Fixed broken links (#37466 ) * Update broken link * Update broken link	2025-04-14 14:16:07 +01:00
Vinh H. Pham	e16775d103	Add Fast Image Processor for LayoutLMv2 (#37203 ) * add support layoutlmv2 * make style * Apply suggestions from code review Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * add warning and clean up * make style * Update src/transformers/models/layoutlmv2/image_processing_layoutlmv2_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 15:06:41 +02:00
Vinh H. Pham	49b9a69a36	Add Fast Image Processor for Flava (#37135 ) * support flava fast image processor * run style and quality * update test * update according to reviews * make style * update comment on BICUBIC * make style --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 15:05:31 +02:00
Vinh H. Pham	e7f5724efd	Add Fast Image Processor for Perceiver (#37176 ) * add test and fast image processor * make style * Update src/transformers/models/perceiver/image_processing_perceiver_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * make style --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 13:49:13 +02:00
BakerBunker	4b8c6d4cf8	Add Qwen2.5-Omni (#36752 ) * Add qwen2.5-omni * Remove einops dependency * Add torchdiffeq dependency * Sort init * Add torchdiffeq to extras['diffeq'] * Fix repo consistency * use cached_file * del odeint * renew pytest * format * Remove torchdiffeq * format * fixed batch infer bug * Change positional_embedding to parameter * Change default speaker * Config revision * Use modular & code clean * code clean * decouple padding with model & code cleaning * sort init * fix * fix * Second code review * fix * fix * rename vars to full name + some comments * update pytest * Code clean & fix * fix * style * more clean up * fixup * smaller vision model in tests * fix processor test * deflake a bit the tests (still flaky though) * de-flake tests finally + add generation mixin * final nits i hope * make sure processor tests are complete * replace with Qwen2_5OmniForConditionalGeneration * fix tests after updating ckpt * fix typos when cleaning, also we can't change ckpt * fixup * images and videos kwargs for processor * thinker and talker loadable from hub ckpt * address comments and update tests after rebase * fixup * skip for now * fixup * fixup * remove torch dependency in processors --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con> Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com> Co-authored-by: raushan <raushan@huggingface.co>	2025-04-14 12:36:41 +02:00
Joao Gante	aaf129cdae	[agents] remove agents 🧹 (#37368 )	2025-04-11 18:42:37 +01:00
Alex Brooks	623d395aff	Add Granite Speech Support (#36801 ) * First pass at speech granite Add encoder / projector, rename things * Combine into one model file with causal lm outputs for forward * Add loss calc * Fix config loading Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Split new / old loading logic * Use transformers integration for loading peft adapters * Add generation wrapper for selective lora enablement * Add note for qformer encoder automodel * Guard torch/audio imports in feature extractor * Handle granite speech autoclasses * Handle optional deps in package structure for granite speech * Add granite pretrained model def for init * Add dummy objects for torch/torchaudio * Add tests for granite speech processor * Minor formatting fixes and refactoring * Add options for falling back to config in forward * Tentative model docstrings for granite speech * Fix config type * Remove legacy load * Allow non-lora variants for granite speech * Override weight tying for llm * Use text config instead of llm config * Add output embeddings getter to fix weight tying * Fix relative imports * computing the number of audio features, based on the raw audio sequence. * collating audio inputs, and keeping the original lengths. * asserted we have text. otherwise we can't specify the audio special token. * assering the number of audio-symbols/audios match correctly. running get validated_audios only when audio is present * indentation bugfix + supporting different feature lengths when expanding audio. * redundant, done in _get_validated_text * adapting the tests: - we must have text (not either audio or text) - _get_num_audio_features takes a list of raw lengths, provided it insetad. * Minor cleanup, remove unused import * Add more tests for batch feature processing * Allow setting offset in rel position embeddings * Add config option for warning if peft is not installed w/ lora * Port blip2 qformer code into granite speech * Add sad test for numpy arr processing * Allow numpy arrays / tuples in granite speech processor * Fix config type for projector * - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16) - cast input_features to the model dtype (support bfloat16) * merge Blip2QFormerConfig to GraniteSpeechProjectorConfig * prevent a crash when re-saving/loading the model (line 109) * consider additional edge cases during preprocessing. * consider additional edge cases during preprocessing. * add features mask for batched inference (bugfix) * Minor refactor, remove multiaudio processor tests * Add set input/output embeddings for granite speech * Fix feature dim check in processor test * Pop input features in embed test for granite speech * Small fixes for test edge cases Add granite speech to seq2seq causal lm mapping names * Add small tests for granite speech model * Fix data parallelism test * Standardize model class names * Fix check for copies * Fix misaligned init check * Skip granite speech in checkpoint check * Use default for tie_word_embeddings in granite speech * Fix non documentation granite speech repo issues * Fix comments and docstring checks * Add placeholder docs for granite speech * Fix test naming collision * Code formatting * Rerun torch dummy obj regen * Fix save pretrained for granite speech * Import sorting * Fix tests typo * Remove offset hack * Pass args through encoder config * Remove unused prune heads from blip2 * removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention. * remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention * remove GraniteSpeechConformerScale * rename to hidden_states * rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous. * move pre-norm to the attention/feedforward blocks (avoid complex module wrapping) * adding pre_norm into forward * feature extractor refactoring to resemble how it's done in phi4multimodal. * rename feature_extractor to audio_processor * bugfix: input_feature_mask fix to get the exact number tokens. * Fix pytest decorator in processor test * Add (disabled) integration tests for granite speech * Fix handling of optional feature masking * Loosen validation in processing for vLLM compatability * Formatting fixes * Update init structure to mirror llama * Make granite speech projector generic * Update test config to reflect generic projector * Formatting fixes * Fix typos, add license * Fix undefined var in input processing * Cleanup and expose ctc encoder * Add missing config docstrings * Better var names, type hints, etc * Set attn context size in init * Add max pos emb to encoder config * Cleanup feature extractor * Add granite speech architecture details * Remove granite speech qformer ref * Add paper link, explicit calc for qkv * Calculate padding directly in depthwise conv1d init * Raise value error instead of asserting * Reorder class defs (classes used at top) * Precompute relpos distances * Run formatting * Pass attention distances through forward * Apply suggestions from code review Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> * Add todo for using common batch feature extraction * Rename audios/features * Ensure chat template may be provided to processor * Move granite speech docs to audio models * Add todos for input proc refactoring * Fix import order * Guard torch import * Use relative imports * Require torch backend for processor in granite speech * Add backend guards in feature extractor --------- Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>	2025-04-11 18:52:00 +02:00
Lysandre Debut	54a123f068	Simplify soft dependencies and update the dummy-creation process (#36827 ) * Reverse dependency map shouldn't be created when test_all is set * [test_all] Remove dummies * Modular fixes * Update utils/check_repo.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * [test_all] Better docs * [test_all] Update src/transformers/commands/chat.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * [test_all] Remove deprecated AdaptiveEmbeddings from the tests * [test_all] Doc builder * [test_all] is_dummy * [test_all] Import utils * [test_all] Doc building should not require all deps --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-11 11:08:36 +02:00
Mehant Kammakomati	7d76876498	(Part 2) feat: allow for tp_size attr for tplizing the model (#37054 ) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: nit in docs Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>	2025-04-10 17:44:09 +02:00
AbdelKarim ELJANDOUBI	7ecc5b88c0	Add image classifier donut & update loss calculation for all swins (#37224 ) * add classifier head to donut * add to transformers __init__ * add to auto model * fix typo * add loss for image classification * add checkpoint * remove no needed import * reoder import * format * consistency * add test of classifier * add doc * try ignore * update loss for all swin models	2025-04-10 15:00:42 +02:00
Raushan Turganbay	1ae8d54b04	[chat-template] Unify tests and clean up 🧼 (#37275 ) * fix tests and some clean up * make one general test for each modality * remove redundant merging of kwargs * edge cases * dont enforce slow when reloading * fix gemma3 tests * has to adapt llama 4 after rebase * remove also from overriden tests * should be green now	2025-04-10 14:42:32 +02:00
DerekLiu35	2527f71a47	Add "selecting a quantization method" doc (#37159 ) * initial draft * make documentation simpler * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * turn pros and cons into tables * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add links to each quant method page * separate calibration vs no calibration methods * add calibration time estimates --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-09 15:51:37 +02:00
Arthur	e3eda6d188	Add glm4 (#37388 ) * add changed * Revert "add changed" This reverts commit `0a0166a1fe`. * update with NEW MODEL class called GLM4 * update * Update glm4.md * Name * style * fix copies * fixup test --------- Co-authored-by: Yuxuan Zhang <2448370773@qq.com>	2025-04-09 14:02:04 +02:00
logesh R	31a62c2eb8	Updated Model-card for donut (#37290 ) * Updated documentation for Donut model * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestions * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestion to Align with the AutoModel example * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated notes section included code examples * close hfoption block and indent --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:54:47 -07:00
Parag Ekbote	e2b0224d94	Update Model Card for Jamba (#37152 ) * Update model card for jamba * Apply the suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review-2 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update model page. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update as per code review. * Update docs/source/en/model_doc/jamba.md as per code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/jamba.md as per code review ` Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update as per code review. * fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:02:59 -07:00
Devesh Rahatekar	6cc109c354	Improvements in Gemma2 model card (#37076 ) * Improved Model card for Gemma2 * Made changes in gemma2 as suggested * Made more changes in the doc (adding image, notes, closing hfoptions) * minor fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:51:26 -07:00
Ashvanth.S	3a826a45ca	Update Model card for GPT2 (#37101 ) * Update Model card for gpt2 * Update link for gpt2 space * fixes docs based on suggestions * Add transformers-cli and quantization example for GPT-2 * Remove resources and flash attention docs and fix typos	2025-04-07 10:15:28 -07:00
Ricardo Alanis	5e855095a2	Update falcon mamba card (#37253 ) * feat: edit falcon mamba card * fix: edit statement on falconmamba arch * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: add right indent for tags * fix: remove notas --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:12:44 -07:00
Shubham Panchal	416b5a875d	Update model-card for DINOv2 (#37104 ) [docs] Update model-card for DINOv2	2025-04-07 10:11:08 -07:00
Nahieli	f8a16805c5	updated model card for Mistral (#37156 ) * model card for Mistral * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply suggestions * fix typo * updated with comments * updated with comments * updated with comments * remove hfoption block --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:05:36 -07:00
Yih-Dar	e7ad077012	byebye torch 2.0 (#37277 ) * bump Torch 2.1 with broken compatibility `torch.compile` * dep table * remove usage of is_torch_greater_or_equal_than_2_1 * remove usage of is_torch_greater_or_equal_than_2_1 * remove if is_torch_greater_or_equal("2.1.0") * remove torch >= "2.1.0" * deal with 2.0.0 * PyTorch 2.0+ --> PyTorch 2.1+ * ruff 1 * difficult ruff * address comment * address comment --------- Co-authored-by: Jirka B <j.borovec+github@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-07 15:19:47 +02:00
Arthur	25b7f27234	Add llama4 (#37307 ) * remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <\|image\|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <\|eom\|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe" This reverts commit `ccda19f050`, reversing changes made to `a515579aed`. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-05 22:02:22 +02:00
Linnet Cosmos Tuscano	0ef339ff1b	Update OpenAI GPT model card (#37255 ) * Update OpenAI GPT model card * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update OpenAI GPT model card: add usage examples and notes section * Add API autodoc tags after Notes section for OpenAI GPT model * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Added missing badges --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:25:16 -07:00
Sharareh Younesian	46d73910d5	Updated T5 model card with standardized format (#37261 ) * Updated T5 model card with standardized format * Updated T5 model card with standardized format, fixed typo * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply reviewer suggestions * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:23:09 -07:00
Chathumina Vimukthi	579135a2f6	Updated model card for distilbert (#37157 ) * Updated model card for distilbert * Updated the distilbert model card * Updated model card for distilbert * Updated the distilbert model card * Addressed code review comments * Addressed review comments * fix pipeline --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:22:46 -07:00
Reshan Gomis	8cd57eb731	mobilebert model card update (#37256 ) * mobilebert model card update * Updates to model card mobilebert --------- Co-authored-by: Reshan Gomis <reshang@verdentra.com>	2025-04-04 14:28:35 -07:00
Shubham Panchal	531e4fcf0e	Update model card for Depth Anything (#37065 ) [docs] Update model card for Depth Anything	2025-04-04 11:36:05 -07:00
Joao Gante	ad3d157188	[RoPE] abstract dynamic RoPE update under a decorator ✨ (#37249 ) * dynamic rope decorator * longrope; shorter fwd pass * propper docstring * make fixup	2025-04-04 14:27:28 +01:00
Surya Garikipati	8dd0a2b89c	Update model card for electra (#37063 ) * Update ELECTRA model card with new format * Update ELECTRA model card with new format * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/electra.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * close hfoption block --------- Co-authored-by: Wun0 <f20191221@hyderabad.bits-pilani.ac.in> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-03 10:45:35 -07:00
Parag Ekbote	15ac2b6ac5	Update Model Card for ModernBERT (#37052 ) * Modify Model Card for ModernBERT. * Update as per code review. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update model card. * Update model card. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-03 10:14:02 -07:00
Abhishek Ranjan	b552708694	chore: Update model doc for code_llama (#37115 ) * Update code_llama.md aims to handle https://github.com/huggingface/transformers/issues/36979#issuecomment-2758560598 sub part of https://github.com/huggingface/transformers/issues/36979 * Update docs/source/en/model_doc/code_llama.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/code_llama.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/code_llama.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * make changes as per code review * chore: make the function smaller for attention mask visualizer * chore[docs]: update code_llama.md with some more suggested changes * Update docs/source/en/model_doc/code_llama.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * chore[docs] : Update code_llama.md with indentation changes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-03 10:09:41 -07:00
Bimal Gajera	2b84831a93	Update model card for Cohere (#37056 ) * Update Cohere model card to follow standard template * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update cohere.md Update code snippet for AutoModel, quantization, and transformers-cli * Update cohere.md * Update docs/source/en/model_doc/cohere.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-03 09:51:40 -07:00
Avigyan Sinha	1b29409d89	feat: updated model card for qwen_2.5_vl (#37099 ) * feat: updated model card for qwen_2.5_vl * applied suggested change 1 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applied suggested change 2 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applied suggested change 3 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: made requested changes for quantization and notes * suggeested model card change 4 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated model card wiht suggested change 5 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated model card wiht suggested change 6 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated model card wiht suggested change 7 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * feat: applied requested changes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-03 09:13:26 -07:00
Ryan Mullins	3f6af96732	Adding links to ShieldGemma 2 technical report (#37247 )	2025-04-03 16:26:29 +01:00
Joao Gante	9a1c1fe7ed	[CI] green llama tests (#37244 ) * green llama tests * use cleanup instead * better test comment; cleanup upgrade * better test comment; cleanup upgrade	2025-04-03 14:15:53 +01:00
Raushan Turganbay	98601cc818	[Phi4] add multimodal chat template (#36996 ) * phi4 chat template * remove from valid kwargs	2025-04-03 09:52:09 +02:00
ARAVINDHAN T	2056287940	Updated model card for Qwen2 (#37192 ) * Update qwen2.md * Update qwen2.md * Update qwen2.md * Update qwen2.md * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update qwen2.md * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-02 18:10:41 -07:00
Ricardo Alanis	3e96a0c32b	Update falcon model card (#37184 ) * feat: updated model card for falcon * fix:rewrite model description * fix: add link to conversion script * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: Add suggested changes * fix: typo in link for quantization * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: fix indent and close ticks * fix: add indent --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-02 17:30:37 -07:00
Purusharth Malik	199d7adf10	Updated the model card for CLIP (#37040 ) * Update clip.md * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Incorporated suggested changes * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/clip.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-02 14:57:38 -07:00
Bowen Bao	800510c67b	[doc] Fix link for Quark quantization page (#37179 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-01 20:57:38 +02:00
Armaghan Shakir	0710e9b1e8	Create and Expose SamVisionModel as public for better accessibility (#36493 ) * move encoder below * auto modeling * write SamVisionTester * fix vision attention shape * fix SamVisionTest * minor changes to SamVisionTest * Revert "fix vision attention shape" This reverts commit `d2a4083ae5`. * fix attention output shape in new tests * remove encoder examples * run modular on got_ocr2 * code formatting * fix got_ocr2 * ruff fixes * code quality * add sam_vision in auto modeling and auto configuration * remove composite test * updated index.md * add TFSamVisionEncoder to __init__ * fix public TFSamVisionEncoder * remove outdated todo comment * set test_torch_exportable Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * rename: VisionEncoder -> VisionModel * bring back original SamVisionEncoder * rename back: VisionEncoderOutput -> VisionModelOutput * undo changes in SamModelTester * reuse SamVisionEncoder in SamVisionModel --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-03-31 11:45:07 +02:00
jiqing-feng	286393fbb1	enable tp on CPU (#36299 ) * enable tp on CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * get rank from cpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable TP tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * em print Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix model id Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix conflict Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix index and add doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-03-31 10:55:47 +02:00
Bo Zheng	6acd5aecb3	Adding Qwen3 and Qwen3MoE (#36878 ) * Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-03-31 09:50:49 +02:00
MinJu-Ha	0d6a60fe55	🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean (#36750 ) * fix: manual edits * fix: resolve suggestions * Update toctree.yml	2025-03-30 15:00:27 -07:00
Cyril Vallez	2bea6bf24e	Fix AttentionInterface following feedback (#37010 ) * up * typo * update doc * Update attention_interface.md	2025-03-28 18:00:35 +01:00
Minho Ryu	eca74d1367	[WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit `f264f800d0`. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>	2025-03-28 15:56:59 +01:00
Abu Bakr Soliman	49b5ab6a27	Support QuestionAnswering Module for ModernBert based models. (#35566 ) * push ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * update __init__ loading * set imports for ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * remove debugging logs * update init_weights method * remove custom initialization for ModernBertForQuestionAnswering * apply make fix-copies * apply make style * apply make fix-copies * append ModernBertForQuestionAnswering to the pipeline supported models * remove unused file * remove invalid autoload value * update en/model_doc/modernbert.md * apply make fixup command * make fixup * Update dummies * update usage tips for ModernBertForQuestionAnswering * update usage tips for ModernBertForQuestionAnswering * add init * add lint * add consistency * update init test * change text to trigger stuck text * use self.loss_function instead of custom loss By @Cyrilvallez Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update modeling_modernbert.py make comparable commit to even it out * Match whitespace * whitespace --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Orion Weller <wellerorion@gmail.com> Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-03-26 21:24:18 +01:00
Steven Liu	3a8ec8c467	[docs] Attention mask image (#36970 ) add image	2025-03-26 10:11:34 -07:00
Cyril Vallez	788e1092e9	Allow easy registration of custom attention functions (#36889 ) * Update modeling_utils.py * style * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * add to init * Update modeling_utils.py * style * update * Update modeling_utils.py * Update modeling_utils.py * style * Add some doc * Update _toctree.yml * readd it for tgi/vllm compat * CIs * CIs	2025-03-26 16:15:06 +01:00
Steven Liu	a844297088	[docs] Fix image link (#36869 ) * fix image link * fix * update * fix	2025-03-25 11:34:21 -07:00
Cyril Vallez	4303d88c09	Add Phi4 multimodal (#36939 ) * raw start * update * update * add to imports * update * up * simplify configs * clean configs * style * typos * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * fix * up * up * up * Update convert_phi4_multimodal_weights_to_hf.py * Update convert_phi4_multimodal_weights_to_hf.py * up * up * up * Update feature_extraction_phi4_multimodal.py * up * up * up * up * up * simplify configs * typo * cut code * typo * typo * typo * re * typo * up * up * up * add tests * fix * fix * Update test_modeling_phi4_multimodal.py * up * Update test_modeling_phi4_multimodal.py * doc * fix * up * up * up * up * up * up * simplify * up * simplify * config docstrings * cleanup * clean * typo * typo * fix * Update phi4_multimodal.md * fix * fix * Update test_modeling_phi4_multimodal.py * update * simplify reshapes and permutes * up * simplify special tokens * simplify processor a lot * Update processing_phi4_multimodal.py * Update processing_phi4_multimodal.py * switch to fast processor * image processor * Update image_processing_phi4_multimodal_fast.py * add lora extraction to converter * Update convert_phi4_multimodal_weights_to_hf.py * Update __init__.py * add AudioInput type in audio_utils * rewrite feature_extraction: support torch batched FFT * input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values * test update * not mono channel warning update * remove auto maps from processor * kargs dispatch in processor * simplify kwargs dispatch * simplify merging * remove default sampling rate * style * Update test_modeling_phi4_multimodal.py * update doc * doc * torch only feature extractor * make fake tokens adjustable * Update feature_extraction_phi4_multimodal.py * fix * Update processing_phi4_multimodal.py * simplify mask * last touch * fix copies * style * Update audio_utils.py * style * Update feature_extraction_phi4_multimodal.py * Update __init__.py * docstrings * copies * fix all checks * back to fix-copies * trigger CIs * Update feature_extraction_phi4_multimodal.py * improve tests with multimodal inputs * trigger CIs --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>	2025-03-25 09:55:21 +01:00
Raushan Turganbay	47e5432805	Deprecate #36741 and map Causal to Conditional (#36917 ) * deprecate the prev fix * reword warning and update docs * reword warning * tests * dont bloat `get_text_config()`	2025-03-25 09:13:56 +01:00
omahs	cbf924b76c	Fix typos (#36910 ) * fix typos * fix typos * fix typos * fix typos	2025-03-24 14:08:29 +00:00
Aritra Roy Gosthipaty	c9d1e5238a	Update installation.md (#36826 ) * Update installation.md * Update README.md	2025-03-21 16:32:02 -07:00
Steven Liu	d253de6d58	[docs] Model docs (#36469 ) * initial * fix * fix * update * fix * fixes * quantization * attention mask visualizer * multimodal * small changes * fix code samples	2025-03-21 15:35:22 -07:00
Joao Gante	949cca4061	[CI] doc builder without custom image (#36862 ) * no image * test * revert jax version updates * make fixup * update autodoc path for model_addition_debugger * shieldgemma2 * add missing pages to toctree	2025-03-21 09:10:27 +00:00
Pablo Montalvo	1d3f35f30a	Add model visual debugger (#36798 ) * draft of model tracer visualiser * add context manager in addition to decorator * add debug utils to init * move model debugging utils to dedicated file * add documentation * protect some imports * format * move and protect imports * format * doc: improve errors in case of broken dummy imports. * format * use automatic torch backend * update doc * fix backend * (TEMP) move to dummies while backend wait * update documentation * doc	2025-03-20 17:37:29 +01:00
Haotong LIN	6515c25953	Add Prompt Depth Anything Model (#35401 ) * add prompt depth anything model by modular transformer * add prompt depth anything docs and imports * update code style according transformers doc * update code style: import order issue is fixed by custom_init_isort * fix depth shape from B,1,H,W to B,H,W which is as the same as Depth Anything * move prompt depth anything to vision models in _toctree.yml * update backbone test; there is no need for resnet18 backbone test * update init file & pass RUN_SLOW tests * update len(prompt_depth) to prompt_depth.shape[0] Co-authored-by: Joshua Lochner <admin@xenova.com> * fix torch_int/model_doc * fix typo * update PromptDepthAnythingImageProcessor * fix typo * fix typo for prompt depth anything doc * update promptda overview image link of huggingface repo * fix some typos in promptda doc * Update image processing to include pad_image, prompt depth position, and related explanations for better clarity and functionality. * add copy disclaimer for prompt depth anything image processing * fix some format typos in image processing and conversion scripts * fix nn.ReLU(False) to nn.ReLU() * rename residual layer as it's a sequential layer * move size compute to a separate line/variable for easier debug in modular prompt depth anything * fix modular format for prompt depth anything * update modular prompt depth anything * fix scale to meter and some internal funcs warp * fix code style in image_processing_prompt_depth_anything.py * fix issues in image_processing_prompt_depth_anything.py * fix issues in image_processing_prompt_depth_anything.py * fix issues in prompt depth anything * update converting script similar to mllamma * update testing for modeling prompt depth anything * update testing for image_processing_prompt_depth_anything * fix assertion in image_processing_prompt_depth_anything * Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/model_doc/prompt_depth_anything.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/model_doc/prompt_depth_anything.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update some testing * fix testing * fix * add return doc for forward of prompt depth anything * Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix prompt depth order * fix format for testing prompt depth anything * fix minor issues in prompt depth anything doc * fix format for modular prompt depth anything * revert format for modular prompt depth anything * revert format for modular prompt depth anything * update format for modular prompt depth anything * fix parallel testing errors * fix doc for prompt depth anything * Add header * Fix imports * Licence header --------- Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-03-20 16:12:44 +00:00
Pavel Iakubovskii	66291778dd	Refactor Attention implementation for ViT-based models (#36545 ) * Refactor vit attention * Refactor ViT-based models * 🚨🚨🚨 Fix prefix for DPT * Update params order * trigger tests * Fix Dinov2 attention * Fix DPT attention impl propagation for backbone config * Common test fix: config is modif. inplace - avoid it * view->reshape * Fixup * Fixup * Enable IJepa FA2 * Add FA2 in corresponding model docs	2025-03-20 15:15:01 +00:00
fxmarty-amd	1a374799ce	Support loading Quark quantized models in Transformers (#36372 ) * add quark quantizer * add quark doc * clean up doc * fix tests * make style * more style fixes * cleanup imports * cleaning * precise install * Update docs/source/en/quantization/quark.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/quark_integration/test_quark.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * remove import guard as suggested * update copyright headers * add quark to transformers-quantization-latest-gpu Dockerfile * make tests pass on transformers main + quark==0.7 * add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Bowen Bao <bowenbao@amd.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-03-20 15:40:51 +01:00
Ryan Mullins	487dab1b2b	Shieldgemma2 (#36678 ) * single commit * correct config * fixup * dummy pt * Use ShieldGemma2Config in conversion script * Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py * Adding shieldgemma2 to models.__init__.py * Adding ShieldGemma2 to main __init__.py * Update shieldgemma2.md * Update shieldgemma2.md * Adding tests. Addressing review feedback. * Minor docs update * Fixing code quality feedback from CI * Fixing empty messages bug reported by ghunkins --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Ren Pang <ain-soph@live.com>	2025-03-20 15:14:38 +01:00
Joao Gante	957b05b413	[qwen2 audio] remove redundant code and update docs (#36282 )	2025-03-20 10:54:51 +00:00
HDCharles	94555437e2	Disable inductor config setter by default (#36608 ) * Disable inductor config setter by default This is hard to debug and should be off by default * remove default settings in autoquant too * Add info to torchao.md about recommended settings * satisfying Ruff format Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-03-20 11:23:14 +01:00
Matt	9be4728af8	Just import torch AdamW instead (#36177 ) * Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf	2025-03-19 18:29:40 +00:00
Mohamed Mekkouri	258dd9cc69	Add Space to Bitsandbytes doc (#36834 ) * add space * address review	2025-03-19 18:56:07 +01:00
Driss Guessous	e8d960329e	Add option for ao base configs (#36526 )	2025-03-19 14:59:47 +01:00
Yoni Gozlan	12f2ebef63	Support custom dosctrings in modular (#36726 ) * Override docstrings in modular if not none * Update doc	2025-03-18 14:00:54 -04:00
Yoni Gozlan	30580f035b	Fix Mistral3 tests (#36797 ) * fix processor tests * fix modeling tests * fix test processor chat template * revert modeling test changes	2025-03-18 13:08:12 -04:00
Cyril Vallez	e959530b8f	Add Mistral3 (#36790 ) * initial start * style and dummies * Create convert_mistral3_weights_to_hf.py * update * typo * typo * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * up * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * update * update * Update image_processing_mistral3.py * Update convert_mistral3_weights_to_hf.py * fix patch merger * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * up * update modular to fit * style * Update convert_mistral3_weights_to_hf.py * typo * Update modular_mistral3.py * simplify a lot all shape shenanigans * simplify * add working test processor * Add partially working common modeling tests * All tests working and remove mistral3 image processors * add docs and fixup * fix inference with image size >1540 * 🚨fix test image proc pixtral * Remove vision_feature_select_strategy * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * clean * fix test checkpoints * Update test_modeling_mistral3.py * Update test_modeling_mistral3.py * style * Use Pixtral processor * up * finish cleaning processor to use pixtral directly * Update __init__.py * Update processing_pixtral.py * doc * Update __init__.py * Update mistral3.md * Update _toctree.yml --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>	2025-03-18 12:04:42 +01:00
Steven Liu	ac1a1b66b9	[docs] Update README (#36265 ) * update * feedback * feedback * update versions	2025-03-17 09:37:19 -07:00
Christopher Akiki	e3af4fec91	[MINOR:TYPO] Update hubert.md (#36733 ) * [MINOR:TYPO] Update hubert.md - typo fix (wave2vec instead of hubert) - make code snippet copiable and runnable * Run tests	2025-03-17 09:07:51 -07:00
MaCAT	25992b493c	🌐 [i18n-KO] Translated codegen.md to Korean (#36698 ) * Initial translation * Add _toctree.yml	2025-03-14 09:31:18 -07:00
Yoni Gozlan	69bc848480	Add support for fast image processors in add-new-model-like CLI (#36313 ) * add support for fast image processors in add-new-model-like * fix header not found add-fast-image-processor-cli * Encourage adding fast image processor * nit * start improve doc * update docs * make requested modifs	2025-03-13 14:16:37 -04:00
Arthur	2829013d2d	fix block mask typing (#36661 ) * fix block mask typing * updated Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * gemma * fix --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-03-12 11:29:11 +01:00
Ryan Mullins	50d3530aa0	Gemma3 (#36658 ) * Fix converter * [Broken] Adds Gemma 3 to Hugging Face Transformers * Consolidating Config and Processor params across impls * Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right. * Additional plumbing for CausalLM and ConditionalGeneration variants * incomplete draft of Orbax conversion script * More complete checkpoint conversion * Supporting Gemma 3 1B checkpoints * Updating RoPE for multiple frequencies * Adjustments to rotary embedder * Proof of life for text-only operation * Updating the conversion script to handle multimodal projection weights * Fixing tet-only conversions * Cleaner conversion script with multimodal support and a simpler processor * Additional refatcors to the Gemma3Processor * Simplified Processor to work over text representations * Updated conversion script to join text and vision embeddings at converion time * Logging for debugging * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joshua Lochner <admin@xenova.com> * Removed extraneous Config params * Switching to fast tokenizer for checkpoint conversions * isolating siglip for performance tetsing * Minor changes for debugging tests against baselines * Adding average pooling for soft tokens * Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts * Updating conversion script for ShieldGemma 2 conversion compatibility * Allow disable_compile to be provided as a kwarg * Refresh from modular * Updated conversion script and corrected sliding window * Fix type mismatch in cache_position (#4) * Fix dtype (#5) * Fix type mismatch in cache_position * Actually fix in the modular file Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> --------- Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> * fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor * Adding 2D pooling for image embeddings * Revert "Adding 2D pooling for image embeddings" This reverts commit `65350cf531`. * Gemma3 average pooling changed from 1D to 2D * Major refactor to Gemma3MultimodalInputProjection * Updating Gemm 3 Auto* registrations * Add option to save Gemma 3 chat template with tokenizer during weights conversion * Removing unused imports * Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration * Removing duplicate config property * Removing final logit softcapping and 1-indexing of position ids * Fixing image processor config and none --> None typo * Fixing sliding window size for 1B * Updating image_mean and image_std in Image Processor * Attention masking changed to lower triangular * Moving image special tokens to conversion script * Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs * Remove special token variables from symbol space * Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration * tie lm_head and embedding weights Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Correct tied weights in Gemma3CausalLM * iterative bidirectional attention * resolving merge conflicts * Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6 * Correcting RoPE scaling * clean up first pass, dummy model geenration works * final clean up before fixing tests * causal lm test works, so fine * Fix conversion * Update src/transformers/models/gemma3/processing_gemma3.py * model tests are happy * processor tests are happy * image processing tests added * fixup * Fix pre-processing in conversion * Inputs merging * Do not normalize vision embeddings * Apply Ryan's (and team) changes to attention * token type ids + mask * template * move embed scale, add rope scale, fix tests * Add chat template to tokenizer * Use prefix for causal model loading * use existing code for sliding mask from gemma2 * self.embed_tokens already normalizes * Correcting Gemma3TextConfig parameters in conversion script * typo, modular overwrites my fixes * enable device map for text model * Conversion updates * ultra nit: no einsums * update image token * copy deepcopy config + some docs * add some test, still WIP * Refactoring --include_chat_tempalte logic in converter * Update src/transformers/models/gemma3/modular_gemma3.py Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Add eos tokens for instruct models * dump so i can work on dgx * Removing add_bos by default * dump * add fast im proc * docs for PaS + fixup * another fixup * one more fixup * fix tests * Inverting prior BOS change * ultra nit * Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS * resize embeds, remove sqrt, add slow test outputs * FA2 but quality is meh * nit * skip FA2, no idea what happened * last bit for green CI * please, green CI for docs * T_T * Fix for Gemma3 logits * Support both options for system prompt * Update src/transformers/models/gemma3/image_processing_gemma3_fast.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Docs updates now that assets are live * Style fixes --------- Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Lysandre <hi@lysand.re>	2025-03-12 09:06:17 +01:00
Afanti	81aa9b2e07	fix typos in the docs directory (#36639 ) * chore: fix typos in the docs directory * chore: fix typos in the docs directory * chore: fix typos in the docs directory	2025-03-11 09:41:41 -07:00
Marc Sun	cb384dcd7a	Fix gguf docs (#36601 ) * update * doc * update * Update docs/source/en/gguf.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-03-11 15:29:14 +01:00
Matt	1e4286fd59	Remove research projects (#36645 ) * Remove research projects * Add new README to explain where the projects went * Trigger tests * Cleanup all references to research_projects	2025-03-11 13:47:38 +00:00
Steven Liu	e9756cdbc7	[docs] Serving LLMs (#36522 ) * initial * fix * model-impl	2025-03-10 13:14:19 -07:00
Krishnakumar Kannan	1b9978c360	Update chat_extras.md with content correction (#36599 ) Update chat_extras.md - content Fixed a typo in the content, that may confuse the readers.	2025-03-07 13:09:02 +00:00
Nouamane Tazi	51ed61e2f0	Mention UltraScale Playbook 🌌 in docs (#36589 )	2025-03-06 14:48:11 -08:00
Aritra Roy Gosthipaty	159445d044	fix: argument (#36558 ) `752ef3fd4e/utils/modular_model_converter.py (L1729)`	2025-03-06 13:11:19 -08:00
Shaohon Chen	0440dbc0e1	Integrate SwanLab for offline/online experiment tracking and local visualization (#36433 ) * add swanlab integration * feat(integrate): add SwanLab as an optional experiment tracking tool in transformers - Integrated SwanLab into the transformers library as an alternative for experiment tracking. - Users can now log training metrics, hyperparameters, and other experiment details to SwanLab by setting `report_to="swanlab"` in the `TrainingArguments`. - Added necessary dependencies and documentation for SwanLab integration. * Fix the spelling error of SwanLabCallback in callback.md * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fix typo in comment * Fix typo in comment * Fix typos and update comments * fix annotation * chore: opt some comments --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: AAssets <20010618@qq.com> Co-authored-by: ZeYi Lin <944270057@qq.com> Co-authored-by: KAAANG <79990647+SAKURA-CAT@users.noreply.github.com>	2025-03-06 17:35:30 +01:00
Mohamed Mekkouri	89d27fa6ff	Fix links in quantization doc (#36528 ) fix quantization doc	2025-03-04 16:43:03 +01:00
co63oc	37508816d6	chore: Fix typos in docs and examples (#36524 ) Fix typos in docs and examples Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-03-04 13:47:41 +00:00
Arthur	84f0186e89	Add aya (#36521 ) * initial commit * small fix * move stuff to image processing file * remove stuff in validate turn and fix return tensor * remove liquid stuff * in the process of addressing comments * changes to get the right tokenization * new __init__ works * fixing defulat std and mean * works * small testing scipt -- to be deleted before merge * remove redundant code * addressing comments * fix inits, add docs templates * refactor processor, switch to gotocr image processor * remove image proc from init * refactor to working llava-style architecture * Change AyaVisionModel to AyaVisionForConditionalGeneration * add tests * fixups * update doc * Adding logits_to_keep explicitly in ayavision forward to enable compatibility with cohere model * better variable names + remove code paths * Updates to aya_vision.md * address comments * adding copied from * make style and remove unused projector_hidden_act from config * sort init * include usage of fast image proc and proc on cuda in doc * update checkpoint iin test processor * update checkpoint in test processor 2 * remove test_model and update docstring * skip failing tests --------- Co-authored-by: Saurabh Dash <saurabh@cohere.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-03-04 12:24:33 +01:00
Steven Liu	c0f8d055ce	[docs] Redesign (#31757 ) * toctree * not-doctested.txt * collapse sections * feedback * update * rewrite get started sections * fixes * fix * loading models * fix * customize models * share * fix link * contribute part 1 * contribute pt 2 * fix toctree * tokenization pt 1 * Add new model (#32615) * v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * "to be not" -> "not to be" (#32636) * "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * fix hfoption tag * tokenization pt. 2 * image processor * fix toctree * backbones * feature extractor * fix file name * processor * update not-doctested * update * make style * fix toctree * revision * make fixup * fix toctree * fix * make style * fix hfoption tag * pipeline * pipeline gradio * pipeline web server * add pipeline * fix toctree * not-doctested * prompting * llm optims * fix toctree * fixes * cache * text generation * fix * chat pipeline * chat stuff * xla * torch.compile * cpu inference * toctree * gpu inference * agents and tools * gguf/tiktoken * finetune * toctree * trainer * trainer pt 2 * optims * optimizers * accelerate * parallelism * fsdp * update * distributed cpu * hardware training * gpu training * gpu training 2 * peft * distrib debug * deepspeed 1 * deepspeed 2 * chat toctree * quant pt 1 * quant pt 2 * fix toctree * fix * fix * quant pt 3 * quant pt 4 * serialization * torchscript * scripts * tpu * review * model addition timeline * modular * more reviews * reviews * fix toctree * reviews reviews * continue reviews * more reviews * modular transformers * more review * zamba2 * fix * all frameworks * pytorch * supported model frameworks * flashattention * rm check_table * not-doctested.txt * rm check_support_list.py * feedback * updates/feedback * review * feedback * fix * update * feedback * updates * update --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-03-03 10:33:46 -08:00
co63oc	acb8586dd9	Fix some typos in docs (#36502 ) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-03-03 17:53:53 +00:00
Yoni Gozlan	2c5d038f92	Add Got-OCR 2 Fast image processor and refactor slow one (#36185 ) * refactor image processor slow got ocr * add working image processor fast * fix fast image processor, update doc * use one big loop for processing patches	2025-03-01 00:56:00 -05:00
Fanli Lin	51083d1bac	[docs] fix bug in deepspeed config (#36081 ) bug fix	2025-02-28 07:09:54 -08:00
Nicolas Patry	b4965cecc5	Fixing the docs corresponding to the breaking change in torch 2.6. (#36420 )	2025-02-26 14:11:52 +01:00
Aymeric Roucher	9a217fc327	Deprecate transformers.agents (#36415 )	2025-02-26 11:38:47 +01:00
jiqing-feng	9d6abf9778	enable torchao quantization on CPU (#36146 ) * enable torchao quantization on CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix int4 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable CPU torchao tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cuda tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cpu tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cuda tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torchao available Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torchao available Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torchao config cannot convert to json * fix docs Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm to_dict to rebase Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * limited torchao version for CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/testing_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix cpu test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-25 11:06:52 +01:00
Jerry Zhang	2af272c101	Add autoquant support for torchao quantizer (#35503 ) * Add autoquant support for torchao quantizer Summary: att, also verified that autoquantized model can be saved and loaded: save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061 load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c Test Plan: tested locally with above script model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant Reviewers: Subscribers: Tasks: Tags: * add test * ruff fix * ruff reformat * add docs and min_sqnr support * format * format * fix test * update doc * format * remove disable_compile * format	2025-02-24 15:54:16 +01:00
Pavel Iakubovskii	a957b7911a	Add SigLIP 2 (#36323 ) * Docs * Inits * Auto classes * Add siglip base * Add base tests * Fix Siglip V1 for fix res version * Add image processor * Update conversion * Experimenting with vectorized embeddings * Fixup * Add modular Siglip2Processor * Add modular configuration * Rename num patches * Correct image and text features merging * Working conversion script * Refactoring conversion script * Remove unused code in conversion script * Shorten dict a bit * Refactoring conversion * Done conversion refactoring * Fixup * Modular siglip2 * Make model exportable and compilable without graph breaks * Remove position_ids from image_processor * REmove position ids from modeling file * Update modular * Type hint * Fixup * Set defaults to processor * Add integration test * Revert spatial shapes back to tensor * Change order * Fix most of the tests * Fix docstring * Remove interpolate_pos_encoding arg (not needed) * Update docs * Standardize processing * Fix attention_mask in vision head * Siglip v1: remove double transpose in FA2 * Update modular file * Update FA2 test * Update expected logits * Fix interpolation for siglip2 image processor * Skip init test * Skip dispatch on flash test * Fix modeling tests * Fixup * Add dummy objects * Fix some docstrings * Add siglip2 in index.md * Fix consistency * Add docs * Remove size and data format * Add image processor tests * Fix * Add fast image processor * Fix style * Fix * Docs * Set lowercase for tokenizer * Adjust head size for Siglip v1 * Update siglip2 for consistency with siglip1 * Update siglip2 conversion * Update pipeline * Update checkpoints in tests * Update checkpoint name * Fix pooling for image classification model * Fix FA2 test * Update processor * Fix check repo * Update docs * Fix typos * Fix docstring for fast image processor * Add siglip2 to FA2 docs * Fix fast ip tests * Fix constitency * Fix tokenizer class for siglip v1 * Fix missing header * Refactor scaling for clip, siglip, siglip2 * Remove unused imports * Make fast IP default for siglip2 * Update docs * Update checkpoints * Update modular * Update paper link * Fixup * Fix name in toctree * Fix test	2025-02-21 09:04:19 +00:00
Joao Gante	27d1707586	[smolvlm] make CI green (#36306 ) * add smolvlm to toctree * add requirements * dev-ci * no docker changes * dev-ci * update torch-light.dockerfile * derp * dev-ci	2025-02-20 18:56:11 +01:00
12v	5412ff1a13	Fix typo in Pixtral example (#36302 ) Fix typo	2025-02-20 14:13:48 +00:00
Orr Zohar	4397dfcb71	SmolVLM2 (#36126 ) * smolvlm init * updates * fixing bugs * minimal run, no checks * minimal run, no checks * passing first check + adding url support * updating video dataloading logic * fixing image logic * trying modular, but fails * modular is working, changing processor to match PR comments and general transformers logic * fixing kwargs * offloading video loading logic to image_util * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * fixing circleci code formatting errors * update * add idefics3-based tests * add keyword to all * add PreTrainedModel * updateing video loading logic * working inference * updates for PR comments * updates for PR comments * moving SmolVLMPretrainedModel higher to fix import error * CI test pass * CI test pass * removing lambda * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * CI test pass * processor tests * add example in docs * typo * fix copies * skip compile tests - sdpa for VisionTransformer * fix init * raise import error for num2words * update doc for FA2 * more doc fix * CI * updates for PR comments * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Joshua Lochner <admin@xenova.com> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc * adding smolvlm to VQA models * removing vqa auto class * Update src/transformers/models/smolvlm/processing_smolvlm.py Co-authored-by: Joshua Lochner <admin@xenova.com> * removing smolvlmvisiontransformer from index.md * my bad, video processing had typos * fixing docs * renaming params in SmolVLMModel.inputs_merger * removing un-needed dtype/device in model forward * ruff for CI * update docs * Update docs/source/en/model_doc/smolvlm.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * return cache position * return cache position * return cache also in modular * needed to run modular again * fix training tests * push vectorized inputs merger * format * format * reduce number of mappings * addressing PR comments * happy CI, happy me :) * skip non-nested images * adjust integration test for smaller GPUs * format * fix kwargs in chat template apply * skip this for now --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-02-20 15:00:26 +01:00
Joao Gante	99adc74462	[tests] remove flax-pt equivalence and cross tests (#36283 )	2025-02-19 15:13:27 +00:00
Joao Gante	0863eef248	[tests] remove `pt_tf` equivalence tests (#36253 )	2025-02-19 11:55:11 +00:00
Mehant Kammakomati	c3ba53303b	feat: add support for tensor parallel training workflow with accelerate (#34194 ) * feat: add support for tensor parallel flow using accelerate Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: add tp degree to env variable Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: add version check for accelerate to allow TP Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * docs: tensor parallelism Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * nit: rename plugin name Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: guard accelerate version before allow tp Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * docs: add more docs and updates related to TP Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-18 14:05:46 +01:00
Mayank Mishra	a570e2ba87	add shared experts for upcoming Granite 4.0 language models (#35894 ) * Modular GraniteMoE with shared Experts. Signed-off-by: Shawn Tan <shawntan@ibm.com> * Modified * Import order. * Modified for style * Fix space. * Test * Remove extra granitemoe file. * New converted file and tests * Modified __init__ files. * Formatting. * Dummy PT objects * register granitemoe shared model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix linting of a file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix import in modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add documentation Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update docstrings Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix docstrings in config class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * merge main Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Shawn Tan <shawntan@ibm.com> Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Shawn Tan <shawntan@ibm.com> Co-authored-by: Shawn Tan <shawn@wtf.sg> Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>	2025-02-14 16:55:28 +01:00
Isotr0py	33d1d715b0	Add ImageProcessorFast to Qwen2.5-VL processor (#36164 ) * add qwen2 fast image processor to modular file Signed-off-by: isotr0py <2037008807@qq.com> * fix modular Signed-off-by: isotr0py <2037008807@qq.com> * fix circle import Signed-off-by: isotr0py <2037008807@qq.com> * add docs Signed-off-by: isotr0py <2037008807@qq.com> * fix typo Signed-off-by: isotr0py <2037008807@qq.com> * add modular generated files Signed-off-by: isotr0py <2037008807@qq.com> * revert qwen2vl fast image processor Signed-off-by: isotr0py <2037008807@qq.com> * remove qwen2.5-vl image processor from modular Signed-off-by: isotr0py <2037008807@qq.com> * re-generate qwen2.5-vl files Signed-off-by: isotr0py <2037008807@qq.com> * remove unnecessary test Signed-off-by: isotr0py <2037008807@qq.com> * fix auto map Signed-off-by: isotr0py <2037008807@qq.com> * cleanup Signed-off-by: isotr0py <2037008807@qq.com> * fix model_input_names Signed-off-by: isotr0py <2037008807@qq.com> * remove import Signed-off-by: isotr0py <2037008807@qq.com> * make fix-copies Signed-off-by: isotr0py <2037008807@qq.com> --------- Signed-off-by: isotr0py <2037008807@qq.com>	2025-02-14 17:34:55 +08:00
Raushan Turganbay	1931a35140	Chat template docs (#36163 ) * decompose chat template docs * add docs * update model docs * qwen2-5 * pixtral * remove old chat template * also video as list frames supported * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_template_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * remove audio for now --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-02-14 10:32:14 +01:00
Elvir Crnčević	845b0a2616	Efficient Inference Kernel for SpQR (#34976 ) * Resolve vptq conflict * Rename spqr package to spqr_quant * Get rid of aqlm mention * Start working on tests * Resolve ruff code checks * Ruff format * Isort * Test updates * Add gpu tag * Rename to modules_to_not_convert * Config update * Docs and config update * Docs and config update * Update to update_torch_dtype * spqr config parameter validation * Ruff update * Apply ruff fixes * Test fixes * Ruff update * Mark tests as @slow again; Ruff; Docstring update * Ruff * Remove absolute path * Resolve typo * Remove redundandt log * Check accelerate/spqr availability * Ruff fix * Check if the config contains proper shapes * Ruff test * Documentation update * overview update * Ruff checks * Ruff code quality * Make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update spqr.md * Enable gptqmodel (#35012) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass *kwargs limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass *kwargs add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix : Nemotron Processor in GGUF conversion (#35708) * fixing nemotron processor * make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add missing TOC to doc --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-02-13 16:22:58 +01:00
Mohamed Mekkouri	b41591d847	Fix : fix doc fp8 (#36173 ) * fix * fix	2025-02-13 15:29:59 +01:00
Mohamed Mekkouri	efe72fe21f	Adding FP8 Quantization to transformers (#36026 ) * first commit * adding kernels * fix create_quantized_param * fix quantization logic * end2end * fix style * fix imports * fix consistency * update * fix style * update * udpate after review * make style * update * update * fix * update * fix docstring * update * update after review * update * fix scheme * update * update * fix * update * fix docstring * add source * fix test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-02-13 13:01:19 +01:00
Lysandre Debut	c82319b493	Helium documentation fixes (#36170 ) * Helium documentation fixes * Update helium.md * Update helium.md * Update helium.md	2025-02-13 12:20:53 +01:00
Thomas Bauwens	8f137b2427	Move `DataCollatorForMultipleChoice` from the docs to the package (#34763 ) * Add implementation for DataCollatorForMultipleChoice based on docs. * Add DataCollatorForMultipleChoice to import structure. * Remove custom DataCollatorForMultipleChoice implementations from example scripts. * Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean. * Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable. * Apply suggested changes and run make fixup. * fix copies, style and fixup * add missing documentation * nits * fix docstring * style * nits * isort --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-02-13 12:01:28 +01:00
Ke Wen	f869d486d3	Update doc re list of models supporting TP (#35864 ) Update doc about models' TP support	2025-02-12 15:53:27 +01:00
zhuHQ	08c4959a23	Optim: APOLLO optimizer integration (#36062 ) * Added APOLLO optimizer integration * fix comment * Remove redundancy: Modularize low-rank optimizer construction * Remove redundancy: Remove useless comment * Fix comment: Add typing * Fix comment: Rewrite apollo desc	2025-02-12 15:33:43 +01:00
Sambhav Dixit	d6897b46bd	Add utility for Reload Transformers imports cache for development workflow #35508 (#35858 ) * Reload transformers fix form cache * add imports * add test fn for clearing import cache * ruff fix to core import logic * ruff fix to test file * fixup for imports * fixup for test * lru restore * test check * fix style changes * added documentation for usecase * fixing --------- Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>	2025-02-12 12:45:11 +01:00
nhamanasu	377d8e2b9c	add RAdamScheduleFree optimizer (#35313 ) * add RAdamScheduleFree optimizer * revert schedulefree version to the minimum requirement * refine is_schedulefree_available so that it can take min_version * refine documents --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-02-12 11:31:51 +01:00
Fanli Lin	11afab19c0	[docs] update awq doc (#36079 ) * update awq doc * Update docs/source/en/quantization/awq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add note for inference --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-02-11 10:35:28 -08:00
Armaghan Shakir	9a6be63fdb	Add Apple's Depth-Pro for depth estimation (#34583 ) * implement config and model building blocks * refactor model architechture * update model outputs * update init param to include use_fov_model * update param name in config * fix hidden_states and attentions outputs for fov * sort config * complete minor todos * update patching * update config for encoder * fix config * use correct defaults in config * update merge for compatibility with different image size * restructure encoder for custom configuration * make fov model compatible with custom config * replace word "decoder" with "fusion" * weight conversion script * fix fov squeeze * update conversion script (without test) * upload ruff image processing * create fast image processing * use torch interpolation for image processing * complete post_process_depth_estimation * config: fix imports and sort args * apply inference in weight conversion * use mllama script instead for weight conversion * clean weight conversion script * add depth-pro status in other files * fill docstring in config * formatting * more formatting * formatting with ruff * formatting with style * fix copied classes * add examples; update weight convert script * fix using check_table.py and isort * fix config docstring * add depth pro to sdpa docs * undo unintentional changes in configuration_gemma.py * minor fixes * test image processing * fixes and tests * more fixes * use output states from image_encoder instead * Revert "use output states from image_encoder instead" This reverts commit `2408ec54e4`. * make embeddings dynamic * reshape output hidden states and attentions as part of computation graph * fix ruff formating * fix docstring failure * use num_fov_head_layers in tests * update doc * check consistency with config * ruff formatting * update test case * fix ruff formatting * add tests for fov * use interpolation in postprocess * run and fix slow tests locally * use scaled_images_features for image and fov encoder * return fused_hidden_states in fusion stage * fix example * fix ruff * fix copyright license for all files * add __all__ for each file * minor fixes - fix download spell - add push_to_hub option - fix Optional type hinting - apply single loop for DepthProImageProcessor.preprocess * return list in post_process_depth_estimation * minor fixes - capitalize start of docstring - use ignore copy - fix examples - move docstring templates and custom output classes to top - remove "-> None" typehinting from __init__ - type hinting for forward passes - fix docstrings for custom output classes * fix "ruff check" * update upsample and projection * major changes: (image size and merge optimization) - add support for images of any size - optimize merge operation - remove image_size from config - use full names instead of B, C, H, W - remove interpolation from fusion stage - add interpolation after merge - move validations to config - update integration test - add type hints for functions * fix push_to_hub option in weights conversion * remove image_size in weights conversion * major changes in the architecture - remove all DepthProViT modules and support different backbones using the AutoModel API - set default use_fov_model to False - validate parameters in configuration - update interpolate function: use "nearest" for faster computation - update reshape_feature function: remove all special tokens, possible from different backbones - update merge function: use padding from config instead of merge_out_size - remove patch_to_batch and batch_to_patch conversions for now - calculate out_size dynamically in the encoder - leave head_mask calculation to the backbone - fix bugs with merge - add more comments - update tests * placeholder for unused config attributes * improve docs amid review * minor change in docs * further optimize merge * fix formatting * remove unused patch/batch convertion functions * use original F.interpolate * improve function naming * minor chages - use torch_int instead of int - use proper for newly initialized tensors - use user provided return_dict for patch_encoder - use if-else block instead in self.use_fov_model * rearchitect upsample block for improved modularity * update upsample keys in weight conversion * improve padding in merge_patches * use double-loop for merge * update comments * create feature_extractor, reduce some forward code * introduce config.use_mask_token in dinov2 * minor fixes * minor fixes for onnx * update __init__ to latest format * remove DepthProConfig.to_dict() * major changes in backbone * update config in weight conversion * formatting * converted model is fp32 * improve naming and docs for feature_extractor->reconstruct_feature_maps * minor fixes; amid review * create intermediate vars in func call * use torch.testing.assert_close * use ModuleList instead of Sequential and ModuleDict * update docs * include fov in integraiton tests * update docs * improve initialization of convolution layers * fix unused fov keys * update tests * ruff format * fix test, amid kaimming initialization * add depthpro to toctree * add residual layer to _no_split_modules * architecture rework * Update src/transformers/models/depth_pro/image_processing_depth_pro.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update docs * improve merge_patches * use flatten with fov_output * ruff formatting * update resources section in docs Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix typo "final_kernal_size" Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix output typehint for DepthProDepthEstimator Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * residual operation in 2 steps Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * use image_size instead of global patch_size in interpolation * replace all Sequential with ModuleList * update fov * update heads * fix and update conversion script for heads * ruff formatting * remove float32 conversion * use "Fov" instead of "FOV" in class names * use "Fov" instead of "FOV" in config docs * remove prune_heads * update fusion stage * use device in examples * update processor * ruff fixes * add do_rescale in image_processor_dict * skip test: test_fast_is_faster_than_slow * ruff formatting * DepthProImageProcessorFast in other files * revert antialias removal * add antialias in BaseImageProcessorFast * Revert "revert antialias removal" This reverts commit `5caa0bd8f9`. * Revert "add antialias in BaseImageProcessorFast" This reverts commit `3ae1134780`. * update processor for grouping and antialias * try test_fast_is_faster_than_slow without "skip" or "flanky" * update checkpoint * update checkpoint * use @is_flanky for processor test * update checkpoint to "apple/DepthPro-hf" --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-02-10 11:32:45 +00:00
Fanli Lin	6b55046213	[docs] fix not-working example code in `perf_infer_gpu_one.md` (#36087 ) * bug fix * update memory limit	2025-02-07 12:42:22 -08:00
Fanli Lin	14ca7f1452	[docs] fix typo (#36080 ) typo fix	2025-02-07 12:42:09 -08:00
Fanli Lin	c361b1e3d9	[docs] fix model checkpoint name (#36075 ) update model name	2025-02-07 12:41:52 -08:00
Jade Choghari	006d9249ec	Adding RT-DETRv2 for object detection (#34773 ) * cookiecutter add rtdetrv2 * make modular working * working modelgit add . * working modelgit add . * finalize moduar inheritence * finalize moduar inheritence * Update src/transformers/models/rtdetrv2/modular_rtdetrv2.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * update modular and add rename * remove output ckpt * define loss_kwargs * fix CamelCase naming * fix naming + files * fix modular and convert file * additional changes * fix modular * fix import error (switch to lazy) * fix autobackbone * make style * add * update testing * fix loss * remove old folder * fix testing for v2 * update docstring * fix docstring * add resnetv2 (with modular bug to fix) * remove resnetv2 backbone * fix changes * small fixes * remove rtdetrv2resnetconfig * add rtdetrv2 name to convert * make style * Update docs/source/en/model_doc/rt_detr_v2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix modular typo after review * add reviewed changes * add final review changes * Update docs/source/en/model_doc/rt_detr_v2.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/rt_detr_v2/__init__.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/rt_detr_v2/convert_rt_detr_v2_weights_to_hf.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * add review changes * remove rtdetrv2 resnet * removing this weird project change * change ckpt name from jadechoghari to author * implement review and update testing * update naming and remove wrong ckpt * name * make fix-copies * Fix RT-DETR loss * Add resources, fix name * Fix repo in docs * Fix table name --------- Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: qubvel <qubvel@gmail.com>	2025-02-06 19:28:45 +00:00
Fanli Lin	6246c03260	[docs] fix outdated example code in `trainer.md` (#36066 ) fix bugs	2025-02-06 10:54:22 -08:00
Fanli Lin	531d1511f5	[docs] no hard-coding cuda (#36043 ) make device-agnostic	2025-02-05 08:22:33 -08:00
Fanli Lin	7399f8021e	[docs] fix bugs in the bitsandbytes documentation (#35868 ) * fix doc * update model	2025-02-05 08:21:20 -08:00
Fanli Lin	0a1a8e3c7e	[docs] no hard coding cuda as bnb has multi-backend support (#35867 ) * change cuda to DEVICE * Update docs/source/en/llm_tutorial.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-02-05 08:20:02 -08:00
Stas Bekman	9dc1efa5d4	DeepSpeed github repo move sync (#36021 ) deepspeed github repo move	2025-02-05 08:19:31 -08:00
Yoni Gozlan	fa56dcc2ab	Refactoring of ImageProcessorFast (#35069 ) * add init and base image processing functions * add add_fast_image_processor to transformers-cli * add working fast image processor clip * add fast image processor to doc, working tests * remove "to be implemented" SigLip * fix unprotected import * fix unprotected vision import * update ViTImageProcessorFast * increase threshold slow fast ewuivalence * add fast img blip * add fast class in tests with cli * improve cli * add fast image processor convnext * add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision * add device kwarg to ImagesKwargs for fast processing on cuda * cleanup * fix unprotected import * group images by sizes and add batch processing * Add batch equivalence tests, skip when center_crop is used * cleanup * update init and cli * fix-copies * refactor convnext, cleanup base * fix * remove patching mixins, add piped torchvision transforms for ViT * fix unbatched processing * fix f strings * protect imports * change llava onevision to class transforms (test) * fix convnext * improve formatting (following Pavel review) * fix handling device arg * improve cli * fix * fix inits * Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs * uniformize qwen2_vl fast * fix docstrings * add add fast image processor llava * remove min_pixels max_pixels from accepted size * nit * nit * refactor fast image processors docstrings * cleanup and remove fast class transforms * update add fast image processor transformers cli * cleanup docstring * uniformize pixtral fast and make _process_image explicit * fix prepare image structure llava next/onevision * Use typed kwargs instead of explicit args * nit fix import Unpack * clearly separate pops and gets in base preprocess. Use explicit typed kwargs * make qwen2_vl preprocess arguments hashable	2025-02-04 17:52:31 -05:00
David	8d73a38606	Add DAB-DETR for object detection (#30803 ) * initial commit * encoder+decoder layer changes WIP * architecture checks * working version of detection + segmentation * fix modeling outputs * fix return dict + output att/hs * found the position embedding masking bug * pre-training version * added iamge processors * typo in init.py * iterupdate set to false * fixed num_labels in class_output linear layer bias init * multihead attention shape fixes * test improvements * test update * dab-detr model_doc update * dab-detr model_doc update2 * test fix:test_retain_grad_hidden_states_attentions * config file clean and renaming variables * config file clean and renaming variables fix * updated convert_to_hf file * small fixes * style and qulity checks * return_dict fix * Merge branch main into add_dab_detr * small comment fix * skip test_inputs_embeds test * image processor updates + image processor test updates * check copies test fix update * updates for check_copies.py test * updates for check_copies.py test2 * tied weights fix * fixed image processing tests and fixed shared weights issues * added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py * delete prints from test file * SafeTensor modification to solve HF Trainer issue * removing the safetensor modifications * make fix copies and hf uplaod has been added. * fixed index.md * fixed repo consistency * styel fix and dabdetrimageprocessor docstring update * requested modifications after the first review * Update src/transformers/models/dab_detr/image_processing_dab_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * repo consistency has been fixed * update copied NestedTensor function after main merge * Update src/transformers/models/dab_detr/modeling_dab_detr.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * temp commit * temp commit2 * temp commit 3 * unit tests are fixed * fixed repo consistency * updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file. * temporarialy config modifications and repo consistency fixes * Put dilation parameter back to config * pattern embeddings have been added to the rename_keys method * add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES * delete FeatureExtractor part from docs.md * requested modifications in modeling_dab_detr.py * [run_slow] dab_detr * deleted last segmentation code part, updated conversion script and changed the hf path in test files * temp commit of requested modifications * temp commit of requested modifications 2 * updated config file, resolved codepaths and refactored conversion script * updated decodelayer block types and refactored conversion script * style and quality update * small modifications based on the request * attentions are refactored * removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed * deleted imageprocessor * fixed conversion script + quality and style * fixed config_att * [run_slow] dab_detr * changing model path in conversion file and in test file * fix Decoder variable naming * testing the old loss function * switched back to the new loss function and testing with the odl attention functions * switched back to the new last good result modeling file * moved back to the version when I asked the review * missing new line at the end of the file * old version test * turn back to newest mdoel versino but change image processor * style fix * style fix after merge main * [run_slow] dab_detr * [run_slow] dab_detr * added device and type for head bias data part * [run_slow] dab_detr * fixed model head bias data fill * changed test_inference_object_detection_head assertTrues to torch test assert_close * fixes part 1 * quality update * self.bbox_embed in decoder has been restored * changed Assert true torch closeall methods to torch testing assertclose * modelcard markdown file has been updated * deleted intemediate list from decoder module --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-02-04 17:28:27 +00:00
Ryoo Kwangrok	b1954fd64a	layernorm_decay_fix (#35927 ) * layernorm_decay_fix * W293 fix * ruff format fix * black format * ruff format * erase last layer * add test_get_parameter_names_rmsnorm * rmsnorm fix	2025-02-04 11:01:49 +01:00
Alex Brooks	e284c7e954	Update Granite Vision Model Path / Tests (#35998 ) * Update granite vision model path Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Enable granite vision test Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> --------- Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-02-03 20:06:03 +01:00
Yoni Gozlan	2b46943195	Add GOT-OCR 2.0 to Transformers (#34721 ) * init modular got_ocr2 * Get correct got_ocr architecture * add processing * run modular with processing * add working inference * apply modular * Refactor and fix style * Refactor, cleanup, fix style * fix init order * Fix docs * add base modeling tests * fix style and consistency * rename doc file * fix repo consistency * fix inference with box * add image processing and support for crop_to_multi_page * Fix batch inference * add tests * fixup * fix slow test * fix docstrings * Add model doc * update to new init * fix input autocast pixel_values dtype * update doc * move doc to multimodal * Reformat crop_image_to_patches and add docstrings * Fix example in forward docstring * Address Pablo review * [run slow] got_ocr2 * remove defaults defined twice * apply modular * add torch_device to integration tests * update modular * follow-up Pavel review * add device variable in doc * fix doc multi-page * Force eager attention for vision encoder to avoid attn implementation conflict * revert qwen2vl doc changes * use Qwen2ForCausalLM instead of Qwen2Model * make fixup * refactor gotocr2 to llava style * uniformize function names and reduce checks * final nits * fix pixel_values dtype error * change checkpoint names * fix modular	2025-01-31 11:28:13 -05:00
Ella Charlaix	61cbb723fc	Remove INC notebook reference in documentation (#35936 ) remove INC notebook in documentation	2025-01-28 17:10:02 +01:00
Joao Gante	ece8c42488	Test: generate with `torch.compile(model.forward)` as a fast test (#34544 )	2025-01-28 14:10:38 +00:00
Steven Liu	86d7564611	[docs] Fix Zamba2 (#35916 ) fix code block	2025-01-27 11:44:10 -08:00
Matt	414658f94f	Close Zamba2Config code block (#35914 ) * close zamba2 code block * Add Zamba2 to toctree	2025-01-27 19:09:42 +00:00
Steven Liu	c550a1c640	[docs] uv install (#35821 ) uv install	2025-01-27 08:49:28 -08:00
pglorio	33cb1f7b61	Add Zamba2 (#34517 ) * First commit * Finish model implementation * First commit * Finish model implementation * Register zamba2 * generated modeling and configuration * generated modeling and configuration * added hybrid cache * fix attention_mask in mamba * dropped unused loras * fix flash2 * config docstrings * fix config and fwd pass * make fixup fixes * text_modeling_zamba2 * small fixes * make fixup fixes * Fix modular model converter * added inheritances in modular, renamed zamba cache * modular rebase * new modular conversion * fix generated modeling file * fixed import for Zamba2RMSNormGated * modular file cleanup * make fixup and model tests * dropped inheritance for Zamba2PreTrainedModel * make fixup and unit tests * Add inheritance of rope from GemmaRotaryEmbedding * moved rope to model init * drop del self.self_attn and del self.feed_forward * fix tests * renamed lora -> adapter * rewrote adapter implementation * fixed tests * Fix torch_forward in mamba2 layer * Fix torch_forward in mamba2 layer * Fix torch_forward in mamba2 layer * Dropped adapter in-place sum * removed rope from attention init * updated rope * created get_layers method * make fixup fix * make fixup fixes * make fixup fixes * update to new attention standard * update to new attention standard * make fixup fixes * minor fixes * cache_position * removed cache_position postion_ids use_cache * remove config from modular * removed config from modular (2) * import apply_rotary_pos_emb from llama * fixed rope_kwargs * Instantiate cache in Zamba2Model * fix cache * fix @slow decorator * small fix in modular file * Update docs/source/en/model_doc/zamba2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * several minor fixes * inherit mamba2decoder fwd and drop position_ids in mamba * removed docstrings from modular * reinstate zamba2 attention decoder fwd * use regex for tied keys * Revert "use regex for tied keys" This reverts commit `9007a522b1`. * use regex for tied keys * add cpu to slow forward tests * dropped config.use_shared_mlp_adapter * Update docs/source/en/model_doc/zamba2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * re-convert from modular --------- Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-27 10:51:23 +01:00
Steven Liu	f11f57c925	[doctest] Fixes (#35863 ) doctest fixes	2025-01-26 15:26:38 -08:00
Yosshi999	045c02f209	[DOC] Fix contamination and missing paragraph in translation (#35851 ) Fix contamination and missing paragraph in translation	2025-01-23 08:33:44 -08:00
Alex Brooks	71cc8161b2	Granite Vision Support (#35579 ) * Add multimodal granite support Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Support multiple image feature layres Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Remove failing validation for visual encoders with no cls Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update llava based models / configs to support list of feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add tests for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use conditional instead of except for misaligned feature shapes Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * crop cls from each hidden state Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Fix formatting Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Support single vision feature int in vipllava Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix typo in vision feature selection strategy validation Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Add tentative integration test for granite vision models Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Add granite vision docs Replace multimodal granite refs with granite vision Add granite vision / llava next alias Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Use image url in granitevision example Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> --------- Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-01-23 17:15:52 +01:00
ShuaiBai623	f3f6c86582	add qwen2.5vl (#35569 ) * add qwen2.5vl * fix * pass check table * add modular file * fix style * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * padd copy check * use modular * fix * fix * fix * update flashatt2&sdpa support_list * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update config * update * fix hf path * rename Qwen2_5_VLVideosKwargs * fix * fix * update * excuted modular * rollback init * fix * formated * simpler init * fix * fix * fix * fix * fix * update docs * fix * fix * update Qwen2VLRotaryEmbedding for yarn * fix --------- Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: gewenbin0992 <gewenbin292@163.com> Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>	2025-01-23 11:23:00 +01:00
Joao Gante	62bd83947a	[chat] docs fix (#35840 ) docs fix	2025-01-22 14:32:27 +00:00
Joao Gante	b3d6722469	[Chat] Add Chat from TRL 🐈 (#35714 ) * tmp commit * add working chat * add docts * docs 2 * use auto dtype by default	2025-01-22 13:30:12 +00:00
Joao Gante	90b46e983f	Remove old `benchmark` code (#35730 ) * remove traces of the old deprecated benchmarks * also remove old tf benchmark example, which uses deleted code * run doc builder	2025-01-21 17:56:43 +00:00
Cyril Vallez	8ac851b0b3	Improve modular documentation (#35737 ) * start a nice doc * keep improving the doc * Finalize doc * Update modular_transformers.md * apply suggestion	2025-01-21 17:53:30 +01:00
Yoni Gozlan	107f9f5127	add Qwen2-VL image processor fast (#35733 ) * add qwen2_vl image processor fast * add device to ImagesKwargs * remove automatic fix copies * fix fast_is_faster_than_slow * remove unnecessary import	2025-01-21 11:49:05 -05:00
eustlb	3df90103b8	move fastspeech to audio models (#35788 )	2025-01-21 08:32:09 -08:00
Ahmed Almaghz	741d55237a	[i18n-ar] Translated file: `docs/source/ar/tasks/masked_language_modeling.md` into Arabic (#35198 ) * إضافة الترجمة العربية: masked_language_modeling.md * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/masked_language_modeling.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update _toctree.yml * Update _toctree.yml * Add language_modeling.md * Add Sequence_classifiation.md * Update _toctree.yml --------- Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>	2025-01-21 08:29:58 -08:00
Aritra Roy Gosthipaty	edbabf6b82	[Doc] Adding blog post to model doc for `TimmWrapper` (#35744 ) * adding blog post to model doc * Update docs/source/en/model_doc/timm_wrapper.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * review suggestions * review suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-21 12:32:39 +00:00
NielsRogge	78f5ee0217	Add LlavaImageProcessor (#33191 ) * First draft * Add equivalence test * Update docstrings * Add tests * Use numpy * Fix tests * Improve variable names * Improve docstring * Add link * Remove script * Add copied from * Address comment * Add note in docs * Add docstring, data format * Improve test * Add test * update * Update src/transformers/models/llava/image_processing_llava.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/llava/image_processing_llava.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * loop once only --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-21 12:47:04 +01:00
eustlb	5f0f4b1b93	Patch moonshine (#35731 ) * udpate expected logits for T4 runners * update doc * correct order of the args for better readability * remove generate wrap * convert modular	2025-01-20 16:19:29 +01:00
StevenBucaille	abe57b6f17	Add SuperGlue model (#29886 ) * Initial commit with template code generated by transformers-cli * Multiple additions to SuperGlue implementation : - Added the SuperGlueConfig - Added the SuperGlueModel and its implementation - Added basic weight conversion script - Added new ImageMatchingOutput dataclass * Few changes for SuperGlue * Multiple changes : - Added keypoint detection config to SuperGlueConfig - Completed convert_superglue_to_pytorch and succesfully run inference * Reverted unintentional change * Multiple changes : - Added SuperGlue to a bunch of places - Divided SuperGlue into SuperGlueForImageMatching and SuperGlueModel - Added testing images * Moved things in init files * Added docs (to be finished depending on the final implementation) * Added necessary imports and some doc * Removed unnecessary import * Fixed make fix-copies bug and ran it * Deleted SuperGlueModel Fixed convert script * Added SuperGlueImageProcessor * Changed SuperGlue to support batching pairs of images and modified ImageMatchingOutput in consequences * Changed convert_superglue_to_hf.py script to experiment different ways of reading an image and seeing its impact on performances * Added initial tests for SuperGlueImageProcessor * Added AutoModelForImageMatching in missing places and tests * Fixed keypoint_detector_output instructions * Fix style * Adapted to latest main changes * Added integration test * Fixed bugs to pass tests * Added keypoints returned by keypoint detector in the output of SuperGlue * Added doc to SuperGlue * SuperGlue returning all attention and hidden states for a fixed number of keypoints * Make style * Changed SuperGlueImageProcessor tests * Revert "SuperGlue returning all attention and hidden states for a fixed number of keypoints" Changed tests accordingly This reverts commit 5b3b669c * Added back hidden_states and attentions masked outputs with tests * Renamed ImageMatching occurences into KeypointMatching * Changed SuperGlueImageProcessor to raise error when batch_size is not even * Added docs and clarity to hidden state and attention grouping function * Fixed some code and done refactoring * Fixed typo in SuperPoint output doc * Fixed some of the formatting and variable naming problems * Removed useless function call * Removed AutoModelForKeypointMatching * Fixed SuperGlueImageProcessor to only accept paris of images * Added more fixes to SuperGlueImageProcessor * Simplified the batching of attention and hidden states * Simplified stack functions * Moved attention instructions into class * Removed unused do_batch_norm argument * Moved weight initialization to the proper place * Replaced deepcopy for instantiation * Fixed small bug * Changed from stevenbucaille to magic-leap repo * Renamed London Bridge images to Tower Bridge * Fixed formatting * Renamed remaining "london" to "tower" * Apply suggestions from code review Small changes in the docs Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Added AutoModelForKeypointMatching * Changed images used in example * Several changes to image_processing_superglue and style * Fixed resample type hint * Changed SuperGlueImageProcessor and added test case for list of 2 images * Changed list_of_tuples implementation * Fix in dummy objects * Added normalize_keypoint, log_sinkhorn_iterations and log_optimal_transport docstring * Added missing docstring * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved forward block at bottom * Added docstring to forward method * Added docstring to match_image_pair method * Changed test_model_common_attributes to test_model_get_set_embeddings test method signature * Removed AutoModelForKeypointMatching * Removed image fixtures and added load_dataset * Added padding of images in SuperGlueImageProcessor * Cleaned up convert_superglue_to_hf script * Added missing docs and fixed unused argument * Fixed SuperGlueImageProcessor tests * Transposed all hidden states from SuperGlue to reflect the standard (..., seq_len, feature_dim) shape * Added SuperGlueForKeypointMatching back to modeling_auto * Fixed image processor padding test * Changed SuperGlue docs * changes: - Abstraction to batch, concat and stack of inconsistent tensors - Changed conv1d's to linears to match standard attention implementations - Renamed all tensors to be tensor0 and not tensor_0 and be consistent - Changed match image pair to run keypoint detection on all image first, create batching tensors and then filling these tensors matches after matches - Various changes in docs, etc * Changes to SuperGlueImageProcessor: - Reworked the input image pairs checking function and added tests accordingly - Added Copied from statements - Added do_grayscale tag (also for SuperPointImageProcessor) - Misc changes for better code * Formatting changes * Reverted conv1d to linear conversion because of numerical differences * fix: changed some code to be more straightforward (e.g. filtering keypoints) and converted plot from opencv to matplotlib * fix: removed unnecessary test * chore: removed commented code and added back hidden states transpositions * chore: changed from "inconsistent" to "ragged" function names as suggested Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * docs: applied suggestions Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * docs: updated to display matched output * chore: applied suggestion for check_image_pairs_input function Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * chore: changed check_image_pairs_input function name to validate_and_format_image_pairs and used validate_preprocess_arguments function * tests: simplified tests for image input format and shapes * feat: converted SuperGlue's use of Conv1d with kernel_size of 1 with Linear layers. Changed tests and conversion script accordingly * feat: several changes to address comments Conversion script: - Reverted fuse batchnorm to linear conversion - Changed all 'nn.Module' to respective SuperGlue models - Changed conversion script to use regex mapping and match other recent scripts Modeling SuperGlue: - Added batching with mask and padding to attention - Removed unnecessary concat, stack and batch ragged pairs functions - Reverted batchnorm layer - Renamed query, key, value and merge layers into q, k, v, out proj - Removed Union of different Module into nn.Module in _init_weights method typehint - Changed several method's signature to combine image0 and image1 inputs with appropriate doc changes - Updated SuperGlue's doc with torch.no_grad() Updated test to reflect changes in SuperGlue model * refactor: changed validate_and_format_image_pairs function with clarity * refactor: changed from one SuperGlueMLP class to a list of SuperGlueMLP class * fix: fixed forgotten init weight change from last commit * fix: fixed rebase mistake * fix: removed leftover commented code * fix: added typehint and changed some of arguments default values * fix: fixed attribute default values for SuperGlueConfig * feat: added SuperGlueImageProcessor post process keypoint matching method with tests * fix: fixed SuperGlue attention and hidden state tuples aggregation * chore: fixed mask optionality and reordered tensor reshapes to be cleaner * chore: fixed docs and error message returned in validate_and_format_image_pairs function * fix: fixed returned keypoints to be the ones that SuperPoint returns * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue (bis) * fix: Changed SuperGlueMultiLayerPerceptron instantiation to avoid if statement * fix: Changed convert_superglue_to_hf script to reflect latest SuperGlue changes and got rid of nn.Modules * WIP: implement Attention from an existing class (like BERT) * docs: Changed docs to include more appealing matching plot * WIP: Implement Attention * chore: minor typehint change * chore: changed convert superglue script by removing all classes and apply conv to linear conversion in state dict + rearrange keys to comply with changes in model's layers organisation * Revert "Fixed typo in SuperPoint output doc" This reverts commit `2120390e82`. * chore: added comments in SuperGlueImageProcessor * chore: changed SuperGlue organization HF repo to magic-leap-community * [run-slow] refactor: small change in layer instantiation * [run-slow] chore: replaced remaining stevenbucaille org to magic-leap-community * [run-slow] chore: make style * chore: update image matching fixture dataset HF repository * [run-slow] superglue * tests: overwriting test_batching_equivalence * [run-slow] superglue * tests: changed test to cope with value changing depending on cuda version * [run-slow] superglue * tests: changed matching_threshold value * [run-slow] superglue * [run-slow] superglue * tests: changed tests for integration * [run-slow] superglue * fix: Changed tensor view and permutations to match original implementation results * fix: updated convert script and integration test to include last change in model * fix: increase tolerance for CUDA variances * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * [run-slow] superglue * chore: removed blank whitespaces * [run-slow] superglue * Revert SuperPoint image processor accident changes * [run-slow] superglue * refactor: reverted copy from BERT class * tests: lower the tolerance in integration tests for SuperGlue * [run-slow] superglue * chore: set do_grayscale to False in SuperPoint and SuperGlue image processors * [run-slow] superglue * fix: fixed imports in SuperGlue files * chore: changed do_grayscale SuperGlueImageProcessing default value to True * docs: added typehint to post_process_keypoint_matching method in SuperGlueImageProcessor * fix: set matching_threshold default value to 0.0 instead of 0.2 * feat: added matching_threshold to post_process_keypoint_matching method * docs: update superglue.md to include matching_threshold parameter * docs: updated SuperGlueConfig docstring for matching_threshold default value * refactor: removed unnecessary parameters in SuperGlueConfig * fix: changed from matching_threshold to threshold * fix: re-revert changes to make SuperGlue attention classes copies of BERT * [run-slow] superglue * fix: added missing device argument in post_processing method * [run-slow] superglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching (and docstring) * fix: add device to image_sizes tensor instantiation * tests: added checks on do_grayscale test * chore: reordered and added Optional typehint to KeypointMatchingOutput * LightGluePR suggestions: - use `post_process_keypoint_matching` as default docs example - add `post_process_keypoint_matching` in autodoc - add `SuperPointConfig` import under TYPE_CHECKING condition - format SuperGlueConfig docstring - add device in convert_superglue_to_hf - Fix typo - Fix KeypointMatchingOutput docstring - Removed unnecessary line - Added missing SuperGlueConfig in __init__ methods * LightGluePR suggestions: - use batching to get keypoint detection * refactor: processing images done in 1 for loop instead of 4 * fix: use @ instead of torch.einsum for scores computation * style: added #fmt skip to long tensor values * refactor: rollbacked validate_and_format_image_pairs valid and invalid case to more simple ones * refactor: prepare_imgs * refactor: simplified `validate_and_format_image_pairs` * docs: fixed doc --------- Co-authored-by: steven <steven.bucaillle@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-01-20 10:32:39 +00:00
NielsRogge	872dfbdd46	[ViTPose] Convert more checkpoints (#35638 ) * Convert more checkpoints * Update docs, convert huge variant * Update model name * Update src/transformers/models/vitpose/modeling_vitpose.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Remove print statements * Update docs/source/en/model_doc/vitpose.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Link to collection --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-20 11:29:47 +01:00
Raushan Turganbay	8571bb145a	Fix CI for VLMs (#35690 ) * fix some easy test * more tests * remove logit check here also * add require_torch_large_gpu in Emu3	2025-01-20 11:15:39 +01:00
Pavel Iakubovskii	099d93d2e9	Grounding DINO Processor standardization (#34853 ) * Add input ids to model output * Add text preprocessing for processor * Fix snippet * Add test for equivalence * Add type checking guard * Fixing typehint * Fix test for added `input_ids` in output * Add deprecations and "text_labels" to output * Adjust tests * Fix test * Update code examples * Minor docs and code improvement * Remove one-liner functions and rename class to CamelCase * Update docstring * Fixup	2025-01-17 14:18:16 +00:00
Pavel Iakubovskii	42b2857b01	OmDet Turbo processor standardization (#34937 ) * Fix docstring * Fix docstring * Add `classes_structure` to model output * Update omdet postprocessing * Adjust tests * Update code example in docs * Add deprecation to "classes" key in output * Types, docs * Fixing test * Fix missed clip_boxes * [run-slow] omdet_turbo * Apply suggestions from code review Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Make CamelCase class --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-01-17 14:10:19 +00:00
Pavel Iakubovskii	94ae9a8da1	OwlViT/Owlv2 post processing standardization (#34929 ) * Refactor owlvit post_process_object_detection + add text_labels * Fix copies in grounding dino * Sync with Owlv2 postprocessing * Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection * Add test cases * Move text_labels to processors only * [run-slow] owlvit owlv2 * [run-slow] owlvit, owlv2 * Update snippets * Update docs structure * Update deprecated objects for check_repo * Update docstring for post processing of image guided object detection	2025-01-17 13:58:28 +00:00
hiroaki222	99e0ab6ed8	Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL (#35705 ) doc: Update original code repository URL	2025-01-15 07:36:50 -08:00
jiqing-feng	387663e571	Enable gptqmodel (#35012 ) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass *kwargs limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass *kwargs add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-15 14:22:49 +01:00
Ego Joseph Oborakpororo	b0cdbd9119	Enhanced Installation Section in README.md (#35094 ) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md Enhanced installation section with troubleshooting, GPU setup, and OS-specific details. * Update README.md Enhanced installation section with troubleshooting, GPU setup, and OS-specific details. * Update installation.md Updated installation.md to include virtual environment and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting and GPU setup instructions. * Update installation.md * Update installation.md * Update installation.md * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update installation.md Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions. * Update README.md Removed numbering from README.md. * Update README.md Removed unnecessary "a)" formatting as per maintainer feedback. * Update README.md Added blank lines around code snippets for better readability. * Update README.md Removed the line "b) Install a backend framework:" from README.md as per feedback. * Update README.md Simplified "For Windows:" to "Windows" in README.md as per feedback as well as "For macOS/Linux:" to "macOS/Linux" * Update README.md Removed unnecessary heading and retained valid code snippet. * Update README.md Removed unnecessary heading "d) Optional: Install from source for the latest updates" as per feedback. * Update README.md Removed "GPU Setup (Optional)" section to align with minimal design feedback. * Update installation.md Removed "Create and Activate a Virtual Environment" section from installation.md as per feedback. * Update installation.md Adjusted "Troubleshooting" to a second-level heading and added an introductory line as per feedback. * Update installation.md Updated troubleshooting section with simplified headings and formatted code blocks as per feedback. * Update installation.md Integrated GPU setup instructions into the "Install with pip" section for better content flow. * Update README.md Removed Troubleshooting section from README.md for minimalism as per maintainer feedback.	2025-01-14 08:05:08 -08:00
Martin	715fdd6459	Update torchao.md: use auto-compilation (#35490 ) * Update torchao.md: use auto-compilation * Update torchao.md: indicate updating transformers to the latest --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-01-14 11:33:48 +01:00
RTrace	34f76bb62b	Fix `zero_shot_image_classification` documentation guide link in SigLIP (#35671 )	2025-01-13 11:08:17 -08:00
Arthur	c23a1c1932	Add-helium (#35669 ) * Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 18:41:15 +01:00
Ahmed Almaghz	a3f82328ed	[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic (#35193 ) * Create token_classification.md * Update token_classification.md * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/token_classification.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update _toctree.yml --------- Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>	2025-01-13 09:32:15 -08:00
Raushan Turganbay	52e1f87c7d	[WIP] Emu3: add model (#33770 ) * model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-01-10 12:23:00 +01:00
Raushan Turganbay	e0646f3dce	Chat template: return vectorized output in processors (#34275 ) * update chat template * style * fix tests * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typehints + docs * fix tests * remove unnecessary warnings * forgot code style :( * allow users to pass backend and num frames * Update docs/source/en/chat_templating.md Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/image_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * typo fix * style * address comments * align with "pipeline" template * update docs * update docs * unpack for all kwargs? * wrong conflict resolution while rebasing * tmp * update docs * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-10 11:05:29 +01:00
eustlb	5f087d1335	Add Moonshine (#34784 ) * config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>	2025-01-10 11:00:54 +01:00
Benjamin Warner	1e3ddcb2d0	ModernBERT bug fixes (#35404 ) * bug fixes * organize imports * wrap cpu warning in reference_compile * Avoid needing repad_logits_with_grad, always repad with grads when training I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that? * Revert "Avoid needing repad_logits_with_grad, always repad with grads when training" This reverts commit `cedcb4e89b`. * Fix grammar: keep -> keeps * Propagate grammar fix with modular_model_converter --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>	2025-01-09 20:15:38 +01:00
胡译文	c9c682d19c	[doc] deepspeed universal checkpoint (#35015 ) * universal checkpoint * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/deepspeed.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-01-09 09:50:51 -08:00
Pablo Montalvo	395b114bd1	Small fix rope kwargs (#35589 ) * don't know why this keeps popping up? * remove unused rope_kwargs	2025-01-09 15:40:36 +01:00
Merve Noyan	487c31a21f	Minor fix in video text 2 text docs (#35546 ) minor fix in docs	2025-01-09 11:20:36 +01:00
Ahmed Almaghz	a6256ec098	[i18n-ar] Translated file: `docs/source/ar/tasks/multiple_choice.md` into Arabic (#35199 ) * إضافة الترجمة العربية: multiple_choice.md * Update multiple_choice.md * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update docs/source/ar/tasks/multiple_choice.md Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com> * Update _toctree.yml * Add files via upload * Update _toctree.yml --------- Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>	2025-01-08 14:17:58 -08:00
Joao Gante	76da6ca034	Pipeline: simple API for assisted generation (#34504 ) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-01-08 17:08:02 +00:00
DaNing An	4c2c12b3de	[docs] Remove Hiera from AUDIO MODELS in docs (#35544 ) Remove Hiera from AUDIO MODELS Hiera is a visual model and should not appear in audio model...	2025-01-08 16:33:21 +00:00

... 2 3 4 5 6 ...

3389 Commits