Commit Graph

983 Commits

Author SHA1 Message Date
Afanti
d84569387f
chore: fix typos in utils module (#36668)
* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module
2025-03-13 15:12:44 +00:00
Yoni Gozlan
ea219ed164
Remove differences between init and preprocess kwargs for fast image processors (#36186)
* Remove differences between init and preprocess kwargs in fast image processors

* make modifs got_ocr2

* update gemma3
2025-03-12 19:44:05 -04:00
Ryan Mullins
50d3530aa0
Gemma3 (#36658)
* Fix converter

* [Broken] Adds Gemma 3 to Hugging Face Transformers

* Consolidating Config and Processor params across impls

* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.

* Additional plumbing for CausalLM and ConditionalGeneration variants

* incomplete draft of Orbax conversion script

* More complete checkpoint conversion

* Supporting Gemma 3 1B checkpoints

* Updating RoPE for multiple frequencies

* Adjustments to rotary embedder

* Proof of life for text-only operation

* Updating the conversion script to handle multimodal projection weights

* Fixing tet-only conversions

* Cleaner conversion script with multimodal support and a simpler processor

* Additional refatcors to the Gemma3Processor

* Simplified Processor to work over text representations

* Updated conversion script to join text and vision embeddings at converion time

* Logging for debugging

* Update src/transformers/models/gemma2/modeling_gemma2.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Removed extraneous Config params

* Switching to fast tokenizer for checkpoint conversions

* isolating siglip for performance tetsing

* Minor changes for debugging tests against baselines

* Adding average pooling for soft tokens

* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts

* Updating conversion script for ShieldGemma 2 conversion compatibility

* Allow disable_compile to be provided as a kwarg

* Refresh from modular

* Updated conversion script and corrected sliding window

* Fix type mismatch in cache_position (#4)

* Fix dtype (#5)

* Fix type mismatch in cache_position

* Actually fix in the modular file

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor

* Adding 2D pooling for image embeddings

* Revert "Adding 2D pooling for image embeddings"

This reverts commit 65350cf531.

* Gemma3 average pooling changed from 1D to 2D

* Major refactor to Gemma3MultimodalInputProjection

* Updating Gemm 3 Auto* registrations

* Add option to save Gemma 3 chat template with tokenizer during weights conversion

* Removing unused imports

* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration

* Removing duplicate config property

* Removing final logit softcapping and 1-indexing of position ids

* Fixing image processor config and none --> None typo

* Fixing sliding window size for 1B

* Updating image_mean and image_std in Image Processor

* Attention masking changed to lower triangular

* Moving image special tokens to conversion script

* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs

* Remove special token variables from symbol space

* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration

* tie lm_head and embedding weights

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Correct tied weights in Gemma3CausalLM

* iterative bidirectional attention

* resolving merge conflicts

* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6

* Correcting RoPE scaling

* clean up first pass, dummy model geenration works

* final clean up before fixing tests

* causal lm test works, so fine

* Fix conversion

* Update src/transformers/models/gemma3/processing_gemma3.py

* model tests are happy

* processor tests are happy

* image processing tests added

* fixup

* Fix pre-processing in conversion

* Inputs merging

* Do not normalize vision embeddings

* Apply Ryan's (and team) changes to attention

* token type ids + mask

* template

* move embed scale, add rope scale, fix tests

* Add chat template to tokenizer

* Use prefix for causal model loading

* use existing code for sliding mask from gemma2

* self.embed_tokens already normalizes

* Correcting Gemma3TextConfig parameters in conversion script

* typo, modular overwrites my fixes

* enable device map for text model

* Conversion updates

* ultra nit: no einsums

* update image token

* copy deepcopy config + some docs

* add some test, still WIP

* Refactoring --include_chat_tempalte logic in converter

* Update src/transformers/models/gemma3/modular_gemma3.py

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Add eos tokens for instruct models

* dump so i can work on dgx

* Removing add_bos by default

* dump

* add fast im proc

* docs for PaS + fixup

* another fixup

* one more fixup

* fix tests

* Inverting prior BOS change

* ultra nit

* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS

* resize embeds, remove sqrt, add slow test outputs

* FA2 but quality is meh

* nit

* skip FA2, no idea what happened

* last bit for green CI

* please, green CI for docs

* T_T

* Fix for Gemma3 logits

* Support both options for system prompt

* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Docs updates now that assets are live

* Style fixes

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
2025-03-12 09:06:17 +01:00
Matt
1e4286fd59
Remove research projects (#36645)
* Remove research projects

* Add new README to explain where the projects went

* Trigger tests

* Cleanup all references to research_projects
2025-03-11 13:47:38 +00:00
hlky
bc30dd1efb
Modular Conversion --fix_and_overwrite on Windows (#36583)
* Modular Conversion --fix_and_overwrite on Windows

* -newline on read
2025-03-06 13:12:30 +00:00
Steven Liu
c0f8d055ce
[docs] Redesign (#31757)
* toctree

* not-doctested.txt

* collapse sections

* feedback

* update

* rewrite get started sections

* fixes

* fix

* loading models

* fix

* customize models

* share

* fix link

* contribute part 1

* contribute pt 2

* fix toctree

* tokenization pt 1

* Add new model (#32615)

* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* "to be not" -> "not to be" (#32636)

* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* fix hfoption tag

* tokenization pt. 2

* image processor

* fix toctree

* backbones

* feature extractor

* fix file name

* processor

* update not-doctested

* update

* make style

* fix toctree

* revision

* make fixup

* fix toctree

* fix

* make style

* fix hfoption tag

* pipeline

* pipeline gradio

* pipeline web server

* add pipeline

* fix toctree

* not-doctested

* prompting

* llm optims

* fix toctree

* fixes

* cache

* text generation

* fix

* chat pipeline

* chat stuff

* xla

* torch.compile

* cpu inference

* toctree

* gpu inference

* agents and tools

* gguf/tiktoken

* finetune

* toctree

* trainer

* trainer pt 2

* optims

* optimizers

* accelerate

* parallelism

* fsdp

* update

* distributed cpu

* hardware training

* gpu training

* gpu training 2

* peft

* distrib debug

* deepspeed 1

* deepspeed 2

* chat toctree

* quant pt 1

* quant pt 2

* fix toctree

* fix

* fix

* quant pt 3

* quant pt 4

* serialization

* torchscript

* scripts

* tpu

* review

* model addition timeline

* modular

* more reviews

* reviews

* fix toctree

* reviews reviews

* continue reviews

* more reviews

* modular transformers

* more review

* zamba2

* fix

* all frameworks

* pytorch

* supported model frameworks

* flashattention

* rm check_table

* not-doctested.txt

* rm check_support_list.py

* feedback

* updates/feedback

* review

* feedback

* fix

* update

* feedback

* updates

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-03-03 10:33:46 -08:00
Cyril Vallez
da4ab2a1b6
Fix doc formatting in forward passes & modular (#36243)
* fix indentation issues + modular without magic keyword

* style

* Update doc.py

* style

* Fix all decorators indentation

* all models

* style

* style

* Update doc.py

* fix

* general fix

* style
2025-02-25 11:09:01 +01:00
Cyril Vallez
bc65f3fc1c
[modular] Do not track imports in functions (#36279)
* Add check

* just check for function

* Update examples
2025-02-25 10:29:47 +01:00
Yih-Dar
2ab7bdc403
notify new model merged to main (#36375)
notify new model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-24 17:53:18 +01:00
Pavel Iakubovskii
a957b7911a
Add SigLIP 2 (#36323)
* Docs

* Inits

* Auto classes

* Add siglip base

* Add base tests

* Fix Siglip V1 for fix res version

* Add image processor

* Update conversion

* Experimenting with vectorized embeddings

* Fixup

* Add modular Siglip2Processor

* Add modular configuration

* Rename num patches

* Correct image and text features merging

* Working conversion script

* Refactoring conversion script

* Remove unused code in conversion script

* Shorten dict a bit

* Refactoring conversion

* Done conversion refactoring

* Fixup

* Modular siglip2

* Make model exportable and compilable without graph breaks

* Remove position_ids from image_processor

* REmove position ids from modeling file

* Update modular

* Type hint

* Fixup

* Set defaults to processor

* Add integration test

* Revert spatial shapes back to tensor

* Change order

* Fix most of the tests

* Fix docstring

* Remove interpolate_pos_encoding arg (not needed)

* Update docs

* Standardize processing

* Fix attention_mask in vision head

* Siglip v1: remove double transpose in FA2

* Update modular file

* Update FA2 test

* Update expected logits

* Fix interpolation for siglip2 image processor

* Skip init test

* Skip dispatch on flash test

* Fix modeling tests

* Fixup

* Add dummy objects

* Fix some docstrings

* Add siglip2 in index.md

* Fix consistency

* Add docs

* Remove size and data format

* Add image processor tests

* Fix

* Add fast image processor

* Fix style

* Fix

* Docs

* Set lowercase for tokenizer

* Adjust head size for Siglip v1

* Update siglip2 for consistency with siglip1

* Update siglip2 conversion

* Update pipeline

* Update checkpoints in tests

* Update checkpoint name

* Fix pooling for image classification model

* Fix FA2 test

* Update processor

* Fix check repo

* Update docs

* Fix typos

* Fix docstring for fast image processor

* Add siglip2 to FA2 docs

* Fix fast ip tests

* Fix constitency

* Fix tokenizer class for siglip v1

* Fix missing header

* Refactor scaling for clip, siglip, siglip2

* Remove unused imports

* Make fast IP default for siglip2

* Update docs

* Update checkpoints

* Update modular

* Update paper link

* Fixup

* Fix name in toctree

* Fix test
2025-02-21 09:04:19 +00:00
Orr Zohar
4397dfcb71
SmolVLM2 (#36126)
* smolvlm init

* updates

* fixing bugs

* minimal run, no checks

* minimal run, no checks

* passing first check + adding url support

* updating video dataloading logic

* fixing image logic

* trying modular, but fails

* modular is working, changing processor to match PR comments and general transformers logic

* fixing kwargs

* offloading video loading logic to  image_util

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* update

* add idefics3-based tests

* add keyword to all

* add PreTrainedModel

* updateing video loading logic

* working inference

* updates for PR comments

* updates for PR comments

* moving SmolVLMPretrainedModel higher to fix import error

* CI test pass

* CI test pass

* removing lambda

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* processor tests

* add example in docs

* typo

* fix copies

* skip compile tests - sdpa for VisionTransformer

* fix init

* raise import error for num2words

* update doc for FA2

* more doc fix

* CI

* updates for PR comments

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc

* adding smolvlm to VQA models

* removing vqa auto class

* Update src/transformers/models/smolvlm/processing_smolvlm.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* removing smolvlmvisiontransformer from index.md

* my bad, video processing had typos

* fixing docs

* renaming params in SmolVLMModel.inputs_merger

* removing un-needed dtype/device in model forward

* ruff for CI

* update docs

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* return cache position

* return cache position

* return cache also in modular

* needed to run modular again

* fix training tests

* push vectorized inputs merger

* format

* format

* reduce number of mappings

* addressing PR comments

* happy CI, happy me :)

* skip non-nested images

* adjust integration test for smaller GPUs

* format

* fix kwargs in chat template apply

* skip this for now

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-02-20 15:00:26 +01:00
Yih-Dar
f2ab182dca
Ignore conversion files in test fetcher (#36251)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-20 13:32:02 +01:00
Joao Gante
99adc74462
[tests] remove flax-pt equivalence and cross tests (#36283) 2025-02-19 15:13:27 +00:00
Joao Gante
0863eef248
[tests] remove pt_tf equivalence tests (#36253) 2025-02-19 11:55:11 +00:00
Yih-Dar
0a9923a609
Use args.num_workers in check_modular_conversion.py (#36200)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-14 17:31:03 +01:00
Yih-Dar
8fd4bc7d1d
Fix a mistake in #36175 (#36179)
fix my bad

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 18:33:02 +01:00
Yih-Dar
bfe46c98b5
Make check_repository_consistency run faster by MP (#36175)
* speeddddd

* speeddddd

* speeddddd

* speeddddd

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 17:25:17 +01:00
Joao Gante
d114a6f78e
[Modular] skip modular checks based on diff (#36130)
skip modular checks based on diff
2025-02-13 12:53:21 +00:00
Yih-Dar
4a5a7b991a
Fix test fetcher (#36129)
* fix

* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-12 17:35:41 +01:00
Joao Gante
8a2f062eac
[commands] remove deprecated/inoperational commands (#35718)
rm deprecated/inoperational commands
2025-02-12 12:23:58 +00:00
kkscilife
09261ccf12
[Bugfix] fix file name of docstring in utils/check_table.py (#36108)
fix file name

Co-authored-by: kkscilife <qa-caif-cicd@pjlab.org.cn>
2025-02-10 15:48:02 +00:00
Jade Choghari
006d9249ec
Adding RT-DETRv2 for object detection (#34773)
* cookiecutter add rtdetrv2

* make modular working

* working modelgit add .

* working modelgit add .

* finalize moduar inheritence

* finalize moduar inheritence

* Update src/transformers/models/rtdetrv2/modular_rtdetrv2.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* update modular and add rename

* remove output ckpt

* define loss_kwargs

* fix CamelCase naming

* fix naming + files

* fix modular and convert file

* additional changes

* fix modular

* fix import error (switch to lazy)

* fix autobackbone

* make style

* add

* update testing

* fix loss

* remove old folder

* fix testing for v2

* update docstring

* fix docstring

* add resnetv2 (with modular bug to fix)

* remove resnetv2 backbone

* fix changes

* small fixes

* remove rtdetrv2resnetconfig

* add rtdetrv2 name to convert

* make style

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix modular typo after review

* add reviewed changes

* add final review changes

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/__init__.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/convert_rt_detr_v2_weights_to_hf.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add review changes

* remove rtdetrv2 resnet

* removing this weird project change

* change ckpt name from jadechoghari to author

* implement review and update testing

* update naming and remove wrong ckpt

* name

* make fix-copies

* Fix RT-DETR loss

* Add resources, fix name

* Fix repo in docs

* Fix table name

---------

Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: qubvel <qubvel@gmail.com>
2025-02-06 19:28:45 +00:00
Yoni Gozlan
fa56dcc2ab
Refactoring of ImageProcessorFast (#35069)
* add init and base image processing functions

* add add_fast_image_processor to transformers-cli

* add working fast image processor clip

* add fast image processor to doc, working tests

* remove "to be implemented" SigLip

* fix unprotected import

* fix unprotected vision import

* update ViTImageProcessorFast

* increase threshold slow fast ewuivalence

* add fast img blip

* add fast class in tests with cli

* improve cli

* add fast image processor convnext

* add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision

* add device kwarg to ImagesKwargs for fast processing on cuda

* cleanup

* fix unprotected import

* group images by sizes and add batch processing

* Add batch equivalence tests, skip when center_crop is used

* cleanup

* update init and cli

* fix-copies

* refactor convnext, cleanup base

* fix

* remove patching mixins, add piped torchvision transforms for ViT

* fix unbatched processing

* fix f strings

* protect imports

* change llava onevision to class transforms (test)

* fix convnext

* improve formatting (following Pavel review)

* fix handling device arg

* improve cli

* fix

* fix inits

* Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs

* uniformize qwen2_vl fast

* fix docstrings

* add add fast image processor llava

* remove min_pixels max_pixels from accepted size

* nit

* nit

* refactor fast image processors docstrings

* cleanup and remove fast class transforms

* update add fast image processor transformers cli

* cleanup docstring

* uniformize pixtral fast and  make _process_image explicit

* fix prepare image structure llava next/onevision

* Use typed kwargs instead of explicit args

* nit fix import Unpack

* clearly separate pops and gets in base preprocess. Use explicit typed kwargs

* make qwen2_vl preprocess arguments hashable
2025-02-04 17:52:31 -05:00
David
8d73a38606
Add DAB-DETR for object detection (#30803)
* initial commit

* encoder+decoder layer changes WIP

* architecture checks

* working version of detection + segmentation

* fix modeling outputs

* fix return dict + output att/hs

* found the position embedding masking bug

* pre-training version

* added iamge processors

* typo in init.py

* iterupdate set to false

* fixed num_labels in class_output linear layer bias init

* multihead attention shape fixes

* test improvements

* test update

* dab-detr model_doc update

* dab-detr model_doc update2

* test fix:test_retain_grad_hidden_states_attentions

* config file clean and renaming variables

* config file clean and renaming variables fix

* updated convert_to_hf file

* small fixes

* style and qulity checks

* return_dict fix

* Merge branch main into add_dab_detr

* small comment fix

* skip test_inputs_embeds test

* image processor updates + image processor test updates

* check copies test fix update

* updates for check_copies.py test

* updates for check_copies.py test2

* tied weights fix

* fixed image processing tests and fixed shared weights issues

* added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py

* delete prints from test file

* SafeTensor modification to solve HF Trainer issue

* removing the safetensor modifications

* make fix copies and hf uplaod has been added.

* fixed index.md

* fixed repo consistency

* styel fix and dabdetrimageprocessor docstring update

* requested modifications after the first review

* Update src/transformers/models/dab_detr/image_processing_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* repo consistency has been fixed

* update copied NestedTensor function after main merge

* Update src/transformers/models/dab_detr/modeling_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* temp commit

* temp commit2

* temp commit 3

* unit tests are fixed

* fixed repo consistency

* updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file.

* temporarialy config modifications and repo consistency fixes

* Put dilation parameter back to config

* pattern embeddings have been added to the rename_keys method

* add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES

* delete FeatureExtractor part from docs.md

* requested modifications in modeling_dab_detr.py

* [run_slow] dab_detr

* deleted last segmentation code part, updated conversion script and changed the hf path in test files

* temp commit of requested modifications

* temp commit of requested modifications 2

* updated config file, resolved codepaths and refactored conversion script

* updated decodelayer block types and refactored conversion script

* style and quality update

* small modifications based on the request

* attentions are refactored

* removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed

* deleted imageprocessor

* fixed conversion script + quality and style

* fixed config_att

* [run_slow] dab_detr

* changing model path in conversion file and in test file

* fix Decoder variable naming

* testing the old loss function

* switched back to the new loss function and testing with the odl attention functions

* switched back to the new last good result modeling file

* moved back to the version when I asked the review

* missing new line at the end of the file

* old version test

* turn back to newest mdoel versino but change image processor

* style fix

* style fix after merge main

* [run_slow] dab_detr

* [run_slow] dab_detr

* added device and type for head bias data part

* [run_slow] dab_detr

* fixed model head bias data fill

* changed test_inference_object_detection_head assertTrues to torch test assert_close

* fixes part 1

* quality update

* self.bbox_embed in decoder has been restored

* changed Assert true torch closeall methods to torch testing assertclose

* modelcard markdown file has been updated

* deleted intemediate list from decoder module

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-04 17:28:27 +00:00
Yih-Dar
014a1fa2c8
CircleCI with python 3.9 (#36027)
update docker files

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 17:40:20 +01:00
Yih-Dar
f19bfa50e7
Commont bot CI for other jobs (generation / quantization) (#35341)
* quantization CI on PRs

* fix

* fix

* add 2 members

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 14:42:51 +01:00
Cyril Vallez
9afb904b15
Refactor (and fix) gpt_neox (#35610)
* start a nice modular

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* update

* Update modular_gpt_neox.py

* convert

* fix attribute

* fix attrs

* oups

* fix

* fix

* fix

* fix

* fix

* fix order to pass test (see with accelerate team)

* trigger CIs

* modular

* update

* up

* Update test_modeling_gpt_neox.py

* Update test_modeling_gpt_neox.py

* trigger CIs

* correctly pass arg

* simplify

* remove key warning

* update tp -> it's compatible since the view is before

* trigger CIs
2025-02-04 11:18:43 +01:00
ShuaiBai623
f3f6c86582
add qwen2.5vl (#35569)
* add qwen2.5vl

* fix

* pass check table

* add modular file

* fix style

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* padd copy check

* use modular

* fix

* fix

* fix

* update flashatt2&sdpa support_list

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update config

* update

* fix hf path

* rename Qwen2_5_VLVideosKwargs

* fix

* fix

* update

* excuted modular

* rollback init

* fix

* formated

* simpler init

* fix

* fix

* fix

* fix

* fix

* update docs

* fix

* fix

* update Qwen2VLRotaryEmbedding for yarn

* fix

---------

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: gewenbin0992 <gewenbin292@163.com>
Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>
2025-01-23 11:23:00 +01:00
Joao Gante
90b46e983f
Remove old benchmark code (#35730)
* remove traces of the old deprecated benchmarks

* also remove old tf benchmark example, which uses deleted code

* run doc builder
2025-01-21 17:56:43 +00:00
Cyril Vallez
e867b97443
Deterministic sorting in modular converter when adding new functions (#35795)
deterministic sort
2025-01-21 09:38:48 +01:00
Nikos Antoniou
920f34a772
modular_model_converter bugfix on assignments (#35642)
* added bugfix in modular converter to keep modular assignments for docstrings, expected outputs etc.

* revert stracoder2 docstring copying, add forward in EMU3 to enable docstring assingment, remove verbatim assignments in modular converter

* added _FOR_DOC in assignments to keep, corrected wrong checkpoint name in ijepa's configuration
2025-01-21 08:06:44 +01:00
Pavel Iakubovskii
94ae9a8da1
OwlViT/Owlv2 post processing standardization (#34929)
* Refactor owlvit post_process_object_detection + add text_labels

* Fix copies in grounding dino

* Sync with Owlv2 postprocessing

* Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection

* Add test cases

* Move text_labels to processors only

* [run-slow] owlvit owlv2

* [run-slow] owlvit, owlv2

* Update snippets

* Update docs structure

* Update deprecated objects for check_repo

* Update docstring for post processing of image guided object detection
2025-01-17 13:58:28 +00:00
Joao Gante
aaa969e97d
Remove pt_to_tf (#35672)
* rm command

* remove exception
2025-01-16 17:03:37 +00:00
Joao Gante
80dbbd103c
🧹 remove generate-related objects and methods scheduled for removal in v4.48 (#35677)
* remove things scheduled for removal

* make fixup
2025-01-16 17:03:20 +00:00
Cyril Vallez
91be6a5eb2
Modular: support for importing functions from any file (#35692)
* fix function imports

* improve comment

* Update modeling_switch_function.py

* make checks more robust

* improvement

* rename

* final test update
2025-01-16 16:37:53 +00:00
Raushan Turganbay
52e1f87c7d
[WIP] Emu3: add model (#33770)
* model can convert to HF and be loaded back

* nit

* works in single batch generation but hallucinates

* use the image tokens

* add image generation

* now it works

* add tests

* update

* add modulare but it doesn't work for porting docstring :(

* skip some tests

* add slow tests

* modular removed the import?

* guess this works

* update

* update

* fix copies

* fix test

* fix copies

* update

* docs

* fix tests

* last fix tests?

* pls

* repo consistency

* more style

* style

* remove file

* address comments

* tiny bits

* update after the new modular

* fix tests

* add one more cond in check attributes

* decompose down/up/mid blocks

* allow static cache generation in VLMs

* nit

* fix copies

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix VAE upsampling

* Update src/transformers/models/emu3/modular_emu3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* state overwritten stuff explicitly

* fix copies

* add the flag for flex attn

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-10 12:23:00 +01:00
Cyril Vallez
46276f9a7f
Fix modular edge case + modular sorting order (#35562)
* look-ahead negation

* re add examples by default

* Fix the bug in topological sort

* Update create_dependency_mapping.py

* start adding test

* finalize test

* more tests

* style

* style
2025-01-09 17:17:52 +01:00
Cyril Vallez
965a2fb320
More model refactoring! (#35359)
* cohere

* style

* phi3

* style

* small fix

* small fix

* phi3 longrope

* oups

* Update rope (only for phi3 still)

* Update test_modeling_rope_utils.py

* Update modeling_phi3.py

* fix

* fix copies

* style

* Fix copied from bad renaming
2025-01-09 11:09:09 +01:00
NielsRogge
8490d3159c
Add ViTPose (#30530)
* First draft

* Make fixup

* Make forward pass worké

* Improve code

* More improvements

* More improvements

* Make predictions match

* More improvements

* Improve image processor

* Fix model tests

* Add classic decoder

* Convert classic decoder

* Verify image processor

* Fix classic decoder logits

* Clean up

* Add post_process_pose_estimation

* Improve post_process_pose_estimation

* Use AutoBackbone

* Add support for MoE models

* Fix tests, improve num_experts%

* Improve variable names

* Make fixup

* More improvements

* Improve post_process_pose_estimation

* Compute centers and scales

* Improve postprocessing

* More improvements

* Fix ViTPoseBackbone tests

* Add docstrings, fix image processor tests

* Update index

* Use is_cv2_available

* Add model to toctree

* Add cv2 to doc tests

* Remove script

* Improve conversion script

* Add coco_to_pascal_voc

* Add box_to_center_and_scale to image_transforms

* Update tests

* Add integration test

* Fix merge

* Address comments

* Replace numpy by pytorch, improve docstrings

* Remove get_input_embeddings

* Address comments

* Move coco_to_pascal_voc

* Address comment

* Fix style

* Address comments

* Fix test

* Address comment

* Remove udp

* Remove comment

* [WIP] need to check if the numpy function is same as cv

* add scipy affine_transform

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* refactor convert

* add output_shape

* add atol 5e-2

* Use hf_hub_download in conversion script

* make box_to_center more applicable

* skipt test_get_set_embedding

* fix to accept array and fix CI

* add co-contributor

* make it to tensor type output

* add torch

* change to torch tensor

* add more test

* minor change

* CI test change

* import torch should be above ImageProcessor

* make style

* try not use torch in def

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix

* fix

* add caution

* make more detail about dataset_index

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add docs

* Update docs/source/en/model_doc/vitpose.md

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Revert "Update src/transformers/__init__.py"

This reverts commit 7ffa504450.

* change name

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vitpose/test_modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move vitpose only function to image_processor

* raise valueerror when using timm backbone

* use out_indices

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove camel-case of def flip_back

* rename vitposeEstimatorOutput

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix confused camelcase of MLP

* remove in-place logic

* clear scale description

* make consistent batch format

* docs update

* formatting docstring

* add batch tests

* test docs change

* Update src/transformers/models/vitpose/image_processing_vitpose.py

* Update src/transformers/models/vitpose/configuration_vitpose.py

* chagne ViT to Vit

* change to enable MoE

* make fix-copies

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* extract udp

* add more described docs

* simple fix

* change to accept target_size

* make style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change to `verify_backbone_config_arguments`

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove unnecessary copy

* make config immutable

* enable gradient checkpointing

* update inappropriate docstring

* linting docs

* split function for visibility

* make style

* check isinstances

* change to acceptable use_pretrained_backbone

* make style

* remove copy in docs

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix + make style

* change input config of activation function to string

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* tmp docs

* delete index.md

* make fix-copies

* simple fix

* change conversion to sam2/mllama style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* refactor convert

* add supervision

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* remove reduntant def

* seperate code block for visualization

* add validation for num_moe

* final commit

* add labels

* [run-slow] vitpose, vitpose_backbone

* Update src/transformers/models/vitpose/convert_vitpose_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* enable all conversion

* final commit

* [run-slow] vitpose, vitpose_backbone

* ruff check --fix

* [run-slow] vitpose, vitpose_backbone

* rename split module

* [run-slow] vitpose, vitpose_backbone

* fix pos_embed

* Simplify init

* Revert "fix pos_embed"

This reverts commit 2c56a4806e.

* refactor single loop

* allow flag to enable custom model

* efficiency of MoE to not use unused experts

* make style

* Fix range -> arange to avoid warning

* Revert MOE router, a new one does not work

* Fix postprocessing a bit (labels)

* Fix type hint

* Fix docs snippets

* Fix links to checkpoints

* Fix checkpoints in tests

* Fix test

* Add image to docs

---------

Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 16:02:14 +00:00
Yoni Gozlan
651cfb400f
Add support for modular with fast image processors (#35379)
* Add support for modular with fast image processors

* fix order and remove copied from

* add comment for "image_processing*_fast"
2025-01-08 08:37:57 -05:00
Raushan Turganbay
d1681ec2b6
VLMs: major clean up 🧼 (#34502)
only lllava models are modified
2025-01-08 10:35:23 +01:00
Jade Choghari
7176e06b52
Add TextNet (#34979)
* WIP

* Add config and modeling for Fast model

* Refactor modeling and add tests

* More changes

* WIP

* Add tests

* Add conversion script

* Add conversion scripts, integration tests, image processor

* Fix style and copies

* Add fast model to init

* Add fast model in docs and other places

* Fix import of cv2

* Rename image processing method

* Fix build

* Fix Build

* fix style and fix copies

* Fix build

* Fix build

* Fix Build

* Clean up docstrings

* Fix Build

* Fix Build

* Fix Build

* Fix build

* Add test for image_processing_fast and add documentation tests

* some refactorings

* Fix failing tests

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Introduce TextNet

* Fix failures

* Refactor textnet model

* Fix failures

* Add cv2 to setup

* Fix failures

* Fix failures

* Add CV2 dependency

* Fix bugs

* Fix build issue

* Fix failures

* Remove textnet from modeling fast

* Fix build and other things

* Fix build

* some cleanups

* some cleanups

* Some more cleanups

* Fix build

* Incorporate PR feedbacks

* More cleanup

* More cleanup

* More cleanup

* Fix build

* Remove all the references of fast model

* More cleanup

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix Build

* Fix build

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Fix style

* Fix build

* Incorporate PR feedbacks

* Fix image processing mean and std

* Incorporate PR feedbacks

* fix build failure

* Add assertion to image processor

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* fix style failures

* fix build

* Fix Imageclassification's linear layer, also introduce TextNetImageProcessor

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix build

* Incorporate PR feedbacks

* Remove some script

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix image processing in textnet

* Incorporate PR Feedbacks

* Fix CI failures

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Add textnet to readme

* Improve readability

* Incorporate PR feedbacks

* fix code style

* fix key error and convert working

* tvlt shouldn't be here

* fix test modeling test

* Fix tests, make fixup

* Make fixup

* Make fixup

* Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_image_processing_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* space typo

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/configuration_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* make conv layer kernel sizes and strides default to None

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix keyword bug

* add batch init and make fixup

* Make fixup

* Update integration test

* Add figure

* Update textnet.md

* add testing and fix errors (classification, imgprocess)

* fix error check

* make fixup

* make fixup

* revert to original docstring

* add make style

* remove conflict for now

* Update modeling_auto.py

got a confusion in `timm_wrapper` - was giving some conflicts

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* add changes

* Update textnet.md

* add doc

* add authors hf ckpt + rename

* add feedback: classifier/docs

---------

Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com>
Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 09:52:51 +01:00
Lysandre Debut
b2f2977533
Applies the rest of the init refactor except to modular files (#35238)
* [test_all] Applies the rest of the init refactor except to modular files

* Revert modular that doesn't work

* [test_all] TFGPT2Tokenizer
2025-01-05 18:30:08 +01:00
NielsRogge
6e0515e99c
Add DINOv2 with registers (#35348)
* added changes from 32905

* fixed mistakes caused by select all paste

* rename diff_dinov2...

* ran tests

* Fix modular

* Fix tests

* Use new init

* Simplify drop path

* Convert all checkpoints

* Add figure and summary

* Update paths

* Update docs

* Update docs

* Update toctree

* Update docs

---------

Co-authored-by: BernardZach <bernardzach00@gmail.com>
Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com>
2024-12-24 13:21:59 +01:00
Arthur
6fae2a84ae
Update test fetcher when we want to test all (#35364)
* [test-all]

* style

* [test-all]

* [test_all]

* [test_all]

* style
2024-12-20 15:10:43 +01:00
Yu Chin Fabian Lim
9613933b02
Add the Bamba Model (#34982)
* initial commit for PR

Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>

* rename dynamic cache

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add more unit tests

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add integration test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add integration test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Add modular bamba file

* Remove trainer changes from unrelated PR

* Modify modular and cofig to get model running

* Fix some CI errors and beam search

* Fix a plethora of bugs from CI/docs/etc

* Add bamba to models with special caches

* Updat to newer mamba PR for mamba sublayer

* fix test_left_padding_compatibility

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix remaining tests

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* missed this test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* ran make style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* move slow tag to integration obj

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* make style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* address comments

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix modular

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* left out one part of modular

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* change model

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Make Rotary modular as well

* Update bamba.md

Added overview, update Model inference card and added config

* Update bamba.md

* Update bamba.md

* Update bamba.md

Minor fixes

* Add docs for config and model back

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Add warning when using fast kernels

* replaced generate example

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Address comments from PR

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Propagate attention fixes

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Fix attention interfaces to the new API

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Fix API for decoder layer

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Remove extra weights

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

---------

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com>
Co-authored-by: Antoni Viros <ani300@gmail.com>
2024-12-18 20:18:17 +01:00
Arthur
2c47618c1a
🚨All attention refactor🚨 (#35235)
* refactor LlamaAttention

* minimal changes

* fix llama

* update

* modular gemmas

* modular nits

* modular updates

* nits

* simplify

* gpt2

* more modualr and fixes

* granite

* modular modular modular

* nits

* update

* qwen2 + starcoder2

* mostly gemma2

* Update image_processing_auto.py

* fix

* Update modular_starcoder2.py

* fix

* remove all copied from attentions

* remove gcv

* make fix-copies

* oups

* oups2.0

* fix some modulars + all copied from

* should be good now

* revert unwanted changes

* Update modeling_decision_transformer.py

* finish cleanup

* Update modeling_olmo.py

* consistency

* re-add gradient checkpointing attribute

* fix

* style

* make config necessary

* bis

* bis

* Update modeling_my_new_model2.py

* is_causal attr

* fix

* remove past kv return from decoder layer

* fix

* default rope config

* correctly fix rope config

* fix bias

* fix gpt2 attention output

* fix test

* fix inits

* fix default sdpa

* fix default sdpa implementation

* harmonize classes

* fix mistral

* fix sliding window models

* mixtral

* be more explicit

* style

* fix

* several fixes

* Update modeling_dbrx.py

* fix test

* olmo + phi

* rotary

* syle

* phi

* phi again

* again

* kwargs

* Update test_modeling_common.py

* skip fx tracing tests

* Update modeling_utils.py

* gemma 2

* again

* Update modeling_recurrent_gemma.py

* gemma2

* granite

* style

* starcoder

* Update sdpa_attention.py

* switch args

* Update modeling_mllama.py

* fix

* cache type tests

* gpt2

* Update test_modeling_common.py

* fix

* consistency

* fix shape with encoder

* should be the last one

* tests non model

* most comments

* small oupsi

* be more explicit in modulars

* more explicit modulars

* CIs! it works locally

* add kwargs to _flash_attention_forward

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
Yih-Dar
f1b7634fc8
Trigger GitHub CI with a comment on PR (#35211)
* fix

* fix

* comment

* final

* final

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-18 13:56:49 +01:00
Matt
e0ae9b5974
🚨🚨🚨 Delete conversion scripts when making release wheels (#35296)
* Delete conversion scripts when making release wheels

* make fixup

* Update docstring
2024-12-17 14:18:42 +00:00
Billel Mokeddem
6c08b3b6e5
Add Falcon3 documentation (#35307)
* Add Falcon3 documentation

* Update Falcon3 documentation

* Change Falcon to Falcon3

* Update docs and run make fix-copies

* Add blog post and huggingface models links
2024-12-17 14:23:13 +01:00