Commit Graph

272 Commits

Author SHA1 Message Date
Joao Gante
beaed8ce01
[generate] move SinkCache to a custom_generate repo (#38399)
remove sink cache
2025-06-02 12:13:30 +02:00
Raushan Turganbay
17742bd9c8
🔴 [VLM] Add base model without head (#37033)
* i guessreverted all CdGen classes

* style

* llava onevision

* fix copies

* fix some tests

* some more tests

* dump

* skip these

* nevermind, i am dumb

* revert fix not needed

* fixup

* fixup

* another fixup

* more fixup to make ci finally happy

* fixup after rebasing

* fix qwen tests

* add internVL + typos here and there

* image token index -> id

* style

* fix init weights

* revert blip-2 not supported

* address comments

* fix copies

* revert blip2 test file as well

* as discussed internally, revert back CdGen models

* fix some tests

* fix more tests for compile

* CI red

* fix copies

* enumerate explicitly allowed models

* address comments

* fix tests

* fixup

* style again

* add tests for new model class

* another fixup ( x _ x )

* [fixup] unused attributes can be removed post-deprecation
2025-05-07 17:47:51 +02:00
eustlb
798f948e88
Add CSM model (#36719)
* draft structure

* depth decoder with forward pre hook

* full model forward draft

* draft update

* depth decoder update

* ConversationalSpeechModelForCausalLM udpates

* add generate

* max length criteria small fix

* udpate

* updates

* generation update

* update in loss compute

* conversion script

* update for correct input embeddings

* handle interleaved rope

* update

* update

* update

* support compile

* update training

* add doc

* update doc

* correct inits

* ConversationalSpeechModel -> Csm

* conf update

* name update

* tests CsmForCausalLMTest

* convert use cached_file

* conf + modeling updates

* generate utils handle third dim shape

* integration test

* modeling + conf updates

* common test handle more than 2 dims

* add nested audio list utils

* processing handle nested audio list

* csm processing draft

* mimi util

* init updates

* modular update

* convert modular

* processing update

* csm tests update

* generate tests handle third dim

* generate utils handle third dim

* propagate _get_initial_cache_position update

* tied_weight_keys update + convert correctly

* fix inputs_embeds

* revert audio nested list

* batch inference update + return audio

* audio_utils update

* processor update

* some more integration tests

* remove old test

* porcessing output labels

* improve

* fix

* update rope values with equivalent ones

* conversion update

* udpate tests

* handle depth decoder generation config

* remove default eos_token_id

* make style

* revert modeling_mimi

* add default generation_config

* remove sdpa since handled by default

* make

* fix conflict

* fix conflicts

* correct naming

* correct imports

* make

* causal -> conditional naming

* causal -> conditional naming

* auto update

* make

* make

* add doc

* test update

* fix weight init

* audio tokens offsets as buffer

* 4d mask in conditional class

* make

* doc update

* fix causal mask

* fix causal mask

* doc update

* doc update

* add processor doc

* update doc

* fix 4d causal mask

* update make_list_of_audio

* do not default to mutable

* remove duplicates

* remove useless reset_parameters

* use GradientCheckpointingLayer

* use can_return_tuple

* formatting

* prepend placeholder in _sample

* torch compile fix

* some more fixies

* convert modular

* fix

* default max_length in convert

* handle depth decoder generation config correctly

* clearer formulation

* handle output_loading_info

* handle softmax warning

* add doc

* propagate _get_initial_cache_position changes

* generation in its own module

* add processor tests

* fix compile witu cuda graphs

* fix compile with cuda graphs

* add csm.md

* include CSM loss

* doc nit

* doc nit

* doc nit

* Update docs/source/en/model_doc/csm.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add save_audio to processor

* Update src/transformers/models/csm/modular_csm.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* doc update

* simplify audio_codes_mask computation

* doc update

* simplify loss computation

* fix static cache test

* fix

* remove comment

* simplify encoded length computation

* use hf-internal-testing

* doc update

* cast to float before numpy

* nit

* mem efficient codebook head

* nit

* cat input values with cutoffs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-07 10:20:13 -04:00
sushmanth reddy
65e940208c
Samhq model addition (#35147)
* added the configuartion for sam_hq

* added the modeelling for sam_hq

* added the sam hq mask decoder with hq features

* added the code for the samhq

* added the code for the samhq

* added the code for the samhq

* Delete src/transformers/models/sam_hq/modelling_sam_hq.py

* added the code for the samhq

* added the code for the samhq

* added the chnages for the modeelling

* added the code for sam hq for image processing

* added code for the sam hq model

* added the required changes

* added the changes

* added the key mappings for the sam hq

* adding the working code of samhq

* added the required files

* adding the pt object

* added the push to hub account

* added the args for the sam maks  decoder

* added the args for the sam hq vision config

* aded the some more documentation

* removed the unecessary spaces

* all required chnages

* removed the image processor

* added the required file

* added the changes for the checkcopies

* added the code for modular file

* added the changes for the __init file

* added the code for the interm embeds

* added the code for sam hq

* added the changes for modular file

* added the test file

* added the changes required

* added the changes required

* added the code for the

* added the cl errors

* added the changes

* added the required changes

* added the some code

* added the code for the removing image processor

* added the test dimensins

* added the code for the removing extra used variables

* added the code for modeluar file hf_mlp for a better name

* removed abbrevaation in core functionality

* removed abbrevaation in core functionality

* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory

* added the code which is after make fixup

* added some test for the intermediate embeddings test

* added the code for the torch support in sam hq

* added the code for the updated modular file

* added the changes for documentations as mentioned

* removed the heading

* add the changes for the code

* first mentioned issue resolved

* added the changes code to processor

* added the easy loading to init file

* added the changes to code

* added the code to changes

* added the code to work

* added the code for sam hq

* added the code for sam hq

* added the code for the point pad value

* added the small test for the image embeddings and intermediate embedding

* added the code

* added the code

* added the code for the tests

* added the code

* added ythe code for the processor file

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code for tests and some checks

* added some code

* added the code

* added the code

* added some code

* added some code

* added the changes for required

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added some changes

* added some changes

* removed spaces and quality checks

* added some code

* added some code

* added some code

* added code quality checks

* added the checks for quality checks

* addded some code which fixes test_inference_mask_generation_no_point

* added code for the test_inference_mask_generation_one_point_one_bb

* added code for the test_inference_mask_generation_one_point_one_bb_zero

* added code for the test_inference_mask_generation_one_box

* added some code in modelling for testing

* added some code which sort maks with high score

* added some code

* added some code

* added some code for the move KEYS_TO_MODIFY_MAPPING

* added some code for the  unsqueeze removal

* added some code for the  unsqueeze removal

* added some code

* added some code

* add some code

* added some code

* added some code

* added some testign values changed

* added changes to code in sam hq for readbility purpose

* added pre commit checks

* added the fix samvisionmodel for compatibilty

* added the changes made on sam by cyyever

* fixed the tests for samhq

* added some the code

* added some code related to init file issue during merge conflicts

* remobved the merge conflicts

* added changes mentioned by aruther and mobap

* added changes mentioned by aruther and mobap

* solving quality checks

* added the changes for input clearly

* added the changes

* added changes in mask generation file rgearding model inputs and  sam hq quargs  in processor file

* added changes in processor file

* added the  Setup -> setupclass conversion

* added the code mentioned for processor

* added changes for the code

* added some code

* added some code

* added some code

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-04-28 19:07:09 +02:00
Yoni Gozlan
a245011252
Add InternVL (2.5 MPO) (#35968)
* initial commit

* add convert internvl

* add first end-to-end working internvl

* nit prompt and image proc

* add working chat template

* add conversion llama-based models

* add tests

* pass all tests

* fix isort

* fix modular after main merge

* add video processing for internvl

* add support for interlaced images and videos

* Remove processing and config from modular, add more tests

* add llama model tests

* Modify processor for compatibility with refactored got ocr image processor

* add comments in processor

* Add docs and nits

* change video processing to use custom sample_indices_fn

* rebase and fix tests

* add processor tests

* Add changes Raushan review

* Use the new attention interface for the vision model

* nits

* add support for custom video_load_backend

* remove mention to InternVLTokenizer

* refactor vision model to simplify logic

* refactor processor for better readibility

* fix copies

* fix require av processor test

* refactor internVL vision

* Update processor and fix processing tests

* fix docstring

* update convert_weights for internvl3

* change image processor to fast by default

* remove do_center_crop=True in convert_weights

* force use_cache to True

* push_to_hub before reloading

* fix internVLVision for larger models

* update convert weight for qk norm

* fix convert_weights

* fix eos_token_id in convert

* update docs and integration tests

* make modifs after review

* fix wrong k_norm and reduce modular

* change image_token_index to image_token_id

* change checkpoint to OpenGVLab org

* last nits

* explicitely del self.num_key_value_groups

* add extra special tokens
2025-04-18 18:57:33 +02:00
Yaswanth Gali
a2ef3cf537
Add Janus model (#36053)
* Iterative generation using input embeds

* Add Janus model

* discard changes

* Janus imports

* Refactor config and processor

* Added Vision tower of Janus

* Import Janus Image processor

* Vision tower fixes

* Refactor code

* Added VQ Model

* Complete model integration

* temp conversion script

* processor refactor

* Adding files to facilitate pulling

* Fixes after debugging

* Skip test for these models

* Add Janus Model

* discard changes

* Janus imports

* Refactor config and processor

* Added Vision tower of Janus

* Import Janus Image processor

* Vision tower fixes

* Refactor code

* Added VQ Model

* Complete model integration

* temp conversion script

* processor refactor

* Adding files to facilitate pulling

* Fixes after debugging

* Refactor to Text config

*  Added generate function

* Saving intermediate convert file. Still need to read configs from the hub and convert them to our format.

* Adding version that reads from the JSON files. Still have to tweak some parameters manually.

* relative imports

* Initial tests

* Refactor image processor

* Seemingly working version of the conversion script, will need to test further.

* Adding command message

* Fixing conflicting JanusTextConfig class

* Incorporating some of the discussed changes.

* Small fix to create dir.

* Removing system from JINJA template

* Adding draft processor tests

* style fixes

* Minor fixes and enhancement

* added generation config

* Initial tests

* Small modifications, tests are now passing.

* Small changes I noticed while reading code.

* more fixes

* Added JanusModel class

* Small merge adaptations

* Small merge adaptations

* Image processing tests passing

* More tests and fixes

* Convert script updated and refactored

* Tests and cleanup

* make style

* Postprocessing for image generation

* generate refactor

* fixes

* - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload)
- Passing tests of dispatching SDPA
- Only gradient checkpointing tests are left.

* Removing temporary code

* Changes

* Writing change to modular

* Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next

* Gradient checkpoint tests passing

* Removing debug code

* Major generate refactor 😮‍💨

* Temp changes for testing

* Green quality CI

* 2 out of 4 integration tests passing

* breadcrumbs

* Usage Examples

* Regenerate modeling after merge

* dirty code

* JanusIntegrationTest are passing

* breadcrumbs

* happy CI

* fixes

* Changing template

* nits

* Text generation logits matching original codebase at 100% precision

* Remove ./tmp from git tracking

* Remove ./tmp from git tracking

* Checkpointing changes after reviewing

* Fixing code in docstrings

* CHanging comments and small bug in convert file

* Fixing bug in image_token_id for 7B version

* Removing line that was added by both of us

* Pushing changes after discussion. Only one left is to change the key mapping for convert file.

* Updating module file

* New convert file using dict. Tested that it is equivalent to the old one by:
- comparing keys in a script
- comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test.

* revert changes

* mistake

* consistency change for CI

* make style

* doc fixes

* more fixes

* experimenting with masking out pad token

* checkpoint

* Batched generation with multi-images working for 1B models. Will test 7B next.

* Device fix.

* Writing changes to modular, previous ones were written to modeling just for quick testing.

* Using passed processor attention mask (only in modeling for now)

* Matching performance done in the non-standard way

* Working version of batched generation. Will change how some args are passed to make it more similar to language case

* More compliant version of the code

* Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position`

* Updating modular file, making masked filling with paddings more efficient

* Slightly more efficient version

* Modifying JanusVisionModel to be a wrapper

* Fixing test to comply with new names

* Modular overhaul

* More refactoring

* - Changing JanusVisionModel back
- Changing forward pass
- Adding boi token to the comparison

* - Removing whole context model_ids
- Using inherited implementation of prepare_inputs_for_generation

* Moving the way boi token is passed to the model

* Fixing sdpa test

* Minor changes

* testing changes

* Minor fix

* - Adding postprocessing test
- checking values of generated image on integration test

* changes

* Removing pooled attention vision module, fixing convert script as a consequence

* More changes

* Fixes

* Draft after merge

* Bug fixes

* More bug fix

* Fixing docs

* Nits

* Refactor return dict

* Moving image post processing test to main processor post process

* Passing guidance_scale as kwarg

* make style

* 🔥 refactor

* make style

* Update and green CI

* Nits and tests update

* up

* Added MID block

* fix

* Dead code

* update testcase

* update

* model_id change

* init_weight changes

---------

Co-authored-by: hsilva664 <metallic-silver@hotmail.com>
2025-04-17 09:18:51 +02:00
Jinan Zhou
a91020aed0
Add TimesFM Time Series Forecasting Model (#34082)
* initial documentation

* rename mask to attention_mask

* smaller tests

* fixup

* fix copies

* move to time series section

* sort docs

* isort fix

* batch_size is not a configuration

* rename to TimesFMModelForPrediction

* initial script

* add check_outputs

* remove dropout_rate

* works with torch.Tensor inputs

* rename script

* fix docstrings

* fix freq when window_size is given

* add loss

* fix _quantile_loss

* formatting

* fix isort

* add weight init

* add support for sdpa and flash_attention_2

* fixes for flash_attention

* formatting

* remove flash_attention

* fix tests

* fix file name

* fix quantile loss

* added initial TimesFMModelIntegrationTests

* fix formatting

* fix import order

* fix _quantile_loss

* add doc for SDPA

* use timesfm 2.0

* bug fix in timesfm decode function.

* compare mean forecasts

* refactor type hints, use CamelCase

* consolidate decode func

* more readable code for weight conversion

* fix-copies

* simpler init

* renaem TimesFmMLP

* use T5LayerNorm

* fix tests

* use initializer_range

* TimesFmModel instead of TimesFmDecoder

* TimesFmPositionalEmbedding takes config for its init

* 2.0-500m-pytorch default configs

* use TimesFmModel

* fix formatting

* ignore TimesFmModel for testing

* fix docstring

* override generate as its not needed

* add doc strings

* fix logging

* add docstrings to output data classes

* initial copy from t5

* added config and attention layers

* add TimesFMPositionalEmbedding

* calcuate scale_factor once

* add more configs and TimesFMResidualBlock

* fix input_dims

* standardize code format with black

* remove unneeded modules

* TimesFM Model

* order of imports

* copy from Google official implementation

* remove covariate forecasting

* Adapting TimesFM to HF format

* restructing in progress

* adapted to HF convention

* timesfm test

* the model runs

* fixing unit tests

* fixing unit tests in progress

* add post_init

* do not change TimesFMOutput

* fixing unit tests

* all unit tests passed

* remove timesfm_layers

* add intermediate_size and initialize with config

* initial documentation

* rename mask to attention_mask

* smaller tests

* fixup

* fix copies

* move to time series section

* sort docs

* isort fix

* batch_size is not a configuration

* rename to TimesFMModelForPrediction

* initial script

* add check_outputs

* remove dropout_rate

* works with torch.Tensor inputs

* rename script

* fix docstrings

* fix freq when window_size is given

* add loss

* fix _quantile_loss

* formatting

* fix isort

* add weight init

* add support for sdpa and flash_attention_2

* fixes for flash_attention

* formatting

* remove flash_attention

* fix tests

* fix file name

* fix quantile loss

* added initial TimesFMModelIntegrationTests

* fix formatting

* fix import order

* fix _quantile_loss

* add doc for SDPA

* use timesfm 2.0

* bug fix in timesfm decode function.

* compare mean forecasts

* refactor type hints, use CamelCase

* consolidate decode func

* more readable code for weight conversion

* fix-copies

* simpler init

* renaem TimesFmMLP

* use T5LayerNorm

* fix tests

* use initializer_range

* TimesFmModel instead of TimesFmDecoder

* TimesFmPositionalEmbedding takes config for its init

* 2.0-500m-pytorch default configs

* use TimesFmModel

* fix formatting

* ignore TimesFmModel for testing

* fix docstring

* override generate as its not needed

* add doc strings

* fix logging

* add docstrings to output data classes

* add _CHECKPOINT_FOR_DOC

* fix comments

* Revert "fix comments"

This reverts commit 8deeb3e191.

* add _prepare_4d_attention_mask

* we do not have generative model classes

* use Cache

* return past_key_values

* modules initialized with config only

* update year

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add layer_idx to cache

* modular timesfm

* fix test

* unwrap sequential class

* fix toctree

* remove TimesFmOnnxConfig

* fix modular

* remove TimesFmStackedDecoder

* split qkv layer into individual layers

* rename projection layers

* use ALL_ATTENTION_FUNCTIONS

* is_causal is True

* rename config

* does not support flash_attn_2

* formatting

* fix typo in docsstring

* rename inputs

* add time series mapping

* Update src/transformers/models/olmo2/modeling_olmo2.py

* Update src/transformers/models/moonshine/modeling_moonshine.py

* use updated arguments

* fix class name

* add MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING

* isort

* consolidate _preprocess into forward

* fix a typo

* fix a typo

* fix toc

* fix modular

* remove aaserts

* use self.config._attn_implementation

* move to _postprocess_output

* remove timesfm_get_large_negative_number

* use view unstead of multiple unsqueeze

* make helpers static methods of the Model

* use to_tuple

* use to_tuple if not return_dict

* remove unused intitialization block as its incorporated in nn.Linear

* remove unused num_key_value_groups

* use the same convention as the masking method

* update modular

* do not use unsqueeze

* use view instead of unsqueeze

* use buffer for inv_timescales

* formatting

* modular conversion

* remove unneeded intialization

* add missing docstrings

* remove cache

* use simple_eager_attention_forward

* support tp_plan

* support for flex and flash attention masks

* Revert "support for flex and flash attention masks"

This reverts commit def36c4fcf.

* fix device

* fix tests on gpu

* remove unsued large model test

* removed unneeded comments

* add example usage

* fix style

* add import

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* inherit from LlamaRMSNorm

* use can_return_tuple decorator

* remvoe return_dict

* fix year

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* pretrained does not inherit from GenerationMixin

* use model for integration test

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Rajat Sen <rsen91@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-16 15:00:53 +02:00
BakerBunker
4b8c6d4cf8
Add Qwen2.5-Omni (#36752)
* Add qwen2.5-omni

* Remove einops dependency

* Add torchdiffeq dependency

* Sort init

* Add torchdiffeq to extras['diffeq']

* Fix repo consistency

* use cached_file

* del odeint

* renew pytest

* format

* Remove torchdiffeq

* format

* fixed batch infer bug

* Change positional_embedding to parameter

* Change default speaker

* Config revision

* Use modular & code clean

* code clean

* decouple padding with model & code cleaning

* sort init

* fix

* fix

* Second code review

* fix

* fix

* rename vars to full name + some comments

* update pytest

* Code clean & fix

* fix

* style

* more clean up

* fixup

* smaller vision model in tests

* fix processor test

* deflake a bit the tests (still flaky though)

* de-flake tests finally + add generation mixin

* final nits i hope

* make sure processor tests are complete

* replace with Qwen2_5OmniForConditionalGeneration

* fix tests after updating ckpt

* fix typos when cleaning, also we can't change ckpt

* fixup

* images and videos kwargs for processor

* thinker and talker loadable from hub ckpt

* address comments and update tests after rebase

* fixup

* skip for now

* fixup

* fixup

* remove torch dependency in processors

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con>
Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-04-14 12:36:41 +02:00
Lysandre Debut
54a123f068
Simplify soft dependencies and update the dummy-creation process (#36827)
* Reverse dependency map shouldn't be created when test_all is set

* [test_all] Remove dummies

* Modular fixes

* Update utils/check_repo.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* [test_all] Better docs

* [test_all] Update src/transformers/commands/chat.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [test_all] Remove deprecated AdaptiveEmbeddings from the tests

* [test_all] Doc builder

* [test_all] is_dummy

* [test_all] Import utils

* [test_all] Doc building should not require all deps

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-11 11:08:36 +02:00
Alex Brooks
2515a5a290
Expose blip2qformer (#37254)
* Expose blip2qformer

* Add missing args to blip2 config
2025-04-08 12:04:33 +02:00
Arthur
25b7f27234
Add llama4 (#37307)
* remove one of the last deps

* update fast image processor after refactor

* styling

* more quality of life improvements

* nit

* update

* cleanups

* some cleanups

* vllm updates

* update fake image token

* [convert] Fix typo

* [convert] Strip extraneous bytes from shards

* [convert] Minor fixes

* [convert] Use num_experts

* multi-image fixes in modeling + processor

* fixup size

* 128 experts

* Use default rope

* Unfuse mlp

* simplify a lot inputs embeds merging

* remove .item() 👀

* fix from review

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* set seed

* return aspect ratios and bug fixes

* Moe 128 rebased (#8)

* 128 experts

* Use default rope

* Unfuse mlp

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* Meta/llama quant compat (#7)

* add quant compatible model & conversion code for llama4

* fix a few issues

* fix a few issues

* minor type mapping fix

---------

Co-authored-by: Lu Fang <fanglu@fb.com>

* use a new config parameter to determine which model definition to use for MoE

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>

* un-comment write_tokenizer from converting script

* remove un-used imports

* [llama4] Pop aspect_ratios from image processor output in Llama4Processor

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* Fix parameter_count name

* Update src/transformers/models/llama4/configuration_llama4.py

* nit

* Add changes for no_rope, moe_layers, chunked attention. Just need to test all

* Update src/transformers/models/llama4/image_processing_llama4_fast.py

* nit

* fix post merge with main

* support flex attention

* fixes

* fix

* add layer

* small updates

* rebase and delete llm_compressor

* nit

* [llama4/mm] Add back <|image|> token that delimits global tile

* [llama4/mm] Fix Llama 4 image processing unit tests

* add explicit dtype

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* sdpa works

* comment todo small

* fix model loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* revert

* nits

* small fix for TP on 1 node

* Read new params from config

* Add <|eom|>

* lol don't know how this got here

* adding fp8

* Save processor, fix chat template

* style

* Add boi/eoi tokens

We don't use them.

* fixes for now flex seems to work :)

* updates

* nits

* updates

* missking keys

* add context parallel

* update

* update

* fix

* nits

* add worldsize and make eager attn work for vision

* Ignore new key present in base models

* add tp_plan

* fix nope

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* minor fix

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* Clean up Llama4 vision model

* current updates

* add support for `attn_temperature_tuning`

* add floor scale

* add missing attn scales

* push what works, dirty trick for the device synch

* oups

* Fix pad_token_id

See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.

* fix causallml loading

* rm

* fix tied-weights

* fix sdpa

* push current version

* should work with both short and long

* add compressed_tensos & fix fbgemm tp

* Fix flex impl

* style

* chunking

* try to revert the potentially breaking change

* fix auto factory

* fix shapes in general

* rm processing

* commit cache utils cleanup

* Fix context length

* fix

* allocate

* update tp_plan

* fix SDPA!

* Add support for sparse `Llama4TextMoe` layer from the kernel hub

* cleanup

* better merge

* update

* still broken fixing now

* nits

* revert print

* Write max_position_embeddings and max_model_length

* Update modeling_llama4.py

* Save attention_chunk_size

* Sync eos terminators

* Read initializer_range

* style

* remove `dict`

* fix

* eager should use `chunked_attention_mask`

* revert

* fixup

* fix config

* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

This reverts commit ccda19f050, reversing
changes made to a515579aed.

* Fix typo and remove warning with compiled flex and chunked prefill

* Fix MoE vs FF (#41)

* fix

* Use correct no_rope_layers if provided one is empty list

* update tests

* fix

* skipping some tests

* fix fp8 loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* fix text geneartion pipeline

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* eager needs 4D mask

* fix

* Some cleanup

* fix

* update

* fix

* replace correctly module

* patch

* modulelist

* update

* update

* clean up

* Don't move to `cuda:0` in distributed mode

* restrict to compressed tensors for now

* rm print

* Docs!

* Fixes

* Update docs/source/en/model_doc/llama4.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fixes

* cuda graph fix

* revert some stuff

* fixup

* styling

* Update src/transformers/models/llama4/modeling_llama4.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* commit licence, cleanup here and there and style

* more styling changes

* fix dummies

* fix and clean docstrings

* remove comment

* remove warning

* Only fast image processor is supported

* nit

* trigger CI

* fix issue with flex encoder

* fix dynamic cache

* Code quality

* Code quality

* fix more tests for now

* Code quality

* Code quality

* Nuke bunch of failing stuff

* Code quality

* Code quality

* cleanup removal of slow image processor

* ruff fix fast image processor

* fix

* fix styling

* Docs

* Repo consistency

* Repo consistency

* fix sliding window issue

* separate llama cache

* styling

* Repo consistency

* Repo consistency

* push waht works

* L4 Repo consistency

* Docs

* fix last last alst alst alst alstsaltlsltlaslt

---------

Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
Joao Gante
ad3d157188
[RoPE] abstract dynamic RoPE update under a decorator (#37249)
* dynamic rope decorator

* longrope; shorter fwd pass

* propper docstring

* make fixup
2025-04-04 14:27:28 +01:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start

* update

* update

* add to imports

* update

* up

* simplify configs

* clean configs

* style

* typos

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* fix

* up

* up

* up

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* up

* up

* up

* Update feature_extraction_phi4_multimodal.py

* up

* up

* up

* up

* up

* simplify configs

* typo

* cut code

* typo

* typo

* typo

* re

* typo

* up

* up

* up

* add tests

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* up

* Update test_modeling_phi4_multimodal.py

* doc

* fix

* up

* up

* up

* up

* up

* up

* simplify

* up

* simplify

* config docstrings

* cleanup

* clean

* typo

* typo

* fix

* Update phi4_multimodal.md

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* update

* simplify reshapes and permutes

* up

* simplify special tokens

* simplify processor a lot

* Update processing_phi4_multimodal.py

* Update processing_phi4_multimodal.py

* switch to fast processor

* image processor

* Update image_processing_phi4_multimodal_fast.py

* add lora extraction to converter

* Update convert_phi4_multimodal_weights_to_hf.py

* Update __init__.py

* add AudioInput type in audio_utils

* rewrite feature_extraction: support torch batched FFT

* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values

* test update

* not mono channel warning update

* remove auto maps from processor

* kargs dispatch in processor

* simplify kwargs dispatch

* simplify merging

* remove default sampling rate

* style

* Update test_modeling_phi4_multimodal.py

* update doc

* doc

* torch only feature extractor

* make fake tokens adjustable

* Update feature_extraction_phi4_multimodal.py

* fix

* Update processing_phi4_multimodal.py

* simplify mask

* last touch

* fix copies

* style

* Update audio_utils.py

* style

* Update feature_extraction_phi4_multimodal.py

* Update __init__.py

* docstrings

* copies

* fix all checks

* back to fix-copies

* trigger CIs

* Update feature_extraction_phi4_multimodal.py

* improve tests with multimodal inputs

* trigger CIs

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
Ryan Mullins
487dab1b2b
Shieldgemma2 (#36678)
* single commit

* correct config

* fixup

* dummy pt

* Use ShieldGemma2Config in conversion script

* Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py

* Adding shieldgemma2 to models.__init__.py

* Adding ShieldGemma2 to main __init__.py

* Update shieldgemma2.md

* Update shieldgemma2.md

* Adding tests. Addressing review feedback.

* Minor docs update

* Fixing code quality feedback from CI

* Fixing empty messages bug reported by ghunkins

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ren Pang <ain-soph@live.com>
2025-03-20 15:14:38 +01:00
Pavel Iakubovskii
a957b7911a
Add SigLIP 2 (#36323)
* Docs

* Inits

* Auto classes

* Add siglip base

* Add base tests

* Fix Siglip V1 for fix res version

* Add image processor

* Update conversion

* Experimenting with vectorized embeddings

* Fixup

* Add modular Siglip2Processor

* Add modular configuration

* Rename num patches

* Correct image and text features merging

* Working conversion script

* Refactoring conversion script

* Remove unused code in conversion script

* Shorten dict a bit

* Refactoring conversion

* Done conversion refactoring

* Fixup

* Modular siglip2

* Make model exportable and compilable without graph breaks

* Remove position_ids from image_processor

* REmove position ids from modeling file

* Update modular

* Type hint

* Fixup

* Set defaults to processor

* Add integration test

* Revert spatial shapes back to tensor

* Change order

* Fix most of the tests

* Fix docstring

* Remove interpolate_pos_encoding arg (not needed)

* Update docs

* Standardize processing

* Fix attention_mask in vision head

* Siglip v1: remove double transpose in FA2

* Update modular file

* Update FA2 test

* Update expected logits

* Fix interpolation for siglip2 image processor

* Skip init test

* Skip dispatch on flash test

* Fix modeling tests

* Fixup

* Add dummy objects

* Fix some docstrings

* Add siglip2 in index.md

* Fix consistency

* Add docs

* Remove size and data format

* Add image processor tests

* Fix

* Add fast image processor

* Fix style

* Fix

* Docs

* Set lowercase for tokenizer

* Adjust head size for Siglip v1

* Update siglip2 for consistency with siglip1

* Update siglip2 conversion

* Update pipeline

* Update checkpoints in tests

* Update checkpoint name

* Fix pooling for image classification model

* Fix FA2 test

* Update processor

* Fix check repo

* Update docs

* Fix typos

* Fix docstring for fast image processor

* Add siglip2 to FA2 docs

* Fix fast ip tests

* Fix constitency

* Fix tokenizer class for siglip v1

* Fix missing header

* Refactor scaling for clip, siglip, siglip2

* Remove unused imports

* Make fast IP default for siglip2

* Update docs

* Update checkpoints

* Update modular

* Update paper link

* Fixup

* Fix name in toctree

* Fix test
2025-02-21 09:04:19 +00:00
Orr Zohar
4397dfcb71
SmolVLM2 (#36126)
* smolvlm init

* updates

* fixing bugs

* minimal run, no checks

* minimal run, no checks

* passing first check + adding url support

* updating video dataloading logic

* fixing image logic

* trying modular, but fails

* modular is working, changing processor to match PR comments and general transformers logic

* fixing kwargs

* offloading video loading logic to  image_util

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* update

* add idefics3-based tests

* add keyword to all

* add PreTrainedModel

* updateing video loading logic

* working inference

* updates for PR comments

* updates for PR comments

* moving SmolVLMPretrainedModel higher to fix import error

* CI test pass

* CI test pass

* removing lambda

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* processor tests

* add example in docs

* typo

* fix copies

* skip compile tests - sdpa for VisionTransformer

* fix init

* raise import error for num2words

* update doc for FA2

* more doc fix

* CI

* updates for PR comments

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc

* adding smolvlm to VQA models

* removing vqa auto class

* Update src/transformers/models/smolvlm/processing_smolvlm.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* removing smolvlmvisiontransformer from index.md

* my bad, video processing had typos

* fixing docs

* renaming params in SmolVLMModel.inputs_merger

* removing un-needed dtype/device in model forward

* ruff for CI

* update docs

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* return cache position

* return cache position

* return cache also in modular

* needed to run modular again

* fix training tests

* push vectorized inputs merger

* format

* format

* reduce number of mappings

* addressing PR comments

* happy CI, happy me :)

* skip non-nested images

* adjust integration test for smaller GPUs

* format

* fix kwargs in chat template apply

* skip this for now

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-02-20 15:00:26 +01:00
ShuaiBai623
f3f6c86582
add qwen2.5vl (#35569)
* add qwen2.5vl

* fix

* pass check table

* add modular file

* fix style

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* padd copy check

* use modular

* fix

* fix

* fix

* update flashatt2&sdpa support_list

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update config

* update

* fix hf path

* rename Qwen2_5_VLVideosKwargs

* fix

* fix

* update

* excuted modular

* rollback init

* fix

* formated

* simpler init

* fix

* fix

* fix

* fix

* fix

* update docs

* fix

* fix

* update Qwen2VLRotaryEmbedding for yarn

* fix

---------

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: gewenbin0992 <gewenbin292@163.com>
Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>
2025-01-23 11:23:00 +01:00
Joao Gante
90b46e983f
Remove old benchmark code (#35730)
* remove traces of the old deprecated benchmarks

* also remove old tf benchmark example, which uses deleted code

* run doc builder
2025-01-21 17:56:43 +00:00
Pavel Iakubovskii
94ae9a8da1
OwlViT/Owlv2 post processing standardization (#34929)
* Refactor owlvit post_process_object_detection + add text_labels

* Fix copies in grounding dino

* Sync with Owlv2 postprocessing

* Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection

* Add test cases

* Move text_labels to processors only

* [run-slow] owlvit owlv2

* [run-slow] owlvit, owlv2

* Update snippets

* Update docs structure

* Update deprecated objects for check_repo

* Update docstring for post processing of image guided object detection
2025-01-17 13:58:28 +00:00
Joao Gante
80dbbd103c
🧹 remove generate-related objects and methods scheduled for removal in v4.48 (#35677)
* remove things scheduled for removal

* make fixup
2025-01-16 17:03:20 +00:00
Raushan Turganbay
52e1f87c7d
[WIP] Emu3: add model (#33770)
* model can convert to HF and be loaded back

* nit

* works in single batch generation but hallucinates

* use the image tokens

* add image generation

* now it works

* add tests

* update

* add modulare but it doesn't work for porting docstring :(

* skip some tests

* add slow tests

* modular removed the import?

* guess this works

* update

* update

* fix copies

* fix test

* fix copies

* update

* docs

* fix tests

* last fix tests?

* pls

* repo consistency

* more style

* style

* remove file

* address comments

* tiny bits

* update after the new modular

* fix tests

* add one more cond in check attributes

* decompose down/up/mid blocks

* allow static cache generation in VLMs

* nit

* fix copies

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix VAE upsampling

* Update src/transformers/models/emu3/modular_emu3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* state overwritten stuff explicitly

* fix copies

* add the flag for flex attn

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-10 12:23:00 +01:00
NielsRogge
8490d3159c
Add ViTPose (#30530)
* First draft

* Make fixup

* Make forward pass worké

* Improve code

* More improvements

* More improvements

* Make predictions match

* More improvements

* Improve image processor

* Fix model tests

* Add classic decoder

* Convert classic decoder

* Verify image processor

* Fix classic decoder logits

* Clean up

* Add post_process_pose_estimation

* Improve post_process_pose_estimation

* Use AutoBackbone

* Add support for MoE models

* Fix tests, improve num_experts%

* Improve variable names

* Make fixup

* More improvements

* Improve post_process_pose_estimation

* Compute centers and scales

* Improve postprocessing

* More improvements

* Fix ViTPoseBackbone tests

* Add docstrings, fix image processor tests

* Update index

* Use is_cv2_available

* Add model to toctree

* Add cv2 to doc tests

* Remove script

* Improve conversion script

* Add coco_to_pascal_voc

* Add box_to_center_and_scale to image_transforms

* Update tests

* Add integration test

* Fix merge

* Address comments

* Replace numpy by pytorch, improve docstrings

* Remove get_input_embeddings

* Address comments

* Move coco_to_pascal_voc

* Address comment

* Fix style

* Address comments

* Fix test

* Address comment

* Remove udp

* Remove comment

* [WIP] need to check if the numpy function is same as cv

* add scipy affine_transform

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* refactor convert

* add output_shape

* add atol 5e-2

* Use hf_hub_download in conversion script

* make box_to_center more applicable

* skipt test_get_set_embedding

* fix to accept array and fix CI

* add co-contributor

* make it to tensor type output

* add torch

* change to torch tensor

* add more test

* minor change

* CI test change

* import torch should be above ImageProcessor

* make style

* try not use torch in def

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix

* fix

* add caution

* make more detail about dataset_index

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add docs

* Update docs/source/en/model_doc/vitpose.md

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Revert "Update src/transformers/__init__.py"

This reverts commit 7ffa504450.

* change name

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vitpose/test_modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move vitpose only function to image_processor

* raise valueerror when using timm backbone

* use out_indices

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove camel-case of def flip_back

* rename vitposeEstimatorOutput

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix confused camelcase of MLP

* remove in-place logic

* clear scale description

* make consistent batch format

* docs update

* formatting docstring

* add batch tests

* test docs change

* Update src/transformers/models/vitpose/image_processing_vitpose.py

* Update src/transformers/models/vitpose/configuration_vitpose.py

* chagne ViT to Vit

* change to enable MoE

* make fix-copies

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* extract udp

* add more described docs

* simple fix

* change to accept target_size

* make style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change to `verify_backbone_config_arguments`

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove unnecessary copy

* make config immutable

* enable gradient checkpointing

* update inappropriate docstring

* linting docs

* split function for visibility

* make style

* check isinstances

* change to acceptable use_pretrained_backbone

* make style

* remove copy in docs

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix + make style

* change input config of activation function to string

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* tmp docs

* delete index.md

* make fix-copies

* simple fix

* change conversion to sam2/mllama style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* refactor convert

* add supervision

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* remove reduntant def

* seperate code block for visualization

* add validation for num_moe

* final commit

* add labels

* [run-slow] vitpose, vitpose_backbone

* Update src/transformers/models/vitpose/convert_vitpose_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* enable all conversion

* final commit

* [run-slow] vitpose, vitpose_backbone

* ruff check --fix

* [run-slow] vitpose, vitpose_backbone

* rename split module

* [run-slow] vitpose, vitpose_backbone

* fix pos_embed

* Simplify init

* Revert "fix pos_embed"

This reverts commit 2c56a4806e.

* refactor single loop

* allow flag to enable custom model

* efficiency of MoE to not use unused experts

* make style

* Fix range -> arange to avoid warning

* Revert MOE router, a new one does not work

* Fix postprocessing a bit (labels)

* Fix type hint

* Fix docs snippets

* Fix links to checkpoints

* Fix checkpoints in tests

* Fix test

* Add image to docs

---------

Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 16:02:14 +00:00
Jade Choghari
7176e06b52
Add TextNet (#34979)
* WIP

* Add config and modeling for Fast model

* Refactor modeling and add tests

* More changes

* WIP

* Add tests

* Add conversion script

* Add conversion scripts, integration tests, image processor

* Fix style and copies

* Add fast model to init

* Add fast model in docs and other places

* Fix import of cv2

* Rename image processing method

* Fix build

* Fix Build

* fix style and fix copies

* Fix build

* Fix build

* Fix Build

* Clean up docstrings

* Fix Build

* Fix Build

* Fix Build

* Fix build

* Add test for image_processing_fast and add documentation tests

* some refactorings

* Fix failing tests

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Introduce TextNet

* Fix failures

* Refactor textnet model

* Fix failures

* Add cv2 to setup

* Fix failures

* Fix failures

* Add CV2 dependency

* Fix bugs

* Fix build issue

* Fix failures

* Remove textnet from modeling fast

* Fix build and other things

* Fix build

* some cleanups

* some cleanups

* Some more cleanups

* Fix build

* Incorporate PR feedbacks

* More cleanup

* More cleanup

* More cleanup

* Fix build

* Remove all the references of fast model

* More cleanup

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix Build

* Fix build

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Fix style

* Fix build

* Incorporate PR feedbacks

* Fix image processing mean and std

* Incorporate PR feedbacks

* fix build failure

* Add assertion to image processor

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* fix style failures

* fix build

* Fix Imageclassification's linear layer, also introduce TextNetImageProcessor

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix build

* Incorporate PR feedbacks

* Remove some script

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix image processing in textnet

* Incorporate PR Feedbacks

* Fix CI failures

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Add textnet to readme

* Improve readability

* Incorporate PR feedbacks

* fix code style

* fix key error and convert working

* tvlt shouldn't be here

* fix test modeling test

* Fix tests, make fixup

* Make fixup

* Make fixup

* Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_image_processing_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* space typo

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/configuration_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* make conv layer kernel sizes and strides default to None

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix keyword bug

* add batch init and make fixup

* Make fixup

* Update integration test

* Add figure

* Update textnet.md

* add testing and fix errors (classification, imgprocess)

* fix error check

* make fixup

* make fixup

* revert to original docstring

* add make style

* remove conflict for now

* Update modeling_auto.py

got a confusion in `timm_wrapper` - was giving some conflicts

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* add changes

* Update textnet.md

* add doc

* add authors hf ckpt + rename

* add feedback: classifier/docs

---------

Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com>
Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 09:52:51 +01:00
Lysandre Debut
b2f2977533
Applies the rest of the init refactor except to modular files (#35238)
* [test_all] Applies the rest of the init refactor except to modular files

* Revert modular that doesn't work

* [test_all] TFGPT2Tokenizer
2025-01-05 18:30:08 +01:00
NielsRogge
6e0515e99c
Add DINOv2 with registers (#35348)
* added changes from 32905

* fixed mistakes caused by select all paste

* rename diff_dinov2...

* ran tests

* Fix modular

* Fix tests

* Use new init

* Simplify drop path

* Convert all checkpoints

* Add figure and summary

* Update paths

* Update docs

* Update docs

* Update toctree

* Update docs

---------

Co-authored-by: BernardZach <bernardzach00@gmail.com>
Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com>
2024-12-24 13:21:59 +01:00
Aymeric Roucher
9ad4c93536
Add Aria (#34157)
* Add Aria
---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-06 12:17:34 +01:00
Raushan Turganbay
21d5025826
Attn implementation for composite models (#32238)
* first try

* codestyle

* idefics2 is happy

* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma

* fix-copies

* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo

* blip-2 needs to init vision from config

* when was this removed O_o

* minor fix

* tests

* this way?

* tests

* model-agnostic code

* codestyle

* add tests for idefics

* modify general test for VLMs

* no generation test for vlm yet!

* no generation test here also

* wanr in VIT-SDPA if output attn

* add more tests

* user can pass dict as attn impl

* repo consistency

* update

* muicgen

* no prints

* forgot speech enc-dec and clip

* how many composite models we have?

* musicgen meelody is same as mudicgen

* +siglip

* fix tests + add some more

* remove idefics custom overriden code

* make idefics2 automappable

* nits

* skip tests

* doctests

* Update src/transformers/models/idefics2/configuration_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/clip/test_modeling_clip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics2/test_modeling_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics2/test_modeling_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* major update, no need for automap

* clean up

* add FA2 test

* more tests

* style

* skip tests

* why did these started failing now?

* no attributes for FA2 needed

* one tiny test

* address comment about FA2 false warning

* style

* add new models and resolve conflicts

* fix copies

* let it be this way for now, come back tomorrow to review

* some more fixes

* update

* more updates

* update

* fix copies

* style and tests

* another big update

* fix tests

* fix tests

* update

* another update

* fix tests

* fix copies

* fix tests

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-22 06:54:44 +02:00
Yoach Lacombe
9ba021ea75
Moshi integration (#33624)
* clean mimi commit

* some nits suggestions from Arthur

* make fixup

* first moshi WIP

* converting weights working + configuration + generation configuration

* finalize converting script - still missing tokenizer and FE and processor

* fix saving model w/o default config

* working generation

* use GenerationMixin instead of inheriting

* add delay pattern mask

* fix right order: moshi codes then user codes

* unconditional inputs + generation config

* get rid of MoshiGenerationConfig

* blank user inputs

* update convert script:fix conversion, add  tokenizer, feature extractor and bf16

* add and correct Auto classes

* update modeling code, configuration and tests

* make fixup

* fix some copies

* WIP: add integration tests

* add dummy objects

* propose better readiblity and code organisation

* update tokenization tests

* update docstrigns, eval and modeling

* add .md

* make fixup

* add MoshiForConditionalGeneration to ignore Auto

* revert mimi changes

* re

* further fix

* Update moshi.md

* correct md formating

* move prepare causal mask to class

* fix copies

* fix depth decoder causal

* fix and correct some tests

* make style and update .md

* correct config checkpoitn

* Update tests/models/moshi/test_tokenization_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/moshi/test_tokenization_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make style

* Update src/transformers/models/moshi/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* change firm in copyrights

* udpate config with nested dict

* replace einsum

* make style

* change split to True

* add back splt=False

* remove tests in convert

* Update tests/models/moshi/test_modeling_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add default config repo + add model to FA2 docstrings

* remove logits float

* fix some tokenization tests and ignore some others

* make style tokenization tests

* update modeling with sliding window + update modeling tests

* [run-slow] moshi

* remove prepare for generation frol CausalLM

* isort

* remove copied from

* ignore offload tests

* update causal mask and prepare 4D mask aligned with recent changes

* further test refine + add back prepare_inputs_for_generation for depth decoder

* correct conditional use of prepare mask

* update slow integration tests

* fix multi-device forward

* remove previous solution to device_map

* save_load is flaky

* fix generate multi-devices

* fix device

* move tensor to int

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
2024-10-16 11:21:49 +02:00
Lysandre Debut
f052e94bcc
Fix flax failures (#33912)
* Few fixes here and there

* Remove typos

* Remove typos
2024-10-11 14:38:35 +02:00
Yoni Gozlan
e2001c3413
Add auto model for image-text-to-text (#32472)
* Add Auto model for image-text-to-text

* Remove donut from processing auto, add chameleon ti image text to text models

* add qwen2_vl and llava_onevision

* add pixtral to auto model for image-text-to-text

* add mllama and idefics3

* remove models in IGNORE_NON_AUTO_CONFIGURED

* add AutoModelForImageTextToText to tests and doc
2024-10-08 14:26:43 +02:00
Andrés Marafioti
f2c388e3f9
Add Idefics 3! (#32473)
* Add Idefics 3!

* fixes to make both pipelines identical

* fix for quantized models

* First pass at the review

* remove vocab size from the main config (it's still in the text_config)

* hot fix for merve

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* re-add model_type for text_config

* remove support for old_cache

* remove hidden_size from main config

* rename idefics3 HF repo

* few changes suggested in the PR

* fix to input_data_format computation

* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion

* improve example

* few improvements from amy's review

* big change to enable processing input images as numpy arrays

* Changes to the code to uniformize processor kwargs

* image processing tests

* image processing tests fixes and some bugs they discovered

* addressed review comments from Yoni

* fix modeling tests

* remove special tokens that are not special

* fixes tests

* skip failing tests - they also fail for idefics2

* added paper and readded the tests with multi gpu, who knows

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* review amy until image_processing_idefics3

* last comments from Amy

* review amy

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* doc improvement - amy review

* fix runtime error during fine-tuning

* amy's review

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ruff

* amy's comment on the order

* ruff ruff

* fix copies

* square images when they are not splitted

* ruff :(

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics3/test_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix small bug introduced in refactor

* amy's image processing changes

* fixes peft tests and ruff

* modify to_pil_image from transformers. and review from emanuele.

* add modified to_pil_image

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-25 21:28:49 +02:00
Arthur
19d58d31f1
Add MLLama (#33703)
* current changes

* nit

* Add cross_attenttion_mask to processor

* multi-image fixed

* Add cross_attenttion_mask to processor

* cross attn works in all cases

* WIP refactoring function for image processor

* WIP refactoring image processor functions

* Refactor preprocess to use global loops instead of list nested list comps

* Docstrings

* Add channels unification

* fix dtype issues

* Update docsrings and format

* Consistent max_image_tiles

* current script

* updates

* Add convert to rgb

* Add image processor tests

* updates!

* update

* god damn it I am dumb sometimes

* Precompute aspect ratios

* now this works, full match

* fix 😉

* nits

* style

* fix model and conversion

* nit

* nit

* kinda works

* hack for sdpa non-contiguous bias

* nits here and there

* latest c hanges

* merge?

* run forward

* Add aspect_ratio_mask

* vision attention mask

* update script and config variable names

* nit

* nits

* be able to load

* style

* nits

* there

* nits

* make forward run

* small update

* enable generation multi-turn

* nit

* nit

* Clean up a bit for errors and typos

* A bit more constant fixes

* 90B keys and shapes match

* Fix for 11B model

* Fixup, remove debug part

* Docs

* Make max_aspect_ratio_id to be minimal

* Update image processing code to match new implementation

* Adjust conversion for final checkpoint state

* Change dim in repeat_interleave (accordig to meta code)

* tmp fix for num_tiles

* Fix for conversion (gate<->up, q/k_proj rope permute)

* nits

* codestyle

* Vision encoder fixes

* pass cross attn mask further

* Refactor aspect ratio mask

* Disable text-only generation

* Fix cross attention layers order, remove q/k norm rotation for cross atention layers

* Refactor gated position embeddings

* fix bugs but needs test with new weights

* rope scaling should be llama3

* Fix rope scaling name

* Remove debug for linear layer

* fix copies

* Make mask prepare private func

* Remove linear patch embed

* Make precomputed embeddings as nn.Embedding module

* MllamaPrecomputedAspectRatioEmbedding with config init

* Remove unused self.output_dim

* nit, intermediate layers

* Rename ln and pos_embed

* vision_chunk_size -> image_size

* return_intermediate -> intermediate_layers_indices

* vision_input_dim -> hidden_size

* Fix copied from statements

* fix most tests

* Fix more copied from

* layer_id->layer_idx

* Comment

* Fix tests for processor

* Copied from for _prepare_4d_causal_attention_mask_with_cache_position

* Style fix

* Add MllamaForCausalLM

* WIP fixing tests

* Remove duplicated layers

* Remove dummy file

* Fix style

* Fix consistency

* Fix some TODOs

* fix language_model instantiation, add docstring

* Move docstring, remove todos for precomputed embeds (we cannot init them properly)

* Add initial docstrings

* Fix

* fix some tests

* lets skip these

* nits, remove print, style

* Add one more copied from

* Improve test message

* Make validate func private

* Fix dummy objects

* Refactor `data_format` a bit + add comment

* typos/nits

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix dummy objects and imports

* Add chat template config json

* remove num_kv_heads from vision attention

* fix

* move some commits and add more tests

* fix test

* Remove `update_key_name` from modeling utils

* remove num-kv-heads again

* some prelimiary docs

* Update chat template + tests

* nit, conversion script max_num_tiles from params

* Fix warning for text-only generation

* Update conversion script for instruct models

* Update chat template in converstion + test

* add tests for CausalLM model

* model_max_length, avoid null chat_template

* Refactor conversion script

* Fix forward

* Fix integration tests

* Refactor vision config + docs

* Fix default

* Refactor text config

* Doc fixes

* Remove unused args, fix docs example

* Squashed commit of the following:

commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date:   Wed Sep 18 13:39:15 2024 +0000

    Move model + add output hidden states and output attentions

* Fix num_channels

* Add mllama text and mllama vision models

* Fixing repo consistency

* Style fix

* Fixing repo consistency

* Fixing unused config params

* Fix failed tests after refactoring

* hidden_activation -> hidden_act  for text mlp

* Remove from_pretrained from sub-configs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Reuse lambda in conversion script

* Remove run.py

* Update docs/source/en/model_doc/mllama.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/processing_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused LlamaTokenizerFast

* Fix logging

* Refactor gating

* Remove cycle for collecting intermediate states

* Refactor text-only check, add integration test for text-only

* Revert from pretrained to configs

* Fix example

* Add auto `bos_token` adding in processor

* Fix tips

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Enable supports_gradient_checkpointing model flag

* add eager/sdpa options

* don't skip attn tests and bring back GC skips (did i really remove those?)

* Fix signature, but get error with None gradient

* Fix output attention tests

* Disable GC back

* Change no split modules

* Fix dropout

* Style

* Add Mllama to sdpa list

* Add post init for vision model

* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model

* if skipped, say it, don't pass

* Clean vision tester config

* Doc for args

* Update tests/models/mllama/test_modeling_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add cross_attention_mask to test

* typehint

* Remove todo

* Enable gradient checkpointing

* Docstring

* Style

* Fixing and skipping some tests for new cache

* Mark flaky test

* Skip `test_sdpa_can_compile_dynamic` test

* Fixing some offload tests

* Add direct GenerationMixin inheritance

* Remove unused code

* Add initializer_range to vision config

* update the test to make sure we show if split

* fix gc?

* Fix repo consistency

* Undo modeling utils debug changes

* Fix link

* mllama -> Mllama

* [mllama] -> [Mllama]

* Enable compile test for CausalLM model (text-only)

* Fix TextModel prefix

* Update doc

* Docs for forward, type hints, and vision model prefix

* make sure to reset

* fix init

* small script refactor and styling

* nit

* updates!

* some nits

* Interpolate embeddings for 560 size and update integration tests

* nit

* does not suppor static cache!

* update

* fix

* nit2

* this?

* Fix conversion

* Style

* 4x memory improvement with image cache AFAIK

* Token decorator for tests

* Skip failing tests

* update processor errors

* fix split issues

* style

* weird

* style

* fix failing tests

* update

* nit fixing the whisper tests

* fix path

* update

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Joao Gante
d6534f996b
Repo checks: check documented methods exist (#32320) 2024-09-03 17:40:27 +01:00
Juan Pizarro
7591ca5bc5
🚨 Add Blip2ForImageTextRetrieval (#29261)
* add Blip2ForImageTextRetrieval

* use one line and remove unnecessary space in tests

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use  value from the config, rather than hardcoded

* change order of params in Blip2QFormerModel.forward

* update docstring

* fix style

* update test_inference_opt

* move embeddings out of Blip2QFormerModel

* remove from_vision_qformer_configs

* remove autocast float16 in Blip2QFormerModel

* rename fiels into vision_projection,text_projection,use_image_text_matching_head

* use CLIPOutput for  Blip2ImageTextMatchingModelOutput

* remove past_key_values_length from Blip2TextEmbeddings

* fix small typo in the CLIPOutput docstring

* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping

* update docstring and add require_torch_fp16

* rollback test_inference_opt

* use use_image_text_matching_head=True in convert

* skip test_model_get_set_embeddings

* fix create_rename_keys error on new itm fields

* revert to do  scale after dot product between "query" and "key"

* fix ValueError on convert script for blip2-opt-2.7b

* update org of paths to Salesforce

* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests

* [run_slow] blip_2

* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED

* fix docstring of Blip2ImageTextMatchingModelOutput

* [run_slow] blip_2

* fix multi-gpu tests

* [run_slow] blip_2

* [run_slow] blip_2

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 18:50:27 +01:00
Shijie
19e6e80e10
support qwen2-vl (#32318)
* support-qwen2-vl

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* hyphen->underscore

* make style

* add-flash2-tipd

* delete-tokenize=False

* remove-image_processor-in-init-file

* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES

* format-doct

* support-Qwen2VLVisionConfig

* remove-standardize_cache_format

* fix-letter-varaibles

* remove-torch-in-image-processor

* remove-useless-docstring

* fix-one-letter-varaible-name

* change-block-name

* default-quick-gelu-in-vision

* remove-useless-doc

* use-preimplemented-flash-forward

* fix-doc

* fix-image-processing-doc

* fix-apply-rotary-embed

* fix-flash-attn-sliding-window

* refactor

* remove-default_template

* remove-reorder_cache

* simple-get-rope_deltas

* update-prepare_inputs_for_generation

* update-attention-mask

* update-rotary_seq_len

* remove-state

* kv_seq_length

* remove-warning

* _supports_static_cache

* remove-legacy-cache

* refactor

* fix-replace

* mrope-section-doc

* code-quality

* code-quality

* polish-doc

* fix-image-processing-test

* update readme

* Update qwen2_vl.md

* fix-test

* Update qwen2_vl.md

* nit

* processor-kwargs

* hard-code-norm_layer

* code-quality

* discard-pixel-values-in-gen

* fix-inconsistent-error-msg

* unify-image-video

* hidden_act

* add-docstring

* vision-encode-as-PreTrainedModel

* pixel-to-target-dtype

* update doc and low memoryvit

* format

* format

* channel-foramt

* fix vit_flashatt

* format

* inherit-Qwen2VLPreTrainedModel

* simplify

* format-test

* remove-one-line-func-in-image-processing

* avoid-one-line-reshape

* simplify-rotary_seq_len

* avoid-single-letter-variable

* no-for-loop-sdpa

* avoid-single-letter-variable

* remove-one-line-reshape

* remove-one-line-reshape

* remove-no-rope-in-vit-logic

* default-mrope

* add-copied-from

* more-docs-for-mrope

* polish-doc

* comment-and-link

* polish-doc

* single-letter-variables

* simplify-image-processing

* video->images

* kv_seq_len-update

* vision-rope-on-the-fly

* vision-eager-attention

* change-processor-order

---------

Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper and LogitsProcessor (#32626) 2024-08-16 11:20:41 +01:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio (#32137)
* add qwen2audio

* Update check_repo.py

* fix style

* fix test

* fix style

* add model size

* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* switch the attention_mask and the feature_attention_mask

* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py

* fix initialization

* update chat_template

* fix consistency issue after copy

* add docstrings to _merge_input_ids_with_audio_features

* add copied from to prepare_inputs_for_generation

* add more details to docs

* rm comment

* add init_std

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* update

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update tests

* rm ignore_index

* update processor

* rm ffmpeg_read

* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update

* typo

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* fix quality

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* add official model

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Raushan Turganbay
24cfcc2114
Chameleon: add model (#31534)
* Chameleon model integration

Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>

* fix 7B, again. mask away image tokens

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove pretrained_config_map

* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file

* remove tokenizer (use llama's); remove codechameleon tests

* a few copied from statements and minor changes

* copied from in ChameleonModel

* some copies in ChameleonForCausalLM

* a few more copies

* VQModel moved to ChameleonModel (as opposed to being in the processor)

* ChameleonProcessor ready

* Fix chameleon weights convert

* update conversion script

* clean-up processing

* update modeling a bit

* update

* update (throws error...)

* correct conversion ready

* fix tests

* fix docs

* docs

* ve swin norm

* fix device for vocab map

* add normalization

* update

* update script with rope rotations

* final fix on model conversion

* add slow tests

* more info in docs

* fix repo consistency tests

* fix repo tests

* fix-copies

* hope this will make CI happy

* fix for 30b model

* Update docs/source/en/index.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address comments

* remove assertion in conversion script

* add image processor test

* not copied

* port changes for qk layernorm

* fix-copies

* read token decorator for tests

* [run-slow] chameleon

* one more read-token

* address some comments

* qk norm changes

* tests and repo check

* moved rope permutations to conversion, YAY!

* fix past kv check

* docs

* layernorm done!

* let's be consistent in naming

* fix slow tests

* weird thing with slow CI, but let's see

* once more try

* remove past-kv as tuple following llama

* ignore

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Naman Garg
c1e139c2b0
Adding hiera (#30356)
* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* Removed tim dependency

* added HieraBlock

* fixed: Model name

* added tests for HieraModel, HieraBlock

* fixed imports

* fixed quality & copies

* Fixes

* Update docs/source/en/model_doc/hiera.md

Fix name

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fixed formatting

* Code quality & Import differences

* quality and repo-consistency fix

* fixed no torch error

* Docstring fix

* Docstring fix

* doc string fix

* fixed example usage

* Resolved issues in modeling_hiera

* Removed Hiera MAE

* Added test and resolved bug

* fixed doc string

* First commit

* Finished conversion script and model forward working

* Resolved all issues

* nits

* Improving tests

* Nits

* More nits

* Improving HieraForMaskedImageModeling

* More improvements and nits

* Fixed docstrings of outputs

* More fixes

* More imrpovments

* Updated conversion script

* Fixed docstrings

* Improved tests

* Fixed attentou outputs test

* All tests green

* Removed unnecessary file

* contribution attribution

* Resolved a few issues

* Resolved Comments

* Updated model repo id and fixed bugs

* Removed loss print

* Make tests green

* Updated docstrings

* Fix style

* Fixed num_heads in config

* Removed unnecessary video checkpoint related code in the conversion script

* Fix style

* Changed atol in conversion script

* HieraConfig

* Fix copies

* Fixed typo

* Resolved few issues

* make

* converted conv_nd -> nn.Module

* Removed video complexities

* Removed video complexities

* fix style

* Addressing comments

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix style

* Fixed tests

* Fixed typo

* Fixed interpolate test

* Made torch fx compatible

* Made sure imageprocesor is correct

* Addressed comments

* Noise directly as torch

* Remove unnecesary attr

* Added return_dit

* Update src/transformers/models/hiera/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated checkpoints

* [run_slow] hiera

* Fixed device mismatch

* [run_slow] hiera

* Fixed GPU tests

* [run_slow] hiera

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
Raushan Turganbay
fc689d75a0
Add video modality for InstrucBLIP (#30182)
* squash in single commit

* add docs

* dummy obj

* more changes in diff converter

* tiny fix

* make docs happy

* skip test

* repo consistency tests

* update docstring

* style

* fix tests

* change diff imports

* [run-slow] instructblipvideo

* [run-slow] instructblipvideo

* fix tests and remove logit check

* [run-slow] instructblipvideo
2024-06-25 15:45:39 +05:00
Arthur
673440d073
update ruff version (#30932)
* update ruff version

* fix research projects

* Empty

* Fix errors

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Arthur
0fe44059ae
Add recurrent gemma (#30143)
* Fork.

* RecurrentGemma initial commit.

* Updating __init__.py.

* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.

* Reformat code to 4 spaces.
Fixed a few typos.

* Fixed the forward pass.
Still unclear on the cache?

* Fixed the RecurrentGemmaForCausalLM

* Minor comment that we might not need attention_mask and output_attention arguments.

* Now cache should work as well.

* Adding a temporary example to check whether the model generation works.

* Adding the tests and updating imports.

* Adding the example file missing in the previous commit.

* First working example.

* Removing .gitignore and reverting parts of __init__.

* Re-add .gitignore.

* Addressing comments for configuration.

* Move mask creation to `_prepare_inputs_for_generation`.

* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed

* Transfoering between machines.

* Running normal tests.

* Minor fix.

* More fixes.

* Addressing more comments.

* Minor fixes.

* first stab at cleanup

* more refactoring

* fix copies and else

* renaming and get init to work

* fix causal mask creation

* update

* nit

* fix a hell lot of things

* updates

* update conversion script

* make all keys importable

* nits

* add auto mappings

* properly convert ffw_up and down

* add scaling

* fix generations

* for recurrent dtype

* update

* fix going beyong window

* fixup

* add missing files

* current updates to remove last einops

* finish modeling refactor

* TADA

* fix compile

* fix most failing testt ? ?

* update tests

* refactor and update

* update

* nits, fixup and update tests

* more fixup

* nits

* fix imports

* test format

* fixups

* nits

* tuple typing

* fix code quality

* add model card

* fix doc

* skip most generation tests

* nits

* style

* doc fixes

* fix pr and check_copies?

* last nit

* oupsy

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update

* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* doc nit

* fix quality

* quality

* fix slow test model path

* update default dype

* ignore attributes that can be safely ignored in check config attributes

* 0lallalala come on

* save nit

* style

* remove to dict update

* make sure we can also run in float16

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-10 16:59:13 +02:00
Yoach Lacombe
c43b380e70
Add MusicGen Melody (#28819)
* first modeling code

* make repository

* still WIP

* update model

* add tests

* add latest change

* clean docstrings and copied from

* update docstrings md and readme

* correct chroma function

* correct copied from and remove unreleated test

* add doc to toctree

* correct imports

* add convert script to notdoctested

* Add suggestion from Sanchit

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct get_uncoditional_inputs docstrings

* modify README according to SANCHIT feedback

* add chroma to audio utils

* clean librosa and torchaudio hard dependencies

* fix FE

* refactor audio decoder -> audio encoder for consistency with previous musicgen

* refactor conditional -> encoder

* modify sampling rate logics

* modify license at the beginning

* refactor all_self_attns->all_attentions

* remove ignore copy from causallm generate

* add copied from for from_sub_models

* fix make copies

* add warning if audio is truncated

* add copied from where relevant

* remove artefact

* fix convert script

* fix torchaudio and FE

* modify chroma method according to feedback-> better naming

* refactor input_values->input_features

* refactor input_values->input_features and fix import fe

* add input_features to docstrigs

* correct inputs_embeds logics

* remove dtype conversion

* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation

* change warning for chroma length

* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* change way to save wav, using soundfile

* correct docs and change to soundfile

* fix import

* fix init proj layers

* remove line breaks from md

* fix issue with docstrings

* add FE suggestions

* improve is in logics and remove useless imports

* remove custom from_pretrained

* simplify docstring code

* add suggestions for modeling tests

* make style

* update converting script with sanity check

* remove encoder attention mask from conditional generation

* replace musicgen melody checkpoints with official orga

* rename ylacombe->facebook in checkpoints

* fix copies

* remove unecessary warning

* add shape in code docstrings

* add files to slow doc tests

* fix md bug and add md to not_tested

* make fix-copies

* fix hidden states test and batching

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-03-18 13:06:12 +00:00
Nate Cibik
1fc505b816
Add PvT-v2 Model (#26812)
* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Updated index.md

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Fixed config docstring. Added channels property

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Fixed config backbone compat

* Ran fix-copies

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Fixed issue from rebase

* Fixed issue from rebase

* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported

* Fixed issue from rebase

* Fixed issue from rebase

* Changed model name in docs

* Removed duplicate PvtV2Backbone

* Work around type switching issue in tests

* Fix model name in config comments

* Update docs/source/en/model_doc/pvt_v2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed old code

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Fixed Class names to be more descriptive

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed outdated code

* Moved paper abstract to single line in pvt_v2.md

* Added usage tips to pvt_v2.md

* Simplified module inits by passing layer_idx

* Fixed typing for hidden_act in PvtV2Config

* Removed unusued import

* Add pvt_v2 to docs/source/en/_toctree.yml

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Move function parameters to single line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Update year of copyright to 2024

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Make code more explicit

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated sr_ratio to be more explicit spatial_reduction_ratio

* Removed excess type hints in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed needless comment in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update copyright date in pvt_v2.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated copyright date in configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Cleaned comments in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Renamed spatial_reduction Conv2D operation

* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"

This reverts commit c4a04416dd.

* Updated conversion script to reflect module name change

* Deprecated reshape_last_stage option in config

* Removed unused imports

* Code formatting

* Fixed outdated decorators on test_inference_fp16

* Added "Copied from" comments in test_modeling_pvt_v2.py

* Fixed import listing

* Updated model name

* Force empty commit for PR refresh

* Fixed linting issue

* Removed # Copied from comments

* Added PVTv2 to README_fr.md

* Ran make fix-copies

* Replace all FoamoftheSea hub references with OpenGVLab

* Fixed out_indices and out_features logic in configuration_pvt_v2.py

* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py

* Ran code fixup

* Fixed order of parent classes in PvtV2Config to fix the to_dict method override

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-13 19:05:20 +00:00
NielsRogge
836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
Eduardo Pacheco
3fcfbe7549
Adding SegGPT (#27735)
* First commit

* Improvements

* More improvements

* Converted original checkpoint to HF checkpoint

* Fix style

* Fixed forward

* More improvements

* More improvements

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove asserts

* Remove unnecessary attributes

* Changed model name to camel case

* Improve forward doc

* Improve tests

* More improvements

* Fix copies

* Fix doc

* Make SegGptImageProcessor more flexible

* Added few-shot test

* Fix style

* Update READMEs and docs

* Update READMEs

* Make inputs required

* Add SegGptForImageSegmentation

* Make tests pass

* Rename to out_indicies

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fixed naming convention

* Copying SegGptMlp from modeling_sam.py

* Some minor improvements

* Remove mlp_ratio

* Fix docstrings

* Fixed docstring match

* Objects defined before use

* Storing only patch_size and beta for SegGptLoss

* removed _prepare_inputs method

* Removed modified from headers

* Renamed to output_indicies

* Removed unnecessary einsums

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixing issues

* Raise error as soon as possible

* More fixes

* Fix merge

* Added palette to SegGptImageProcessor

* Fixed typo

* Fixed shape typo

* Added permute before doing palette to class mapping

* Fixed style

* Fixed and added tests

* Fixed docstrings

* Matching SegFormer API for post_processing_semantic_segmentation

* Fixed copies

* Fixed SegGptImageProcessor to handle both binary and RGB masks

* Updated docstrings of SegGptImageProcessor

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/seggpt.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Object definitions above & fix style

* Renamed output_indices to intermediate_feature_indices

* Removed unnecessary check on bool_masked_pos

* Loss first in the outputs

* Added validation for do_normalize

* Improved SegGptImageProcessor and added new tests

* Added comment

* Added docstrings to SegGptLoss

* Reimplemented ensemble condition logic in SegGptEncoder

* Update src/transformers/models/seggpt/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updated docstrings to use post_process_semantic_segmentation

* Fixed typo on docstrings

* moved pixel values test to test_image_processing_seggpt

* Addressed comments

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated docstrings for SegGptLoss

* Address comments

* Added SegGpt example to model docs

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* moved patchify and unpatchify

* Rename checkpoint

* Renamed intermediate_features to intermediate_hidden_states for consistency

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Replaced post_process_masks for post_process_semantic_segmentation in the docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 18:17:19 +00:00
Merve Noyan
7c4995f93d
Add feature extraction mapping for automatic metadata update (#28944)
* add feature extraction mapping

* added prefix

* ruff check

* minor fix

* Update modeling_auto.py

* fix typo

* remove prefix to make variable public/importable

* Update src/transformers/models/auto/modeling_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixes

* addressed comments

* nit

* fix-copies

* remove from tests

* this should fix

* Update tests/models/convnextv2/test_modeling_convnextv2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 10:35:37 +00:00
Timothy Cronin
ff86bc364d
improve dev setup comments and hints (#28495)
* improve dev setup comments and hints

* fix tests for new dev setup hints
2024-01-15 18:36:40 +00:00
Joao Gante
4fb3d3a0f6
TF: purge TFTrainer (#28483) 2024-01-12 16:56:34 +00:00
NielsRogge
3b742ea84c
Add SigLIP (#26522)
* Add first draft

* Use appropriate gelu function

* More improvements

* More improvements

* More improvements

* Convert checkpoint

* More improvements

* Improve docs, remove print statements

* More improvements

* Add link

* remove unused masking function

* begin tokenizer

* do_lower_case

* debug

* set split_special_tokens=True

* Remove script

* Fix style

* Fix rebase

* Use same design as CLIP

* Add fast tokenizer

* Add SiglipTokenizer to init, remove extra_ids

* Improve conversion script

* Use smaller inputs in conversion script

* Update conversion script

* More improvements

* Add processor to conversion script

* Add tests

* Remove print statements

* Add tokenizer tests

* Fix more tests

* More improvements related to weight initialization

* More improvements

* Make more tests pass

* More improvements

* More improvements

* Add copied from

* Add canonicalize_text

* Enable fast tokenizer tests

* More improvements

* Fix most slow tokenizer tests

* Address comments

* Fix style

* Remove script

* Address some comments

* Add copied from to tests

* Add more copied from

* Add more copied from

* Add more copied from

* Remove is_flax_available

* More updates

* Address comment

* Remove SiglipTokenizerFast for now

* Add caching

* Remove umt5 test

* Add canonicalize_text inside _tokenize, thanks Arthur

* Fix image processor tests

* Skip tests which are not applicable

* Skip test_initialization

* More improvements

* Compare pixel values

* Fix doc tests, add integration test

* Add do_normalize

* Remove causal mask and leverage ignore copy

* Fix attention_mask

* Fix remaining tests

* Fix dummies

* Rename temperature and bias

* Address comments

* Add copied from to tokenizer tests

* Add SiglipVisionModel to auto mapping

* Add copied from to image processor tests

* Improve doc

* Remove SiglipVisionModel from index

* Address comments

* Improve docs

* Simplify config

* Add first draft

* Make it like mistral

* More improvements

* Fix attention_mask

* Fix output_attentions

* Add note in docs

* Convert multilingual model

* Convert large checkpoint

* Convert more checkpoints

* Add pipeline support, correct image_mean and image_std

* Use padding=max_length by default

* Make processor like llava

* Add code snippet

* Convert more checkpoints

* Set keep_punctuation_string=None as in OpenCLIP

* Set normalized=False for special tokens

* Fix doc test

* Update integration test

* Add figure

* Update organization

* Happy new year

* Use AutoModel everywhere

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00