Commit Graph

1008 Commits

Author SHA1 Message Date
BakerBunker
4b8c6d4cf8
Add Qwen2.5-Omni (#36752)
* Add qwen2.5-omni

* Remove einops dependency

* Add torchdiffeq dependency

* Sort init

* Add torchdiffeq to extras['diffeq']

* Fix repo consistency

* use cached_file

* del odeint

* renew pytest

* format

* Remove torchdiffeq

* format

* fixed batch infer bug

* Change positional_embedding to parameter

* Change default speaker

* Config revision

* Use modular & code clean

* code clean

* decouple padding with model & code cleaning

* sort init

* fix

* fix

* Second code review

* fix

* fix

* rename vars to full name + some comments

* update pytest

* Code clean & fix

* fix

* style

* more clean up

* fixup

* smaller vision model in tests

* fix processor test

* deflake a bit the tests (still flaky though)

* de-flake tests finally + add generation mixin

* final nits i hope

* make sure processor tests are complete

* replace with Qwen2_5OmniForConditionalGeneration

* fix tests after updating ckpt

* fix typos when cleaning, also we can't change ckpt

* fixup

* images and videos kwargs for processor

* thinker and talker loadable from hub ckpt

* address comments and update tests after rebase

* fixup

* skip for now

* fixup

* fixup

* remove torch dependency in processors

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con>
Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-04-14 12:36:41 +02:00
Joao Gante
aaf129cdae
[agents] remove agents 🧹 (#37368) 2025-04-11 18:42:37 +01:00
Alex Brooks
623d395aff
Add Granite Speech Support (#36801)
* First pass at speech granite

Add encoder / projector, rename things

* Combine into one model file with causal lm outputs for forward

* Add loss calc

* Fix config loading

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Split new / old loading logic

* Use transformers integration for loading peft adapters

* Add generation wrapper for selective lora enablement

* Add note for qformer encoder automodel

* Guard torch/audio imports in feature extractor

* Handle granite speech autoclasses

* Handle optional deps in package structure for granite speech

* Add granite pretrained model def for init

* Add dummy objects for torch/torchaudio

* Add tests for granite speech processor

* Minor formatting fixes and refactoring

* Add options for falling back to config in forward

* Tentative model docstrings for granite speech

* Fix config type

* Remove legacy load

* Allow non-lora variants for granite speech

* Override weight tying for llm

* Use text config instead of llm config

* Add output embeddings getter to fix weight tying

* Fix relative imports

* computing the number of audio features, based on the raw audio sequence.

* collating audio inputs, and keeping the original lengths.

* asserted we have text. otherwise we can't specify the audio special token.

* assering the number of audio-symbols/audios match correctly.
running get validated_audios only when audio is present

* indentation bugfix + supporting different feature lengths when expanding audio.

* redundant, done in _get_validated_text

* adapting the tests:
- we must have text (not either audio or text)
- _get_num_audio_features takes a list of raw lengths, provided it insetad.

* Minor cleanup, remove unused import

* Add more tests for batch feature processing

* Allow setting offset in rel position embeddings

* Add config option for warning if peft is not installed w/ lora

* Port blip2 qformer code into granite speech

* Add sad test for numpy arr processing

* Allow numpy arrays / tuples in granite speech processor

* Fix config type for projector

* - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16)
- cast input_features to the model dtype (support bfloat16)

* merge Blip2QFormerConfig to GraniteSpeechProjectorConfig

* prevent a crash when re-saving/loading the model (line 109)

* consider additional edge cases during preprocessing.

* consider additional edge cases during preprocessing.

* add features mask for batched inference (bugfix)

* Minor refactor, remove multiaudio processor tests

* Add set input/output embeddings for granite speech

* Fix feature dim check in processor test

* Pop input features in embed test for granite speech

* Small fixes for test edge cases

Add granite speech to seq2seq causal lm mapping names

* Add small tests for granite speech model

* Fix data parallelism test

* Standardize model class names

* Fix check for copies

* Fix misaligned init check

* Skip granite speech in checkpoint check

* Use default for tie_word_embeddings in granite speech

* Fix non documentation granite speech repo issues

* Fix comments and docstring checks

* Add placeholder docs for granite speech

* Fix test naming collision

* Code formatting

* Rerun torch dummy obj regen

* Fix save pretrained for granite speech

* Import sorting

* Fix tests typo

* Remove offset hack

* Pass args through encoder config

* Remove unused prune heads from blip2

* removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention.

* remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention

* remove GraniteSpeechConformerScale

* rename to hidden_states

* rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous.

* move pre-norm to the attention/feedforward blocks (avoid complex module wrapping)

* adding pre_norm into forward

* feature extractor refactoring to resemble how it's done in phi4multimodal.

* rename feature_extractor to audio_processor

* bugfix: input_feature_mask fix to get the exact number tokens.

* Fix pytest decorator in processor test

* Add (disabled) integration tests for granite speech

* Fix handling of optional feature masking

* Loosen validation in processing for vLLM compatability

* Formatting fixes

* Update init structure to mirror llama

* Make granite speech projector generic

* Update test config to reflect generic projector

* Formatting fixes

* Fix typos, add license

* Fix undefined var in input processing

* Cleanup and expose ctc encoder

* Add missing config docstrings

* Better var names, type hints, etc

* Set attn context size in init

* Add max pos emb to encoder config

* Cleanup feature extractor

* Add granite speech architecture details

* Remove granite speech qformer ref

* Add paper link, explicit calc for qkv

* Calculate padding directly in depthwise conv1d init

* Raise value error instead of asserting

* Reorder class defs (classes used at top)

* Precompute relpos distances

* Run formatting

* Pass attention distances through forward

* Apply suggestions from code review

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Add todo for using common batch feature extraction

* Rename audios/features

* Ensure chat template may be provided to processor

* Move granite speech docs to audio models

* Add todos for input proc refactoring

* Fix import order

* Guard torch import

* Use relative imports

* Require torch backend for processor in granite speech

* Add backend guards in feature extractor

---------

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-04-11 18:52:00 +02:00
Lysandre Debut
f797e3d98a
Fix the test fetcher (#37452)
Test fetcher
2025-04-11 12:19:27 +02:00
Lysandre Debut
54a123f068
Simplify soft dependencies and update the dummy-creation process (#36827)
* Reverse dependency map shouldn't be created when test_all is set

* [test_all] Remove dummies

* Modular fixes

* Update utils/check_repo.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* [test_all] Better docs

* [test_all] Update src/transformers/commands/chat.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [test_all] Remove deprecated AdaptiveEmbeddings from the tests

* [test_all] Doc builder

* [test_all] is_dummy

* [test_all] Import utils

* [test_all] Doc building should not require all deps

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-11 11:08:36 +02:00
Yih-Dar
d9e76656ae
Fix new failure reports not including anything other than tests/models/ (#37415)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-10 14:47:23 +02:00
Yih-Dar
4f139f5a50
Send trainer/fsdp/deepspeed CI job reports to a single channel (#37411)
* send trainer/fsdd/deepspeed channel

* update

* change name

* no .

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-10 13:17:31 +02:00
Alex Brooks
2515a5a290
Expose blip2qformer (#37254)
* Expose blip2qformer

* Add missing args to blip2 config
2025-04-08 12:04:33 +02:00
Arthur
2da82e432d
Multiple llama4 fixe (#37353)
* update for fixes

* more fixes

* fuxix dynamic cache?

* style

* fix both traiining and generating. Eager seems alright

* dynamic does not work

* fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states)

* should be final fixes

* fix more stuff no cat

* style

* fix

* style

* final sytle

* qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs

* fix

* revert
2025-04-08 11:14:49 +02:00
Arthur
25b7f27234
Add llama4 (#37307)
* remove one of the last deps

* update fast image processor after refactor

* styling

* more quality of life improvements

* nit

* update

* cleanups

* some cleanups

* vllm updates

* update fake image token

* [convert] Fix typo

* [convert] Strip extraneous bytes from shards

* [convert] Minor fixes

* [convert] Use num_experts

* multi-image fixes in modeling + processor

* fixup size

* 128 experts

* Use default rope

* Unfuse mlp

* simplify a lot inputs embeds merging

* remove .item() 👀

* fix from review

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* set seed

* return aspect ratios and bug fixes

* Moe 128 rebased (#8)

* 128 experts

* Use default rope

* Unfuse mlp

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* Meta/llama quant compat (#7)

* add quant compatible model & conversion code for llama4

* fix a few issues

* fix a few issues

* minor type mapping fix

---------

Co-authored-by: Lu Fang <fanglu@fb.com>

* use a new config parameter to determine which model definition to use for MoE

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>

* un-comment write_tokenizer from converting script

* remove un-used imports

* [llama4] Pop aspect_ratios from image processor output in Llama4Processor

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* Fix parameter_count name

* Update src/transformers/models/llama4/configuration_llama4.py

* nit

* Add changes for no_rope, moe_layers, chunked attention. Just need to test all

* Update src/transformers/models/llama4/image_processing_llama4_fast.py

* nit

* fix post merge with main

* support flex attention

* fixes

* fix

* add layer

* small updates

* rebase and delete llm_compressor

* nit

* [llama4/mm] Add back <|image|> token that delimits global tile

* [llama4/mm] Fix Llama 4 image processing unit tests

* add explicit dtype

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* sdpa works

* comment todo small

* fix model loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* revert

* nits

* small fix for TP on 1 node

* Read new params from config

* Add <|eom|>

* lol don't know how this got here

* adding fp8

* Save processor, fix chat template

* style

* Add boi/eoi tokens

We don't use them.

* fixes for now flex seems to work :)

* updates

* nits

* updates

* missking keys

* add context parallel

* update

* update

* fix

* nits

* add worldsize and make eager attn work for vision

* Ignore new key present in base models

* add tp_plan

* fix nope

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* minor fix

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* Clean up Llama4 vision model

* current updates

* add support for `attn_temperature_tuning`

* add floor scale

* add missing attn scales

* push what works, dirty trick for the device synch

* oups

* Fix pad_token_id

See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.

* fix causallml loading

* rm

* fix tied-weights

* fix sdpa

* push current version

* should work with both short and long

* add compressed_tensos & fix fbgemm tp

* Fix flex impl

* style

* chunking

* try to revert the potentially breaking change

* fix auto factory

* fix shapes in general

* rm processing

* commit cache utils cleanup

* Fix context length

* fix

* allocate

* update tp_plan

* fix SDPA!

* Add support for sparse `Llama4TextMoe` layer from the kernel hub

* cleanup

* better merge

* update

* still broken fixing now

* nits

* revert print

* Write max_position_embeddings and max_model_length

* Update modeling_llama4.py

* Save attention_chunk_size

* Sync eos terminators

* Read initializer_range

* style

* remove `dict`

* fix

* eager should use `chunked_attention_mask`

* revert

* fixup

* fix config

* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

This reverts commit ccda19f050, reversing
changes made to a515579aed.

* Fix typo and remove warning with compiled flex and chunked prefill

* Fix MoE vs FF (#41)

* fix

* Use correct no_rope_layers if provided one is empty list

* update tests

* fix

* skipping some tests

* fix fp8 loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* fix text geneartion pipeline

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* eager needs 4D mask

* fix

* Some cleanup

* fix

* update

* fix

* replace correctly module

* patch

* modulelist

* update

* update

* clean up

* Don't move to `cuda:0` in distributed mode

* restrict to compressed tensors for now

* rm print

* Docs!

* Fixes

* Update docs/source/en/model_doc/llama4.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fixes

* cuda graph fix

* revert some stuff

* fixup

* styling

* Update src/transformers/models/llama4/modeling_llama4.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* commit licence, cleanup here and there and style

* more styling changes

* fix dummies

* fix and clean docstrings

* remove comment

* remove warning

* Only fast image processor is supported

* nit

* trigger CI

* fix issue with flex encoder

* fix dynamic cache

* Code quality

* Code quality

* fix more tests for now

* Code quality

* Code quality

* Nuke bunch of failing stuff

* Code quality

* Code quality

* cleanup removal of slow image processor

* ruff fix fast image processor

* fix

* fix styling

* Docs

* Repo consistency

* Repo consistency

* fix sliding window issue

* separate llama cache

* styling

* Repo consistency

* Repo consistency

* push waht works

* L4 Repo consistency

* Docs

* fix last last alst alst alst alstsaltlsltlaslt

---------

Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
Joao Gante
ad3d157188
[RoPE] abstract dynamic RoPE update under a decorator (#37249)
* dynamic rope decorator

* longrope; shorter fwd pass

* propper docstring

* make fixup
2025-04-04 14:27:28 +01:00
Yih-Dar
b016de1ae4
Fix utils/check_bad_commit.py (#37272)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-04 12:18:20 +02:00
Nikos Antoniou
f74d7da836
Introduce modular files for speech models (#35902)
* WAV_2_VEC_2 to WAV2VEC2

* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio

* remove unnessary definitions in modulars

* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer

* docstring fix for UniSpeechForCTC

* removed unneccessary re-definition of modular classes

* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput

* top-level import of deepspeed in seamless_m4t, speecht5

* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations

* convert modular

* tiny modular typing fixes

* some more modular fixes

* make style

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-04-04 11:46:27 +02:00
Yih-Dar
adfc91cd46
Try to avoid/reduce some remaining CI job failures (#37202)
* try

* try

* Update tests/pipelines/test_pipelines_video_classification.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-02 14:39:57 +02:00
Yih-Dar
35253076f4
Avoid pipeline test failing related to Hub call (#37170)
* cls

* cls

* cls

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-01 18:22:45 +02:00
Bo Zheng
6acd5aecb3
Adding Qwen3 and Qwen3MoE (#36878)
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-03-31 09:50:49 +02:00
cyyever
6cc9c8d7d1
Remove deprecated batch_size parameter (#37007) 2025-03-27 15:01:56 +00:00
Yih-Dar
181d453069
byebye CircleCI TF jobs (#36998)
* byebye tf jobs

* byebye tf jobs

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 12:49:50 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2 (#36962)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start

* update

* update

* add to imports

* update

* up

* simplify configs

* clean configs

* style

* typos

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* fix

* up

* up

* up

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* up

* up

* up

* Update feature_extraction_phi4_multimodal.py

* up

* up

* up

* up

* up

* simplify configs

* typo

* cut code

* typo

* typo

* typo

* re

* typo

* up

* up

* up

* add tests

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* up

* Update test_modeling_phi4_multimodal.py

* doc

* fix

* up

* up

* up

* up

* up

* up

* simplify

* up

* simplify

* config docstrings

* cleanup

* clean

* typo

* typo

* fix

* Update phi4_multimodal.md

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* update

* simplify reshapes and permutes

* up

* simplify special tokens

* simplify processor a lot

* Update processing_phi4_multimodal.py

* Update processing_phi4_multimodal.py

* switch to fast processor

* image processor

* Update image_processing_phi4_multimodal_fast.py

* add lora extraction to converter

* Update convert_phi4_multimodal_weights_to_hf.py

* Update __init__.py

* add AudioInput type in audio_utils

* rewrite feature_extraction: support torch batched FFT

* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values

* test update

* not mono channel warning update

* remove auto maps from processor

* kargs dispatch in processor

* simplify kwargs dispatch

* simplify merging

* remove default sampling rate

* style

* Update test_modeling_phi4_multimodal.py

* update doc

* doc

* torch only feature extractor

* make fake tokens adjustable

* Update feature_extraction_phi4_multimodal.py

* fix

* Update processing_phi4_multimodal.py

* simplify mask

* last touch

* fix copies

* style

* Update audio_utils.py

* style

* Update feature_extraction_phi4_multimodal.py

* Update __init__.py

* docstrings

* copies

* fix all checks

* back to fix-copies

* trigger CIs

* Update feature_extraction_phi4_multimodal.py

* improve tests with multimodal inputs

* trigger CIs

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
Daniël de Kok
f94b0c59f2
Use deformable_detr kernel from the Hub (#36853)
* Use `deformable_detr` kernel from the Hub

Remove the `deformable_detr` kernel from `kernels/` and use the
pre-built kernel from the Hub instead.

* Add license header

* Add `kernels` as an extra `hub-kernels`

Also add it to `testing`, so that the kernel replacement gets tested
when using CUDA in CI.
2025-03-21 13:08:47 +01:00
Pablo Montalvo
1d3f35f30a
Add model visual debugger (#36798)
* draft of model tracer visualiser

* add context manager in addition to decorator

* add debug utils to init

* move model debugging utils to dedicated file

* add documentation

* protect some imports

* format

* move and protect imports

* format

* doc: improve errors in case of broken dummy imports.

* format

* use automatic torch backend

* update doc

* fix backend

* (TEMP) move to dummies while backend wait

* update documentation

* doc
2025-03-20 17:37:29 +01:00
Ryan Mullins
487dab1b2b
Shieldgemma2 (#36678)
* single commit

* correct config

* fixup

* dummy pt

* Use ShieldGemma2Config in conversion script

* Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py

* Adding shieldgemma2 to models.__init__.py

* Adding ShieldGemma2 to main __init__.py

* Update shieldgemma2.md

* Update shieldgemma2.md

* Adding tests. Addressing review feedback.

* Minor docs update

* Fixing code quality feedback from CI

* Fixing empty messages bug reported by ghunkins

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ren Pang <ain-soph@live.com>
2025-03-20 15:14:38 +01:00
Joao Gante
8e97b44087
[modular] Sort modular skips (#36304) 2025-03-20 10:55:12 +00:00
Yoni Gozlan
12f2ebef63
Support custom dosctrings in modular (#36726)
* Override docstrings in modular if not none

* Update doc
2025-03-18 14:00:54 -04:00
Afanti
d84569387f
chore: fix typos in utils module (#36668)
* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module
2025-03-13 15:12:44 +00:00
Yoni Gozlan
ea219ed164
Remove differences between init and preprocess kwargs for fast image processors (#36186)
* Remove differences between init and preprocess kwargs in fast image processors

* make modifs got_ocr2

* update gemma3
2025-03-12 19:44:05 -04:00
Ryan Mullins
50d3530aa0
Gemma3 (#36658)
* Fix converter

* [Broken] Adds Gemma 3 to Hugging Face Transformers

* Consolidating Config and Processor params across impls

* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.

* Additional plumbing for CausalLM and ConditionalGeneration variants

* incomplete draft of Orbax conversion script

* More complete checkpoint conversion

* Supporting Gemma 3 1B checkpoints

* Updating RoPE for multiple frequencies

* Adjustments to rotary embedder

* Proof of life for text-only operation

* Updating the conversion script to handle multimodal projection weights

* Fixing tet-only conversions

* Cleaner conversion script with multimodal support and a simpler processor

* Additional refatcors to the Gemma3Processor

* Simplified Processor to work over text representations

* Updated conversion script to join text and vision embeddings at converion time

* Logging for debugging

* Update src/transformers/models/gemma2/modeling_gemma2.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Removed extraneous Config params

* Switching to fast tokenizer for checkpoint conversions

* isolating siglip for performance tetsing

* Minor changes for debugging tests against baselines

* Adding average pooling for soft tokens

* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts

* Updating conversion script for ShieldGemma 2 conversion compatibility

* Allow disable_compile to be provided as a kwarg

* Refresh from modular

* Updated conversion script and corrected sliding window

* Fix type mismatch in cache_position (#4)

* Fix dtype (#5)

* Fix type mismatch in cache_position

* Actually fix in the modular file

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor

* Adding 2D pooling for image embeddings

* Revert "Adding 2D pooling for image embeddings"

This reverts commit 65350cf531.

* Gemma3 average pooling changed from 1D to 2D

* Major refactor to Gemma3MultimodalInputProjection

* Updating Gemm 3 Auto* registrations

* Add option to save Gemma 3 chat template with tokenizer during weights conversion

* Removing unused imports

* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration

* Removing duplicate config property

* Removing final logit softcapping and 1-indexing of position ids

* Fixing image processor config and none --> None typo

* Fixing sliding window size for 1B

* Updating image_mean and image_std in Image Processor

* Attention masking changed to lower triangular

* Moving image special tokens to conversion script

* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs

* Remove special token variables from symbol space

* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration

* tie lm_head and embedding weights

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Correct tied weights in Gemma3CausalLM

* iterative bidirectional attention

* resolving merge conflicts

* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6

* Correcting RoPE scaling

* clean up first pass, dummy model geenration works

* final clean up before fixing tests

* causal lm test works, so fine

* Fix conversion

* Update src/transformers/models/gemma3/processing_gemma3.py

* model tests are happy

* processor tests are happy

* image processing tests added

* fixup

* Fix pre-processing in conversion

* Inputs merging

* Do not normalize vision embeddings

* Apply Ryan's (and team) changes to attention

* token type ids + mask

* template

* move embed scale, add rope scale, fix tests

* Add chat template to tokenizer

* Use prefix for causal model loading

* use existing code for sliding mask from gemma2

* self.embed_tokens already normalizes

* Correcting Gemma3TextConfig parameters in conversion script

* typo, modular overwrites my fixes

* enable device map for text model

* Conversion updates

* ultra nit: no einsums

* update image token

* copy deepcopy config + some docs

* add some test, still WIP

* Refactoring --include_chat_tempalte logic in converter

* Update src/transformers/models/gemma3/modular_gemma3.py

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Add eos tokens for instruct models

* dump so i can work on dgx

* Removing add_bos by default

* dump

* add fast im proc

* docs for PaS + fixup

* another fixup

* one more fixup

* fix tests

* Inverting prior BOS change

* ultra nit

* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS

* resize embeds, remove sqrt, add slow test outputs

* FA2 but quality is meh

* nit

* skip FA2, no idea what happened

* last bit for green CI

* please, green CI for docs

* T_T

* Fix for Gemma3 logits

* Support both options for system prompt

* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Docs updates now that assets are live

* Style fixes

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
2025-03-12 09:06:17 +01:00
Matt
1e4286fd59
Remove research projects (#36645)
* Remove research projects

* Add new README to explain where the projects went

* Trigger tests

* Cleanup all references to research_projects
2025-03-11 13:47:38 +00:00
hlky
bc30dd1efb
Modular Conversion --fix_and_overwrite on Windows (#36583)
* Modular Conversion --fix_and_overwrite on Windows

* -newline on read
2025-03-06 13:12:30 +00:00
Steven Liu
c0f8d055ce
[docs] Redesign (#31757)
* toctree

* not-doctested.txt

* collapse sections

* feedback

* update

* rewrite get started sections

* fixes

* fix

* loading models

* fix

* customize models

* share

* fix link

* contribute part 1

* contribute pt 2

* fix toctree

* tokenization pt 1

* Add new model (#32615)

* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* "to be not" -> "not to be" (#32636)

* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* fix hfoption tag

* tokenization pt. 2

* image processor

* fix toctree

* backbones

* feature extractor

* fix file name

* processor

* update not-doctested

* update

* make style

* fix toctree

* revision

* make fixup

* fix toctree

* fix

* make style

* fix hfoption tag

* pipeline

* pipeline gradio

* pipeline web server

* add pipeline

* fix toctree

* not-doctested

* prompting

* llm optims

* fix toctree

* fixes

* cache

* text generation

* fix

* chat pipeline

* chat stuff

* xla

* torch.compile

* cpu inference

* toctree

* gpu inference

* agents and tools

* gguf/tiktoken

* finetune

* toctree

* trainer

* trainer pt 2

* optims

* optimizers

* accelerate

* parallelism

* fsdp

* update

* distributed cpu

* hardware training

* gpu training

* gpu training 2

* peft

* distrib debug

* deepspeed 1

* deepspeed 2

* chat toctree

* quant pt 1

* quant pt 2

* fix toctree

* fix

* fix

* quant pt 3

* quant pt 4

* serialization

* torchscript

* scripts

* tpu

* review

* model addition timeline

* modular

* more reviews

* reviews

* fix toctree

* reviews reviews

* continue reviews

* more reviews

* modular transformers

* more review

* zamba2

* fix

* all frameworks

* pytorch

* supported model frameworks

* flashattention

* rm check_table

* not-doctested.txt

* rm check_support_list.py

* feedback

* updates/feedback

* review

* feedback

* fix

* update

* feedback

* updates

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-03-03 10:33:46 -08:00
Cyril Vallez
da4ab2a1b6
Fix doc formatting in forward passes & modular (#36243)
* fix indentation issues + modular without magic keyword

* style

* Update doc.py

* style

* Fix all decorators indentation

* all models

* style

* style

* Update doc.py

* fix

* general fix

* style
2025-02-25 11:09:01 +01:00
Cyril Vallez
bc65f3fc1c
[modular] Do not track imports in functions (#36279)
* Add check

* just check for function

* Update examples
2025-02-25 10:29:47 +01:00
Yih-Dar
2ab7bdc403
notify new model merged to main (#36375)
notify new model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-24 17:53:18 +01:00
Pavel Iakubovskii
a957b7911a
Add SigLIP 2 (#36323)
* Docs

* Inits

* Auto classes

* Add siglip base

* Add base tests

* Fix Siglip V1 for fix res version

* Add image processor

* Update conversion

* Experimenting with vectorized embeddings

* Fixup

* Add modular Siglip2Processor

* Add modular configuration

* Rename num patches

* Correct image and text features merging

* Working conversion script

* Refactoring conversion script

* Remove unused code in conversion script

* Shorten dict a bit

* Refactoring conversion

* Done conversion refactoring

* Fixup

* Modular siglip2

* Make model exportable and compilable without graph breaks

* Remove position_ids from image_processor

* REmove position ids from modeling file

* Update modular

* Type hint

* Fixup

* Set defaults to processor

* Add integration test

* Revert spatial shapes back to tensor

* Change order

* Fix most of the tests

* Fix docstring

* Remove interpolate_pos_encoding arg (not needed)

* Update docs

* Standardize processing

* Fix attention_mask in vision head

* Siglip v1: remove double transpose in FA2

* Update modular file

* Update FA2 test

* Update expected logits

* Fix interpolation for siglip2 image processor

* Skip init test

* Skip dispatch on flash test

* Fix modeling tests

* Fixup

* Add dummy objects

* Fix some docstrings

* Add siglip2 in index.md

* Fix consistency

* Add docs

* Remove size and data format

* Add image processor tests

* Fix

* Add fast image processor

* Fix style

* Fix

* Docs

* Set lowercase for tokenizer

* Adjust head size for Siglip v1

* Update siglip2 for consistency with siglip1

* Update siglip2 conversion

* Update pipeline

* Update checkpoints in tests

* Update checkpoint name

* Fix pooling for image classification model

* Fix FA2 test

* Update processor

* Fix check repo

* Update docs

* Fix typos

* Fix docstring for fast image processor

* Add siglip2 to FA2 docs

* Fix fast ip tests

* Fix constitency

* Fix tokenizer class for siglip v1

* Fix missing header

* Refactor scaling for clip, siglip, siglip2

* Remove unused imports

* Make fast IP default for siglip2

* Update docs

* Update checkpoints

* Update modular

* Update paper link

* Fixup

* Fix name in toctree

* Fix test
2025-02-21 09:04:19 +00:00
Orr Zohar
4397dfcb71
SmolVLM2 (#36126)
* smolvlm init

* updates

* fixing bugs

* minimal run, no checks

* minimal run, no checks

* passing first check + adding url support

* updating video dataloading logic

* fixing image logic

* trying modular, but fails

* modular is working, changing processor to match PR comments and general transformers logic

* fixing kwargs

* offloading video loading logic to  image_util

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* update

* add idefics3-based tests

* add keyword to all

* add PreTrainedModel

* updateing video loading logic

* working inference

* updates for PR comments

* updates for PR comments

* moving SmolVLMPretrainedModel higher to fix import error

* CI test pass

* CI test pass

* removing lambda

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* processor tests

* add example in docs

* typo

* fix copies

* skip compile tests - sdpa for VisionTransformer

* fix init

* raise import error for num2words

* update doc for FA2

* more doc fix

* CI

* updates for PR comments

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc

* adding smolvlm to VQA models

* removing vqa auto class

* Update src/transformers/models/smolvlm/processing_smolvlm.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* removing smolvlmvisiontransformer from index.md

* my bad, video processing had typos

* fixing docs

* renaming params in SmolVLMModel.inputs_merger

* removing un-needed dtype/device in model forward

* ruff for CI

* update docs

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* return cache position

* return cache position

* return cache also in modular

* needed to run modular again

* fix training tests

* push vectorized inputs merger

* format

* format

* reduce number of mappings

* addressing PR comments

* happy CI, happy me :)

* skip non-nested images

* adjust integration test for smaller GPUs

* format

* fix kwargs in chat template apply

* skip this for now

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-02-20 15:00:26 +01:00
Yih-Dar
f2ab182dca
Ignore conversion files in test fetcher (#36251)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-20 13:32:02 +01:00
Joao Gante
99adc74462
[tests] remove flax-pt equivalence and cross tests (#36283) 2025-02-19 15:13:27 +00:00
Joao Gante
0863eef248
[tests] remove pt_tf equivalence tests (#36253) 2025-02-19 11:55:11 +00:00
Yih-Dar
0a9923a609
Use args.num_workers in check_modular_conversion.py (#36200)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-14 17:31:03 +01:00
Yih-Dar
8fd4bc7d1d
Fix a mistake in #36175 (#36179)
fix my bad

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 18:33:02 +01:00
Yih-Dar
bfe46c98b5
Make check_repository_consistency run faster by MP (#36175)
* speeddddd

* speeddddd

* speeddddd

* speeddddd

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 17:25:17 +01:00
Joao Gante
d114a6f78e
[Modular] skip modular checks based on diff (#36130)
skip modular checks based on diff
2025-02-13 12:53:21 +00:00
Yih-Dar
4a5a7b991a
Fix test fetcher (#36129)
* fix

* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-12 17:35:41 +01:00
Joao Gante
8a2f062eac
[commands] remove deprecated/inoperational commands (#35718)
rm deprecated/inoperational commands
2025-02-12 12:23:58 +00:00
kkscilife
09261ccf12
[Bugfix] fix file name of docstring in utils/check_table.py (#36108)
fix file name

Co-authored-by: kkscilife <qa-caif-cicd@pjlab.org.cn>
2025-02-10 15:48:02 +00:00
Jade Choghari
006d9249ec
Adding RT-DETRv2 for object detection (#34773)
* cookiecutter add rtdetrv2

* make modular working

* working modelgit add .

* working modelgit add .

* finalize moduar inheritence

* finalize moduar inheritence

* Update src/transformers/models/rtdetrv2/modular_rtdetrv2.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* update modular and add rename

* remove output ckpt

* define loss_kwargs

* fix CamelCase naming

* fix naming + files

* fix modular and convert file

* additional changes

* fix modular

* fix import error (switch to lazy)

* fix autobackbone

* make style

* add

* update testing

* fix loss

* remove old folder

* fix testing for v2

* update docstring

* fix docstring

* add resnetv2 (with modular bug to fix)

* remove resnetv2 backbone

* fix changes

* small fixes

* remove rtdetrv2resnetconfig

* add rtdetrv2 name to convert

* make style

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix modular typo after review

* add reviewed changes

* add final review changes

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/__init__.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/convert_rt_detr_v2_weights_to_hf.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add review changes

* remove rtdetrv2 resnet

* removing this weird project change

* change ckpt name from jadechoghari to author

* implement review and update testing

* update naming and remove wrong ckpt

* name

* make fix-copies

* Fix RT-DETR loss

* Add resources, fix name

* Fix repo in docs

* Fix table name

---------

Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: qubvel <qubvel@gmail.com>
2025-02-06 19:28:45 +00:00
Yoni Gozlan
fa56dcc2ab
Refactoring of ImageProcessorFast (#35069)
* add init and base image processing functions

* add add_fast_image_processor to transformers-cli

* add working fast image processor clip

* add fast image processor to doc, working tests

* remove "to be implemented" SigLip

* fix unprotected import

* fix unprotected vision import

* update ViTImageProcessorFast

* increase threshold slow fast ewuivalence

* add fast img blip

* add fast class in tests with cli

* improve cli

* add fast image processor convnext

* add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision

* add device kwarg to ImagesKwargs for fast processing on cuda

* cleanup

* fix unprotected import

* group images by sizes and add batch processing

* Add batch equivalence tests, skip when center_crop is used

* cleanup

* update init and cli

* fix-copies

* refactor convnext, cleanup base

* fix

* remove patching mixins, add piped torchvision transforms for ViT

* fix unbatched processing

* fix f strings

* protect imports

* change llava onevision to class transforms (test)

* fix convnext

* improve formatting (following Pavel review)

* fix handling device arg

* improve cli

* fix

* fix inits

* Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs

* uniformize qwen2_vl fast

* fix docstrings

* add add fast image processor llava

* remove min_pixels max_pixels from accepted size

* nit

* nit

* refactor fast image processors docstrings

* cleanup and remove fast class transforms

* update add fast image processor transformers cli

* cleanup docstring

* uniformize pixtral fast and  make _process_image explicit

* fix prepare image structure llava next/onevision

* Use typed kwargs instead of explicit args

* nit fix import Unpack

* clearly separate pops and gets in base preprocess. Use explicit typed kwargs

* make qwen2_vl preprocess arguments hashable
2025-02-04 17:52:31 -05:00
David
8d73a38606
Add DAB-DETR for object detection (#30803)
* initial commit

* encoder+decoder layer changes WIP

* architecture checks

* working version of detection + segmentation

* fix modeling outputs

* fix return dict + output att/hs

* found the position embedding masking bug

* pre-training version

* added iamge processors

* typo in init.py

* iterupdate set to false

* fixed num_labels in class_output linear layer bias init

* multihead attention shape fixes

* test improvements

* test update

* dab-detr model_doc update

* dab-detr model_doc update2

* test fix:test_retain_grad_hidden_states_attentions

* config file clean and renaming variables

* config file clean and renaming variables fix

* updated convert_to_hf file

* small fixes

* style and qulity checks

* return_dict fix

* Merge branch main into add_dab_detr

* small comment fix

* skip test_inputs_embeds test

* image processor updates + image processor test updates

* check copies test fix update

* updates for check_copies.py test

* updates for check_copies.py test2

* tied weights fix

* fixed image processing tests and fixed shared weights issues

* added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py

* delete prints from test file

* SafeTensor modification to solve HF Trainer issue

* removing the safetensor modifications

* make fix copies and hf uplaod has been added.

* fixed index.md

* fixed repo consistency

* styel fix and dabdetrimageprocessor docstring update

* requested modifications after the first review

* Update src/transformers/models/dab_detr/image_processing_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* repo consistency has been fixed

* update copied NestedTensor function after main merge

* Update src/transformers/models/dab_detr/modeling_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* temp commit

* temp commit2

* temp commit 3

* unit tests are fixed

* fixed repo consistency

* updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file.

* temporarialy config modifications and repo consistency fixes

* Put dilation parameter back to config

* pattern embeddings have been added to the rename_keys method

* add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES

* delete FeatureExtractor part from docs.md

* requested modifications in modeling_dab_detr.py

* [run_slow] dab_detr

* deleted last segmentation code part, updated conversion script and changed the hf path in test files

* temp commit of requested modifications

* temp commit of requested modifications 2

* updated config file, resolved codepaths and refactored conversion script

* updated decodelayer block types and refactored conversion script

* style and quality update

* small modifications based on the request

* attentions are refactored

* removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed

* deleted imageprocessor

* fixed conversion script + quality and style

* fixed config_att

* [run_slow] dab_detr

* changing model path in conversion file and in test file

* fix Decoder variable naming

* testing the old loss function

* switched back to the new loss function and testing with the odl attention functions

* switched back to the new last good result modeling file

* moved back to the version when I asked the review

* missing new line at the end of the file

* old version test

* turn back to newest mdoel versino but change image processor

* style fix

* style fix after merge main

* [run_slow] dab_detr

* [run_slow] dab_detr

* added device and type for head bias data part

* [run_slow] dab_detr

* fixed model head bias data fill

* changed test_inference_object_detection_head assertTrues to torch test assert_close

* fixes part 1

* quality update

* self.bbox_embed in decoder has been restored

* changed Assert true torch closeall methods to torch testing assertclose

* modelcard markdown file has been updated

* deleted intemediate list from decoder module

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-04 17:28:27 +00:00
Yih-Dar
014a1fa2c8
CircleCI with python 3.9 (#36027)
update docker files

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 17:40:20 +01:00