Commit Graph

19117 Commits

Author SHA1 Message Date
Mohamed Mekkouri
096f25ae1f
Fix Bitnet tokenizer in pipeline (#37861)
add tokenizer
2025-04-29 15:35:02 +02:00
Chris
da7ae467c4
Fix cache get item return type hints (#37847)
F: Fix cache return hints

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-29 14:23:52 +01:00
Hicham Tala
aa6b79db43
Fix check of unecessary packages (issue #37626) (#37825)
* Fix check of unecessary packages (issue #37626)

* Reformat using ruff

* And a condition to avoind the risk of matching a random object in `import_utils`

* Reformat
2025-04-29 14:21:05 +01:00
Matt
517367fe9a
Revert change that breaks on Torch 2.1 (#37531)
* Revert change that breaks on Torch 2.1

* Add TODO

* Trigger tests

* Trigger tests
2025-04-29 13:27:09 +01:00
Joao Gante
755b0fa2fe
[tests] reorganize cache tests and clean memory between tests (#37684) 2025-04-29 12:21:14 +01:00
Joao Gante
3a1acc36ed
[tests] fix flaky pattern in test_generate_continue_from_past_key_values (#37724) 2025-04-29 12:20:42 +01:00
Vladislav Bronzov
4abeb50f6e
Add D-FINE Model into Transformers (#36261)
* copy the last changes from broken PR

* small format

* some fixes and refactoring after review

* format

* add config attr for loss

* some fixes and refactoring

* fix copies

* fix style

* add test for d-fine resnet

* fix decoder layer prop

* fix dummies

* format init

* remove extra print

* refactor modeling, move resnet into separate folder

* fix resnet config

* change resnet on hgnet_v2, add clamp into decoder

* fix init

* fix config doc

* fix init

* fix dummies

* fix config docs

* fix hgnet_v2 config typo

* format modular

* add image classification for hgnet, some refactoring

* format tests

* fix dummies

* fix init

* fix style

* fix init for hgnet v2

* fix index.md, add init rnage for hgnet

* fix conversion

* add missing attr to encoder

* add loss for d-fine, add additional output for rt-detr decoder

* tests and docs fixes

* fix rt_detr v2 conversion

* some fixes for loos and decoder output

* some fixes for loss

* small fix for converted modeling

* add n model config, some todo comments for modular

* convert script adjustments and fixes, small refact

* remove extra output for rt_detr

* make some outputs optionsl, fix conversion

* some posr merge fixes

* small fix

* last field fix

* fix not split for hgnet_v2

* disable parallelism test for hgnet_v2 image classification

* skip multi gpu for d-fine

* adjust after merge init

* remove extra comment

* fix repo name references

* small fixes for tests

* Fix checkpoint path

* Fix consistency

* Fixing docs

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-29 12:17:55 +01:00
Cyril Vallez
4602059aae
[modular] Fix the prefix-based renaming if the old and new model share a common name suffix (#37829)
* first try

* Fix and set examples

* style

* fix

* Update modular_test_detr.py

* Update image_processing_new_imgproc_model.py

* Update modular_model_converter.py
2025-04-29 10:43:23 +02:00
Henrik Matthiesen
a847d4aa6b
Fast image processor for VitMatte added and bug in slow version fixed (#37616)
* added fast image processor for VitMatte including updated and new tests, fixed a bug in the slow image processor that processed images incorrectly for input format ChannelDimension.FIRST in which case the trimaps were not added in the correct dimension, this bug was also reflected in the tests through incorretly shaped trimaps being passed

* final edits for fast vitmatte image processor and tests

* final edits for fast vitmatte image processor and tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-28 14:51:50 -04:00
sushmanth reddy
65e940208c
Samhq model addition (#35147)
* added the configuartion for sam_hq

* added the modeelling for sam_hq

* added the sam hq mask decoder with hq features

* added the code for the samhq

* added the code for the samhq

* added the code for the samhq

* Delete src/transformers/models/sam_hq/modelling_sam_hq.py

* added the code for the samhq

* added the code for the samhq

* added the chnages for the modeelling

* added the code for sam hq for image processing

* added code for the sam hq model

* added the required changes

* added the changes

* added the key mappings for the sam hq

* adding the working code of samhq

* added the required files

* adding the pt object

* added the push to hub account

* added the args for the sam maks  decoder

* added the args for the sam hq vision config

* aded the some more documentation

* removed the unecessary spaces

* all required chnages

* removed the image processor

* added the required file

* added the changes for the checkcopies

* added the code for modular file

* added the changes for the __init file

* added the code for the interm embeds

* added the code for sam hq

* added the changes for modular file

* added the test file

* added the changes required

* added the changes required

* added the code for the

* added the cl errors

* added the changes

* added the required changes

* added the some code

* added the code for the removing image processor

* added the test dimensins

* added the code for the removing extra used variables

* added the code for modeluar file hf_mlp for a better name

* removed abbrevaation in core functionality

* removed abbrevaation in core functionality

* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory

* added the code which is after make fixup

* added some test for the intermediate embeddings test

* added the code for the torch support in sam hq

* added the code for the updated modular file

* added the changes for documentations as mentioned

* removed the heading

* add the changes for the code

* first mentioned issue resolved

* added the changes code to processor

* added the easy loading to init file

* added the changes to code

* added the code to changes

* added the code to work

* added the code for sam hq

* added the code for sam hq

* added the code for the point pad value

* added the small test for the image embeddings and intermediate embedding

* added the code

* added the code

* added the code for the tests

* added the code

* added ythe code for the processor file

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code for tests and some checks

* added some code

* added the code

* added the code

* added some code

* added some code

* added the changes for required

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added some changes

* added some changes

* removed spaces and quality checks

* added some code

* added some code

* added some code

* added code quality checks

* added the checks for quality checks

* addded some code which fixes test_inference_mask_generation_no_point

* added code for the test_inference_mask_generation_one_point_one_bb

* added code for the test_inference_mask_generation_one_point_one_bb_zero

* added code for the test_inference_mask_generation_one_box

* added some code in modelling for testing

* added some code which sort maks with high score

* added some code

* added some code

* added some code for the move KEYS_TO_MODIFY_MAPPING

* added some code for the  unsqueeze removal

* added some code for the  unsqueeze removal

* added some code

* added some code

* add some code

* added some code

* added some code

* added some testign values changed

* added changes to code in sam hq for readbility purpose

* added pre commit checks

* added the fix samvisionmodel for compatibilty

* added the changes made on sam by cyyever

* fixed the tests for samhq

* added some the code

* added some code related to init file issue during merge conflicts

* remobved the merge conflicts

* added changes mentioned by aruther and mobap

* added changes mentioned by aruther and mobap

* solving quality checks

* added the changes for input clearly

* added the changes

* added changes in mask generation file rgearding model inputs and  sam hq quargs  in processor file

* added changes in processor file

* added the  Setup -> setupclass conversion

* added the code mentioned for processor

* added changes for the code

* added some code

* added some code

* added some code

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-04-28 19:07:09 +02:00
Raushan Turganbay
9c5b1319d0
[config] revert #37603 (#37821)
revert
2025-04-28 16:28:30 +02:00
Marc Sun
9e730689c3
change XLA deprecated api (#37741)
* deprecated api

* fix
2025-04-28 16:27:41 +02:00
Yuan Wu
2933894985
Fix error of HPU TP (#37782)
* Fix error of HPU TP

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add the init distrubuted for hpu

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix error of make style

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-04-28 15:47:16 +02:00
Yuanyuan Chen
da4ff2a5f5
Add Optional to remaining types (#37808)
More Optional typing

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-28 14:20:45 +01:00
Benjamin Bossan
1a9188a54e
FIX: Faulty PEFT tests (#37757)
Two PEFT tests are actually failing:

tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_delete_adapter
tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_pipeline_no_warning

This must have been going on for some time but was apparently never
noticed. The cause is that the tests themselves are faulty, the PEFT
integration is correct in these cases.

test_delete_adapter

The first faulty test was introduced by #34650. AFAICT, it should never
have passed in the first place, the PEFT integration logic was not
changed in the meantime. At this point, the logs for the PR CI are gone,
so I'm not sure if the test passed back then or not.

test_peft_pipeline_no_warning

This test was introduced in #36783 and should also never have passed, as
the self.assertNoLogs context manager only returns None, thus the assert
should never have worked (mea culpa for suggesting this code snippet).
Here too, the CI logs are deleted by now, so I can't check if the test
already failed back then.
2025-04-28 15:10:46 +02:00
Mohamed Mekkouri
b262680af4
Add Bitnet model (#37742)
* Adding BitNet b1.58 Model

* Add testing code for BitNet

* Fix format issues

* Fix docstring format issues

* Fix docstring

* Fix docstring

* Fix: weight back to uint8

* Fix

* Fix format issues

* Remove copy comments

* Add model link to the docstring

* Fix: set tie_word_embeddings default to false

* Update

* Generate modeling file

* Change config name for automatically generating modeling file.

* Generate modeling file

* Fix class name

* Change testing branch

* Remove unused param

* Fix config docstring

* Add docstring for BitNetQuantConfig.

* Fix docstring

* Update docs/source/en/model_doc/bitnet.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update docs/source/en/model_doc/bitnet.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update bitnet config

* Update explanation between online and offline mode

* Remove space

* revert changes

* more revert

* spaces

* update

* fix-copies

* doc fix

* fix minor nits

* empty

* small nit

* empty

---------

Co-authored-by: Shuming Ma <shumingma@pku.edu.cn>
Co-authored-by: shumingma <shmingm@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 15:08:46 +02:00
NielsRogge
82862ce443
[RT-DETR] Improve docs (#37814)
Fix docs
2025-04-28 13:19:24 +02:00
Li Haoru
97e57b2545
Fix: Correct tensor shape comment in Mamba modeling (#37801)
* Fix: Correct tensor shape comment in Mamba modeling

* Update src/transformers/models/mamba/modeling_mamba.py

* Update src/transformers/models/mamba/modeling_mamba.py

---------

Co-authored-by: ShadyPi <11342288+shadypi@user.noreply.gitee.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-04-28 11:56:42 +01:00
Ken J
33493542aa
[doc] fix the code examples in qwen doc (#37803) 2025-04-28 11:56:32 +01:00
co63oc
d5fa7d2d19
Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
Mohamed Mekkouri
f466603963
Define warmup allocator for torchao quantization (#37764)
* torchao allocator

* add comment

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 10:45:55 +02:00
Yuan Wu
a41b6d9b5c
Fix the fsdp config cannot work issue. (#37549)
* Fix the fsdp config cannot work issue.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Check the fsdp_config type

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add the accelerate_fsdp_config test

Signed-off-by: yuanwu <yuan.wu@intel.com>

* fix error of make style

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add key check

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 10:44:51 +02:00
Guang Yang
816b37010c
Gemma3 is Torch Exportable (#37728)
* Gemma3 is Torch Exportable

* Expand the support to other mdoels using HybridCache

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2025-04-28 09:36:46 +02:00
SR
397a5ede33
Fix error message in hub.py (#37796)
Fix error message
2025-04-25 14:03:06 -07:00
martin-harmonic
6ce675ee81
fix performance issue in convert_ids_to_tokens (#37773) 2025-04-25 22:00:50 +02:00
saswatmeher
57c620bf8a
chore: update SigLIP2 model card (#37624)
* update siglip2 model card

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address comments

* separate naflex and fixres variant

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-25 12:46:17 -07:00
Minki Kim
eb4afdd1fb
[i18n-KO] Translated keypoint_detection.md to Korean (#36649)
* fix: manual edits

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/tasks/keypoint_detection.md

Anchor lower modify

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

connect letter

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify extension word

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual representation

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-25 12:24:12 -07:00
jiqing-feng
555693fbfa
fix mpt test of different outputs from cuda (#37691)
* fix mpt test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix mpt tests with Expectations

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix output

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-25 18:04:56 +02:00
Cyril Vallez
0cfbf9c95b
Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785)
* fix all main files

* fix test files

* oups forgot modular

* add link

* update message
2025-04-25 16:57:09 +02:00
Cyril Vallez
eefc86aa31
Fix tensor parallel with non-floating dtypes (#37790)
fix
2025-04-25 15:48:16 +02:00
co63oc
214062201e
Fix typos in strings and comments (#37784)
* Fix typos in strings and comments

* Fix
2025-04-25 13:47:25 +01:00
Cyril Vallez
ba3bd37253
Align gpt2 mask preparation to #37612 (#37787)
Update modeling_gpt2.py
2025-04-25 12:50:30 +02:00
Yih-Dar
50d231a806
unpin pytest<8 (#37768)
* pytest 8

* pytest 8

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-25 12:34:33 +02:00
Raushan Turganbay
79d4bc761d
[causal mask] fix preparation with multi-gpu (#37612)
* fix multi-gpu

* forgot non-copied models

* fixup
2025-04-25 09:34:18 +02:00
김가영
7bb619d710
🌐 [i18n-KO] Translated roberta.md to Korean (#37069)
* docs: ko: roberta.md

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-04-24 10:00:24 -07:00
AfafEL
cfe666919e
Update model card for Gemma (#37674)
* Update Gemma model card

* Updated after review

* Update following review
2025-04-24 09:58:46 -07:00
Mohamed Mekkouri
b2d70e9c49
Fix auto-round hfoption (#37759)
fix
2025-04-24 18:19:38 +02:00
lewtun
acdbe627e3
Guard DeepSpeed imports (#37755)
* Guard DeepSpeed imports

* Fix import

* Import deepspeed consistently

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 18:16:34 +02:00
Joao Gante
af6d2756d9
[deps] pin max torch version (#37760)
pin max pt version :(
2025-04-24 16:18:25 +01:00
co63oc
0302aa1c6e
Fix typos in comments (#37694)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-04-24 15:59:56 +01:00
Wing Lian
af000ceb92
Fix load of rng state for resuming training from checkpoint (#37162)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 16:55:34 +02:00
Cyril Vallez
0af0a5f969
Fix tied weight loading with TP and loading sub state_dicts (#37758)
Update modeling_utils.py
2025-04-24 16:47:40 +02:00
flashJd
3af24f7e27
Refine parameter type annotations (#37666) 2025-04-24 15:37:13 +01:00
Kaiwen
22e3da92b7
Fix wrong input shapes in doc-string of models (#37729)
* Fix wrong position_ids shape in doc

Supported by ClvpDecoder.forward, line 1212--1215:

src/transformers/models/clvp/modeling_clvp.py:
  1212	        if inputs_embeds is None:
  1213	            inputs_embeds = self.input_embeds_layer(input_ids)
  1214	        position_embeds = self.position_embeds_layer(position_ids)
  1215	        inputs_embeds = inputs_embeds + position_embeds

* Fix possibly wrong input_ids shape in doc

Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead.

* Fix possibly wrong inputs_embeds shape in doc

Supported by CTRLModel.forward, line 448--449:

src/transformers/models/ctrl/modeling_ctrl.py:
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix possibly wrong token_type_ids shape in doc

Supported by CTRLModel.forward, line 441--460:

src/transformers/models/ctrl/modeling_ctrl.py:
   441	        if token_type_ids is not None:
   442	            token_type_ids = token_type_ids.view(-1, input_shape[-1])
   443	            token_type_embeds = self.w(token_type_ids)
   444	            token_type_embeds *= np.sqrt(self.d_model_size)
   445	        else:
   446	            token_type_embeds = 0
   447
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)
   450	        # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded
   451	        seq_len = input_shape[-1]
   452	        mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device)
   453
   454	        inputs_embeds *= np.sqrt(self.d_model_size)
   455
   456	        # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually.
   457	        self.pos_encoding = self.pos_encoding.to(device)
   458	        pos_embeds = self.pos_encoding[position_ids, :]
   459
   460	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix possibly wrong position_ids shape in doc

Supported by CTRLModel.forward, line 448--460:

src/transformers/models/ctrl/modeling_ctrl.py:
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)
   450	        # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded
   451	        seq_len = input_shape[-1]
   452	        mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device)
   453
   454	        inputs_embeds *= np.sqrt(self.d_model_size)
   455
   456	        # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually.
   457	        self.pos_encoding = self.pos_encoding.to(device)
   458	        pos_embeds = self.pos_encoding[position_ids, :]
   459
   460	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix wrong token_type_ids shape in doc

Supported by TFCTRLMainLayer.call, line 376--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   376	        if token_type_ids is not None:
   377	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   378	            token_type_embeds = self.w(token_type_ids)
   379	            token_type_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, dtype=token_type_embeds.dtype))
   380	        else:
   381	            token_type_embeds = tf.constant(0.0)
   382	        position_ids = tf.reshape(position_ids, [-1, shape_list(position_ids)[-1]])
   383
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFCTRLMainLayer.call, line 384--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by TFCTRLMainLayer.call, line 384--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by ClvpDecoder.forward, line 1212--1213:

src/transformers/models/clvp/modeling_clvp.py:
  1212	        if inputs_embeds is None:
  1213	            inputs_embeds = self.input_embeds_layer(input_ids)

* Fix wrong position_ids shape in doc

Supported by FlaxGemmaPreTrainedModel.__call__, line 502--508:

src/transformers/models/gemma/modeling_flax_gemma.py:
   502	        batch_size, sequence_length = input_ids.shape
   503
   504	        if position_ids is None:
   505	            if past_key_values is not None:
   506	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   507
   508	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by FlaxGPT2PreTrainedModel.__call__, line 482--488:

src/transformers/models/gpt2/modeling_flax_gpt2.py:
   482	        batch_size, sequence_length = input_ids.shape
   483
   484	        if position_ids is None:
   485	            if past_key_values is not None:
   486	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   487
   488	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by GPT2Model.forward, line 918--921:

src/transformers/models/gpt2/modeling_gpt2.py:
   918	        if inputs_embeds is None:
   919	            inputs_embeds = self.wte(input_ids)
   920	        position_embeds = self.wpe(position_ids)
   921	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

* Fix wrong inputs_embeds shape in doc

Supported by GPT2Model.forward, line 918--919:

src/transformers/models/gpt2/modeling_gpt2.py:
   918	        if inputs_embeds is None:
   919	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPT2LMHeadModel.forward, line 1156--1157:

src/transformers/models/gpt2/modeling_gpt2.py:
  1156	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1157	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong labels shape in doc

Supported by GPT2DoubleHeadsModel.forward, line 1314--1315:

src/transformers/models/gpt2/modeling_gpt2.py:
  1314	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1315	            `labels = input_ids`. Indices are selected in `[-100, 0, ..., config.vocab_size - 1]`. All labels set to

* Fix wrong token_type_ids shape in doc

Supported by TFGPT2MainLayer.call, line 486--500:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)
   489
   490	        position_embeds = self.wpe(position_ids)
   491
   492	        if token_type_ids is not None:
   493	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   494	            token_type_embeds = self.wte(token_type_ids)
   495	        else:
   496	            token_type_embeds = tf.constant(0.0)
   497
   498	        position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype)
   499	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   500	        hidden_states = inputs_embeds + position_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFGPT2MainLayer.call, line 486--500:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)
   489
   490	        position_embeds = self.wpe(position_ids)
   491
   492	        if token_type_ids is not None:
   493	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   494	            token_type_embeds = self.wte(token_type_ids)
   495	        else:
   496	            token_type_embeds = tf.constant(0.0)
   497
   498	        position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype)
   499	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   500	        hidden_states = inputs_embeds + position_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by TFGPT2MainLayer.call, line 486--488:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)

* Fix wrong position_ids shape in doc

Supported by GPTBigCodeModel.forward, line 962--965:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
   962	        if inputs_embeds is None:
   963	            inputs_embeds = self.wte(input_ids)
   964	        position_embeds = self.wpe(position_ids)
   965	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

* Fix wrong inputs_embeds shape in doc

Supported by GPTBigCodeModel.forward, line 962--963:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
   962	        if inputs_embeds is None:
   963	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPTBigCodeForCausalLM.forward, line 1158--1159:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
  1158	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1159	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong position_ids shape in doc

Supported by FlaxGPTNeoModule.__call__, line 549--552:

src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py:
   549	        input_embeds = self.wte(input_ids.astype("i4"))
   550	        position_embeds = self.wpe(position_ids.astype("i4"))
   551
   552	        hidden_states = input_embeds + position_embeds

* Fix wrong position_ids shape in doc

Supported by GPTNeoModel.forward, line 685--720:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   685	        if inputs_embeds is None:
   686	            inputs_embeds = self.wte(input_ids)
   687
   688	        # kept for BC (non `Cache` `past_key_values` inputs)
   689	        return_legacy_cache = False
   690	        if use_cache and not isinstance(past_key_values, Cache):
   691	            return_legacy_cache = True
   692	            if past_key_values is None:
   693	                past_key_values = DynamicCache()
   694	            else:
   695	                past_key_values = DynamicCache.from_legacy_cache(past_key_values)
   696	                logger.warning_once(
   697	                    "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
   698	                    "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class "
   699	                    "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)"
   700	                )
   701
   702	        seq_length = inputs_embeds.shape[1]
   703	        if cache_position is None:
   704	            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
   705	            cache_position = torch.arange(past_seen_tokens, past_seen_tokens + seq_length, device=inputs_embeds.device)
   706
   707	        if position_ids is None:
   708	            position_ids = cache_position.unsqueeze(0)
   709
   710	        causal_mask = self._update_causal_mask(
   711	            attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
   712	        )
   713
   714	        # Prepare head mask if needed
   715	        # 1.0 in head_mask indicate we keep the head
   716	        # attention_probs has shape bsz x num_heads x N x N
   717	        # head_mask has shape n_layer x batch x num_heads x N x N
   718	        head_mask = self.get_head_mask(head_mask, self.config.num_layers)
   719	        position_embeds = self.wpe(position_ids)
   720	        hidden_states = inputs_embeds + position_embeds

* Fix wrong inputs_embeds shape in doc

Supported by GPTNeoModel.forward, line 685--686:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   685	        if inputs_embeds is None:
   686	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPTNeoForCausalLM.forward, line 968--969:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   968	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   969	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong position_ids shape in doc

Supported by FlaxGPTJPreTrainedModel.__call__, line 455--461:

src/transformers/models/gptj/modeling_flax_gptj.py:
   455	        batch_size, sequence_length = input_ids.shape
   456
   457	        if position_ids is None:
   458	            if past_key_values is not None:
   459	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   460
   461	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong token_type_ids shape in doc

Supported by TFGPTJMainLayer.call, line 482--493:

src/transformers/models/gptj/modeling_tf_gptj.py:
   482	        if inputs_embeds is None:
   483	            check_embeddings_within_bounds(input_ids, self.wte.vocab_size)
   484	            inputs_embeds = self.wte(input_ids, mode="embedding")
   485
   486	        if token_type_ids is not None:
   487	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   488	            token_type_embeds = self.wte(token_type_ids, mode="embedding")
   489	        else:
   490	            token_type_embeds = tf.constant(0.0)
   491
   492	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   493	        hidden_states = inputs_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFGPTJMainLayer.call, line 434--449:

src/transformers/models/gptj/modeling_tf_gptj.py:
   434	        elif input_ids is not None:
   435	            input_shape = shape_list(input_ids)
   436	            input_ids = tf.reshape(input_ids, [-1, input_shape[-1]])
   437	        elif inputs_embeds is not None:
   438	            input_shape = shape_list(inputs_embeds)[:-1]
   439	        else:
   440	            raise ValueError("You have to specify either input_ids or inputs_embeds")
   441
   442	        if past_key_values is None:
   443	            past_length = 0
   444	            past_key_values = [None] * len(self.h)
   445	        else:
   446	            past_length = shape_list(past_key_values[0][0])[-2]
   447
   448	        if position_ids is None:
   449	            position_ids = tf.expand_dims(tf.range(past_length, input_shape[-1] + past_length), axis=0)

* Fix wrong inputs_embeds shape in doc

Supported by TFGPTJMainLayer.call, line 482--484:

src/transformers/models/gptj/modeling_tf_gptj.py:
   482	        if inputs_embeds is None:
   483	            check_embeddings_within_bounds(input_ids, self.wte.vocab_size)
   484	            inputs_embeds = self.wte(input_ids, mode="embedding")

* Fix wrong labels shape in doc

Supported by TFGPTJForCausalLM.call, line 812--813:

src/transformers/models/gptj/modeling_tf_gptj.py:
   812	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   813	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix possibly wrong input_ids shape in doc

Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead.

* Fix possibly wrong token_type_ids shape in doc

Supported by ImageGPTModel.forward, line 773--780:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)
   775	        position_embeds = self.wpe(position_ids)
   776	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)
   777
   778	        if token_type_ids is not None:
   779	            token_type_embeds = self.wte(token_type_ids)
   780	            hidden_states = hidden_states + token_type_embeds

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong position_ids shape in doc

Supported by ImageGPTModel.forward, line 773--776:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)
   775	        position_embeds = self.wpe(position_ids)
   776	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong inputs_embeds shape in doc

Supported by ImageGPTModel.forward, line 773--774:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong labels shape in doc

Supported by ImageGPTForCausalImageModeling.forward, line 923--924:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   923	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   924	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong labels shape in doc

Supported by ImageGPTModel.forward, line 665--666:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   665	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   666	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix wrong position_ids shape in doc

Supported by FlaxLlamaPreTrainedModel.__call__, line 484--490:

src/transformers/models/llama/modeling_flax_llama.py:
   484	        batch_size, sequence_length = input_ids.shape
   485
   486	        if position_ids is None:
   487	            if past_key_values is not None:
   488	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   489
   490	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by FlaxMistralPreTrainedModel.__call__, line 478--484:

src/transformers/models/mistral/modeling_flax_mistral.py:
   478	        batch_size, sequence_length = input_ids.shape
   479
   480	        if position_ids is None:
   481	            if past_key_values is not None:
   482	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   483
   484	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))
2025-04-24 15:36:03 +01:00
Joao Gante
4d64c38593
[generate] fix default autocompile case on gpu (#37756) 2025-04-24 15:08:38 +01:00
robert
43bb4c0456
Fix qwen2_5 get_rope_index tensor device locations (#37597)
* Fix qwen2_5 get_rope_index tensor device locations

* simpler fix

* edit right file for modular model

* add a test

* try normalizing type to fix non-video

* fix some imports

* add a video forward test with dummy input
2025-04-24 16:04:38 +02:00
Prem Kumar M
dd2649fa98
updated hidden_features for FlaxDinov2SwiGLUFFN in Dinov2 (#37747)
Flax Dinov2: updated hidden_features in FlaxDinov2SwiGLUFFN

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-24 14:30:31 +01:00
Joao Gante
8bdd4f2acd
[generate] skip compilation on cpu offload (#37709)
* skip compilation on cpu offload

* add test

* better logic

* docstring

* boolean logic

* add disk offload check

* warn users if compilation options are set but compilation doesn happen

* fix test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 14:08:17 +01:00
Poedator
7c62e69326
GPT2Model StaticCache support (#35761)
* initial GPT2 changes

* causal_mask support

* return_legacy_cache

* cleanup

* fix1

* outputs shape fixes

* gpt2 return fix

* pkv, attn fixes

* fix dual_head

* is_causal arg fix

* decision transformer updated

* style fix

* batch_size from inputs_embeds

* DecisionTransformerModel fixes

* cross-attn support + cache warning

* x-attn @decision

* EDCache proper init

* simplified logic in `if use_cache:` for GPT2Model

* @deprecate_kwarg for DecisionTr attn fwd

* @deprecate_kwarg in gpt2

* deprecation version updated to 4.51

* kwargs in gradient_checkpointing_fn

* rename next_cache to past_key_values

* attention_mask prep

* +cache_position in GPT2DoubleHeadsModel

* undo kwargs in gradient checkpointing

* moved up `if self.gradient_checkpointing`

* consistency in decision_transformer

* pastkv, cache_pos in grad_checkpt args

* rm _reorder_cache

* output_attentions streamlined

* decision_transformer consistency

* return_legacy_cache improved

* ClvpForCausalLM used for legacy cache test now

* is_causal fixed

* attn_output cleanup

* consistency @ decision_transformer

* Updated deprecation notice version to 4.52

* upd deprecation

* consistent legacy cache code in decision transformers\

* next_cache -> past_kv in decision_tr

* cache support flags in decision_transf

* rm legacy cache warning

* consistency in cache init for decision transf

* no Static Cache for Decision Transformer

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-24 14:46:35 +02:00
Joao Gante
9f927c8250
[cache] fix HybridCache init when device is passed (#37718)
fix device init
2025-04-24 13:36:52 +01:00