Commit Graph

87 Commits

Author SHA1 Message Date
root
acf1484b86 fix review comments
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>
2025-03-17 19:52:19 -07:00
Gar
9d2056f12b
Add mean_resizing for every VLMs' resizing_token_embeddings() (#35717)
* refine all resize_token_embedding()

* ruff format

* hotfix
2025-02-03 15:03:49 +01:00
Lysandre Debut
b2f2977533
Applies the rest of the init refactor except to modular files (#35238)
* [test_all] Applies the rest of the init refactor except to modular files

* Revert modular that doesn't work

* [test_all] TFGPT2Tokenizer
2025-01-05 18:30:08 +01:00
Joao Gante
37ac078535
Generate: move prepare_inputs_for_generation in encoder-decoder llms (#34048) 2024-10-11 16:11:18 +01:00
Joao Gante
e15687fffe
Generation: deprecate PreTrainedModel inheriting from GenerationMixin (#33203) 2024-09-23 18:28:36 +01:00
Arthur
673440d073
update ruff version (#30932)
* update ruff version

* fix research projects

* Empty

* Fix errors

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Lysandre Debut
297b732bdf
Removal of deprecated maps (#30576)
* [test_all] Remove all imports

Remove remaining ARCHIVE MAPS

Remove remaining PRETRAINED maps

* review comments

* [test_all] empty commit to trigger tests
2024-05-09 14:15:56 +02:00
Lysandre Debut
39114c0383
Remove static pretrained maps from the library's internals (#29112)
* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-25 10:33:38 +01:00
nakranivaibhav
5d29530ea2
Improved type hinting for all attention parameters (#28479)
* Changed type hinting for all attention inputs to 'Optional[Tuple[torch.FloatTensor,...]] = None'

* Fixed the ruff formatting issue

* fixed type hinting for all hidden_states to 'Optional[Tuple[torch.FloatTensor, ...]] = None'

* Changed type hinting in these 12 scripts modeling_dpr.py,modeling_nat.py,idefics/vision.py,modeling_tf_dpr.py,modeling_luke.py,modeling_swin.py,modeling_tf_swin.py,modeling_blip.py,modeling_tf_blip.py,modeling_donut_swin.py,modeling_dinat.py,modeling_swinv2.py

* test fail update

* fixed type hinting for these 15 scripts modeling_xlnet.py,modeling_tf_xlnet.py,modeling_led.py,modeling_tf_led.py,modleing_rwkv.py,modeling_dpt.py,modeling_tf_cvt.py,modeling_clip.py,modeling_flax_clip.py,modeling_tf_clip.py,modeling_longformer.py,modeling_tf_longformer.py,modeling_siglip.py,modeling_clap.py,modeling_git.py

* Changed type hinting in these 12 scripts modeling_dpr.py,modeling_nat.py,idefics/vision.py,modeling_tf_dpr.py,modeling_luke.py,modeling_swin.py,modeling_tf_swin.py,modeling_blip.py,modeling_tf_blip.py,modeling_donut_swin.py,modeling_dinat.py,modeling_swinv2.py

* test fail update

* Removed the myvenv file

* Fixed type hinting for these 8 scripts modeling_tvlt.py,modeling_sam.py,modeling_tf_sam.py,modeling_tvp.py,modeling_rag.py,modeling_tf_rag.py,modeling_tf_xlm.py,modeling_xlm.py
2024-01-24 16:47:34 +00:00
Joao Gante
7e93ce40c5
Fix input_embeds docstring in encoder-decoder architectures (#28168) 2023-12-21 11:01:54 +00:00
YQ
e49c385266
use logger.warning_once to avoid massive outputs (#27428)
* use logger.warning_once to avoid massive outputs when training/finetuning longformer

* update more
2023-12-11 11:59:29 +00:00
Patrick von Platen
ac5893756b
[Attention Mask] Refactor all encoder-decoder attention mask (#27086)
* [FA2 Bart] Add FA2 to all Bart-like

* better

* Refactor attention mask

* remove all customized atteniton logic

* format

* mass rename

* replace _expand_mask

* replace _expand_mask

* mass rename

* add pt files

* mass replace & rename

* mass replace & rename

* mass replace & rename

* mass replace & rename

* Update src/transformers/models/idefics/modeling_idefics.py

* fix more

* clean more

* fix more

* make style

* fix again

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* small fix mistral

* finish

* finish

* finish

* finish

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-27 16:42:01 +02:00
Younes Belkada
ffff9e70ab
[core/ gradient_checkpointing] Refactor GC - part 2 (#27073)
* fix

* more fixes

* fix other models

* fix long t5

* use `gradient_checkpointing_func` instead

* fix copies

* set `gradient_checkpointing_func` as a private attribute and retrieve previous behaviour

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* replace it with `is_gradient_checkpointing_set`

* remove default

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-27 16:15:22 +02:00
Younes Belkada
06e782da4e
[core] Refactor of gradient_checkpointing (#27020)
* v1

* fix

* remove `create_custom_forward`

* fixup

* fixup

* add test and fix all failing GC tests

* remove all remaining `create_custom_forward` methods

* fix idefics bug

* fixup

* replace with `__call__`

* add comment

* quality
2023-10-25 12:16:15 +02:00
Dong-Yong Lee
8881f38a4f
Fix beam search when using model parallel (#24969)
* Fix GPTNeoX beam search when using parallelize

* Fix beam search idx device when using model parallel

* remove onnx related stuff

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin

* fix: add right item to _no_split_modules of MegaPreTrainedModel

* fix: add num_beams within parallelized beam_search test

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-14 11:00:52 -04:00
Arthur
d6bf08f7f6
[resize_embedding] Introduce pad_to_multiple_of and guidance (#25088)
* fix

* revert cahnges and update resizing of embedding layer

* use wraning

* fixup

* more styling nits

* fix all tests that overload the embedding tests

* 👀👀 remove breakpoint

* remove useless overload + overload correctly where needed

* resize lm head with new vocab size

* reverse not necessary changes

* style

* fix CIs!

* fix last CI tests, adapt bark and Marian

* fixup
2023-08-17 17:00:32 +02:00
Liyang90
1f6f32c243
Removing unnecessary device=device in modeling_llama.py (#24696)
* Update modeling_llama.py

Removing unnecessary `device=device`

* fix in all occurrences of _make_causal_mask
2023-07-13 10:30:22 +01:00
MS Kim(tony9402)
232c898f9f
Fix annotations (#24582)
* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations
2023-06-29 14:17:35 -04:00
Sylvain Gugger
8e5d1619b3
Clean load keys (#24505)
* Preliminary work on some models

* Fix test load missing and make sure nonpersistent buffers are tested

* Always ignore nonpersistent buffers if in state_dict

* Treat models

* More models

* Treat remaining models

* Fix quality

* Fix tests

* Remove draft

* This test is not needed anymore

* Fix copies

* Fix last test

* Newly added models

* Fix last tests

* Address review comments
2023-06-27 14:45:40 -04:00
Yih-Dar
850cf4af0c
Compute dropout_probability only in training mode (#24486)
* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-26 18:36:47 +02:00
Bowen Bao
a28325e25e
Replace python random with torch.rand to enable dynamo.export (#24434)
* Replace python random with torch.rand to enable dynamo.export

* revert changes to flax model code

* Remove unused random import

* Fix torch template

* Move torch.manual_seed(0) to right location
2023-06-23 08:17:21 -04:00
Younes Belkada
3ce3385c47
Revert "Fix gradient checkpointing + fp16 autocast for most models" (#24420)
Revert "Fix gradient checkpointing + fp16 autocast for most models (#24247)"

This reverts commit 285a48011d.
2023-06-22 16:11:27 +02:00
Younes Belkada
285a48011d
Fix gradient checkpointing + fp16 autocast for most models (#24247)
* fix gc bug

* continue PoC on OPT

* fixes

* 🤯

* fix tests

* remove pytest.mark

* fixup

* forward contrib credits from discussions

* forward contrib credits from discussions

* reverting changes on untouched files.

---------

Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com>
Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>
2023-06-21 17:04:59 +02:00
Sylvain Gugger
695928e1e5
Tied params cleanup (#24211)
* First test

* Add info for all models

* style

* Repo consistency

* Fix last model and cleanup prints

* Repo consistency

* Use consistent function for detecting tied weights
2023-06-13 11:38:39 -04:00
Karim Foda
7e6dd664e8
Fix gradient checkpointing bug LED (#21840)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-02 15:40:35 +00:00
Stas Bekman
c7f3abc257
introduce logger.warning_once and use it for grad checkpointing code (#21804)
* logger.warning_once

* style
2023-02-27 13:25:06 -08:00
Sylvain Gugger
04b2f13c37
🚨🚨🚨 Enforce single model initialization (#21431)
* Enforce single model initialization

* Add OneFormer example for problem 3

* Do it the Stas way

* Actually rename the uses...

* Rewrite test

* Try to change the test this way

* Fix all init slow/fast tests

* Break connection

* Fix more tests

* Fix test for initialization

* Remove custom test

* Quality

* Fix last failing tests

* The end?
2023-02-09 15:46:26 -05:00
Arthur
12eb528b5a
[CI ] Remove past in favor of pat_key_values (#21443)
* fix past renamed to past_key_value

* update more `past`that were ski^êd

* fixup

* remove changes made to rag

* refactor `_reorder_cache` to use `past_key_values`

* fix git `prepare_inputs_for_generation` to pass tests when false is needed in use_cache
2023-02-07 09:51:35 +01:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting (#21480)
* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies
2023-02-06 18:10:56 -05:00
Yih-Dar
f726d53ea3
Remove more unused attributes in config classes (#21392)
* * Remove unused type_vocab_size

* Remove unused initializer_factor

* Remove unused n_embd

* Remove unused scale_embedding

* Remove unused scale_attn_weights

* fix

* fix

* Remove unused head_hidden_scale

* Remove unused activation_dropout

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-03 13:41:15 +01:00
Sylvain Gugger
fd5cdaeea6
Models docstring (#21225)
* Clean all models

* Style

* Last to remove

* address review comments

* Address review comments
2023-01-23 14:33:18 -05:00
Arthur
f0577df6de
Replace past with past_key_values (#20944)
* start cleanup

* more updates

* more models are affected

* more updates

* update generation utils

* style

* revert change that removed reorder cachce

* update generation utils

* style

* style

* remove reorder cache
2023-01-08 10:21:40 +01:00
Yih-Dar
e3cc4487fe
Fix CIs for PyTorch 1.13 (#20686)
* fix 1

* fix 2

* fix 3

* fix 4

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-12-08 18:51:54 +01:00
fxmarty
993a187c6f
fix device in longformer onnx path (#20419) 2022-11-23 15:07:01 -05:00
fxmarty
3d0c0ae437
Fix longformer onnx broken export (#20292)
* fix controlflow for onnx export

* fix warning

* fix the case padding_len = 0, explicit the recorded control flows

* style

* style

* fix bug

* fix copy

* nits
2022-11-22 11:07:19 -05:00
Nicolas Patry
bac2d29a80
Attempting to test automatically the _keys_to_ignore. (#20042)
* Attempting to test automatically the `_keys_to_ignore`.

* Style.

* First fix pass.

* Moving test on its own.

* Another batch.

* Second round removing BatchNorm

* Fixing layoutlmv{2,3} + support older Python.

* Disable miss missing warning.

* Removing dodgy additions.

* Big pass.

* mbart.

* More corrections.

* Fixup.

* Updating test_correct_missing_keys

* Add escape hatch for when the head has no extra params so doesn't need

the missing keys check.

* Fixing test.

* Greener.

* Green ! (except for weird splinter bug).

* Adding a test about `named_parameters` usage.

* Shorten message.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* After rebase modifications.

* More explicit condition checking.

* Fixing slow tests issues.

* Remove extra pdb.

* Remove print.

* Attempt to make failure consistent + fixing roc_bert.

* Removing the seed  (all tests passing with it).

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-09 16:03:36 +01:00
Arthur
737bff6a36
[FuturWarning] Add futur warning for LEDForSequenceClassification (#19066)
* fix led eos_mask

* add Futur Warning

* revert uselesss cahnges

* Update src/transformers/models/led/modeling_led.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-03 15:26:09 +01:00
Arthur
3e07196f89
check decoder_inputs_embeds is None before shifting labels (#19671) 2022-10-18 09:14:12 +02:00
Patrick Deutschmann
3223d49354
Add ONNX support for Longformer (#17176)
* Implement ONNX support for Longformer

Fix repo consistency check complaints

Fix value mismatches

Add pooler output for default model

Increase validation atol to accommodate multiple-choice error

Fix copies

Fix chunking for longer sequence lengths

Add future comment

* Fix issue in mask_invalid_locations

* Remove torch imports in configuration_longformer

* Change config access to fix LED

* Push opset version to support tril

* Work in review comments (mostly style)

* Add Longformer to ONNX tests
2022-08-25 08:34:42 +02:00
Yih-Dar
bd6d1b4300
Add a check regarding the number of occurrences of ``` (#18389)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-01 14:23:02 +02:00
Yih-Dar
d3cb28886a
Not use -1e4 as attn mask (#17306)
* Use torch.finfo(self.dtype).min

* for GPTNeoX

* for Albert

* For Splinter

* Update src/transformers/models/data2vec/modeling_data2vec_audio.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix -inf used in Bart-like models

* Fix a few remaining -inf

* more fix

* clean up

* For CLIP

* For FSMT

* clean up

* fix test

* Add dtype argument and use it for LayoutLMv3

* update FlaxLongT5Attention

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-06-20 16:16:16 +02:00
Stas Bekman
66f893320c
normalize keys_to_ignore (#17722) 2022-06-15 11:59:11 -07:00
Cesare Campagnano
d9050dc768
[LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (#17112)
* [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing

* LED docs clarification

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] gradient_checkpointing=True should be passed to TrainingArguments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] docs: remove wrong word

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] docs fix typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-05-17 23:44:37 +02:00
Sylvain Gugger
afe5d42d8d
Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
Manuel R. Ciosici
c76afa511c
Fix LED documentation (#17181)
* Fix markdown code block

* Use consistent spelling for self-attention

* Fix typos and phrasing

* Fix code style
2022-05-11 13:17:50 -04:00
Anmol Joshi
cc034f72eb
Replace assertion with exception (#16720)
* Updated assertions to exceptions

* updated assertions to exceptions

* bug fixes

* fix-copies

* Update modeling_ctrl.py

* Update src/transformers/models/ctrl/modeling_tf_ctrl.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_tf_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update modeling_led.py

* Update modeling_led.py

* Update modeling_led.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-12 11:47:01 -04:00
Anmol Joshi
8ac9b82724
Added Annotations for PyTorch models (#16619)
* Update modeling_mpnet.py

* Update modeling_ctrl.py

* formatting

* Formatting

* Formatting

* annotated FSMT

* Added annotations for LED

* Added Annotations for M2M

* Added annotations for nystromformer

* Added annotations for OpenAI

* Added annotations for RAG

* Removed unused imports

* fix isort errors

* Removed inputs_embeds docstring, corrected original

* flake8 fixes

* doc-builder fixes
2022-04-06 14:12:01 -04:00
Sylvain Gugger
088c1880b7
Big file_utils cleanup (#16396)
* Big file_utils cleanup

* This one still needs to be treated separately
2022-03-25 07:25:20 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Lysandre Debut
7732d0fe7a
Upgrade black to version ~=22.0 (#15565)
* Upgrade black to version ~=22.0

* Check copies

* Fix code
2022-02-09 09:28:57 -05:00