Commit Graph

1582 Commits

Author SHA1 Message Date
Raushan Turganbay
612065efeb
Paligemma: fix static cache test (#33941)
* fix

* not flaky anymore + style
2024-10-05 09:47:37 +02:00
Joao Gante
38f9f10dd9
Cache: revert DynamicCache init for BC (#33861)
* tmp commit

* tmp commit

* make fixup

* missing removal

* fix condition

* fix end-to-end compilation

* if -> elif

* BC

* BC

* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

* wups the import

* 🥴

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-04 22:47:08 +02:00
Arthur
f92d354823
fix red check-copies (#33964) 2024-10-04 22:45:37 +02:00
pglorio
f319ba16fa
Add Zamba (#30950)
* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* Moved mamba init into `_init_weights`

* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Moved mamba init into `_init_weights`

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* make fixup fixes

* quality test fixes

* Fix Zamba model path

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* Update

* circleci fixes

* fix zamba test from merge

* fix ValueError for disabling mamba kernels

* add HF copyright

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* shared_transf --> shared_transformer

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixes

* Move attention head dim to config

* Fix circle/ci tests

* Update modeling_zamba.py

* apply GenerationMixin inheritance change from upstream

* apply import ordering

* update needed transformers version for zamba

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add contribution author

* add @slow to avoid CI

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Define attention_hidden_size

* Added doc for attention_head_size

* trigger CI

* Fix doc of attention_hidden_size

* [run-slow] zamba

* Fixed shared layer logic, swapped up<->gate in mlp

* shared_transformer -> shared_transf

* reformat HybridLayer __init__

* fix docstrings in zamba config

* added definition of _get_input_ids_and_config

* fixed formatting of _get_input_ids_and_config

---------

Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
2024-10-04 22:28:05 +02:00
Amit Garg
e3775539c8
PhiMoE (#33363)
* onboard phimoe model

* removed debug code

* added unit tests

* updated docs

* formatted

* fixed unit tests

* fixed test case

* fixed format

* refactored code

* fixed expected outputs in the integration tests

* Added a warning msg

* Addressed comments

* Addressed comments

* fixed test cases

* added paper link

* Addressed comments

* Refactored PhimoeForCausalLM forward fn

* Refactored PhimoeRotaryEmbedding class

* fixed test cases

* fixed testcase

* fixed test case

* Addressed comments

* fixed test cases

* fixed testcases

* Used cache position instead to get the seq len
2024-10-04 21:39:45 +02:00
Longjie Zheng
0d1692a49b
Fix attn mask ignore logic in training-time trace (#32613)
* fix attn mask logic for training-time trace

* add test

* fix

* fix

* fix

* fix

* fix

* format

* [run-slow] llama

* avoid accelearate

* [run-slow] llama
2024-10-04 19:00:45 +02:00
Yoach Lacombe
124713c32b
Fix distil whisper segment computation (#33920)
* Fix distil whisper segment computation

* [run-slow] whisper
2024-10-04 11:18:01 +02:00
Yoni Gozlan
074aa3b3fd
Uniformize kwargs for Idefics/2 processors (#32568)
* Add uniformize idefics processor kwargs and tests

* Uniformize idefics2 processor kwargs

* add image_processor tests idefics

* add BC args order change idefics2 processor and update doc

* Add support for multiple images per prompt in image-text-to-text mode idefics

* Fix processor input args in idefics tests

* improve test processing common, remove unnecessary tests, update process uniformization

* fix doctrings idefics

* fix tests processors idefics/2
2024-10-03 18:08:24 +02:00
Yoach Lacombe
bf0ffe3d29
[Tests] Diverse Whisper fixes (#33665)
* fix beam indices in token_timestamps

* fix attention_mask in FA2

* correct translation example with the right example

* correct how somes tests are using outputs + correct num_frames

* fix shortform batch prev cond tests

* make fix-copies

* make fix-copies

* take care of shifting beam indices

* [run-slow] whisper

* [run-slow] whisper
2024-10-03 15:59:01 +02:00
Joao Gante
d29738f5b4
Generate tests: modality-agnostic input preparation (#33685) 2024-10-03 14:01:24 +01:00
Arie Pratama Sutiono
f2bf4fcf3d
Add SplinterTokenizer unit test (#32652)
* add unit tests for splinter_tokenizer

* add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called

* remove unused import

* remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines

* remove all the spaces

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-03 14:49:56 +02:00
Yoni Gozlan
d7950bff82
uniformize processor Mllama (#33876)
* uniformize processor Mllama

* nit syntax

* nit
2024-10-02 16:50:15 +02:00
Yoni Gozlan
62e8c759c3
rename all test_processing_*.py to test_processor_*.py (#33878)
* rename all test_processing_*.py to test_processor_*.py ans fix duplicate test processor paligemma

* fix copies

* fix broken tests

* fix-copies

* fix test processor bridgetower
2024-10-02 16:43:43 +02:00
Pablo Montalvo
50290cf7a0
Uniformize model processors (#31368)
* add initial design for uniform processors + align model

* add uniform processors for altclip + chinese_clip

* add uniform processors for blip + blip2

* fix mutable default 👀

* add configuration test

* handle structured kwargs w defaults + add test

* protect torch-specific test

* fix style

* fix

* rebase

* update processor to generic kwargs + test

* fix style

* add sensible kwargs merge

* update test

* fix assertEqual

* move kwargs merging to processing common

* rework kwargs for type hinting

* just get Unpack from extensions

* run-slow[align]

* handle kwargs passed as nested dict

* add from_pretrained test for nested kwargs handling

* [run-slow]align

* update documentation + imports

* update audio inputs

* protect audio types, silly

* try removing imports

* make things simpler

* simplerer

* move out kwargs test to common mixin

* [run-slow]align

* skip tests for old processors

* [run-slow]align, clip

* !$#@!! protect imports, darn it

* [run-slow]align, clip

* [run-slow]align, clip

* update common processor testing

* add altclip

* add chinese_clip

* add pad_size

* [run-slow]align, clip, chinese_clip, altclip

* remove duplicated tests

* fix

* add blip, blip2, bridgetower

Added tests for bridgetower which override common. Also modified common
tests to force center cropping if existing

* fix

* update doc

* improve documentation for default values

* add model_max_length testing

This parameter depends on tokenizers received.

* Raise if kwargs are specified in two places

* fix

* removed copied from

* match defaults

* force padding

* fix tokenizer test

* clean defaults

* move tests to common

* add missing import

* fix

* adapt bridgetower tests to shortest edge

* uniformize donut processor + tests

* add wav2vec2

* extend common testing to audio processors

* add testing + bert version

* propagate common kwargs to different modalities

* BC order of arguments

* check py version

* revert kwargs merging

* add draft overlap test

* update

* fix blip2 and wav2vec due to updates

* fix copies

* ensure overlapping kwargs do not disappear

* replace .pop by .get to handle duplicated kwargs

* fix copies

* fix missing import

* add clearly wav2vec2_bert to uniformized models

* fix copies

* increase number of features

* fix style

* [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* fix concatenation

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* Update tests/test_processing_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* 🧹

* address comments

* clean up + tests

* [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-02 10:41:08 +02:00
Yoni Gozlan
61ac161a9d
Add support for custom inputs and batched inputs in ProcessorTesterMixin (#33711)
* add support for custom inputs and batched inputs in ProcessorTesterMixin

* Fix batch_size behavior ProcessorTesterMixin

* Change format prepare inputs batched

* Remove override test pixtral processor

* Remove unnecessary tests and cleanup after new prepare_inputs functions

* Fix instructBlipVideo image processor
2024-10-01 23:52:03 +02:00
Prakarsh Kaushik
68a2b50069
[Fix] ViViT interpolate_pos_encoding (#33815)
* fix:test_inference_interpolate_pos_encoding

* style:make style;make fixup

* test: add suggestion to test_modeling_vivit

* chore:add suggestions

* style:make style

* [run_slow] vivit

* ci:slow test fix

* [run_slow] vivit
2024-10-01 20:14:35 +01:00
Adibvafa Fallahpour
c269c5c74d
Fix Mamba slow path bug with dtype mismatch. (#32691)
* Fix Mamba slow path bug with dtype mismatch.

* Update test_modeling_mamba.py

* Improve style.

* Fix issue with cache position of dtype mismatch test.

* Change test for slow path.

* Revert changes.

* Switch to buggy code and add test to catch it.

* Fix the dtype mismatch bug and add test code to verify it.

* Fix minor bug with test.

* Fix incorrect dtype of model output.

* Fix incorrect dtype of cache.

* Fix incorrect dtype of ssm cache.

* Fix incorrect dtype of conv state.

* Remove assertion for ssm state.

* Add assertion for conv state dtype.

* Fix all issues with dtype mismatch test.
2024-10-01 09:28:40 +02:00
Joshua Lochner
18c5b216f1
Fix ViT-MAE decoder interpolate (#33330)
* Fix ViT-MAE decoder interpolate

* Add unit test for `interpolate_pos_encoding` w/ custom sizes

* [run_slow] vit_mae
2024-09-30 18:47:13 +02:00
Raushan Turganbay
3e039d3827
Paligemma support for multi-image (#33447)
* upadte

* Update src/transformers/models/paligemma/processing_paligemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update docs

* better example in tests

* support image tokens

* read token

* Update tests/models/paligemma/test_processing_paligemma.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* nit: naming

* Update docs/source/en/model_doc/paligemma.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* conflicts after rebasing

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-09-27 11:23:14 +02:00
Ita Zaporozhets
6730485b02
clean_up_tokenization_spaces=False if unset (#31938)
* clean_up_tokenization_spaces=False if unset

* deprecate warning

* updating param for old models

* update models

* make fix-copies

* fix-copies and update bert models

* warning msg

* update prophet and clvp

* updating test since space before is arbitrarily removed

* remove warning for 4.45
2024-09-26 19:38:20 +02:00
Arthur
46841d3eb2
[MllamaProcessor] Update errors and API with multiple image (#33715)
* update error

* update and add a test

* update

* update
2024-09-26 16:33:25 +02:00
Franz Louis Cesista
0a21381ba3
Uniformize kwargs for chameleon processor (#32181)
* uniformize kwargs of Chameleon

* fix linter nit

* rm stride default

* add tests for chameleon processor

* fix tests

* add comment on get_component

* rm Chameleon's slow tokenizer

* add check order images text + nit

* update docs and tests

* Fix LlamaTokenizer tests

* fix gated repo access

* fix wrong import

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2024-09-26 10:18:07 -04:00
Andrés Marafioti
f2c388e3f9
Add Idefics 3! (#32473)
* Add Idefics 3!

* fixes to make both pipelines identical

* fix for quantized models

* First pass at the review

* remove vocab size from the main config (it's still in the text_config)

* hot fix for merve

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* re-add model_type for text_config

* remove support for old_cache

* remove hidden_size from main config

* rename idefics3 HF repo

* few changes suggested in the PR

* fix to input_data_format computation

* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion

* improve example

* few improvements from amy's review

* big change to enable processing input images as numpy arrays

* Changes to the code to uniformize processor kwargs

* image processing tests

* image processing tests fixes and some bugs they discovered

* addressed review comments from Yoni

* fix modeling tests

* remove special tokens that are not special

* fixes tests

* skip failing tests - they also fail for idefics2

* added paper and readded the tests with multi gpu, who knows

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* review amy until image_processing_idefics3

* last comments from Amy

* review amy

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* doc improvement - amy review

* fix runtime error during fine-tuning

* amy's review

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ruff

* amy's comment on the order

* ruff ruff

* fix copies

* square images when they are not splitted

* ruff :(

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics3/test_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix small bug introduced in refactor

* amy's image processing changes

* fixes peft tests and ruff

* modify to_pil_image from transformers. and review from emanuele.

* add modified to_pil_image

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-25 21:28:49 +02:00
Manuel
a55adee890
adding positional encoder changes and tests (#32600)
* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding right pretrained model

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_image_segmentation

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding ruff formatting

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding new interpolate_pos_encoding function

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing interpolate_POS funciton

* adapting output tensor in teests

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* modifying output tensor

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding the correct tensor

* [run_slow]  clipseg

* fixing spaces

* [run_slow]  clipseg

* [run_slow]  clipseg

---------

Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
2024-09-25 19:05:01 +01:00
Arthur
19d58d31f1
Add MLLama (#33703)
* current changes

* nit

* Add cross_attenttion_mask to processor

* multi-image fixed

* Add cross_attenttion_mask to processor

* cross attn works in all cases

* WIP refactoring function for image processor

* WIP refactoring image processor functions

* Refactor preprocess to use global loops instead of list nested list comps

* Docstrings

* Add channels unification

* fix dtype issues

* Update docsrings and format

* Consistent max_image_tiles

* current script

* updates

* Add convert to rgb

* Add image processor tests

* updates!

* update

* god damn it I am dumb sometimes

* Precompute aspect ratios

* now this works, full match

* fix 😉

* nits

* style

* fix model and conversion

* nit

* nit

* kinda works

* hack for sdpa non-contiguous bias

* nits here and there

* latest c hanges

* merge?

* run forward

* Add aspect_ratio_mask

* vision attention mask

* update script and config variable names

* nit

* nits

* be able to load

* style

* nits

* there

* nits

* make forward run

* small update

* enable generation multi-turn

* nit

* nit

* Clean up a bit for errors and typos

* A bit more constant fixes

* 90B keys and shapes match

* Fix for 11B model

* Fixup, remove debug part

* Docs

* Make max_aspect_ratio_id to be minimal

* Update image processing code to match new implementation

* Adjust conversion for final checkpoint state

* Change dim in repeat_interleave (accordig to meta code)

* tmp fix for num_tiles

* Fix for conversion (gate<->up, q/k_proj rope permute)

* nits

* codestyle

* Vision encoder fixes

* pass cross attn mask further

* Refactor aspect ratio mask

* Disable text-only generation

* Fix cross attention layers order, remove q/k norm rotation for cross atention layers

* Refactor gated position embeddings

* fix bugs but needs test with new weights

* rope scaling should be llama3

* Fix rope scaling name

* Remove debug for linear layer

* fix copies

* Make mask prepare private func

* Remove linear patch embed

* Make precomputed embeddings as nn.Embedding module

* MllamaPrecomputedAspectRatioEmbedding with config init

* Remove unused self.output_dim

* nit, intermediate layers

* Rename ln and pos_embed

* vision_chunk_size -> image_size

* return_intermediate -> intermediate_layers_indices

* vision_input_dim -> hidden_size

* Fix copied from statements

* fix most tests

* Fix more copied from

* layer_id->layer_idx

* Comment

* Fix tests for processor

* Copied from for _prepare_4d_causal_attention_mask_with_cache_position

* Style fix

* Add MllamaForCausalLM

* WIP fixing tests

* Remove duplicated layers

* Remove dummy file

* Fix style

* Fix consistency

* Fix some TODOs

* fix language_model instantiation, add docstring

* Move docstring, remove todos for precomputed embeds (we cannot init them properly)

* Add initial docstrings

* Fix

* fix some tests

* lets skip these

* nits, remove print, style

* Add one more copied from

* Improve test message

* Make validate func private

* Fix dummy objects

* Refactor `data_format` a bit + add comment

* typos/nits

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix dummy objects and imports

* Add chat template config json

* remove num_kv_heads from vision attention

* fix

* move some commits and add more tests

* fix test

* Remove `update_key_name` from modeling utils

* remove num-kv-heads again

* some prelimiary docs

* Update chat template + tests

* nit, conversion script max_num_tiles from params

* Fix warning for text-only generation

* Update conversion script for instruct models

* Update chat template in converstion + test

* add tests for CausalLM model

* model_max_length, avoid null chat_template

* Refactor conversion script

* Fix forward

* Fix integration tests

* Refactor vision config + docs

* Fix default

* Refactor text config

* Doc fixes

* Remove unused args, fix docs example

* Squashed commit of the following:

commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date:   Wed Sep 18 13:39:15 2024 +0000

    Move model + add output hidden states and output attentions

* Fix num_channels

* Add mllama text and mllama vision models

* Fixing repo consistency

* Style fix

* Fixing repo consistency

* Fixing unused config params

* Fix failed tests after refactoring

* hidden_activation -> hidden_act  for text mlp

* Remove from_pretrained from sub-configs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Reuse lambda in conversion script

* Remove run.py

* Update docs/source/en/model_doc/mllama.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/processing_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused LlamaTokenizerFast

* Fix logging

* Refactor gating

* Remove cycle for collecting intermediate states

* Refactor text-only check, add integration test for text-only

* Revert from pretrained to configs

* Fix example

* Add auto `bos_token` adding in processor

* Fix tips

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Enable supports_gradient_checkpointing model flag

* add eager/sdpa options

* don't skip attn tests and bring back GC skips (did i really remove those?)

* Fix signature, but get error with None gradient

* Fix output attention tests

* Disable GC back

* Change no split modules

* Fix dropout

* Style

* Add Mllama to sdpa list

* Add post init for vision model

* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model

* if skipped, say it, don't pass

* Clean vision tester config

* Doc for args

* Update tests/models/mllama/test_modeling_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add cross_attention_mask to test

* typehint

* Remove todo

* Enable gradient checkpointing

* Docstring

* Style

* Fixing and skipping some tests for new cache

* Mark flaky test

* Skip `test_sdpa_can_compile_dynamic` test

* Fixing some offload tests

* Add direct GenerationMixin inheritance

* Remove unused code

* Add initializer_range to vision config

* update the test to make sure we show if split

* fix gc?

* Fix repo consistency

* Undo modeling utils debug changes

* Fix link

* mllama -> Mllama

* [mllama] -> [Mllama]

* Enable compile test for CausalLM model (text-only)

* Fix TextModel prefix

* Update doc

* Docs for forward, type hints, and vision model prefix

* make sure to reset

* fix init

* small script refactor and styling

* nit

* updates!

* some nits

* Interpolate embeddings for 560 size and update integration tests

* nit

* does not suppor static cache!

* update

* fix

* nit2

* this?

* Fix conversion

* Style

* 4x memory improvement with image cache AFAIK

* Token decorator for tests

* Skip failing tests

* update processor errors

* fix split issues

* style

* weird

* style

* fix failing tests

* update

* nit fixing the whisper tests

* fix path

* update

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Yoni Gozlan
94f18cf23c
Add OmDet-Turbo (#31843)
* Add template with add-new-model-like

* Add rough OmDetTurboEncoder and OmDetTurboDecoder

* Add working OmDetTurbo convert to hf

* Change OmDetTurbo encoder to RT-DETR encoder

* Add swin timm backbone as default, add always partition fix for swin timm

* Add labels and tasks caching

* Fix make fix-copies

* Format omdet_turbo

* fix Tokenizer tests

* Fix style and quality

* Reformat omdet_turbo

* Fix quality, style, copies

* Standardize processor kwargs

* Fix style

* Add output_hidden_states and ouput_attentions

* Add personalize multi-head attention, improve docstrings

* Add integrated test and fix copy, style, quality

* Fix unprotected import

* Cleanup comments and fix unprotected imports

* Add fix different prompts in batch (key_padding_mask)

* Add key_padding_mask to custom multi-head attention module

* Replace attention_mask by key_padding_mask

* Remove OmDetTurboModel and refactor

* Refactor processing of classes and abstract use of timm backbone

* Add testing, fix output attentions and hidden states, add cache for anchors generation

* Fix copies, style, quality

* Add documentation, conver key_padding_mask to attention_mask

* revert changes to backbone_utils

* Fic docstrings rst

* Fix unused argument in config

* Fix image link documentation

* Reorder config and cleanup

* Add tokenizer_init_kwargs in merge_kwargs of the processor

* Change AutoTokenizer to CLIPTokenizer in convert

* Fix init_weights

* Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs

* change processor kwargs and make task input optional

* Fix omdet docs

* Remove unnecessary tests for processor kwargs

* Replace nested BatchEncoding output of the processor by a flattened BatchFeature

* Make modifications from Pavel review

* Add changes Amy review

* Remove unused param

* Remove normalize_before param, Modify processor call docstring

* Remove redundant decoder class, add gradient checkpointing for decoder

* Remove commented out code

* Fix inference in fp16 and add fp16 integrated test

* update omdet md doc

* Add OmdetTurboModel

* fix caching and nit

* add OmDetTurboModel to tests

* nit change repeated key test

* Improve inference speed in eager mode

* fix copies

* Fix nit

* remove OmdetTurboModel

* [run-slow] omdet_turbo

* [run-slow] omdet_turbo

* skip dataparallel test

* [run-slow] omdet_turbo

* update weights to new path

* remove unnecessary config in class

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>
2024-09-25 13:26:28 -04:00
NielsRogge
06e27e3dc0
[Pixtral] Improve docs, rename model (#33491)
* Improve docs, rename model

* Fix style

* Update repo id
2024-09-25 13:53:12 +02:00
Dmitry Rogozhkin
5e2916bc14
tests: fix pytorch tensor placement errors (#33485)
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"

According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.

Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-09-25 12:21:53 +01:00
Yoni Gozlan
5f0c181f4e
Uniformize kwargs for image-text-to-text processors (#32544)
* uniformize FUYU processor kwargs

* Uniformize instructblip processor kwargs

* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2

* Uniformize llava_next processor

* Fix save_load test for processor with chat_template only as extra init args

* Fix import Unpack

* Fix Fuyu Processor import

* Fix FuyuProcessor import

* Fix FuyuProcessor

* Add defaults for specific kwargs kosmos2

* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs

* Add tests processor Udop

* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature

* Fix overwrite tests kwargs processors

* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop

* Fix processing test fuyu

* remove unnecessary pad_token check in instructblip ProcessorTest

* Fix BC tests and cleanup

* FIx imports fuyu

* Uniformize Pix2Struct

* Fix wrong name for FuyuProcessorKwargs

* Fix slow tests reversed inputs align fuyu llava-next, change udop warning

* Fix wrong logging import udop

* Add check images text input order

* Fix copies

* change text pair handling when positional arg

* rebase on main, fix imports in test_processing_common

* remove optional args and udop uniformization from this PR

* fix failing tests

* remove unnecessary test, fix processing utils and test processing common

* cleanup Unpack

* cleanup

* fix conflict grounding dino
2024-09-24 21:28:19 -04:00
Joao Gante
a7734238ff
Generation tests: update imagegpt input name, remove unused functions (#33663) 2024-09-24 16:40:48 +01:00
Joao Gante
e15687fffe
Generation: deprecate PreTrainedModel inheriting from GenerationMixin (#33203) 2024-09-23 18:28:36 +01:00
Yoni Gozlan
1456120929
Uniformize kwargs for Udop processor and update docs (#33628)
* Add optional kwargs and uniformize udop

* cleanup Unpack

* nit Udop
2024-09-23 12:47:32 -04:00
Avishai Elmakies
78b2929c05
Sdpa dino v2 (#33403)
* add sdpa to dinov2

* fixup

* add dinov2 to sdpa doc

* update doc order

* [run-slow] dinov2

* common to eager

* [run-slow] dinov2

* update attn implementation in common

* update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit

* [run-slow] dinov2

---------

Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
2024-09-21 01:58:00 +01:00
Mayank Mishra
e472e077c2
Granitemoe (#33207)
* first commit

* drop tokenizer

* drop tokenizer

* drop tokenizer

* drop convert

* granite

* drop tokenization test

* mup

* fix

* reformat

* reformat

* reformat

* fix docs

* stop checking for checkpoint

* update support

* attention multiplier

* update model

* tiny drop

* saibo drop

* skip test

* fix test

* fix test

* drop

* drop useless imports

* update docs

* drop flash function

* copied from

* drop pretraining tp

* drop pretraining tp

* drop pretraining tp

* drop unused import

* drop code path

* change name

* softmax scale

* head dim

* drop legacy cache

* rename params

* cleanup

* fix copies

* comments

* add back legacy cache

* multipliers

* multipliers

* multipliers

* text fix

* fix copies

* merge

* multipliers

* attention multiplier

* drop unused imports

* add granitemoe

* add decoration

* remove moe from sequenceclassification

* fix test

* fix

* fix

* fix

* move rope?

* merge

* drop bias

* drop bias

* Update src/transformers/models/granite/configuration_granite.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* Update src/transformers/models/granite/modeling_granite.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix

* fix

* fix

* drop

* drop

* fix

* fix

* cleanup

* cleanup

* fix

* fix granite tests

* fp32 test

* fix

* drop jitter

* fix

* rename

* rename

* fix config

* add gen test

---------

Co-authored-by: Yikang Shen <yikang.shn@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-21 01:43:50 +02:00
Yoni Gozlan
c0c6815dc9
Add support for args to ProcessorMixin for backward compatibility (#33479)
* add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin

* change size and crop_size in processor kwargs tests to do_rescale and rescale_factor

* remove unnecessary llava processor kwargs test overwrite

* nit

* change data_arg_name to input_name

* Remove unnecessary test override

* Remove unnecessary tests Paligemma

* Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring
2024-09-20 11:40:59 -04:00
Joao Gante
2fdb5e74cc
VLM generate: tests can't generate image/video tokens (#33623) 2024-09-20 15:43:27 +01:00
amyeroberts
f9b4409726
Remove unnecessary CPM model tests (#33621)
Remove model tests
2024-09-20 14:20:57 +01:00
Lake Lee
ec1424c6a3
Update modeling_mamba2.py, fix pad size (#32599)
* Update modeling_mamba2.py

Fix pad_size calculation to ensure it's less than self.chunk_size

* [run_slow] mamba2

* [run-slow] mamba2

* [run-slow] Add @require_read_token decorator to failing tests for token propagation

* [run_slow] mamba2
2024-09-20 11:40:57 +01:00
Fanli Lin
8bd1f2f338
[tests] make more tests device-agnostic (#33580)
* enable

* fix

* add xpu skip

* add marker

* skip for xpu

* add more

* enable on accelerator

* add more cases

* add more tests

* add more
2024-09-20 10:16:43 +01:00
Fanli Lin
4d8908df27
[tests] enable GemmaIntegrationTest on XPU (#33555)
enable GemmaIntegrationTest
2024-09-19 19:39:19 +01:00
Fanli Lin
b87755aa6d
[tests] skip tests for xpu (#33553)
* enable

* fix

* add xpu skip

* add marker

* skip for xpu

* add more

* add one more
2024-09-19 19:28:04 +01:00
Yoni Gozlan
f111d5b783
Uniformize kwargs for Paligemma processor and update docs (#33571)
* Uniformize paligemma processor

* nit
2024-09-19 14:14:06 -04:00
Joao Gante
52920b5dd5
Cache: don't throw warnings on gemma2 when instantiating a new cache (#33595) 2024-09-19 17:42:47 +01:00
Anton Vlasjuk
b50ff5993a
[Mamba2] Move dt calculations to kernel (#33520)
* use kernel for dt calculations

* add small test

* [run-slow] mamba2
2024-09-19 17:41:17 +01:00
Pablo Montalvo
413008c580
add uniform processors for altclip + chinese_clip (#31198)
* add initial design for uniform processors + align model

* add uniform processors for altclip + chinese_clip

* fix mutable default 👀

* add configuration test

* handle structured kwargs w defaults + add test

* protect torch-specific test

* fix style

* fix

* rebase

* update processor to generic kwargs + test

* fix style

* add sensible kwargs merge

* update test

* fix assertEqual

* move kwargs merging to processing common

* rework kwargs for type hinting

* just get Unpack from extensions

* run-slow[align]

* handle kwargs passed as nested dict

* add from_pretrained test for nested kwargs handling

* [run-slow]align

* update documentation + imports

* update audio inputs

* protect audio types, silly

* try removing imports

* make things simpler

* simplerer

* move out kwargs test to common mixin

* [run-slow]align

* skip tests for old processors

* [run-slow]align, clip

* !$#@!! protect imports, darn it

* [run-slow]align, clip

* [run-slow]align, clip

* update common processor testing

* add altclip

* add chinese_clip

* add pad_size

* [run-slow]align, clip, chinese_clip, altclip

* remove duplicated tests

* fix

* update doc

* improve documentation for default values

* add model_max_length testing

This parameter depends on tokenizers received.

* Raise if kwargs are specified in two places

* fix

* match defaults

* force padding

* fix tokenizer test

* clean defaults

* move tests to common

* remove try/catch block

* deprecate kwarg

* format

* add copyright + remove unused method

* [run-slow]altclip, chinese_clip

* clean imports

* fix version

* clean up deprecation

* fix style

* add corner case test on kwarg overlap

* resume processing - add Unpack as importable

* add tmpdirname

* fix altclip

* fix up

* add back crop_size to specific tests

* generalize tests to possible video_processor

* add back crop_size arg

* fixup overlapping kwargs test for qformer_tokenizer

* remove copied from

* fixup chinese_clip tests values

* fixup tests - qformer tokenizers

* [run-slow] altclip, chinese_clip

* remove prepare_image_inputs
2024-09-19 17:21:54 +02:00
Pablo Montalvo
4f0246e535
fix tests with main revision and read token (#33560)
* fix tests with main revision and read token

* [run-slow]mamba2

* test previously skipped tests

* [run-slow]mamba2

* skip some tests

* [run-slow]mamba2

* finalize tests

* [run-slow]mamba2
2024-09-19 17:10:22 +02:00
Joao Gante
f3b3810fe6
rag: fix CI (#33578) 2024-09-19 11:55:26 +01:00
Raushan Turganbay
d7975a5874
VLMs: enable generation tests (#33533)
* add tests

* fix whisper

* update

* nit

* add qwen2-vl

* more updates!

* better this way

* fix this one

* fix more tests

* fix final tests, hope so

* fix led

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* pr comments

* not pass pixels and extra for low-mem tests, very flaky because of visio tower

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-09-19 12:04:24 +02:00
Raushan Turganbay
e40bb4845e
Load and save video-processor from separate folder (#33562)
* load and save from video-processor folder

* Update src/transformers/models/llava_onevision/processing_llava_onevision.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-19 09:56:52 +02:00
Yoach Lacombe
5af7d41e49
Codec integration (#33565)
* clean mimi commit

* some nits suggestions from Arthur

* make fixup

* rename repo id + change readme

* Update docs/source/en/model_doc/mimi.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add flaky flag to batching equivalence due to audio_codes failing sometimes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-18 19:23:44 +02:00
Raushan Turganbay
db72894b48
Chat template: save and load correctly for processors (#33462)
* fix

* add tests

* fix tests

* Update tests/models/llava/test_processor_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

* fix tests

* update tests

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-18 13:00:44 +02:00
Wang, Yi
454a0f2efd
fix patch_attention_mask incorrect setting which leads to the differe… (#33499)
* fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* fix format

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* [run_slow] idefics2

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-09-17 22:24:42 +01:00
Yoni Gozlan
d8500cd229
Uniformize kwargs for Pixtral processor (#33521)
* add uniformized pixtral and kwargs

* update doc

* fix _validate_images_text_input_order

* nit
2024-09-17 14:44:27 -04:00
Nikita Krasnytskyi
c29a8694b0
Fix missing sequences_scores in the Whisper beam search output (#32970)
* added sequences_scores to the output

* added beam_indices to output

* added test to check for beam_indices, sequences_scores and their shape

* removed redundant whitespaces

* make fixup
2024-09-17 19:36:11 +01:00
ErezSC42
46c27577b3
fix to jamba config, asserting attention and expert offset (#33316)
* fix to jamba config, asserting attention and expert offset

* fix foramtting

* fix foramtting

* fix foramtting

* changed to error raise instead of assertion, added unittests

* fix

* changed t_ to property_

* changed t_ to property_

* quickfix

* ran code styler
2024-09-17 19:29:27 +01:00
Wang, Yi
74026b473e
idefics2 enable_input_require_grads not aligned with disable_input_re… (#33194)
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* split test case

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* refine test

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-09-17 10:39:34 +01:00
Insu Jang
bcf8946f0a
Fix number of patch check for different vision feature select strategy (#32494)
* Fix number of patch check for different vision feature select strategy

* add test

---------

Co-authored-by: raushan <raushan@huggingface.co>
2024-09-17 09:33:07 +02:00
Yoach Lacombe
18e1a9c719
Fix parametrization-based weight norm (#33275)
* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading

* make style

* fix sew

* fix sew and sew_d tests
2024-09-17 08:05:21 +02:00
Yoach Lacombe
98adf24883
[Whisper test] Fix some failing tests (#33450)
* Fix failing tensor placement in Whisper

* fix long form generation tests

* more return_timestamps=True

* make fixup

* [run_slow] whisper

* [run_slow] whisper
2024-09-16 19:05:17 +02:00
Yoni Gozlan
2f62146f0e
Uniformize kwargs for LLaVa processor and update docs (#32858)
* Uniformize kwargs for LlaVa and update docs

* Change order of processor inputs in docstring

* Improve BC support for reversed images and text inputs

* cleanup llava processor call docstring

* Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning

* Put function check reversed images text outside base processor class

* Refactor _validate_images_text_input_order

* Add ProcessingUtilTester

* fix processing and test_processing
2024-09-16 11:26:26 -04:00
Arthur
8bd2b1e8c2
Add support for Pixtral (#33449)
* initial commit

* gloups

* updates

* work

* weights match

* nits

* nits

* updates to support the tokenizer :)

* updates

* Pixtral processor (#33454)

* rough outline

* Add in image break and end tokens

* Fix

* Udo some formatting changes

* Set patch_size default

* Fix

* Fix token expansion

* nit in conversion script

* Fix image token list creation

* done

* add expected results

* Process list of list of images (#33465)

* updates

* working image and processor

* this is the expected format

* some fixes

* push current updated

* working mult images!

* add a small integration test

* Uodate configuration docstring

* Formatting

* Config docstring fix

* simplify model test

* fixup modeling and etests

* Return BatchMixFeature in image processor

* fix some copies

* update

* nits

* Update model docstring

* Apply suggestions from code review

* Fix up

* updates

* revert modeling changes

* update

* update

* fix load safe

* addd liscence

* update

* use pixel_values as required by the model

* skip some tests and refactor

* Add pixtral image processing tests (#33476)

* Image processing tests

* Add processing tests

* woops

* defaults reflect pixtral image processor

* fixup post merge

* images -> pixel values

* oups sorry Mr docbuilder

* isort

* fix

* fix processor tests

* small fixes

* nit

* update

* last nits

* oups this was really breaking!

* nits

* is composition needs to be true

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-14 12:28:39 +02:00
Amit Garg
dfd31158ee
[Phi-3] Bug on stale kv cache (#33129)
* fix long seq bug

* fixed format

* fixed fn copy inconsistency

* fix long seq bug

* fixed format

* fixed fn copy inconsistency

* Addressed comments

* added a unit test

* fixed cache position

* Added a warning msg to the forward fn

* fixed test case
2024-09-13 14:07:19 +02:00
Raushan Turganbay
4b0418df11
Enable padding_side as call time kwargs (#33385)
* fix

* add padding-side kwarg

* add padding side in all models & fix tests

* fix copies

* fix tests
2024-09-13 11:58:38 +01:00
Raushan Turganbay
9c4639b622
Return image hidden states (#33426)
* fix

* return image hidden states

* fix copies

* fix test
2024-09-13 10:20:03 +02:00
benniekiss
5c6257d1fc
[whisper] Clarify error message when setting max_new_tokens (#33324)
* clarify error message when setting max_new_tokens

* sync error message in test_generate_with_prompt_ids_max_length

* there is no self
2024-09-12 18:48:36 +02:00
Raushan Turganbay
2f611d30d9
Qwen2-VL: clean-up and add more tests (#33354)
* clean-up on qwen2-vl and add generation tests

* add video tests

* Update tests/models/qwen2_vl/test_processing_qwen2_vl.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix and add better tests

* Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update docs and address comments

* Update docs/source/en/model_doc/qwen2_vl.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_vl.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update

* remove size at all

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-12 18:24:04 +02:00
Hannan Komari
8ed635258c
Fix flax whisper tokenizer bug (#33151)
* Update tokenization_whisper.py

Fix issue with flax whisper model

* Update tokenization_whisper_fast.py

Fix issue with flax whisper model

* Update tokenization_whisper.py

just check len of token_ids

* Update tokenization_whisper_fast.py

just use len of token_ids

* Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list

* Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list

* Update test_tokenization_whisper.py to add test for _convert_to_list method

* Update test_tokenization_whisper.py to fix code style issues

* Fix code style

* Fix code check again

* Update test_tokenization)whisper.py to Improve code style

* Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available

* Update tests/models/whisper/test_tokenization_whisper.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method

* Revert the changes automatically applied by formatter and was unrelated to PR

* Format for minimal changes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-12 12:21:59 +01:00
Ita Zaporozhets
781bbc4d98
use diff internal model in tests (#33387)
* use diff internal model in tests

* use diff internal model in tests
2024-09-11 11:27:00 +02:00
Maciej Adamiak
8e8e7d8558
fixed Mask2Former image processor segmentation maps handling (#33364)
* fixed mask2former image processor segmentation maps handling

* introduced review suggestions

* introduced review suggestions
2024-09-10 11:19:56 +01:00
Raushan Turganbay
7d2d6ce9cb
VLM: fixes after refactor (#32907)
* leave only half of the changes

* fix tests

* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava

* fix tests, first try

* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava

* fix, second try

* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava

* fix

* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava
2024-09-10 12:02:37 +02:00
Lysandre Debut
f24f084329
Import structure & first three model refactors (#31329)
* Import structure & first three model refactors

* Register -> Export. Export all in __all__. Sensible defaults according to filename.

* Apply most comments from Amy and some comments from Lucain

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>

* Style

* Add comment

* Clearer .py management

* Raise if not in backend mapping

* More specific type

* More efficient listdir

* Misc fixes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
2024-09-10 11:10:53 +02:00
amyeroberts
f745e7d3f9
Remove repeated prepare_images in processor tests (#33163)
* Remove repeated prepare_images

* Address comments - update docstring; explanatory comment
2024-09-09 13:20:27 +01:00
Raushan Turganbay
65bb284448
Compile compatibilty for decoder-only models (#32617)
* squash into one commit

* add qwen2-vl for rope standardization

* fix mistral compile

* fix qwen2-vl

* fix-copies
2024-09-09 10:59:04 +02:00
Ita Zaporozhets
e48e5f1f13
Support reading tiktoken tokenizer.model file (#31656)
* use existing TikTokenConverter to read tiktoken tokenizer.model file

* del test file

* create titktoken integration file

* adding tiktoken llama test

* ALTNATIVE IMPLEMENTATION: supports llama 405B

* fix one char

* remove redundant line

* small fix

* rm unused import

* flag for converting from tiktokeng

* remove unneeded file

* ruff

* remove llamatiktokenconverter, stick to general converter

* tiktoken support v2

* update test

* remove stale changes

* udpate doc

* protect import

* use is_protobuf_available

* add templateprocessor in tiktokenconverter

* reverting templateprocessor from tiktoken support

* update test

* add require_tiktoken

* dev-ci

* trigger build

* trigger build again

* dev-ci

* [build-ci-image] tiktoken

* dev-ci

* dev-ci

* dev-ci

* dev-ci

* change tiktoken file name

* feedback review

* feedback rev

* applying feedback, removing tiktoken converters

* conform test

* adding docs for review

* add doc file for review

* add doc file for review

* add doc file for review

* support loading model without config.json file

* Revert "support loading model without config.json file"

This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb.

* remove dev var

* updating docs

* safely import protobuf

* fix protobuf import error

* fix protobuf import error

* trying isort to fix ruff error

* fix ruff error

* try to fix ruff again

* try to fix ruff again

* try to fix ruff again

* doc table of contents

* add fix for consistency.dockerfile torchaudio

* ruff

* applying feedback

* minor typo

* merging with push-ci-image

* clean up imports

* revert dockerfile consistency
2024-09-06 14:24:02 +02:00
Shiyu
342e800086
support 3D attention mask in bert (#32105)
* support 3D/4D attention mask in bert

* test cases

* update doc

* fix doc
2024-09-06 14:20:48 +02:00
GeLee
2b18354106
add self.head_dim for VisionAttention in Qwen2-VL (#33211)
* add self.head_dim for VisionAttention in Qwen2-VL

* add self.head_dim for VisionAttention in Qwen2-VL

* fix ci

* black the test_modeling_qwen2_vl.py

* use ruff to format test_modeling_qwen2_vl.py

* [run-slow] qwen2_vl

* use tying for python3.8

* fix the import format

* use ruff to fix the ci error I001

* [run-slow] qwen2_vl

* remove unused import

* commit for rebase

* use ruff fix ci

* [run-slow] qwen2_vl

---------

Co-authored-by: root <liji>
2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi
3314fe1760
Add validation for maximum sequence length in modeling_whisper.py (#33196)
* Add validation for maximum sequence length in modeling_whisper.py

Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.

This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.

* Change exception message in src/transformers/models/whisper/modeling_whisper.py

The exception message is for whisper's label's sequence max length.

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py

It's for whisper's config.max_target_positions.

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py

* Add test for maximum label's sequence length in test_modeling_whisper.py

* Add self to modeling_whisper.py

* Update test_modeling_whisper.py with respect to automatic validations

* Update modeling_whisper.py with respect to ci/circleci: check_code_quality

* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality

* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate

* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate

* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality

* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py

* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality

* Remove assert from test_modeling_whisper.py

* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py

* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality

* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate

* Update test_modeling_whisper.py

* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py

* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py

* Add new tests in test_modeling_whisper.py

* Update test_modeling_whisper.py

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-09-06 14:09:49 +02:00
Ita Zaporozhets
363301f221
support loading model without config.json file (#32356)
* support loading model without config.json file

* fix condition

* update tests

* add test

* ruff

* ruff

* ruff
2024-09-06 13:49:47 +02:00
Xuehai Pan
e1c2b69c34
Load dynamic module (remote code) only once if code isn't change (#33162)
* Load remote code only once

* Use hash as load indicator

* Add a new option `force_reload` for old behavior (i.e. always reload)

* Add test for dynamic module is cached

* Add more type annotations to improve code readability

* Address comments from code review
2024-09-06 12:49:35 +01:00
Sanchit Gandhi
51d15eb1c1
[whisper] alternative fix for long-form timestamps (#32131)
* [whisper] alternative fix for long-form timestamps

* update test
2024-09-06 12:57:08 +02:00
Raushan Turganbay
1759bb9126
Fix: StaticCache & inputs_embeds (#32932)
squash commit
2024-09-06 12:56:59 +05:00
Shijie
21fac7abba
simple align qwen2vl kv_seq_len calculation with qwen2 (#33161)
* qwen2vl_align_kv_seqlen_to_qwen2

* flash att test

* [run-slow] qwen2_vl

* [run-slow] qwen2_vl fix OOM

* [run-slow] qwen2_vl

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* code quality

---------

Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-09-05 21:19:30 +05:00
Joshua Lochner
c6d2848a23
🚨 Fix torch.jit.trace for interpolate_pos_encoding in all vision models (#33226)
* Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models

* Apply formatting

* Add missing `self.config = config`

* Fix copies

* Fix hiera interpolation unit test

* Formatting

* Update `_import_structure`

* make style

* Fix docstring

* Use `# Copied from` instead of utils

* DeiT variable renaming (`class_and_dist_pos_embed`)

* Fix Hiera `interpolate_pos_encoding`
2024-09-05 16:17:34 +02:00
Younes Belkada
47b096412d
Fix: Fix FalconMamba training issues due to incompatible kernels (#33195)
* fix FM training kernels

* fix copies

* fix copies

* propagate to slow path

* make it BC

* add comment

* fix test
2024-09-05 11:55:08 +02:00
Raushan Turganbay
43df47d8e7
Llava Onevision: add model (#32673)
* working version

* fix copies

* update

* tests

* update docs

* codestyle

* add more tests

* add returns for docs

* clean up

* Update src/transformers/models/llava_onevision/processing_llava_onevision.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* updates

* codestyle

* style

* shouldn't be reversed

* [run-slow] llava_onevision

* [run-slow] llava_onevision

* add pooling in videos

* [run-slow] llava_onevision

* num-logits-to-keep

* [run-slow] llava_onevision

* [run-slow] llava_onevision

* Update tests/test_modeling_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* video matched orig impl

* fix tests

* chat template was modified

* Update docs/source/en/model_doc/llava_onevision.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add morer info in the doc page

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-05 14:43:20 +05:00
amyeroberts
d2dcff96f8
[InstructBLIP] qformer_tokenizer is required input (#33222)
* [InstructBLIP] qformer_tokenizer is required input

* Bit safer

* Add to instructblipvideo processor

* Fix up

* Use video inputs

* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
2024-09-04 16:18:06 +01:00
laurentd-lunit
d703477265
[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263)
* [fix] LlavaNextProcessor '_get_unpadded_features' method

* [tests] add test_image_token_filling

* [chore] style + comment

* [minor] improve readability

* [chore] run make fix-copies
2024-09-04 17:41:51 +05:00
Niklas Muennighoff
ecd61c6286
Add OLMoE (#32406)
* Add OLMoE

* Add OLMoE

* Updates

* Make norm optional; add keys

* Add output

* Add

* Fix dtype

* Fix eos config

* Update

* Add OLMoE

* Fix OLMoE path

* Format

* Format

* Rmv copy statement

* Rmv copy statement

* Format

* Add copies

* Cp rotary

* Fix aming

* Fix naming

* Update RoPE integration; num_logits_to_keep; Add copy statements

* Add eps to config

* Format

* Add aux loss

* Adapt router_aux_loss_coef

* Update md

* Adapt

* adapt tests
2024-09-03 18:43:12 +02:00
Arthur
b017a9eb11
Refactor CI: more explicit (#30674)
* don't run custom when not needed?

* update test fetcher filtering

* fixup and updates

* update

* update

* reduce burden

* nit

* nit

* mising comma

* this?

* this?

* more parallelism

* more

* nit for real parallelism on tf and torch examples

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update to make it more custom

* update to make it more custom

* update to make it more custom

* update to make it more custom

* update

* update

* update

* update

* update

* update

* use correct path

* fix path to test files and examples

* filter-tests

* filter?

* filter?

* filter?

* nits

* fix naming of the artifacts to be pushed

* list vs files

* list vs files

* fixup

* fix list of all tests

* fix the install steps

* fix the install steps

* fix the config

* fix the config

* only split if needed

* only split if needed

* extend should fix it

* extend should fix it

* arg

* arg

* update

* update

* run tests

* run tests

* run tests

* more nits

* update

* update

* update

* update

* update

* update

* update

* simpler way to show the test, reduces the complexity of the generated config

* simpler way to show the test, reduces the complexity of the generated config

* style

* oups

* oups

* fix import errors

* skip some tests for now

* update doctestjob

* more parallelism

* fixup

* test only the test in examples

* test only the test in examples

* nits

* from Arthur

* fix generated congi

* update

* update

* show tests

* oups

* oups

* fix torch job for now

* use single upload setp

* oups

* fu**k

* fix

* nit

* update

* nit

* fix

* fixes

* [test-all]

* add generate marker and generate job

* oups

* torch job runs not generate tests

* let repo utils test all utils

* UPdate

* styling

* fix repo utils test

* more parallel please

* don't test

* update

* bit more verbose sir

* more

* hub were skipped

* split by classname

* revert

* maybe?

* Amazing catch

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix

* update

* update

* maybe non capturing

* manual convert?

* pass artifacts as parameters as otherwise the config is too long

* artifact.json

* store output

* might not be safe?

* my token

* mmm?

* use CI job IS

* can't get a proper id?

* ups

* build num

* update

* echo url

* this?

* this!

* fix

* wget

* ish

* dang

* udpdate

* there we go

* update

* update

* pass all

* not .txt

* update

* fetcg

* fix naming

* fix

* up

* update

* update

* ??

* update

* more updates

* update

* more

* skip

* oups

* pr documentation tests are currently created differently

* update

* hmmmm

* oups

* curl -L

* update

* ????

* nit

* mmmm

* ish

* ouf

* update

* ish

* update

* update

* updatea

* nit

* nit

* up

* oups

* documentation_test fix

* test hub tests everything, just marker

* update

* fix

* test_hub is the only annoying one now

* tf threads?

* oups

* not sure what is happening?

* fix?

* just use folder for stating hub

* I am getting fucking annoyed

* fix the test?

* update

* uupdate

* ?

* fixes

* add comment!

* nit

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-08-30 18:17:25 +02:00
Matt
38d58a4427
Fix local repos with remote code not registering for pipelines (#33100)
* Extremely experimental fix!

* Try removing the clause entirely

* Add test

* make fixup

* stash commit

* Remove breakpoint

* Add anti-regression test

* make fixup

* Move repos to hf-internal-testing!
2024-08-30 16:56:22 +01:00
JB (Don)
f1a385b1de
[RoBERTa-based] Add support for sdpa (#30510)
* Adding SDPA support for RoBERTa-based models

* add not is_cross_attention

* fix copies

* fix test

* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin

* address some review comments

* use copied from

* style

* consistency

* fix lists

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-28 10:26:00 +02:00
Anton Vlasjuk
3bfd3e4803
Fix: Jamba batched generation (#32914)
* init fix

* fix mask during cached forward, move mask related stuff to own function

* adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp)

* revert overwriting new integration tests

* move some comments to docstring
2024-08-28 09:24:06 +02:00
Mayank Mishra
c35d2ccf5a
Granite language models (#31502)
* first commit

* drop tokenizer

* drop tokenizer

* drop tokenizer

* drop convert

* granite

* drop tokenization test

* mup

* fix

* reformat

* reformat

* reformat

* fix docs

* stop checking for checkpoint

* update support

* attention multiplier

* update model

* tiny drop

* saibo drop

* skip test

* fix test

* fix test

* drop

* drop useless imports

* update docs

* drop flash function

* copied from

* drop pretraining tp

* drop pretraining tp

* drop pretraining tp

* drop unused import

* drop code path

* change name

* softmax scale

* head dim

* drop legacy cache

* rename params

* cleanup

* fix copies

* comments

* add back legacy cache

* multipliers

* multipliers

* multipliers

* text fix

* fix copies

* merge

* multipliers

* attention multiplier

* drop unused imports

* fix

* fix

* fix

* move rope?

* Update src/transformers/models/granite/configuration_granite.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* Update src/transformers/models/granite/modeling_granite.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix

* fix

* fix

* fix-copies

* torch rmsnorm

* add authors

* change model path

* fix

* test

* drop static cache test

* uupdate readme

* drop non-causal

* readme

* drop useless imports

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-27 21:27:21 +02:00
Juan Pizarro
7591ca5bc5
🚨 Add Blip2ForImageTextRetrieval (#29261)
* add Blip2ForImageTextRetrieval

* use one line and remove unnecessary space in tests

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use  value from the config, rather than hardcoded

* change order of params in Blip2QFormerModel.forward

* update docstring

* fix style

* update test_inference_opt

* move embeddings out of Blip2QFormerModel

* remove from_vision_qformer_configs

* remove autocast float16 in Blip2QFormerModel

* rename fiels into vision_projection,text_projection,use_image_text_matching_head

* use CLIPOutput for  Blip2ImageTextMatchingModelOutput

* remove past_key_values_length from Blip2TextEmbeddings

* fix small typo in the CLIPOutput docstring

* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping

* update docstring and add require_torch_fp16

* rollback test_inference_opt

* use use_image_text_matching_head=True in convert

* skip test_model_get_set_embeddings

* fix create_rename_keys error on new itm fields

* revert to do  scale after dot product between "query" and "key"

* fix ValueError on convert script for blip2-opt-2.7b

* update org of paths to Salesforce

* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests

* [run_slow] blip_2

* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED

* fix docstring of Blip2ImageTextMatchingModelOutput

* [run_slow] blip_2

* fix multi-gpu tests

* [run_slow] blip_2

* [run_slow] blip_2

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 18:50:27 +01:00
Joao Gante
c6b23fda65
Llama: make slow tests green 🟢 (#33138) 2024-08-27 14:44:42 +01:00
Joao Gante
ab0ac3b98f
CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123)
* fix param not being passed in tested; add exceptions

* better source of model name

* Update utils/create_dummy_models.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 11:58:27 +01:00
Sai-Suraj-27
3bf6dd8aa1
fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32850)
* Fixed failing CodeGenTokenizationTest::test_truncation.

* [run_slow] Codegen

* [run_slow] codegen
2024-08-27 09:20:59 +02:00
Shijie
19e6e80e10
support qwen2-vl (#32318)
* support-qwen2-vl

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* hyphen->underscore

* make style

* add-flash2-tipd

* delete-tokenize=False

* remove-image_processor-in-init-file

* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES

* format-doct

* support-Qwen2VLVisionConfig

* remove-standardize_cache_format

* fix-letter-varaibles

* remove-torch-in-image-processor

* remove-useless-docstring

* fix-one-letter-varaible-name

* change-block-name

* default-quick-gelu-in-vision

* remove-useless-doc

* use-preimplemented-flash-forward

* fix-doc

* fix-image-processing-doc

* fix-apply-rotary-embed

* fix-flash-attn-sliding-window

* refactor

* remove-default_template

* remove-reorder_cache

* simple-get-rope_deltas

* update-prepare_inputs_for_generation

* update-attention-mask

* update-rotary_seq_len

* remove-state

* kv_seq_length

* remove-warning

* _supports_static_cache

* remove-legacy-cache

* refactor

* fix-replace

* mrope-section-doc

* code-quality

* code-quality

* polish-doc

* fix-image-processing-test

* update readme

* Update qwen2_vl.md

* fix-test

* Update qwen2_vl.md

* nit

* processor-kwargs

* hard-code-norm_layer

* code-quality

* discard-pixel-values-in-gen

* fix-inconsistent-error-msg

* unify-image-video

* hidden_act

* add-docstring

* vision-encode-as-PreTrainedModel

* pixel-to-target-dtype

* update doc and low memoryvit

* format

* format

* channel-foramt

* fix vit_flashatt

* format

* inherit-Qwen2VLPreTrainedModel

* simplify

* format-test

* remove-one-line-func-in-image-processing

* avoid-one-line-reshape

* simplify-rotary_seq_len

* avoid-single-letter-variable

* no-for-loop-sdpa

* avoid-single-letter-variable

* remove-one-line-reshape

* remove-one-line-reshape

* remove-no-rope-in-vit-logic

* default-mrope

* add-copied-from

* more-docs-for-mrope

* polish-doc

* comment-and-link

* polish-doc

* single-letter-variables

* simplify-image-processing

* video->images

* kv_seq_len-update

* vision-rope-on-the-fly

* vision-eager-attention

* change-processor-order

---------

Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 (#32659)
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 11:12:53 +01:00
Andrés Marafioti
18199b34e5
[run_slow] idefics2 (#32840) 2024-08-22 18:08:03 +02:00
Joao Gante
975b988bfe
Gemma2: eager attention by default (#32865) 2024-08-22 15:59:30 +01:00
Joao Gante
f6e2586a36
Jamba: update integration tests (#32250)
* try test updates

* a few more changes

* a few more changes

* a few more changes

* [run slow] jamba

* skip logits checks on older gpus

* [run slow] jamba

* oops

* [run slow] jamba

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-22 11:46:10 +01:00
Younes Belkada
93e538ae2e
Mamba / FalconMamba: Fix mamba left padding (#32677)
* fix mamba left padding

* Apply suggestions from code review

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix copies

* test with `inputs_embeds`

* Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copies

* clairfy

* fix last comments

* remove

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-19 16:01:35 +02:00
Kamil Akesbi
8260cb311e
Add Descript-Audio-Codec model (#31494)
* dac model

* original dac works

* add dac model

* dac can be instatiated

* add forward pass

* load weights

* all weights are used

* convert checkpoint script ready

* test

* add feature extractor

* up

* make style

* apply cookicutter

* fix tests

* iterate on FeatureExtractor

* nit

* update dac doc

* replace nn.Sequential with nn.ModuleList

* nit

* apply review suggestions 1/2

* Update src/transformers/models/dac/modeling_dac.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* up

* apply review suggestions 2/2

* update padding in FeatureExtractor

* apply review suggestions

* iterate on design and tests

* add integration tests

* feature extractor tests

* make style

* all tests pass

* make style

* fixup

* apply review suggestions

* fix-copies

* apply review suggestions

* apply review suggestions

* Update docs/source/en/model_doc/dac.md

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update docs/source/en/model_doc/dac.md

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* anticipate transfer weights to descript

* up

* make style

* apply review suggestions

* update slow test values

* update slow tests

* update test values

* update with CI values

* update with vorace values

* update test with slice

* make style

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-08-19 10:21:51 +01:00
MAHIR DAIYAN
843e5e20ca
Add Flax Dinov2 (#31960)
* tfmsenv restored in main

* installed flax

* forward pass done and all tests passed

* make fix-copies and cleaning the scripts

* fixup attempt 1

* fixup attempt 2

* fixup third attempt

* fixup attempt 4

* fixup attempt 5

* dinov2 doc fixed

* FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE

* external pos_encoding layer removed

* fixup attempt 6

* fixed integration test values

* fixup attempt 7

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* comments removed

* comment removed from the test

* fixup

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* new fixes 1

* interpolate_pos_encoding function removed

* droppath rng fixed, pretrained beit copied-from still not working

* modeling_flax_dinov2.py reformatted

* Update tests/models/dinov2/test_modeling_flax_dinov2.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* added Copied from, to the tests

* copied from statements removed from tests

* fixed copied from statements in the tests

* [run_slow] dinov2

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-08-19 09:28:13 +01:00
Joao Gante
cf32ee1753
Cache: use batch_size instead of max_batch_size (#32657)
* more precise name

* better docstrings

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-16 11:48:45 +01:00
Fanli Lin
8f9fa3b081
[tests] make test_sdpa_equivalence device-agnostic (#32520)
* fix on xpu

* [run_all]
2024-08-16 11:34:13 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper and LogitsProcessor (#32626) 2024-08-16 11:20:41 +01:00
jp
e840127370
reopen: llava-next fails to consider padding_side during Training (#32679)
restore #32386
2024-08-15 11:44:19 +01:00
Yih-Dar
20a04497a8
Fix JetMoeIntegrationTest (#32332)
JetMoeIntegrationTest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-14 16:22:06 +02:00
Yoni Gozlan
5bcbdff159
Modify ProcessorTesterMixin for better generalization (#32637)
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs

* remove crop_size argument in align processor tests to be coherent with base tests

* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
2024-08-13 11:48:53 -04:00
Bertrand Thia
cc25757a44
Add Depth Anything V2 Metric models (#32126)
* add checkpoint and repo names

* adapt head to support metric depth estimation

* add max_depth output scaling

* add expected logits

* improve docs

* fix docstring

* add checkpoint and repo names

* adapt head to support metric depth estimation

* add max_depth output scaling

* add expected logits

* improve docs

* fix docstring

* rename depth_estimation to depth_estimation_type

* add integration test

* Refactored tests to include metric depth model inference test
* Integration test pass when the timm backbone lines are commented (L220-L227)

* address feedback

* replace model path to use organization path

* formatting

* delete deprecated TODO

* address feedback

* [run_slow] depth_anything
2024-08-13 16:16:30 +02:00
Raushan Turganbay
a29eabd0eb
Expand inputs in processors for VLMs (#30962)
* let it be

* draft

* should not have changed

* add warnings

* fix & add tests

* fix tests

* ipnuts embeds cannot be passed with pixels

* more updates

* paligemma ready!

* minor typos

* update blip-2

* fix tests & raise error

* docstring

* add blip2 test

* tmp

* add image seq length to config

* update docstring

* delete

* fix tests

* fix blip

* fix paligemma

* out-of-place scatter

* add llava-next-video

* Update src/transformers/models/blip_2/modeling_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* remove tmp

* codestyle

* nits

* more nits

* remove overriding in tests

* comprehension when merging video

* fix-copies

* revert changes for embeds test

* fix tests after making comprehension

* Update src/transformers/models/blip_2/processing_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/models/blip_2/processing_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* more updates

* fix tests

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-08-13 10:14:39 +05:00
Quentin Gallouédec
f1c8542ff7
"to be not" -> "not to be" (#32636)
* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
Raushan Turganbay
8f2b6d5e3d
Fix: FA2 with packed training (#32487)
* fix check

* add tests

* [run-slow] llama, gemma2

* oops, whisper actually runs but needed some special treatment
2024-08-12 13:40:07 +05:00
Younes Belkada
7c11491208
Add new model (#32615)
* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Arthur
e4522fe399
fix slow integration gemma2 test (#32534)
no empty revision
2024-08-09 11:28:22 +02:00
Pablo Montalvo
044281605f
Fix generate with inputs_embeds as input (#32493)
* I think inputs_embeds has ndim == 3

* fix sequence length catch

* add generate test

* [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama

* skip whisper

* fix bart test

* more fixes
2024-08-08 18:44:53 +02:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio (#32137)
* add qwen2audio

* Update check_repo.py

* fix style

* fix test

* fix style

* add model size

* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* switch the attention_mask and the feature_attention_mask

* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py

* fix initialization

* update chat_template

* fix consistency issue after copy

* add docstrings to _merge_input_ids_with_audio_features

* add copied from to prepare_inputs_for_generation

* add more details to docs

* rm comment

* add init_std

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* update

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update tests

* rm ignore_index

* update processor

* rm ffmpeg_read

* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update

* typo

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* fix quality

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* add official model

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Sangbum Daniel Choi
d3b3551750
Uniformize kwargs for processors - GroundingDINO (#31964)
* fix typo

* uniform kwargs

* make style

* add comments

* remove return_tensors

* remove common_kwargs from processor since it propagates

* make style

* return_token_type_ids to True

* revert the default imagekwargs since does not accept any value in the image processro

* revert processing_utils.py

* make style

* add molbap's commit

* fix typo

* fix common processor

* remain

* Revert "add molbap's commit"

This reverts commit a476c6ee88.

* add unsync PR

* revert

* make CI happy

* nit

* import annotationformat
2024-08-08 14:03:08 +01:00
Pablo Montalvo
80b90e7b2f
Add codestral mamba2 (#32080)
* add new model like

* draft cuda forward - mismatched keys (sharding on conv1)

* match keys successfully

* fix split

* get generation/forward running (wrong gens, norm?)

* :update

* some refactoring

* fixes

* works up until copy to cache

* fix

* update

* NON WORKING VERSION

* version that work?

* nit

* fix config

* fix conversion script

* working cuda forward

* nit

* update

* simplifcation

* make mamba slow simple work

* no einops

* todo

* fix style

* no einops

* update fix no einsum

* nit

* remove einops

* bug: scan_output differs strongly

* add rms norm option

* fix fast + slow generation with and w/o cache ✔️

* draft integration tests

* remove a big chunk of the einsum

* fix slow, fast generations, without any einsum

* fix copies

* fix structure

* fix up modeling and tests

* fix tests

* clamping is indeed worse

* recover mamba2 cache test

* fix copies

* no cache position (yet)

* fix tf tests

* fix matmul for generate

* fixup

* skip cache tests for now

* [run-slow]mamba2

* tune out hidden states for padding

* test batched generation

* propagate attention mask changes

* fix past length

* fix integration test

* style

* address comments

* update readme

* add mamba2 version check

* fix tests

* [run-slow]mamba2

* skip edge tests

* [run-slow]mamba2

* last fixup

* [run-slow]mamba2

* update README

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
Ao Tang
6a03942db7
Add Nemotron HF Support (#31699)
* Add nemotron support

* fix inference

* add unit test

* add layernorm1p as a class to avoid meta device mismatch

* test fixed

* Add copied_from statements

* remove pretraining_tp args

* remove nemotronlayernorm

* force LN computation done in FP32

* remove nemotrontokenizer and use llamatokenizer

* license update

* add option for kv_channels for minitron8b

* remove assert

* o_proj fixed

* o_proj reshape

* add gated_proj option

* typo

* remove todos

* fix broken test after merging latest main

* remove nezha/nat after meging main

* chnage default config to 15b model

* add nemo conversion script

* rename conversion script

* remove gate_proj option

* pr comment resolved

* fix unit test

* rename kv_channels to head_dim

* resolve PR issue

* add nemotron md

* fix broken tests

* refactor rope for nemotron

* test fix

* remove linearscaling

* whitespace and import

* fix some copied-from

* code style fix

* reformatted

* add position_embedding to nemotronattention

* rope refactor to only use config, copied-from fix

* format

* Run make fix-copies

* nemotron md with autodoc

* doc  fix

* fix order

* pass check_config_docstrings.py

* fix config_attributes

* remove all llama BC related code

* Use PreTrainedTokenizerFast

* ruff check examples

* conversion script update

* add nemotron to toctree
2024-08-06 15:42:05 +02:00
Francisco Kurucz
438d06c95a
Fix get large model config for Switch Transformer encoder only tester (#32438) 2024-08-06 11:48:32 +01:00
Pavel Iakubovskii
fb66ef8147
Update kwargs validation for preprocess with decorator (#32024)
* BLIP preprocess

* BIT preprocess

* BRIDGETOWER preprocess

* CHAMELEON preprocess

* CHINESE_CLIP preprocess

* CONVNEXT preprocess

* DEIT preprocess

* DONUT preprocess

* DPT preprocess

* FLAVA preprocess

* EFFICIENTNET preprocess

* FUYU preprocess

* GLPN preprocess

* IMAGEGPT preprocess

* INTRUCTBLIPVIDEO preprocess

* VIVIT preprocess

* ZOEDEPTH preprocess

* VITMATTE preprocess

* VIT preprocess

* VILT preprocess

* VIDEOMAE preprocess

* VIDEOLLAVA

* TVP processing

* TVP fixup

* SWIN2SR preprocess

* SIGLIP preprocess

* SAM preprocess

* RT-DETR preprocess

* PVT preprocess

* POOLFORMER preprocess

* PERCEIVER preprocess

* OWLVIT preprocess

* OWLV2 preprocess

* NOUGAT preprocess

* MOBILEVIT preprocess

* MOBILENETV2 preprocess

* MOBILENETV1 preprocess

* LEVIT preprocess

* LAYOUTLMV2 preprocess

* LAYOUTLMV3 preprocess

* Add test

* Update tests
2024-08-06 11:33:05 +01:00
Fanli Lin
e85d86398a
add the missing flash attention test marker (#32419)
* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more
2024-08-06 11:18:58 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens for luke and mluke models (#32413)
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize (#32242)
* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 (#32388)
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder (#32088)
* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 (#32339)
* Remove size check between attn_weights and kv_seq_len

* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding (#31772)
* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)
* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer (#32275)
* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq assignment (#32330)
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention (#32188)
* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore
2024-07-31 10:33:38 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191)
* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI
2024-07-30 23:36:38 +02:00
Kamil Akesbi
3fbaaaa64d
Whisper tokenizer word level timestamps (#32197)
* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test
2024-07-29 11:19:52 +01:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation (#30788)
* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Sai-Suraj-27
b8e5cd5396
Refactor: Removed un-necessary object base class (#32230)
* Refactored to remove un-necessary object base class.

* small fix.
2024-07-26 10:33:02 +02:00
Raushan Turganbay
fad15fba78
Llava: generate without images (#32183)
* llava w/o images

* tests
2024-07-26 10:17:27 +05:00
Yih-Dar
df6eee9201
Follow up for #31973 (#32025)
* fix

* [test_all] trigger full CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-25 16:12:23 +02:00
Kashif Rasul
de2318894e
[warnings] fix E721 warnings (#32223)
fix E721 warnings
2024-07-25 15:12:23 +02:00
Sanchit Gandhi
5658e749ad
[whisper] fix short-form output type (#32178)
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
2024-07-25 16:58:02 +08:00
Sai-Suraj-27
85a1269e19
fix: Replaced deprecated unittest method with the correct one (#32198)
Replaced deprecated unittest method with the correct one.
2024-07-24 18:00:21 +01:00
Matt
edd68f4ed8
🚨 No more default chat templates (#31733)
* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again
2024-07-24 17:36:32 +01:00
Joao Gante
e0182f3bd7
RoPE: relaxed rope validation (#32182)
* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist
2024-07-24 15:00:48 +01:00
Sai-Suraj-27
d2c687b3f1
Updated ruff to the latest version (#31926)
* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff
2024-07-23 17:07:31 +02:00
Sanchit Gandhi
3263b34354
Revert "Incorrect Whisper long-form decoding timestamps " (#32148)
Revert "Incorrect Whisper long-form decoding timestamps  (#32003)"

This reverts commit cd48553fc8.
2024-07-23 18:34:30 +08:00
Amit Garg
034b477847
Rename Phi-3 rope scaling type (#31436)
* renamed phi3 rope_scaling type

* fixed trailing whitespaces

* fixed test

* added warning

* fixed format
2024-07-23 12:33:22 +02:00
Merve Noyan
9ced33ca7f
Fix video batching to videollava (#32139)
---------

Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
2024-07-23 13:23:23 +03:00
Joao Gante
2e113422b3
Llama: RoPE refactor (#32135)
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-23 10:42:55 +01:00
mig-mfreitas
34b43211d7
Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910)
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods

YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.

Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.

We implement YaRN and Dynamic-YaRN for the following list of models:

 - LLaMA
 - Falcon
 - GPT-NeoX
 - Olmo
 - Persimmon
 - Phi
 - StableLM
 - OpenLLaMA

New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.

For more details, please refer to https://arxiv.org/abs/2309.00071.

Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

* Refactor YaRN implementation for LLaMA

Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.

This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
  from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies

Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

* Refactor Tensor Building Logic for YaRN

- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment

Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

* remove unwanted file

---------

Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-23 10:07:58 +01:00
Anton Vlasjuk
605f3245dc
Fix mask creations of GPTNeoX and GPT2 (#31944)
* fix mask creation of gpt2 and gpt_neox caused by me

* forgot the reshape of masks when shape > 2

* add tests for gpt neox and gpt2

* nit on a comment
2024-07-23 10:11:12 +02:00
Sanchit Gandhi
f83c6f1d02
Remove trust_remote_code when loading Libri Dummy (#31748)
* [whisper integration] use parquet dataset for testing

* propagate to others

* more propagation

* last one
2024-07-23 14:54:38 +08:00
Raushan Turganbay
3aefb4ec7f
LLaVaNeXT: pad on right if training (#32134)
* pad on right if training

* docs

* add tests
2024-07-23 10:23:55 +05:00
Yoni Gottesman
74d0eb3fed
Return assistant generated tokens mask in apply_chat_template (#30650)
return assistant generated tokens mask in apply_chat_template
2024-07-22 18:24:43 +01:00
Sai-Suraj-27
12b6880c81
fix: Fixed raising TypeError instead of ValueError for invalid type (#32111)
* Raised TypeError instead of ValueError for invalid types.

* Updated formatting using ruff.

* Retrieved few changes.

* Retrieved few changes.

* Updated tests accordingly.
2024-07-22 17:46:17 +01:00
Matt
7ba028fccb
Fix failing test with race condition (#32140)
* Fix failing test with race condition

* make fixup

* monotonic_ns instead of randint

* uuid4 instead of monotonic_ns

* Add a finally cleanup step
2024-07-22 16:07:29 +01:00
Lucain
f2a1e3ca68
Mention model_info.id instead of model_info.modelId (#32106) 2024-07-22 14:14:47 +01:00
Kamil Akesbi
89575b567e
Support generating with fallback for short form audio in Whisper (#30984)
* remove is_shortform

* adapt _retrieve_max_frames_and_seek for short_form

* return bos token in short and long form

* add decoder_input_ids to short form audios

* add eos token for  short form

* handle short form token_timestamps

* no need to return scores

* add is_shortform conditions

* handle when max_new_tokens is None - short form

* handle assistant decoding

* fix

* handle return_dict_in_generate

* handle split_by_batch for encoder_attentions attribute

* handle num_beams>1

* handle num_return_sequences>1 in generate_with_fallback

* handle num_return_sequences>1 with return_dict_in_generate=True

* raise error if max_new_tokens + decoder_inputs_ids > max_target_pos

* fix

* apply review suggestions

* fix

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix

* logits for both short form and long form

* handle if logits_processor is None

* test

* apply review changes to num_return_sequences

* add _expand_variables_for_generation

* remove short form commented section

* update comments

* uncomment num_beams line in generate_with_fallback

* update assistant decoding

* handle return_segment with short form generation

* up

* fix output format is_shortform

* overwrite beam_sample test

* update _set_return_timestamps

* apply review suggestions

* apply review suggestions

* remove seek_outputs_short_form

* fix _stack_split_outputs

* fix stack dim in _stack_split_outputs

* update tests

* fix past_key_values + beam tests

* fix

* clean _expand_variables_for_generation

* make style

* fix slow tests

* make style

* max_length condition

* make style

* add slow tests for shortform fallback

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* apply review changes

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* up

* fix slow tests

* apply review suggestions

* update test

* make style

* small fix

* fix

* fix test_new_cache_format

* fix past_key_values

* fix

* make style

* fix slow tests

* fix

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-19 13:42:22 +01:00
Kamil Akesbi
cd48553fc8
Incorrect Whisper long-form decoding timestamps (#32003)
* fix lo form timestamps in decode_batch

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* add test

* make style

* fix copies

* Update src/transformers/models/whisper/tokenization_whisper_fast.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/processing_whisper.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply review suggestions

* fix

* fix copies

* fix

* Update src/transformers/models/whisper/tokenization_whisper_fast.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix-copies

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-19 09:26:38 +01:00
Raushan Turganbay
b873234cb6
Llava: add default chat templates (#31691)
* add default chat templates

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/llava_next/processing_llava_next.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more clear docstring and docs

* Update docs/source/en/model_doc/llava.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add tests

* remove default templates (see #31733)

* load chat template from another file

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert some changes in docs

* forgot vipllava

* chat template file is not temporary hack

* warn if loading from processor

* not that file

* similarly modify `save_pretrained`

* Update tests/models/llava_next/test_processor_llava_next.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vipllava/test_processor_vipllava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2024-07-19 10:08:56 +05:00
Longjie Zheng
c75969ee28
Add torch.compile Support For Mamba (#31247)
* modify mamba cache

* set up cache

* add test

* [run-slow] mamba

* [run-slow] mamba

* address comments

* [run-slow] mamba

* use_cache_position

* [run-slow] mamba

* [run-slow] mamba

* [run-slow] mamba

* [run-slow] mamba

* fix

* cache in generate

* [run-slow] mamba

* address comments

* [run-slow] mamba

* [run-slow] mamba

* address comments

* [run-slow] mamba

* fix

* [run-slow] mamba

* fix

* [run-slow] mamba

* fix cache name

* [run-slow] mamba
2024-07-18 11:54:54 -04:00
Raushan Turganbay
673d30b826
Chameleon: minor fixes after shipping (#32037)
* fix merging

* make chameleon conditional
2024-07-18 16:54:07 +05:00
Pavel Iakubovskii
1c37e8c1a6
Add sdpa and FA2 for CLIP (#31940)
* Squashed commit of the following:

commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Fri Jul 12 18:23:52 2024 +0000

    Add model-specific sdpa tests

commit 60e4c88581abf89ec098da84ed8e92aa904c997d
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Fri Jul 12 18:20:53 2024 +0000

    Add fallback to eager (expensive operation)

commit c29033d30e7ffde4327e8a15cbbc6bee37546f80
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Thu Jul 11 17:09:55 2024 +0000

    Fix attn_implementation propagation

commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 09:05:27 2024 +0530

    style

commit e77e703ca75d00447cda277eca6b886cd32bddc0
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 09:04:57 2024 +0530

    add comment to explain why I had to touch forbidden codebase.

commit ab9d8849758e7773a31778ccba71588d18552623
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 09:03:02 2024 +0530

    fix: flax attribute access.

commit c570fc0abf9d1bd58c291aae3c7e384f995996d2
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 08:23:54 2024 +0530

    fix tensorflow attribute name.

commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 07:57:10 2024 +0530

    fix attribute access.

commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Sat May 25 07:44:02 2024 +0530

    _from_config.

commit 35aed64ff602422adcf41d7f677a0a24bd9eccae
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 24 18:46:52 2024 +0530

    propagation of attn_implementation.

commit 4c25c19845438b1dc1d35a5adf9436151c8c5940
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 24 09:24:36 2024 +0530

    style again

commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 24 09:19:05 2024 +0530

    use from_config.

commit b70c409956d0359fa6ae5372275d2a20ba7e3389
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 24 09:13:43 2024 +0530

    quality

commit a7b63beff53d0fc754c6564e2a7b51731ddee49d
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 10 14:35:10 2024 +0200

    add benchmark numbers

commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 10 13:50:16 2024 +0200

    Revert "reflect feedback more"

    This reverts commit dc123e71ef.

commit ca674829d28787349c2a9593a14e0f1d41f04ea4
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 10 13:50:05 2024 +0200

    Revert "fix"

    This reverts commit 37a1cb35b8.

commit fab2dd8576c099eb1a3464958cb206a664d28247
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 10 13:47:46 2024 +0200

    fix

commit fbc6ae50fd6f2d36294d31e191761631b701d696
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 10 13:38:30 2024 +0200

    reflect feedback more

commit 87245bb020b2d60a89afe318a951df0159404fc9
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 3 08:54:34 2024 +0530

    fixes

commit 1057cc26390ee839251e7f8b3326c4207595fb23
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 3 07:49:03 2024 +0530

    don't explicit set attn_implementation in tests

commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 3 07:43:54 2024 +0530

    explicitly override attn_implementation in the towers.

commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 3 07:38:42 2024 +0530

    import in one-line.

commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri May 3 07:34:58 2024 +0530

    move sdpa mention to usage tips.

commit 92884766c64dbb456926a3a84dd427be1349fa95
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Mon Apr 29 10:58:26 2024 +0530

    fix: memory allocation problem.

commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Mon Apr 29 09:56:59 2024 +0530

    fix-copies

commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Fri Apr 26 20:16:12 2024 +0530

    address arthur's comments.

commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9
Author: Sayak Paul <spsayakpaul@gmail.com>
Date:   Fri Apr 26 20:08:15 2024 +0530

    Apply suggestions from code review

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

commit 46e04361f37ded5c522ff05e9f725b9f82dce40e
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Wed Apr 24 09:55:27 2024 +0530

    add to docs.

commit 831629158ad40d34d8983f209afb2740ba041af2
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Wed Apr 24 09:33:10 2024 +0530

    styling.g

commit d263a119c77314250f4b4c8469caf42559197f22
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Wed Apr 24 09:15:20 2024 +0530

    up

commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Tue Apr 23 18:40:42 2024 +0530

    handle causal and attention mask

commit 122f1d60153df6666b634a94e38d073f3f260926
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Tue Apr 23 15:18:21 2024 +0530

    test fixes.

commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Tue Apr 23 09:39:25 2024 +0530

    fix: scaling inside sdpa.

commit 0f629989efc48b7315cf19405a81e02955efe7e5
Author: Sayak Paul <spsayakpaul@gmail.com>
Date:   Tue Apr 23 08:14:58 2024 +0530

    Update src/transformers/models/clip/modeling_clip.py

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

commit 14367316877dc27ea40f767ad1aee38bbc97e4ce
Author: sayakpaul <spsayakpaul@gmail.com>
Date:   Mon Apr 22 16:21:36 2024 +0530

    add: sdpa support to clip.

* Remove fallback for empty attention mask (expensive operation)

* Fix typing in copies

* Add flash attention

* Add flash attention tests

* List CLIP in FA docs

* Fix embeddings attributes and tf

* [run-slow] clip

* Update clip documentation

* Remove commented code, skip compile dynamic for CLIPModel

* Fix doc

* Fix doc 2

* Remove double transpose

* Add torch version check for contiguous()

* Add comment to test mixin

* Fix copies

* Add comment for mask

* Update docs

* [run-slow] clip
2024-07-18 10:30:37 +05:30
Pavel Iakubovskii
691586b0dc
Fix tests skip (#32012)
* [run-slow] clip

* [run-slow] clip

* Fix skip -> skipTest

* [run-slow] clip
2024-07-17 08:37:43 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model (#31534)
* Chameleon model integration

Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>

* fix 7B, again. mask away image tokens

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove pretrained_config_map

* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file

* remove tokenizer (use llama's); remove codechameleon tests

* a few copied from statements and minor changes

* copied from in ChameleonModel

* some copies in ChameleonForCausalLM

* a few more copies

* VQModel moved to ChameleonModel (as opposed to being in the processor)

* ChameleonProcessor ready

* Fix chameleon weights convert

* update conversion script

* clean-up processing

* update modeling a bit

* update

* update (throws error...)

* correct conversion ready

* fix tests

* fix docs

* docs

* ve swin norm

* fix device for vocab map

* add normalization

* update

* update script with rope rotations

* final fix on model conversion

* add slow tests

* more info in docs

* fix repo consistency tests

* fix repo tests

* fix-copies

* hope this will make CI happy

* fix for 30b model

* Update docs/source/en/index.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/image_processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/chameleon/processing_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/chameleon/test_modeling_chameleon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address comments

* remove assertion in conversion script

* add image processor test

* not copied

* port changes for qk layernorm

* fix-copies

* read token decorator for tests

* [run-slow] chameleon

* one more read-token

* address some comments

* qk norm changes

* tests and repo check

* moved rope permutations to conversion, YAY!

* fix past kv check

* docs

* layernorm done!

* let's be consistent in naming

* fix slow tests

* weird thing with slow CI, but let's see

* once more try

* remove past-kv as tuple following llama

* ignore

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Joao Gante
999981daf4
Tests: remove cuda versions when the result is the same 🧹🧹 (#31955)
remove cuda versions when the result is the same
2024-07-16 16:49:54 +01:00
Zach Mueller
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) (#31771)
* 1,100%!

* Clean

* Don't touch DS

* Experiment with dtype allocation

* skip test_load_save_without_tied_weights test

* A little faster

* Include proper upscaling?

* Fixup tests

* Potentially skip?

* Let's see if this fixes git history

* Maintain new dtype

* Fin

* Rm hook idea for now

* New approach, see what breaks

* stage

* Clean

* Stash

* Should be fin now, just need to mark failing models

* Clean up

* Simplify

* Deal with weird models

* Enc/Dec

* Skip w/ reason

* Adjust test

* Fix test

* one more test

* Keep experimenting

* Fix ref

* TO REMOVE: testing feedback CI

* Right push

* Update tests/utils/test_modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* disable

* Add new func

* Test nits from Amy

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Adjust comment

* Adjust comment on skip

* make private

* Fin

* Should be a not flag

* Clarify and rename test

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00
Joao Gante
e4682de635
Masking: remove flakiness from test (#31939) 2024-07-15 18:49:37 +01:00
Aviv Shamsian
7f79a97399
fix prompt strip to support tensors and np arrays (#27818)
* fix prompt strip to support tensors and np arrays

* framework agnostic

* change logic check before converting prompt into list

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adding _convert_to_list to tokenization_whisper_fast

* adding tests for prompt decoding

* adding comment

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adding comment

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* revert minor

* make style formatting

* style formatting after update

* Update src/transformers/models/whisper/tokenization_whisper_fast.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixing _strip_prompt to handle _decode_with_timestamps

* fix copies

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-12 20:07:10 +01:00
Naman Garg
c1e139c2b0
Adding hiera (#30356)
* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* Removed tim dependency

* added HieraBlock

* fixed: Model name

* added tests for HieraModel, HieraBlock

* fixed imports

* fixed quality & copies

* Fixes

* Update docs/source/en/model_doc/hiera.md

Fix name

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fixed formatting

* Code quality & Import differences

* quality and repo-consistency fix

* fixed no torch error

* Docstring fix

* Docstring fix

* doc string fix

* fixed example usage

* Resolved issues in modeling_hiera

* Removed Hiera MAE

* Added test and resolved bug

* fixed doc string

* First commit

* Finished conversion script and model forward working

* Resolved all issues

* nits

* Improving tests

* Nits

* More nits

* Improving HieraForMaskedImageModeling

* More improvements and nits

* Fixed docstrings of outputs

* More fixes

* More imrpovments

* Updated conversion script

* Fixed docstrings

* Improved tests

* Fixed attentou outputs test

* All tests green

* Removed unnecessary file

* contribution attribution

* Resolved a few issues

* Resolved Comments

* Updated model repo id and fixed bugs

* Removed loss print

* Make tests green

* Updated docstrings

* Fix style

* Fixed num_heads in config

* Removed unnecessary video checkpoint related code in the conversion script

* Fix style

* Changed atol in conversion script

* HieraConfig

* Fix copies

* Fixed typo

* Resolved few issues

* make

* converted conv_nd -> nn.Module

* Removed video complexities

* Removed video complexities

* fix style

* Addressing comments

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix style

* Fixed tests

* Fixed typo

* Fixed interpolate test

* Made torch fx compatible

* Made sure imageprocesor is correct

* Addressed comments

* Noise directly as torch

* Remove unnecesary attr

* Added return_dit

* Update src/transformers/models/hiera/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated checkpoints

* [run_slow] hiera

* Fixed device mismatch

* [run_slow] hiera

* Fixed GPU tests

* [run_slow] hiera

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
Sai-Suraj-27
da79b18087
fix: Removed duplicate field definitions in some classes (#31888)
Removed duplicate field definitions in classes.
2024-07-10 13:46:31 +01:00
Yung-Sung Chuang
d094d8d9ec
Generate: Add new decoding strategy "DoLa" in .generate() (#29619)
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-09 17:37:38 +01:00
Yih-Dar
4879ac2b33
Avoid failure TFBlipModelTest::test_pipeline_image_to_text (#31827)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-08 13:49:21 +02:00
Pavel Iakubovskii
a177821b24
Add FA2 and sdpa support for SigLIP (#31499)
* Rebase to main

* Fix attention implementation autoset for tex and vision configs

* Fixup

* Minor fixes

* Fix copies

* Fix attention_mask for FA2

* Add eqvivalence tests for siglip

* Remove right padding test

* Uncomment flaky

* Fix import

* Add to docs

* Fix test message

* Add sdpa

* Add sdpa equivalence test

* Add siglip sdpa to docs

* Fix typing for attention output

* Add sdpa tests

* Fix signature of FA2

* Autoset attn_implementation in config

* Rename bsz -> batch_size

* Move back autoset attn method

* Mark as flaky

* Correct attention mask padding

* [run-slow] siglip

* Add FA2 and sdpa docs

* Style fix

* Remove flaky for FA2 test

* Change attention implementation set

* Change attn_implementaiton propogation

* Fix typos

* Add modality to assert message

* Add more sdpa backends in test

* [run slow] siglip

* Add math sdpa backend for all options

* [run slow] siglip
2024-07-08 11:10:02 +01:00
NielsRogge
06fd7972ac
Add ZoeDepth (#30136)
* First draft

* Add docs

* Clean up code

* Convert model

* Add image processor

* Convert Zoe_K

* More improvements

* Improve variable names and docstrings

* Improve variable names

* Improve variable names

* Replace nn.sequential

* More improvements

* Convert ZoeD_NK

* Fix most tests

* Verify pixel values

* Verify pixel values

* Add squeeze

* Update beit to support arbitrary window sizes

* Improve image processor

* Improve docstring

* Improve beit

* Improve model outputs

* Add figure

* Fix beit

* Update checkpoint

* Fix repo id

* Add _keys_to_ignore_on_load_unexpected

* More improvements

* Address comments

* Address comments

* Address comments

* Address comments

* Rename variable name

* Add backbone_hidden_size

* Vectorize

* Vectorize more

* Address comments

* Clarify docstring

* Remove backbone_hidden_size

* Fix image processor

* Remove print statements

* Remove print statement

* Add integration test

* Address comments

* Address comments

* Address comments

* Address comments

* Add requires_backends

* Clean up

* Simplify conversion script

* Simplify more

* Simplify more

* Simplify more

* Clean up

* Make sure beit is loaded correctly

* Address comment

* Address bin_configurations

* Use bin_configurations

* Convert models, add integration tests

* Fix doc test

* Address comments

* Unify regressor classes

* Clarify arguments

* Improve resize_image

* Add num_relative_features

* Address comment

* [run-slow]beit,data2vec,zoedepth

* [run-slow]beit,data2vec,zoedepth

* Address comments

* Address comment

* Address comment

* Replace nn.TransformerEncoderLayer and nn.TransformerEncoder

* Replace nn.MultiheadAttention

* Add attributes for patch transformer to config

* Add tests for ensure_multiple_of

* Update organization

* Add tests

* [run-slow] beit data2vec

* Update ruff

* [run-slow] beit data2vec

* Add comment

* Improve docstrings, add test

* Fix interpolate_pos_encoding

* Fix slow tests

* Add docstring

* Update src/transformers/models/zoedepth/image_processing_zoedepth.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/zoedepth/image_processing_zoedepth.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Improve tests and docstrings

* Use run_common_tests

* Improve docstrings

* Improve docstrings

* Improve tests

* Improve tests

* Remove print statements

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-08 11:43:33 +02:00
Billy Cao
1d3eaa6f7e
Add training support for SigLIP (#31495)
* Add siglip loss function

* Update docs

* Enable training tests
[experimental] enable GC training tests as it has worked for my own data

* Remove test_training* overrides to enable training tests
[run_slow] siglip

* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip

* Skip GC training tests for SiglipForImageClassification

* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel

* Remove copied from to fix CI
2024-07-05 14:50:39 +01:00
Yih-Dar
eef0507f3d
Fix gemma tests (#31794)
* skip 3 7b tests

* fix

* fix

* fix

* [run-slow] gemma

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-05 10:17:59 +02:00
Pavel Iakubovskii
048f599f35
Fix RT-DETR weights initialization (#31724)
* Fix init for rt-detr heads

* Fixup

* Add separate prior_prob value to config for initialization

* Add bbox init

* Change to 1 / num_labels init

* Adjust weights init test

* Fix style for test
2024-07-03 14:29:02 +01:00
Pavel Iakubovskii
b97521614a
Fix RT-DETR cache for generate_anchors (#31671)
* Fix cache and type conversion

* Add test

* Fixup

* nit

* [run slow] rt_detr

* Fix test

* Fixup

* [run slow] rt_detr

* Update src/transformers/models/rt_detr/modeling_rt_detr.py
2024-07-03 14:19:57 +01:00
Joao Gante
ddfaf11926
Gemma 2: Update slow tests (#31759)
gemma 2 slow tests
2024-07-03 11:43:44 +02:00
Sanchit Gandhi
a9701953ff
[whisper] static kv cache (#31166)
* make work with cache abstraction

* correct for static cache

* hacks for compile

* make fast

* fix

* fix pos ids

* generate

* fix sdpa

* fix sdpa cache pos

* fix fa2

* clean fa2

* integrate cache into generate

* make style

* copies

* more copies

* update eager

* update sdpa

* update fa2

* simplify

* use cache pos

* always compute cross-cache for debug

* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>

* fix fix

* fix fix fix

* more fix

* try encoder-decoder cache (too messy)

* revert encoder-decoder cache

* check cross-attn cache

* use enc-dec dataclass

* use richer enc-dec dataclass

* clean-up

* revert static cache changes

* small fixes

* revert to cpu flag

* fix copies

* add static slow test

* past k/v docstring

* more docstrings

* cache_position docstrings

* add to docs

* add enc-dec cache to docs

* make style

* fix after rebase

* fix beam

* style

* fix generation strategies

* fix most decoder-only tests

* style

* skip test

* more clean up

* small docstrings

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add todo

* only crop self-attn

* check cache in mixin

* style

* fix re-compile after rebase

* move `is_updated` logic to enc-dec wrapper

* revert back

* revert cache back

* finalise design

* fix

* fix fix

* style

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* deprecate

* updates

* final updates

* style

* style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-02 13:24:15 +01:00
Jacky Lee
82a1fc7256
Fix return_dict in encodec (#31646)
* fix: use return_dict parameter

* fix: type checks

* fix: unused imports

* update: one-line if else

* remove: recursive check
2024-06-28 12:18:01 +01:00
Arthur
75a6319864
Fix post gemma merge (#31660)
* nit

* toctree issue

* protect gemma2 tests as well

* sdpa supported
2024-06-27 17:51:42 +02:00
Arthur
0cf60f13ab
Add gemma 2 (#31659)
* inital commit

* Add doc

* protect?

* fixup stuffs

* update tests

* fix build documentation

* mmmmmmm config attributes

* style

* nit

* uodate

* nit

* Fix docs

* protect some stuff

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
Sangbum Daniel Choi
be50a0338b
change anchor_image_size None for compatibility (#31640)
* change anchor_image_size None for compatibility

* make fix-copies
2024-06-27 12:36:55 +01:00
amyeroberts
1de7dc7403
Skip tests properly (#31308)
* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]
2024-06-26 21:59:08 +01:00
Billy Cao
1f9f57ab4c
Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference (#31589)
* Fix dtype casting in modeling_swin2sr to allow non-FP32 inference

* Fix formattting

* Fix for swinv2 too

* Update src/transformers/models/swin2sr/modeling_swin2sr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/swinv2/modeling_swinv2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add FP16 tests for swin2sr and swinv2

* [run_slow] swin2sr, swinv2

* [run_slow] swin2sr, swinv2

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 18:46:48 +01:00
Pablo Montalvo
492ee17ec3
Fix paligemma detection inference (#31587)
* fix extended attention mask

* add slow test for detection instance

* [run-slow]paligemma
2024-06-26 19:17:09 +02:00
Raushan Turganbay
e71f2863d7
Add LLaVa NeXT Video (#31252)
* squash into single commit

* run diff once more

* docstring

* tests

* minor chnages and ready to go

* Update src/transformers/models/llava_next_video/processing_llava_next_video.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vipllava/test_modeling_vipllava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [run-slow] llava-next-video

* [run-slow] llava-next-video

* [run-slow] llava_next_video

* fix two tests

* fix slow tests

* remove logit checks due to numeric errors

* run test once more

* [run-slow] llava_next_video

* final try to pass the test

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* style

* fix

* style

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-26 21:52:28 +05:00
Pavel Iakubovskii
b1ec745475
Fix RT-DETR inference with float16 and bfloat16 (#31639)
* [run_slow] rt_detr

* Fix positional embeddings and anchors dtypes

* [run slow] rt_detr

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 17:50:10 +01:00
Anton Vlasjuk
b07770c5eb
[GPT-NeoX] Add SDPA support (#31031)
* starting support for sdpa in `gptneox` models

* small comment on tests

* fix dropout

* documentation and style

* clarify concrete paths for reference

* generalise attn projections and rope application

added head mask check to sdpa mask creation

handle sdpa memory backend bug via own version flag

* update docs and style

* move dtype casting outside of general attn_projection_and_rope function

fix flash_attn_2 stuff

* more generic attn warning if output_attns or head_mask

* simplify head mask check by moving head mask creation to a later point

* remove copied llama artifact

* remove padding_mask from attention function signature

* removing unnecessary comments, only "save" attn implementation once

* [run_slow] gpt_neox
2024-06-26 13:56:36 +01:00
amyeroberts
0f67ba1d74
Add ViTImageProcessorFast to tests (#31424)
* Add ViTImageProcessor to tests

* Correct data format

* Review comments
2024-06-25 13:36:58 +01:00
Raushan Turganbay
fc689d75a0
Add video modality for InstrucBLIP (#30182)
* squash in single commit

* add docs

* dummy obj

* more changes in diff converter

* tiny fix

* make docs happy

* skip test

* repo consistency tests

* update docstring

* style

* fix tests

* change diff imports

* [run-slow] instructblipvideo

* [run-slow] instructblipvideo

* fix tests and remove logit check

* [run-slow] instructblipvideo
2024-06-25 15:45:39 +05:00
Raushan Turganbay
7e86cb6c6f
Siglip: add _no_split_module (#31566)
* device-map siglip

* move split modules to PretrainedSigLip
2024-06-25 09:49:55 +05:00
Pavel Iakubovskii
3c2d4d60d7
Correct @is_flaky test decoration (#31480)
* Correct @is_flaky decorator
2024-06-24 08:09:21 +01:00
Sangbum Daniel Choi
74a207404e
New model support RTDETR (#29077)
* fill out docs string in configuration
75dcd3a0e8 (r1506391856)

* reduce the input image size for the tests

* remove the unappropriate tests

* only 5 failes exists

* make style

* fill up missed architecture for object detection in docs

* fix auto modeling

* simple fix in missing import

* major change including backbone refactor and objectdetectionoutput refactor

* minor fix only 4 fails left

* intermediate fix

* revert __init__.py

* revert __init__.py

* make style

* fixes in pr_docs

* intermediate fix

* make style

* two fixes

* pass doctest

* only one fix left

* intermediate commit

* all fixed

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/rt_detr/test_modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* function class above the model definition in dice_loss

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* simple fix

* layernorm add config.layer_norm_eps

* fix inputs_docstring

* make style

* simple fix

* add custom coco loading test in image_processor

* fix error in BaseModelOutput
https://github.com/huggingface/transformers/pull/29077#discussion_r1516657790

* simple typo

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* intermediate fix

* fix with load_backbone format

* remove unused configuration

* 3 fix test left

* make style

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: Sounak Dey <dey.sounak@gmail.com>

* change last_hidden_state to first index

* all pass fix
TO DO: minor update in comments

* make fix-copies

* remove deepcopy

* pr_document fix

* revert deepcopy due to the issue of unexpceted behavior in decoderlayer

* add atol in final

* add no_split_module

* _no_split_modules = None

* device transfer for model parallelism

* minor fix

* make fix-copies

* fix typo

* add test_image_processor with post_processing

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add config in RTDETRPredictionHead

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* set lru_cache with max_size 32

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add lru_cache import and configuration change

* change the order of definition

* make fix-copies

* add docs and change config error

* revert strange make-fix

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* test pass

* fix get_clones related and remove deepcopy

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* nit for paper section

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rename denoising related parameters

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* check the image transformation logic

* make style

* make style

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* pe_encoding -> positional_encoding_temperature

* remove TODO

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* remove eval_idx since transformer DETR is giving all decoder output

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* change variable name

* make style and docs import update

* Revert "Update src/transformers/models/rt_detr/image_processing_rt_detr.py"

This reverts commit 74aa3e1de0.

* fix typo

* add postprocessing in docs

* move import scipy to top

* change varaible name

* make fix-copies

* remove eval_idx in test

* move to after first sentence

* update image_processor since box loss requires normalized one

* change appropriate name to auxiliary_outputs

* Update src/transformers/models/rt_detr/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/rt_detr/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/rt_detr.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/rt_detr.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* make style

* remove panoptic related comments

* make style

* revert valid_processor_keys

* fix aux related test

* make style

* change origination from config to backbone API

* enable the dn_loss

* fix test and conversion

* renewal weight initialization

* change initializer_range

* make fix-up

* fix the loss issue in the auxiliary output and denoising part

* change weight loss to original RTDETR

* fix in initialization

* sync shape format of dn and aux

* make style

* stable fine-tuning and compatible conversion for resnet101

* make style

* skip input_embed

* change encoder related variable

* enable converting rtdetr_r101

* add r101 related conversion code

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/rt_detr.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/image_processing_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change name _shape to _reshape

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* maket style

* make fix-copies

* remove deprecated import

* more fix

* remove last_hidden_state for task-specific model

* Revert "remove last_hidden_state for task-specific model"

This reverts commit ccb7a34051.

* minore change in convert

* remove print

* make style and fix-copies

* add custom rtdetr backbone for r18, r34

* remove print

* change copied

* add pad_size

* make style

* change layertype to optional to pass the CI

* make style

* add test in modeling_resnet_rt_detr

* make fix-copies

* skip tmp file test

* fix comment

* add docs

* change to modeling_resnet file format

* enabling resnet50 above

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>

* enable all the rtdetr model :)

* finish except CI

* add RTDetrResNetBackbone

* make fix-copies

* fix
TO DO: CI enable

* make style

* rename test

* add docs

* add special fix

* revert resnet

* Update src/transformers/models/rt_detr/modeling_rt_detr_resnet.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add more comment

* remove swin comment

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rename convert and add verify backbone

* Update docs/source/en/_toctree.yml

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/rt_detr.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/rt_detr.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* make style

* requests for docs

* more general test docs

* general script docs

* make fix-copies

* final commit

* Revert "Update src/transformers/models/rt_detr/configuration_rt_detr.py"

This reverts commit d136225cd3.

* skip test_model_get_set_embeddings

* remove target

* add changes

* make fix-copies

* remove decoder_attention_mask

* add load_backbone function for auto_backbone

* remove comment

* fix repo name

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* final commit

* remove unused downsample_in_bottleneck

* new test for autobackbone

* change to appropriate indices

* test fix

* fix dict in test_image_processor

* fix test

* [run-slow] rt_detr, rt_detr_resnet

* change the slow test

* [run-slow] rt_detr

* [run-slow] rt_detr, rt_detr_resnet

* make in to same cuda in CSPRepLayer

* [run-slow] rt_detr, rt_detr_resnet

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sounak Dey <dey.sounak@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>
Co-authored-by: ChoiSangBum <choisangbum@ChoiSangBumui-MacBookPro.local>
2024-06-21 17:50:08 +01:00
Ita Zaporozhets
1e79eade41
SPLIT PR: add user defined symbols and control symbols (#31305)
* PR SPLIT: moving origina changes for adding user defined symbols

* adding gemma test and generalizing gemma converter

* ruff

* update common test

* update serialization test

* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added

* removing commented lines

* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols

* add comment referencing sentencepiece
2024-06-21 01:48:10 -07:00
Yih-Dar
ec905f3a76
unskip 2 tests in cohere (#31517)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-20 17:21:08 +02:00