Commit Graph

451 Commits

Author SHA1 Message Date
Klaus Hipp
721ee783ca
[Docs] Fix spelling and grammar mistakes (#28825)
* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
2024-02-02 08:45:00 +01:00
JB (Don)
0d26abdd3a
Adding [T5/MT5/UMT5]ForTokenClassification (#28443)
* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest
2024-02-01 03:53:49 +01:00
Kian Sierra McGettigan
f7076cd346
Flax mistral (#26943)
* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf22.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187a.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
Sanchit Gandhi
da3c79b245
[Whisper] Make tokenizer normalization public (#28136)
* [Whisper] Make tokenizer normalization public

* add to docs
2024-01-29 16:07:35 +00:00
Julien Chaumond
26aa03a252
small doc update for CamemBERT (#28644) 2024-01-29 15:46:32 +01:00
Vinyzu
3a08cc485f
[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) (#28751)
* [Docs] Fix Typo in English CLIP model_doc

* [Docs] Fix Typo in Japanese CLIP model_doc
2024-01-29 10:06:51 +00:00
NielsRogge
963db81a5a
Add Depth Anything (#28654)
* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* Add docs

* Remove file

* Add copied from

* Address comments

* Address comments

* Address comments

* Fix style

* Update docs

* Convert all checkpoints, add integration test

* Rename checkpoints

* Add pretrained backbone attributes

* Fix default config

* Address comment

* Add figure to docs

* Fix bug thanks to @xenova

* Update conversion script

* Fix integration test
2024-01-25 09:34:50 +01:00
amyeroberts
e547458c43
Fix phi model doc checkpoint (#28581)
Co-authored-by: Pashmina Cameron <11311835+pashminacameron@users.noreply.github.com>
2024-01-22 17:15:07 +00:00
NielsRogge
faf03541e2
[SigLIP] Don't pad by default (#28578)
First draft
2024-01-19 13:30:00 +01:00
Yoach Lacombe
d2cdefb9ec
Add new meta w2v2-conformer BERT-like model (#28165)
* first commit

* correct default value non causal

* update config and modeling code

* update converting checkpoint

* clean modeling and fix tests

* make style

* add new config parameters to docstring

* fix copied from statements

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* make position_embeddings_type docstrings clearer

* clean converting script

* remove function not used

* clean modeling file

* apply suggestion for test file + add convert script to not_doctested

* modify tests according to review - cleaner logic and more tests

* Apply nit suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add checker of valid position embeddings type

* instantiate new layer norm layer with the right eps

* fix freeze_feature_encoder since it can be None in some cases

* add test same output in convert script

* restore wav2vec2conformer and add new model

* create processor and FE + clean

* add new model code

* fix convert script and set default config parameters

* correct model id paths

* make style

* make fix-copies and cleaning files

* fix copied from statements

* complete .md and fixe copies

* clean convert script argument defaults

* fix config parameters docstrings

* fix config docstring

* add copied from and enrich FE tests

* fix copied from and repo-consistency

* add autotokenizer

* make test input length shorter and change docstring code

* fix docstrings and copied from

* add add_adapter to ASR training example

* make testing of adapters more robust

* adapt to multi adapter layers

* refactor input_values->input_features and remove w2v2-bert feature extractor

* remove pretraining model

* remove depreciated features and useless lines

* add copied from and ignore statements to modeling tests

* remove pretraining model #2

* change import in convert script

* change default in convert script

* update readme and remove useless line

* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor BERT to Bert for consistency

* remove useless ignore copy statement

* add persistent to buffer in rotary

* add eps in LayerNorm init and remove copied from

* add adapter activation parameters and add copied from statements

* Fix copied statements and add unitest.skip reasons

* add copied statement in test_processor

* refactor processor

* make style

* replace numpy random by torch rand

* remove expected output CTC

* improve converting script with processor class

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove gumbel class

* remove tests related to previously deleted class

* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* correct typos

* remove uused parameters

* update processor to takes both text and audio

* update checkpoints

* update expected output and add ctc expected output

* add label_attention_mask

* replace pt with np in processor tests

* fix typo

* revert to behaviour with labels_attention_mask

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-18 13:37:34 +00:00
Junyang Lin
d6ffe74dfa
Add qwen2 (#28436)
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-17 16:02:22 +01:00
Gustavo de Rosa
d93ef7d751
Fixes default value of softmax_scale in PhiFlashAttention2. (#28537)
* fix(phi): Phi does not use softmax_scale in Flash-Attention.

* chore(docs): Update Phi docs.
2024-01-17 14:22:44 +01:00
Francisco Kurucz
121641cab1
Fix paths to AI Sweden Models reference and model loading (#28423)
Fix URL to Ai Sweden Models reference and model loading
2024-01-15 09:09:22 +01:00
Francisco Kurucz
3724156b4d
Fix load correct tokenizer in Mixtral model documentation (#28437) 2024-01-10 18:09:06 +01:00
Susnato Dhar
fff8ca8e59
update docs to add the phi-2 example (#28392)
* update docs

* added Tip
2024-01-10 16:07:47 +01:00
NielsRogge
3b742ea84c
Add SigLIP (#26522)
* Add first draft

* Use appropriate gelu function

* More improvements

* More improvements

* More improvements

* Convert checkpoint

* More improvements

* Improve docs, remove print statements

* More improvements

* Add link

* remove unused masking function

* begin tokenizer

* do_lower_case

* debug

* set split_special_tokens=True

* Remove script

* Fix style

* Fix rebase

* Use same design as CLIP

* Add fast tokenizer

* Add SiglipTokenizer to init, remove extra_ids

* Improve conversion script

* Use smaller inputs in conversion script

* Update conversion script

* More improvements

* Add processor to conversion script

* Add tests

* Remove print statements

* Add tokenizer tests

* Fix more tests

* More improvements related to weight initialization

* More improvements

* Make more tests pass

* More improvements

* More improvements

* Add copied from

* Add canonicalize_text

* Enable fast tokenizer tests

* More improvements

* Fix most slow tokenizer tests

* Address comments

* Fix style

* Remove script

* Address some comments

* Add copied from to tests

* Add more copied from

* Add more copied from

* Add more copied from

* Remove is_flax_available

* More updates

* Address comment

* Remove SiglipTokenizerFast for now

* Add caching

* Remove umt5 test

* Add canonicalize_text inside _tokenize, thanks Arthur

* Fix image processor tests

* Skip tests which are not applicable

* Skip test_initialization

* More improvements

* Compare pixel values

* Fix doc tests, add integration test

* Add do_normalize

* Remove causal mask and leverage ignore copy

* Fix attention_mask

* Fix remaining tests

* Fix dummies

* Rename temperature and bias

* Address comments

* Add copied from to tokenizer tests

* Add SiglipVisionModel to auto mapping

* Add copied from to image processor tests

* Improve doc

* Remove SiglipVisionModel from index

* Address comments

* Improve docs

* Simplify config

* Add first draft

* Make it like mistral

* More improvements

* Fix attention_mask

* Fix output_attentions

* Add note in docs

* Convert multilingual model

* Convert large checkpoint

* Convert more checkpoints

* Add pipeline support, correct image_mean and image_std

* Use padding=max_length by default

* Make processor like llava

* Add code snippet

* Convert more checkpoints

* Set keep_punctuation_string=None as in OpenCLIP

* Set normalized=False for special tokens

* Fix doc test

* Update integration test

* Add figure

* Update organization

* Happy new year

* Use AutoModel everywhere

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00
Rosie Wood
73c88012b7
Add segmentation map processing to SAM Image Processor (#27463)
* add segmentation map processing to sam image processor

* fixup

* add tests

* reshaped_input_size is shape before padding

* update tests for size/shape outputs

* fixup

* add code snippet to docs

* Update docs/source/en/model_doc/sam.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add missing backticks

* add `segmentation_maps` as arg for SamProcessor.__call__()

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-08 16:40:36 +00:00
Connor Henderson
d83ff5eeff
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style
2024-01-03 18:01:06 +00:00
Sourab Mangrulkar
def581ef51
Fix FA2 integration (#28142)
* fix fa2

* fix FA2 for popular models

* improve warning and add Younes as co-author

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix the warning

* Add Tip

* typo fix

* nit

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-20 14:25:07 +05:30
Aaron Jimenez
38611086d2
[docs] Fix mistral link in mixtral.md (#28143)
Fix mistral link in mixtral.md
2023-12-19 10:34:14 -08:00
Aeneas Stankowski
7f2a8f92e4
Spelling correction (#28110)
Update mixtral.md

correct minor typo in overview
2023-12-18 14:04:05 +00:00
Younes Belkada
1faeff85ce
Fix Vip-llava docs (#28085)
* Update vipllava.md

* Update modeling_vipllava.py
2023-12-15 20:16:47 +01:00
Sanchit Gandhi
52c37882fb
[Seamless] Fix links in docs (#27905)
* [Seamless] Fix links in docs

* apply suggestions from code review
2023-12-14 15:14:13 +00:00
Younes Belkada
c7f076a00e
Adds VIP-llava to transformers (#27932)
* v1

* add-new-model-like

* revert

* fix forward and conversion script

* revert

* fix copies

* fixup

* fix

* Update docs/source/en/index.md

* Apply suggestions from code review

* push

* fix

* fixes here and there

* up

* fixup and fix tests

* Apply suggestions from code review

* add docs

* fixup

* fixes

* docstring

* add docstring

* fixup

* docstring

* fixup

* nit

* docs

* more copies

* fix copies

* nit

* update test
2023-12-13 10:42:24 +01:00
NielsRogge
67b1335cb9
Update bounding box format everywhere (#27944)
Update formats
2023-12-11 18:03:42 +00:00
Timon Käch
5cec306cdc
Fix parameter count in readme for mixtral 45b (#27945)
fix parameter count in readme
2023-12-11 14:58:48 +00:00
Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
NielsRogge
7ea21f1f03
[LLaVa] Some improvements (#27895)
* More improvements

* Improve variable names

* Update READMEs, improve docs
2023-12-11 10:22:26 +01:00
fxmarty
80377eb018
F.scaled_dot_product_attention support (#26572)
* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
Younes Belkada
44b5506d29
[Llava] Add Llava to transformers (#27662)
* add model like

* logits match

* minor fixes

* fixes

* up

* up

* add todo

* llava processor

* keep the processor simple

* add conversion script

* fixup

* fix copies

* up

* add to index

* fix config + logits

* fix

* refactor

* more refactor

* more refactor

* fix copies

* add authors

* v1 tests

* add `LlavaProcessor` in init

* remove unneeded import

* up

* up

* docs

* up

* fix CI

* fix CI

* add attention  mask in test

* make fixup

* remove the vision model

* that' s the dirty way to do it

* nits

* nits

* updates

* add more tests

* add input tests

* fixup

* more styling

* nits

* updates amd cleanup

* fixup the generation expected results

* fix the testing script

* some cleanup and simplification which does not work yet but almost there!

* make correct dispatch operations

* vectorize works for batch of images and text

* last todos

* nits

* update test and modeling code

* remove useless function for now

* fix few issues

* fix generation

* some nits

* add bakllava

* nits

* remove duplicated code

* finis merge

* cleanup

* missed this line

* fill the todos

* add left padding offset

* add left and rignt padding logic

* bool to properly index

* make sure

* more cleanups

* batch is fixed 😉

* add correct device for tensor creation

* fix some dtype missmatch

* ruff

* update conversion script

* Update src/transformers/__init__.py

* fa 2 support + fix conversion script

* more

* correct reshaping

* fix test dict

* fix copies by ignoring

* fix nit

* skip clip vision model

* fixup

* fixup

* LlavaForVisionText2Text -> LlavaForCausalLM

* update

* fix

* raise correct errors

* fix

* docs

* nuke for now

* nits here and there

* fixup

* fix remaining tests

* update LlavaForConditionalGeneration instead of CausalLM

* fixups

* pipeline support

* slow and piepline tests

* supports batch

* nits

* cleanup

* fix first integration tests

* add pad token where needed

* correct etsts

* fixups

* update pipeline testr

* fix quality

* nits

* revert unneeded change

* nit

* use BatchFeature

* from ...feature_extraction_utils import BatchFeature

* nits

* nits

* properly update

* more f*** nits

* fix copies

* comment

* keep slow test slow

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add piepline example

* add pixel values in docstrign

* update pr doctest

* fix

* fix slow tests

* remove hack

* fixup

* small note

* forward contrib credits from PR25789

* forward contrib credits from original implementation and work

* add arthur

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update docstring

* nit

* move to not doctested because of timeout issues

* fixup

* add description

* more

* fix-copies

* fix docs

* add beam search

* add more comments

* add typehints on processor

* add speedup plot

* update slow tests and docs

* push test

* push batched test

* fix batched generation with different number of images

* remove benchmark due to a bug

* fix test

* fix copies

* add gcolab demo

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-07 09:30:47 +01:00
Susnato Dhar
f84d85ba67
[FA-2] Add Flash Attention to Phi (#27661)
* add FA and modify doc file

* test_flash_attn_2_generate_padding_right test overwritten

* comment

* modify persimmon modeling file

* added speedup graph

* more changes
2023-12-07 07:57:48 +01:00
Alex McKinney
75336c1794
Add Llama Flax Implementation (#24587)
* Copies `modeling_flax_gpt_neo.py` to start

* MLP Block. WIP Attention and Block

* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue

* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.

* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating

* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..

* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?

* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now

* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now

* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting

* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗

* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later

* start porting pretrained wrappers
realised it probably needs return dict as a prereq

* cleanup, quality, style

* readds `return_dict` and model output named tuples

* (tentatively) pretrained wrappers work 🔥

* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.

* [WIP] debugging numerics

* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.

* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict

* adds missing TYPE_CHECKING import and `make fixup`

* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO

* commenting out equivalence test as can just use common

* debugging

* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥

* cleanup of modeling file

* cleanup of test file

* Resolving simpler review comments

* addresses more minor review comments

* fixing introduced pytest errors from review

* wip additional slow tests

* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay

* `make quality`, `make style`

* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs

* `make fix-copies`

* fix mangled function following `make fix-copies`

* adds missing type checking imports

* fixes missing parameter checkpoint warning

* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`

* swaps import guards
??? how did these get swapped initially?

* removing `inv_freq` again as pytorch version has now removed

* attempting to get CI to pass

* adds doc entries for llama flax models

* fixes typo in __init__.py imports

* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version

* overrides tests with dummy to see if CI passes
need to fill in these tests later

* adds my contribution to docs

* `make style; make quality`

* replaces random masking with fixed to work with flax version

* `make quality; make style`

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* updates `x`->`tensor` in `rotate_half`

* addresses smaller review comments

* Update docs/source/en/model_doc/llama.md

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adds integration test class

* adds `dtype` to rotary embedding to cast outputs

* adds type to flax llama rotary layer

* `make style`

* `make fix-copies`

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* applies suggestions from review

* Update modeling_flax_llama.py

* `make fix-copies`

* Update tests/models/llama/test_modeling_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixes shape mismatch in FlaxLlamaMLP

* applies some suggestions from reviews

* casts attn output logits to f32 regardless of dtype

* adds attn bias using `LlamaConfig.attention_bias`

* adds Copied From comments to Flax Llama test

* mistral and persimmon test change -copy from llama

* updates docs index

* removes Copied from in tests

it was preventing `make fix-copies` from succeeding

* quality and style

* ignores FlaxLlama input docstring

* adds revision to `_CHECKPOINT_FOR_DOC`

* repo consistency and quality

* removes unused import

* removes copied from from Phi test

now diverges from llama tests following FlaxLlama changes

* adds `_REAL_CHECKPOINT_FOR_DOC`

* removes refs from pr tests

* reformat to make ruff happy

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-12-07 07:05:00 +01:00
Rockerz
9660e27cd0
Translating en/model_doc folder docs to Japanese(from blip to clap) 🇯🇵 (#27673)
* Add models

* Add models and update `_toctree.yml`

* Update docs/source/ja/model_doc/chinese_clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/camembert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/bros.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/bros.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/blip-2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/camembert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* solve merge conflicts and update paper titles

* Update docs/source/ja/model_doc/bridgetower.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/chinese_clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the authons name in bros..md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-12-06 10:38:21 -08:00
Younes Belkada
9270ab0827
[Flash Attention 2] Add flash attention 2 for GPT-Neo-X (#26463)
* add flash-attn-2 support for GPT-neo-x

* fixup

* add comment

* revert

* fixes

* update docs

* comment

* again

* fix copies

* add plot + fix copies

* Update docs/source/en/model_doc/gpt_neox.md
2023-12-06 17:22:32 +01:00
Arindam Jati
b242d0f297
[Time series] Add PatchTSMixer (#26247)
* patchtsmixer initial commit

* x,y->context_values,target_values, unittest addded

* cleanup code

* minor

* return hidden states

* model tests, partial integration tests

* ettm notebook temporary

* minor

* config mask bug fix, tests updated

* final ETT notebooks

* add selfattn

* init

* added docstrings

* PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining

* functionality tests added

* add start and input docstrings

* docstring edits

* testcase edits

* minor changes

* docstring error fixed

* ran make fixup

* finalize integration tests and docs

* minor

* cleaned gitignore

* added dataclass decorator, ran black formatter

* ran ruff

* formatting

* add slow decorator

* renamed in_Channel to input_size and default to 1

* shorten dataclass names

* use smaller model for testing

* moved the 3 heads to the modeling file

* use scalers instead of revin

* support forecast_channel_indices

* fix regression scaling

* undo reg. scaling

* removed unneeded classes

* forgot missing

* add more layers

* add copied positional_encoding

* use patchmask from patchtst

* removed dependency on layers directory

* formatting

* set seed

* removed unused imports

* fixed forward signature test

* adding distributional head for PatchTSMixerForecasting

* add generate to forecast

* testcases for generate

* add generate and distributional head for regression

* raise Exception for negative values for neg binominal distribution

* formatting changes

* remove copied from patchtst and add TODO for test passing

* make copies

* doc edits

* minor changes

* format issues

* minor changes

* minor changes

* format docstring

* change some class names to PatchTSMixer + class name

Transpose to PatchTSMixerTranspose
GatedAttention to PatchTSMixerGatedAttention

* change NormLayer to PatchTSMixerNormLayer

* change MLP to PatchTSMixerMLP

* change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock

* change ChannelFeatureMixer to ChannelFeatureMixerBlock

* change PatchMasking to PatchTSMixerMasking

* change Patchify to PatchTSMixerPatchify

* list to `list`

* fix docstrings

* formatting

* change bs to batch_size, edit forecast_masking

* edit random_masking

* change variable name and update docstring in PatchTSMixerMasking

* change variable name and update docstring in InjectScalerStatistics4D

* update forward call in PatchTSMixerTranspose

* change variable name and update docstring in PatchTSMixerNormLayer

* change variable name and update docstring in PatchTSMixerMLP

* change variable name and update docstring in ChannelFeatureMixerBlock

* formatting

* formatting issues

* docstring issue

* fixed observed_mask type in docstrings

* use FloatTensor type

* formatting

* fix rescaling issue in forecasting, fixed integration tests

* add docstring from decorator

* fix docstring

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PatchTSMixerChannelFeatureMixerBlock

* formatting

* ForPretraining

* use num_labels instead of n_classes

* remove commented out code

* docstring fixed

* nn.functional used instead of one letter F

* x_tmp renamed

* one letter variable x removed from forward calls

* one letter variable y removed

* remove commented code

* rename patch_size, in_channels, PatchTSMixerBackbone

* add config to heads

* add config to heads tests

* code reafactoring to use config instead of passing individual params

* Cdocstring fixes part 1

* docstring fixes part 2

* removed logger.debug

* context_values -> past_values

* formatting changes

* pe -> positional_encoding

* removed unused target variable

* self.mode logic fixed

* formatting change

* edit docstring and var name

* change n_targets to num_targets

* rename input_size to num_input_channels

* add head names with prefix PatchTSMixer

* edit docstring in PatchTSMixerForRegression

* fix var name change in testcases

* add PatchTSMixerAttention

* return dict for all exposed classes, test cases added

* format

* move loss function to forward call

* make style

* adding return dict/tuple

* make repo-consistency

* remove flatten mode

* code refactoring

* rename data

* remove PatchTSMixer and keep only PatchTSMixerEncoder

* docstring fixes

* removed unused code

* format

* format

* remove contiguous and formatting changes

* remove model description from config

* replace asserts with ValueError

* remove nn.Sequential from PatchTSMixerNormLayer

* replace if-else with map

* remove all nn.Sequential

* format

* formatting

* fix gradient_checkpointing error after merge, and formatting

* make fix-copies

* remove comments

* reshape

* doesnt support gradient checkpointing

* corect Patchify

* masking updates

* batchnorm copy from

* format checks

* scaler edits

* remove comments

* format changes

* remove self.config

* correct class PatchTSMixerMLP(nn.Module):

* makr fix

* doc updates

* fix-copies

* scaler class correction

* doc edits

* scaler edits

* update readme with links

* injectstatistics add

* fix-copies

* add norm_eps option to LayerNorm

* format changes

* fix copies

* correct make copies

* use parametrize

* fix doc string

* add docs to toctree

* make style

* doc segmenting

* docstring edit

* change forecast to prediction

* edit doc

* doc edits

* remove PatchTSMixerTranspose

* add PatchTSMixerPositionalEncoding and init position_enc

* remove positional_encoding

* edit forecast_masking, remove forecast_mask_ratios

* fix broken code

* var rename target_values -> future_values

* num_features -> d_model

* fix broken code after master merge

* repo consistency

* use postional embedding

* prediction_logits -> prediction_outputs, make fix-copies

* uncommented @slow

* minor changes

* loss first in tuple

* tuple and dict same ordering

* style edits

* minor changes

* dict/tuple consistent enablement

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* formatting

* usage tip

* test on cpu only

* add sample usage

* change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification

* push changes

* fix copies

* std scaling set to default True case

* minor changes

* stylechanges

---------

Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: vijaye12 <vijaye12@in.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: vijaye12 <vijaykr.e@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-05 15:31:35 +01:00
Yih-Dar
1d63b0ec36
Disallow pickle.load unless TRUST_REMOTE_CODE=True (#27776)
* fix

* fix

* Use TRUST_REMOTE_CODE

* fix doc

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-04 16:48:37 +01:00
fxmarty
1da1302ec8
Flash Attention 2 support for RoCm (#27611)
* support FA2

* fix typo

* fix broken tests

* fix more test errors

* left/right

* fix bug

* more test

* typo

* fix layout flash attention falcon

* do not support this case

* use allclose instead of equal

* fix various bugs with flash attention

* bump

* fix test

* fix mistral

* use skiptest instead of return that may be misleading

* add fix causal arg flash attention

* fix copies

* more explicit comment

* still use self.is_causal

* fix causal argument

* comment

* fixes

* update documentation

* add link

* wrong test

* simplify FA2 RoCm requirements

* update opt

* make flash_attn_uses_top_left_mask attribute private and precise comment

* better error handling

* fix copy & mistral

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210

* fix merge

* simplify

* inline args

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-04 21:52:17 +09:00
Sanchit Gandhi
ede09d671d
[Seamless v1] Link to v2 docs (#27827) 2023-12-04 11:47:54 +00:00
Yoach Lacombe
29f1aee3b6
Add SeamlessM4T v2 (#27779)
* add working convertion script

* first non-working version of modeling code

* update modeling code (working)

* make style

* make fix-copies

* add config docstrings

* add config to ignore docstrings formatage due to unconventional markdown

* fix copies

* fix generation num_return_sequences

* enrich docs

* add and fix tests beside integration tests

* update integration tests

* update repo id

* add tie weights and make style

* correct naming in .md

* fix imports and so on

* correct docstrings

* fix fp16 speech forward

* fix speechencoder attention

* make style

* fix copied from

* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2

* Apply suggestions on configuration

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove useless public models

* fix private models + better naming for T2U models

* clean speech encoder relative position embeddings

* refactor chunk attention

* add docstrings to chunk attention method

* improve naming and docstrings

* rename some attention variables + add temperature sampling in T2U model

* rename DOCSTRINGS variable names

* make style + remove 2 useless config parameters

* enrich model card

* remove any attention_head reference + fix temperature in T2U

* new fmt and make style

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename spkr_id->speaker_id and change docstrings of get_char_input_ids

* simplify v2attention

* make style

* Update seamless_m4t_v2.md

* update code and tests with last update

* update repo ids

* fill article name, abstract andauthors

* update not_doctested and slow_doc tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-30 20:24:43 +01:00
Kashif Rasul
af8acc4760
[Time series] Add patchtst (#27581)
* add distribution head to forecasting

* formatting

* Add generate function for forecasting

* Add generate function to prediction task

* formatting

* use argsort

* add past_observed_mask ordering

* fix arguments

* docs

* add back test_model_outputs_equivalence test

* formatting

* cleanup

* formatting

* use ACT2CLS

* formatting

* fix add_start_docstrings decorator

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* fix typos

* add forecast_masking

* fixed tests

* use set_seed

* fix doc test

* formatting

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* better var names

* rename PatchTSTTranspose

* fix argument names and docs string

* remove compute_num_patches and unused class

* remove assert

* renamed to PatchTSTMasking

* use num_labels for classification

* use num_labels

* use default num_labels from super class

* move model_type after docstring

* renamed PatchTSTForMaskPretraining

* bs -> batch_size

* more review fixes

* use hidden_state

* rename encoder layer and block class

* remove commented seed_number

* edit docstring

* Add docstring

* formatting

* use past_observed_mask

* doc suggestion

* make fix-copies

* use Args:

* add docstring

* add docstring

* change some variable names and add PatchTST before some class names

* formatting

* fix argument types

* fix tests

* change x variable to patch_input

* format

* formatting

* fix-copies

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move loss to forward

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* formatting

* fix a bug when pre_norm is set to True

* output_hidden_states is set to False as default

* set pre_norm=True as default

* format docstring

* format

* output_hidden_states is None by default

* add missing docs

* better var names

* docstring: remove default to False in output_hidden_states

* change labels name to target_values in regression task

* format

* fix tests

* change to forecast_mask_ratios and random_mask_ratio

* change mask names

* change future_values to target_values param in the prediction class

* remove nn.Sequential and make PatchTSTBatchNorm class

* black

* fix argument name for prediction

* add output_attentions option

* add output_attentions to PatchTSTEncoder

* formatting

* Add attention output option to all classes

* Remove PatchTSTEncoderBlock

* create PatchTSTEmbedding class

* use config in PatchTSTPatchify

* Use config in PatchTSTMasking class

* add channel_attn_weights

* Add PatchTSTScaler class

* add output_attentions arg to test function

* format

* Update doc with image patchtst.md

* fix-copies

* rename Forecast <-> Prediction

* change name of a few parameters to match with PatchTSMixer.

* Remove *ForForecasting class to match with other time series models.

* make style

* Remove PatchTSTForForecasting in the test

* remove PatchTSTForForecastingOutput class

* change test_forecast_head to test_prediction_head

* style

* fix docs

* fix tests

* change num_labels to num_targets

* Remove PatchTSTTranspose

* remove arguments in PatchTSTMeanScaler

* remove arguments in PatchTSTStdScaler

* add config as an argument to all the scaler classes

* reformat

* Add norm_eps for batchnorm and layernorm

* reformat.

* reformat

* edit docstring

* update docstring

* change variable name pooling to pooling_type

* fix output_hidden_states as tuple

* fix bug when calling PatchTSTBatchNorm

* change stride to patch_stride

* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder

* formatting

* initialize scalers with configs

* edit output_hidden_states

* style

* fix forecast_mask_patches doc string

* doc improvements

* move summary to the start

* typo

* fix docstring

* turn off masking when using prediction, regression, classification

* return scaled output

* adjust output when using distribution head

* remove _num_patches function in the config

* get config.num_patches from patchifier init

* add output_attentions docstring, remove tuple in output_hidden_states

* change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput

* remove print("model_class: ", model_class)

* change encoder_attention_heads to num_attention_heads

* change norm to norm_layer

* change encoder_layers to num_hidden_layers

* change shared_embedding to share_embedding, shared_projection to share_projection

* add output_attentions

* more robust check of norm_type

* change dropout_path to path_dropout

* edit docstring

* remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding

* edit shape of cls_token and initialize it

* add a check on the num_input_channels.

* edit head_dim in the Prediction class to allow the use of cls_token

* remove some positional_encoding_type options, remove learn_pe arg, initalize pe

* change Exception to ValueError

* format

* norm_type is "batchnorm"

* make style

* change cls_token shape

* Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios.

* Bring PatchTSTClassificationHead on top of PatchTSTForClassification

* change encoder_ffn_dim to ffn_dim and edit the docstring.

* update variable names to match with the config

* add generation tests

* change num_mask_patches to num_forecast_mask_patches

* Add examples explaining the use of these models

* make style

* Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)"

This reverts commit 78f6ed6c70.

* make style

* fix default std scaler's minimum_scale

* fix docstring

* close code blocks

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/configuration_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix tests

* add add_start_docstrings

* move examples to the forward's docstrings

* update prepare_batch

* update test

* fix test_prediction_head

* fix generation test

* use seed to create generator

* add output_hidden_states and config.num_patches

* add loc and scale args in PatchTSTForPredictionOutput

* edit outputs if if not return_dict

* use self.share_embedding to check instead checking type.

* remove seed

* make style

* seed is an optional int

* fix test

* generator device

* Fix assertTrue test

* swap order of items in outputs when return_dict=False.

* add mask_type and random_mask_ratio to unittest

* Update modeling_patchtst.py

* add add_start_docstrings for regression model

* make style

* update model path

* Edit the ValueError comment in forecast_masking

* update examples

* make style

* fix commented code

* update examples: remove config from from_pretrained call

* Edit example outputs

* Set default target_values to None

* remove config setting in regression example

* Update configuration_patchtst.py

* Update configuration_patchtst.py

* remove config from examples

* change default d_model and ffn_dim

* norm_eps default

* set has_attentions to Trye and define self.seq_length = self.num_patche

* update docstring

* change variable mask_input to do_mask_input

* fix blank space.

* change logger.debug to logger.warning.

* remove unused PATCHTST_INPUTS_DOCSTRING

* remove all_generative_model_classes

* set test_missing_keys=True

* remove undefined params in the docstring.

---------

Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-29 13:36:38 +01:00
Tom Aarsen
f2ad4b537b
Docs: Fix broken cross-references, i.e. ~transformer. -> ~transformers. (#27740)
~transformer. -> ~transformers.
2023-11-28 08:40:44 -08:00
Juarez Bochi
fdd86eed3b
Add madlad-400 MT models (#27471)
* Add madlad-400 models

* Add madlad-400 to the doc table

* Update docs/source/en/model_doc/madlad-400.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fill missing details in documentation

* Update docs/source/en/model_doc/madlad-400.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Do not doctest madlad-400

Tests are timing out.

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-28 13:19:50 +00:00
fxmarty
c13a43aaf2
Reflect RoCm support in the documentation (#27636)
* reflect RoCm support in the documentation

* Update docs/source/en/main_classes/trainer.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* fix review comments

* use ROCm instead of RoCm

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-25 00:59:17 +09:00
yoinked
181f85da24
Docs/Add conversion code to the musicgen docs (#27665)
* Update musicgen.md

please make it less hidden

* Add cleaner formatting
2023-11-24 12:34:24 +01:00
Yih-Dar
7293fdc5b9
Deprecate TransfoXL (#27607)
* fix

* fix

* trigger

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-24 11:48:02 +01:00
NielsRogge
fe1c16e95a
[DPT, Dinov2] Add resources (#27655)
* Add resources

* Remove script

* Update docs/source/en/model_doc/dinov2.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-23 17:44:08 +00:00
amyeroberts
b406c4d261
Update TVP arxiv link (#27672)
Update arxiv link
2023-11-23 17:02:16 +00:00
Susnato Dhar
3bc50d81e6
[FA2] Add flash attention for opt (#26414)
* added flash attention for opt

* added to list

* fix use cache (#3)

* style fix

* fix text

* test fix2

* reverted until 689f599

* torch fx tests are working now!

* small fix

* added TODO docstring

* changes

* comments and .md file modification

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-11-23 10:16:51 +00:00
dg845
7f6a804d30
Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799)
* initial commit

* Add inital testing files and modify __init__ files to add UnivNet imports.

* Fix some bugs

* Add checkpoint conversion script and add references to transformers pre-trained model.

* Add UnivNet entries for auto.

* Add initial docs for UnivNet.

* Handle input and output shapes in UnivNetGan.forward and add initial docstrings.

* Write tests and make them pass.

* Write docs.

* Add UnivNet doc to _toctree.yml and improve docs.

* fix typo

* make fixup

* make fix-copies

* Add upsample_rates parameter to config and improve config documentation.

* make fixup

* make fix-copies

* Remove unused upsample_rates config parameter.

* apply suggestions from review

* make style

* Verify and add reason for skipped tests inherited from ModelTesterMixin.

* Add initial UnivNetGan integration tests

* make style

* Remove noise_length input to UnivNetGan and improve integration tests.

* Fix bug and make style

* Make UnivNet integration tests pass

* Add initial code for UnivNetFeatureExtractor.

* make style

* Add initial tests for UnivNetFeatureExtractor.

* make style

* Properly initialize weights for UnivNetGan

* Get feature extractor fast tests passing

* make style

* Get feature extractor integration tests passing

* Get UnivNet integration tests passing

* make style

* Add UnivNetGan usage example

* make style and use feature extractor from hub in integration tests

* Update tips in docs

* apply suggestions from review

* make style

* Calculate padding directly instead of using get_padding methods.

* Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific.

* Update feature extractor to support using model(**inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__.

* Perform padding before generating noise to ensure the shapes are correct.

* Rename UnivNetGan.forward's noise_waveform argument to noise_sequence.

* make style

* Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__.

* Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model.

* Add expected mean and stddev checks to the integration tests and make them pass.

* make style

* Make it possible to use model(**inputs), where inputs is the output of the feature extractor.

* fix typo in UnivNetGanConfig example

* Calculate spectrogram_zero from other config values.

* apply suggestions from review

* make style

* Refactor UnivNet conversion script to use load_state_dict (following persimmon).

* Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor.

* make style

* Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices.

* make style

* Use config in UnivNetGan modeling blocks.

* make style

* Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper.

* make style

* Improving padding documentation.

* Add UnivNet usage example to the docs.

* apply suggestions from review

* Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor.

* Improve UnivNetGan.forward return docstring.

* Update table in docs/source/en/index.md.

* make fix-copies

* Rename UnivNet components to have pattern UnivNet*.

* make style

* make fix-copies

* Update docs

* make style

* Increase tolerance on flaky unbatched integration test.

* Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors.

* Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding.

* Update documentation and clean up padding code.

* make style

* make style

* Remove torch dependency from UnivNetFeatureExtractor.

* make style

* Fix UnivNetModel usage example

* Clean up feature extractor code/docstrings.

* apply suggestions from review

* make style

* Add comments for tests skipped via ModelTesterMixin flags.

* Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag.

* Add # Copied from statements to copied UnivNetFeatureExtractionTest tests.

* Simplify UnivNetFeatureExtractorTest.test_batch_decode.

* Add support for unbatched padding_masks in UnivNetModel.forward.

* Refactor unbatched padding_mask support.

* make style
2023-11-22 17:21:36 +01:00
jiqing-feng
c770600fde
TVP model (#25856)
* tvp model for video grounding

add tokenizer auto

fix param in TVPProcessor

add docs

clear comments and enable different torch dtype

add image processor test and model test and fix code style

* fix conflict

* fix model doc

* fix image processing tests

* fix tvp tests

* remove torch in processor

* fix grammar error

* add more details on tvp.md

* fix model arch for loss, grammar, and processor

* add docstring and do not regard TvpTransformer, TvpVisionModel as individual model

* use pad_image

* update copyright

* control first downsample stride

* reduce first only works for ResNetBottleNeckLayer

* fix param name

* fix style

* add testing

* fix style

* rm init_weight

* fix style

* add post init

* fix comments

* do not test TvpTransformer

* fix warning

* fix style

* fix example

* fix config map

* add link in config

* fix comments

* fix style

* rm useless param

* change attention

* change test

* add notes

* fix comments

* fix tvp

* import checkpointing

* fix gradient checkpointing

* Use a more accurate example in readme

* update

* fix copy

* fix style

* update readme

* delete print

* remove tvp test_forward_signature

* remove TvpTransformer

* fix test init model

* merge main and make style

* fix tests and others

* fix image processor

* fix style and model_input_names

* fix tests
2023-11-21 16:41:55 +00:00