Commit Graph

513 Commits

Author SHA1 Message Date
Younes Belkada
b844f8a9ab
[Pix2Struct] Fix slow test (#22448)
fix slow test
2023-03-29 17:40:45 +02:00
Yih-Dar
8894b81742
Use real tokenizers if tiny version(s) creation has issue(s) (#22428)
Fix some tiny model creation issues

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-29 16:16:23 +02:00
Arthur
19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
NielsRogge
0e708178ed
[Pix2Struct] Add support to resize embeddings (#22394)
* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments
2023-03-27 11:38:07 -04:00
Joao Gante
7dcd8703ef
Generate: support for left-padding on GPTNeoX and Llama (#22382) 2023-03-27 15:48:23 +01:00
Shubhamai
a0cbbba31f
Resnet flax (#21472)
* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-24 19:45:57 +00:00
Mitch Naylor
57f25f4b7f
Add Mega: Moving Average Equipped Gated Attention (#21766)
* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-24 08:17:27 -04:00
Joao Gante
0fa46524ac
Generate: Add GPTNeoX integration test (#22346) 2023-03-24 11:33:16 +00:00
Yih-Dar
e8cc02555e
Automatically create/update tiny models (#22275)
* Automatically create or update tiny models

* Skip failed tests

* update workflow file

* use revision

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-23 19:14:17 +01:00
Joao Gante
502fec779b
Generate: add test for left-padding support (#22322) 2023-03-23 17:00:22 +00:00
Sylvain
ef28df0572 Fix quality due to ruff release 2023-03-22 20:45:08 -04:00
Yih-Dar
8b05ace014
Fix PipelineTests skip conditions (#22320)
* check what tests fail

* Skip failing tests

* Skip failing tests

* Skip failing tests

* Skip failing tests

* clean up

* clean up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-22 20:02:24 +01:00
Younes Belkada
0f68a7f408
Add Pix2Struct (#21400)
* v1 all keys match

* clean up

* forward pass ok

* add correct image transform

* generate works, logits matching

* clean up

* more refactor

* revert

* revert

* clean up

* clean ups

* clean up

* refactor

* refactor

* fix doc

* fix tokenizer test

* fix toctree

* revert toctree

* oops

* few fixes

* replace to `pixel_embeds`

* make fixup

* test processing & feat extractor

* fix some tests

* more fixes

* make fixup

* clean up

* more clean up

* add a single slow test

* fix test

* make fixup

* fix

* fix authors

* fix toctree

* update docs

* add docstring

* revert change

* Update src/transformers/models/pix2struct/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer

* fix processor test

* fix test

* make fixup

* refactor

* fix config

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format

* fix

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

* add docstring

* fix issues

* fix

* fix

* fix

* add slow test

* fix

* fix

* fix batched issue

* fix training issues

* fix ci test

* fix slow test

* fix conversion script

* remove unneeded classes

* fix slow test

* fix require backends

* fix masked fill

* revert

* fix softmax

* add large models support

* fix conditional generation

* few fixes

* add instructions

* rm unneeded file

* Update src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py

* fix ci test

* fix ci test really

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix nit

* fix nits

* fix image processors nits

* docstring

* clean up

* fix nit

* fix tests

* docstring nit

* fix reshape

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix nit

* fix repetition

* refactor processor

* make patch size consistent

* refactor forward

* fix docstring

* fix max_patches issue

* update docstirng

* update docstring

* fix coped from

* add skip reasons

* few fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* format

* fix doctests

* refactor and fix

* fix doc build issue

* fix processor test

* small fix conversion script

* replace correct weights

* make fixup

* fix some issues

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert config and fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more details

* fixes

* fix processor

* fix processor test

* fix

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* fix processor

* Update src/transformers/models/pix2struct/modeling_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add copied

* make fixup

* fix copies

* update docstring

* refactor

* fix docstring

* fix conversion script

* fix vqa issue

* replace to `flattened_patches`

* nit

* fix numpy issue

* fix image processors

* add batched vqa support

* fix vqa conversion

* make fixup

* fix conversion script

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add correct docstring

* update docstring

* fix module level + channel dim

* use `make_list_of_images`

* refactor

* correct docstring

* fix authors

* remove `data_format`

* add header text test

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add checkpoints

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-22 16:53:52 +01:00
Joao Gante
fd3eb3e3cd
Beef up Llama tests (#22314)
* tmp commit

* beef up llama tests
2023-03-22 15:20:48 +00:00
silentghoul-spec
48bef3a734
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302)
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list

Co-authored-by: dusejat <dusejat@amazon.com>
2023-03-22 12:07:49 +00:00
Alara Dirik
0558914dff
Add MaskedImageModelingOutput (#22212)
* Add MaskedImageModelingOutput
2023-03-22 07:35:47 +03:00
Yih-Dar
67c2dbdb54
Time to Say Goodbye, torch 1.7 and 1.8 (#22291)
* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 19:22:01 +01:00
Gerald Cuder
5a2b77a6c1
Fix error in mixed precision training of TFCvtModel (#22267)
* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2023-03-21 12:12:57 +00:00
lewtun
f251441387
Add LlamaForSequenceClassification (#22209)
* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
Yih-Dar
52a57f7c7c
Update expected values in MgpstrModelIntegrationTest (#22195)
Update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 11:48:52 +00:00
Anahita Bhiwandiwalla
16121bae5c
Update BridgeTowerForContrastiveLearning (#22145)
* Use return_loss for BridgeTowerForContrastiveLearning, add example

* fix tests

* Update example in BridgeTowerForContrastiveLearning

* Update test_modeling_bridgetower.py

* update model output format

* minor update

* Update src/transformers/models/bridgetower/modeling_bridgetower.py

* make style

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-15 20:54:38 +01:00
amyeroberts
737681477c
Revert 22152 MaskedImageCompletionOutput changes (#22187)
Revert changes
2023-03-15 18:37:23 +01:00
Alara Dirik
3b22bfbc6a
Create MaskedImageCompletionOutput and fix ViT docs (#22152)
* create MaskedImageCompletionOutput

* fix bugs

* fix bugs
2023-03-14 13:55:18 +00:00
Alara Dirik
cdddfbffa1
Add ConvNeXT V2 (#21679)
* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues
2023-03-14 12:08:14 +03:00
Yih-Dar
6c2ad00c46
Move is_pipeline_test_to_skip to specific model test classes (#21999)
* Move `is_pipeline_test_to_skip` to specific model test classes

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-14 10:03:02 +01:00
Younes Belkada
d979cf6efd
[Whiper] add get_input_embeddings to WhisperForAudioClassification (#22133)
* add `get_input_embeddings` to `WhisperForAudioClassification`

* add common tests

* fix another common test

* Update tests/models/whisper/test_modeling_whisper.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-03-13 19:46:01 +01:00
Younes Belkada
6652e7da0d
[Blip2] skip accelerate test (#22124)
skip accelerate test
2023-03-13 15:03:21 +01:00
wangpeng
102b5ff4a8
add new model of MGP-STR (#21418)
* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* remove representation_size from MGPSTRConfig

* reformat configuration_mgp_str.py

* format test_processor_mgp_str.py

* add test for tokenizer and complete model/processer test and model file

* rm Unnecessary tupple in modeling_mgp_str

* reduce hidden_size/layers/label_size in test_model

* add integration tests and change MGPSTR to Mgpstr

* add test for logit values

* reformat test model file

---------

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-13 10:11:31 +00:00
Yih-Dar
2f320661f3
Revert "[GPT2] Propose fix for #21080" (#22093)
Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure

This reverts commit a3fef89b26.
2023-03-10 22:08:21 +01:00
Arthur
a3fef89b26
[GPT2] Propose fix for #21080 (#21853)
* Make sure position ids are masked

* test that padded input produce the same results

* fix failing tests

* fixup

* fix batch test
2023-03-10 07:15:25 -05:00
Yih-Dar
ab81d31d20
Skip 3 tests for WhisperEncoderModelTest (#22060)
* skip 3 tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-09 19:09:23 +01:00
Lucain
923110b74f
Remove set_access_token usage + fail tests if FutureWarning (#22051)
* Remove set_access_token usage + fail tests if FutureWarning

* do not fail on FutureWarning in CI

---------

Co-authored-by: testbot <lucainp@hf.co>
2023-03-09 09:23:48 -05:00
Yih-Dar
1cbac6867b
Mark all BridgeTower tests slow for now (#22039)
* slow me

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-08 21:48:29 +01:00
Anahita Bhiwandiwalla
de81adf978
[WIP] Add BridgeTowerForContrastiveLearning (#21964)
* Add BridgeTower for ITC

* Fix review feedback

* Rename BridgeTowerForITC, cleanup

* Fix style and quality

* implement tests

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
2023-03-08 09:00:54 -05:00
Yih-Dar
b338414e61
Update tiny model creation script and some others files (#22006)
* Update 1

* Update 2

* Update 3

* Update 4

* Update 5

* Update 6

* Update 7

* Update 8

* Update 9

* Update 10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-07 22:31:14 +01:00
Eli Simhayev
8abe4930d3
[Time-Series] informer model (#21099)
* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃

* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-07 21:36:38 +01:00
NielsRogge
dde718e7a6
[DETR and friends] Remove is_timm_available (#21814)
* First draft

* Fix to_dict

* Improve conversion script

* Update config

* Remove timm dependency

* Fix dummies

* Fix typo, add integration test

* Upload 101 model as well

* Remove timm dummies

* Fix style

---------

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2023-03-07 15:19:39 -05:00
Sanchit Gandhi
7c39318136
[Whisper] Add model for audio classification (#21754)
* [Whisper] Add model for audio classification

* make fix-copies

* add to docs

* add docstring

* empty returns

* add code example

* switch to fleurs

* stick everything on one line
2023-03-07 16:20:21 +01:00
Yih-Dar
9402788b34
Skip test_multi_gpu_data_parallel_forward for some model tests (#21991)
skip test_multi_gpu_data_parallel_forward for some model tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-07 14:23:36 +01:00
NielsRogge
95408e9953
[DETR, YOLOS] Fix device bug (#21974)
* Fix integration test

* Add test

* Add test
2023-03-07 07:34:04 -05:00
Yih-Dar
5b28b78332
Update Jukebox tests (#21984)
* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-07 04:20:14 +01:00
Yih-Dar
f2a2616b74
Update expected values for test_xglm_sample (#21975)
update expected values for xglm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-06 18:07:31 +01:00
Yih-Dar
fcf813417a
Update expected values in XLMProphetNetModelIntegrationTest (#21957)
update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-06 09:15:44 +01:00
Arthur
718e9d777f
[CLAP] Support batched inputs for CLAP. Fixes pipeline issues (#21931)
* fix pipeline

* fix feature_extraction clap

* you can now batch the `is_longer` attribute

* add tests

* fixup

* add expected scores

* comment on is_longert
2023-03-03 18:42:18 +01:00
Yih-Dar
d4306daea1
Fix AlignModelTest tests (#21923)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-03 14:47:09 +01:00
Yih-Dar
fa9d2ad7ec
Update model_split_percents for WhisperModelTest (#21922)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-03 14:35:08 +01:00
Yih-Dar
9f5bfe1b99
Avoid modeling tests run in pipeline CI jobs (#21911)
* rework is_pipeline_test

* bring back 3 tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-02 21:23:06 +01:00
Kashif Rasul
db979f7588
[time series] Add Time series inputs tests (#21846)
* intial test of inputs

* added test for generation

* remove asserts

* fixed test

* Update tests/models/time_series_transformer/test_modeling_time_series_transformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-02 20:43:35 +01:00
Yih-Dar
88e5c51a15
Temporarily skip 3 tests in BridgeTowerModelTest (#21908)
skip for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-02 19:16:03 +01:00