Commit Graph

15053 Commits

Author SHA1 Message Date
Nicola Procopio
dd3a0580a6
Added big_models.mdx italian translation #17600 (#22115)
* updated toctree

* italian translation big_model.mdx

* italian translation big_models
2023-03-13 10:02:03 -04:00
Karim Foda
0768c5e274
Fix gradient checkpointing bug in xlm_roberta_xl (#22128) 2023-03-13 13:52:34 +00:00
Karim Foda
4c14c1f47b
Fix gradient checkpointing bug in Trajectory Transformer (#22125) 2023-03-13 13:50:02 +00:00
Karim Foda
d0876a095f
Fix gradient checkpointing bug in xglm (#22127) 2023-03-13 13:49:23 +00:00
Alex Calabrese
0c883766bd
Add pr_checks.mdx Italian translation (#17459) (#22116)
* Add pr_checks.mdx Italian translation (#17459)

* Updated pr_checks.mdx Italian translation (#17459)
2023-03-13 09:24:34 -04:00
wangpeng
102b5ff4a8
add new model of MGP-STR (#21418)
* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* remove representation_size from MGPSTRConfig

* reformat configuration_mgp_str.py

* format test_processor_mgp_str.py

* add test for tokenizer and complete model/processer test and model file

* rm Unnecessary tupple in modeling_mgp_str

* reduce hidden_size/layers/label_size in test_model

* add integration tests and change MGPSTR to Mgpstr

* add test for logit values

* reformat test model file

---------

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-13 10:11:31 +00:00
Alara Dirik
32e3466d38
Add AutoModelForZeroShotImageClassification (#22087)
Adds AutoModelForZeroShotImageClassification to transformers
2023-03-13 12:46:14 +03:00
Sanchit Gandhi
b90fbc7e0b
[Whisper] Remove embed_tokens from encoder docstring (#21996)
* [Whisper] Remove embed_tokens from encoder docstring

* new line to retrigger CI

* remove new line
2023-03-11 14:03:36 +01:00
Yih-Dar
2f320661f3
Revert "[GPT2] Propose fix for #21080" (#22093)
Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure

This reverts commit a3fef89b26.
2023-03-10 22:08:21 +01:00
Sylvain Gugger
499770c088
Fix imports of TF MobileViT (#22065)
* Fix imports of TF MobileViT

* Fix copies
2023-03-10 14:46:34 -05:00
Maria Khalusova
bdec2768bd
GPT-J specific half precision on CPU note (#22086)
* re: #21989

* update re: #21989

* removed cpu option

* make style
2023-03-10 14:03:43 -05:00
Dean Wyatte
2f4cdd97f5
handle numpy inputs in whole word mask data collator (#22032) 2023-03-10 10:50:29 -05:00
J-shang
a70da86b84
Fix hint in src/transformers/modeling_utils.py (#22074)
fix hint
2023-03-10 08:56:42 -05:00
Karim Foda
419d979f7f
Fix gradient checkpointing bug in Speecht5 (#22080)
* Fix gradient checkpointing bug in Speecht5

* Update modeling_speech_to_text.py

* Update src/transformers/models/speech_to_text/modeling_speech_to_text.py

* Fix change errors

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2023-03-10 13:36:09 +00:00
Joao Gante
7014fc360d
Generate - Fix broken documentation links (#22078)
fix broken links
2023-03-10 13:28:30 +00:00
Kevin Jiang
ade26bf991
Fix small typo in flan-ul2.mdx (#22068)
* Update flan-ul2.mdx

* Update flan-ul2.mdx
2023-03-10 07:44:45 -05:00
Arthur
a3fef89b26
[GPT2] Propose fix for #21080 (#21853)
* Make sure position ids are masked

* test that padded input produce the same results

* fix failing tests

* fixup

* fix batch test
2023-03-10 07:15:25 -05:00
Karim Foda
eee195b3aa
Fix gradient checkpointing bug in switch transformer (#22081) 2023-03-10 11:31:08 +00:00
Karim Foda
b9273353dc
Fix gradient checkpointing bug in Speech2Text (#22079)
* Fix gradient checkpointing bug in Speech2Text

* Update modeling_speech_to_text.py

* Update modeling_speech_to_text_2.py
2023-03-10 11:30:42 +00:00
Sylvain Gugger
a9bd5df16a
Add a progress bar for the total download of shards (#22062)
* Add a progress bar for the total download of shards

* Check for no cache at all

* Fix check
2023-03-09 16:58:03 -05:00
aws-sangeetha
1a5fc300f4
Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007)
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>
2023-03-09 14:31:58 -05:00
Yih-Dar
6d9031f285
Update tiny model creation script (#22058)
Update the script

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-09 19:53:54 +01:00
Sylvain Gugger
7a2b915e92
Add setters by type of args to TrainingArguments (#21570)
* Add setters by type of args to TrainingArguments

* Define more setters
2023-03-09 13:13:23 -05:00
Yih-Dar
ab81d31d20
Skip 3 tests for WhisperEncoderModelTest (#22060)
* skip 3 tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-09 19:09:23 +01:00
Jiali Mei
8434cb878e
Edit the docstring of image_processing_donut to match code (#22033)
* Edit the docstring of `image_processing_donut` to match code

* improve style

* more style improvement after installing quality
2023-03-09 17:35:43 +00:00
Stas Bekman
ec24132b6c
[deepspeed] offload + non-cpuadam optimizer exception (#22043)
* [deepspeed] offload + non-cpuadam optimizer exception

* flip

* revert min version
2023-03-09 08:12:57 -08:00
Kamal Raj Kanakarajan
d0c19b3303
rm $ symbol from code block from contributing.md (#22057)
rm $ symbol from code block 

Removed the $ symbol from the code block to make copy-pasting easier.
2023-03-09 11:09:46 -05:00
Matt
fdf8409656
pt-to-tf model architecture override (#22055)
* Add an argument to pt-to-tf to allow overriding the model class

* make fixup

* Minor fix to error message

* Remove unused extra conversion from the script
2023-03-09 15:36:29 +00:00
anruijian
04bfac83b7
Return analysis for hyperparameter_search with Ray backend (#22040)
* return analysis for hyperparameter_search with ray backend

* Revert "return analysis for hyperparameter_search with ray backend"

This reverts commit cd51790709.

* add run_summary attribute to BestRun and return analysis for ray backend

* fix typo

* add doc for run_summary for ray backend
2023-03-09 09:44:17 -05:00
Yih-Dar
90a7c95496
Show the number of huggingface_hub warnings in CI report (#22054)
* show hfh warnings

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-09 15:39:05 +01:00
Lucain
923110b74f
Remove set_access_token usage + fail tests if FutureWarning (#22051)
* Remove set_access_token usage + fail tests if FutureWarning

* do not fail on FutureWarning in CI

---------

Co-authored-by: testbot <lucainp@hf.co>
2023-03-09 09:23:48 -05:00
Shaun VanWeelden
684774306d
Can't install tf2 on M1 Chip by default (#22046) 2023-03-09 07:44:58 -05:00
Shaun VanWeelden
81cd655cab
Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it (#22045)
In ZSH, not using ' ' around pip install fails

Running 
```
pip install transformers[torch]
```
in the default ZSH terminal will fail with the error `zsh: no matches found: transformers[torch]`

The solution is to wrap the installation path in ' ' like 
```
pip install 'transformers[torch]'
```

Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity
2023-03-09 07:43:49 -05:00
Nipun Jindal
1a77a1a86f
[21737][T5]: Fix gradient checkpoint bug (#22036)
* [21737][T5]: Fix gradient checkpoint bug

* [21737][T5]: Fix gradient checkpoint bug

* [21737][T5]: Fix gradient checkpoint bug

* Update src/transformers/models/mt5/modeling_mt5.py

* Update src/transformers/models/t5/modeling_t5.py

---------

Co-authored-by: njindal <njindal@adobe.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2023-03-09 12:17:44 +00:00
Alara Dirik
2055d737ad
Update ALIGN docs (#22025)
* Fix typos and add code examples, resources
2023-03-09 14:12:17 +03:00
Ceyda Cinarel
3ec8171bed
Bug fix: token classification pipeline while passing offset_mapping (#22034)
fix slow tokenizers with passing offset_mapping
2023-03-08 16:21:46 -05:00
Yih-Dar
1cbac6867b
Mark all BridgeTower tests slow for now (#22039)
* slow me

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-08 21:48:29 +01:00
Yih-Dar
bcc8d30aff
Avoid text_config_dict and vision_config_dict being saved for CLIP-like models (#22035)
* Avoid text_config_dict and vision_config_dict being saved

* for other CLIP-like models

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-08 20:27:30 +01:00
Somasree Majumder
998395061b
fixes the gradient checkpointing of whisper (#22019)
* fixing

* Update modeling_whisper.py

* Update modeling_whisper.py

* Update src/transformers/models/whisper/modeling_whisper.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2023-03-08 14:21:38 -05:00
bofeng huang
6192549c1f
[examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py (#21942)
* Add specaugment to run_speech_recognition_seq2seq.py

* Remove useless argument: text_column

* Fix quality

* Update return_attention_mask condition

* Update specaugment arguments only for whisper models

* Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update apply_spec_augment only for whisper models

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-03-08 17:59:31 +01:00
anruijian
b427b263e2
Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline (#22031)
add tokenize_kwargs doc in the FeatureExtractionPipeline
2023-03-08 11:43:31 -05:00
Sylvain Gugger
a5392ee747
Fix test for torchneuroncore in Trainer (#22028) 2023-03-08 09:12:43 -05:00
Anahita Bhiwandiwalla
de81adf978
[WIP] Add BridgeTowerForContrastiveLearning (#21964)
* Add BridgeTower for ITC

* Fix review feedback

* Rename BridgeTowerForITC, cleanup

* Fix style and quality

* implement tests

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
2023-03-08 09:00:54 -05:00
Younes Belkada
edea08a6b0
[bnb] Fix bnb error message (#22026)
* fix error message

* make style
2023-03-08 14:51:44 +01:00
Yih-Dar
dfe9a31973
Update AudioClassificationPipelineTests::test_small_model_pt for PT 2.0.0 (#22023)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-08 13:56:47 +01:00
Qiushi
bbd949970d
update: bertology paper (#22012) 2023-03-08 07:54:30 -05:00
amyeroberts
4130e70367
VideoMAE doctest - use valid dummy pixel values (#22022)
Use valid dummy pixel values
2023-03-08 11:54:42 +00:00
jim
c1f85598eb
Generate - add 1 to cur_len to make up the new beam length (#21993)
* add 1 to cur_len to make up the new beam length

cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator.

* cur_len+=1 before check if beam hyp is done

* format code

* reformat with black

---------

Co-authored-by: Chiming <chiming@biomap.com>
2023-03-08 11:47:55 +00:00
Yih-Dar
b338414e61
Update tiny model creation script and some others files (#22006)
* Update 1

* Update 2

* Update 3

* Update 4

* Update 5

* Update 6

* Update 7

* Update 8

* Update 9

* Update 10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-07 22:31:14 +01:00
Eli Simhayev
8abe4930d3
[Time-Series] informer model (#21099)
* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃

* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-07 21:36:38 +01:00