Commit Graph

9566 Commits

Author SHA1 Message Date
Stas Bekman
5da33f8729
[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests (#16657)
* add low_cpu_mem_usage tests

* wip: revamping

* wip

* install /usr/bin/time

* wip

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* fix assert

* put the wrapper back

* cleanup; switch to bert-base-cased

* Trigger CI

* Trigger CI
2022-04-14 18:10:05 -07:00
Stas Bekman
ce2fef2ad2
[trainer / deepspeed] fix hyperparameter_search (#16740)
* [trainer / deepspeed] fix hyperparameter_search

* require optuna

* style

* oops

* add dep in the right place

* create deepspeed-testing dep group

* Trigger CI
2022-04-14 17:24:38 -07:00
code-review-doctor
1b7de41a07
Fix issue avoid-missing-comma found at https://codereview.doctor (#16768) 2022-04-14 16:42:27 -04:00
Sanchit Gandhi
de8b06f9bf
[SpeechEncoderDecoderModel] Fix bug in reshaping labels (#16748) 2022-04-14 19:02:40 +01:00
NielsRogge
048443db86
Improve image classification example (#16585)
* Improve README

* Make dataset_name argument optional

* Improve local data

* Fix bug

* Improve README some more

* Apply suggestions from code review

* Improve README

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-14 18:10:52 +02:00
Sylvain Gugger
3e4eec47f5
Kill async pushes when calling push_to_hub with blocking=True (#16755) 2022-04-14 10:02:29 -04:00
Stas Bekman
c21e1071a7
[deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop (#16717)
* [deepspeed / m2m_100] make deepspeed 3 work with layerdrop

* fix

* revert last
2022-04-14 06:51:55 -07:00
Zachary Mueller
89293a0f6b
Make nightly install dev accelerate (#16783) 2022-04-14 09:41:02 -04:00
Sylvain Gugger
b151ddb9b9
Fix batch size in evaluation loop (#16763)
* Fix batch size in evaluation loop

* remove debug statement
2022-04-14 09:22:54 -04:00
Sanchit Gandhi
d8269eb4d5
[Flax .from_pretrained] Raise a warning if model weights are not in float32 (#16762)
* [Flax] Raise a warning if model weights are not in float32

* apply suggestions and few small changes

* reorder wording for better readability
2022-04-14 11:52:15 +02:00
Nicolas Patry
195fbbb6cf
Enabling Tapex in table question answering pipeline. (#16663)
* Enabling `Tapex` in table question answering pipeline.

* Questions are independant for Tapex, making the test respect that.

* Missing extra space.
2022-04-14 09:06:14 +02:00
Bhadresh Savani
442dc45645
[Doctest] added doctest changes for electra (#16675)
* added doctest changes for electra

* fixed doctest tests

* updated changes
2022-04-13 22:39:00 +02:00
Zachary Mueller
be752d12f8
Fixup no_trainer examples scripts and add more tests (#16765)
* Change tracking to store_true

* Remove step param and use it in the log dictionary directly

* use vars(args) when passing args to init_trackers

* Include tracking tests since tensorboard is already a dep
2022-04-13 14:40:48 -04:00
Stas Bekman
3a16ab25c8
[self-scheduled ci] explain where dependencies are (#16757) 2022-04-13 12:28:02 -04:00
Tu Vu
34ef029dc0
Add self training code for text classification (#16738)
* Add self-training code for text-classification

* Add self-training code for text-classification

* Add self-training code for text-classification

* Add self-training code for text-classification

* Add self-training code for text-classification

* Delete strata
2022-04-13 12:03:24 -04:00
Sylvain Gugger
8e0d3b427f
Add defensive check for config num_labels and id2label (#16709)
* Add defensive check for config num_labels and id2label

* Actually check value...

* Only warning inside init plus better error message
2022-04-13 11:28:19 -04:00
Yih-Dar
6bed0647fe
Reduce Funnel PT/TF diff (#16744)
* Make Funnel Test less flaky

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-13 17:19:52 +02:00
Joao Gante
0b8f697219
CI: setup-dependent pip cache (#16751)
* Setup-dependent pip cache

* Do not restore from old versions
2022-04-13 16:19:14 +01:00
Stas Bekman
ac43a40e6a
[modeling_utils] better explanation of ignore keys (#16741) 2022-04-13 08:03:20 -07:00
Jeremy Fisher
0235bc57ab
Fix and improve CTRL doctests (#16573)
* Improve CTRL doctests

* Fix `CTRLForSequenceClassification` flakiness with inconsistent losses

* Remove unused

* Fixup

* Add CTRL to documentation_tests.txt

* Fix control code not being first

* Add output assertions

* Change from sshleifer/tiny-ctrl -> ctrl

* Run `make fixup`

* apply `list` to output logits shape for clarity

* Reduce output loss precision to make assertion more robust

* Add assertion of control code being first

* Fix docstyle

* upper case sentence following control code

* Weird bug fixes

* Add a better generation example

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-04-13 15:44:31 +02:00
Michael Chung
06b4aac9eb
Add Doc Test for GPT-J (#16507)
* Required the values GPTJ unfortunately cannot run the model =)

* Added the file to the doc tests

* Run Fixup and Style

* Fixed with the test versions of gptj. Ran Style and Fixup.

* Trigger ci

* A Minor Change to License

* Fixed spacing added to the benchmark_utils. Then refactored tests to const variables.

* Removed strings that were included as default parameters anyways.

Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
2022-04-13 15:04:47 +02:00
Stas Bekman
12bfa97a43
[from_pretrained] refactor find_mismatched_keys (#16706) 2022-04-13 07:50:15 -04:00
davidleonfdez
9f8bfe703c
Fix #16660 (tokenizers setters of ids of special tokens) (#16661)
* Fix setters of *_token_id properties of SpecialTokensMixin

* Test setters of common tokens ids

* Move to a separate test checks of setters of tokens ids

* Add independent test for ByT5

* Add Canine test

* Test speech to text
2022-04-13 07:49:06 -04:00
Patrick von Platen
b24201fa44
[Doctests] Fix all T5 doc tests (#16646)
* [Doctests] Fix all T5 doc tests

* make style

* Update docs/source/en/model_doc/t5.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply Sylvains comments

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-13 11:36:54 +02:00
Santiago Castro
f7196f2e63
Fix decoding score comparison when using logits processors or warpers (#10638)
* Normalize using a logits warper

* Add a flag in `generate` to support the logit renormalization

* Add in RAG
2022-04-13 09:37:33 +01:00
Joao Gante
eb5bdcdfa5
TF generate: handle case without cache in beam search (#16704) 2022-04-12 20:46:10 +01:00
Minh Chien Vu
9c9db751e2
add Bigbird ONNX config (#16427)
* add Bigbird ONNX config
2022-04-12 20:46:06 +02:00
Sanchit Gandhi
a960406722
[FlaxWav2Vec2Model] Fix bug in attention mask (#16725)
* [FlaxWav2Vec2Model] Fix bug in attention mask

* more fixes

* add (Flax)SpeechEncoderDecoderModel PT-FX cross-test
2022-04-12 19:48:24 +02:00
Sanchit Gandhi
6adefba3f0
[FlaxSpeechEncoderDecoder] Fix input shape bug in weights init (#16728)
* [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init

* make style
2022-04-12 19:33:57 +02:00
hiromu
1bac40db8a
Add Doc Tests for Reformer PyTorch (#16565)
* start working

* fix: ReformerForQA doctest

* fix: ReformerModelWithLMHead doctest

* fix: ReformerModelForSC doctest

* fix: ReformerModelForMLM doctest

* add: documentation_tests.txt

* make fixup

* change: ReformerModelForSC doctest

* change: checkpoint
2022-04-12 18:52:31 +02:00
Joao Gante
d7f7f29f29
TF: remove set_tensor_by_indices_to_value (#16729) 2022-04-12 17:51:47 +01:00
Anmol Joshi
a315988bae
Moved functions to pytorch_utils.py (#16625)
* Moved functions to pytorch_utils.py

* isort formatting

* Reverted tf changes

* isort, make fix-copies

* documentation fix

* Fixed Conv1D import

* Reverted research examples file

* backward compatibility for pytorch_utils

* missing import

* isort fix
2022-04-12 12:38:50 -04:00
Sylvain Gugger
0711c45eae
Remove duplicate header (#16732) 2022-04-12 12:37:13 -04:00
Nicolas Patry
a192f61e08
Change the chunk_iter function to handle (#16730)
* Change the chunk_iter function to handle

the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.

We need to remove the right striding on the previous item.

* Remove commented line.
2022-04-12 18:25:02 +02:00
Anmol Joshi
cc034f72eb
Replace assertion with exception (#16720)
* Updated assertions to exceptions

* updated assertions to exceptions

* bug fixes

* fix-copies

* Update modeling_ctrl.py

* Update src/transformers/models/ctrl/modeling_tf_ctrl.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_tf_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update modeling_led.py

* Update modeling_led.py

* Update modeling_led.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-12 11:47:01 -04:00
Shang Zhang
14daa6102a
Qdqbert example add benchmark script with ORT-TRT (#16592)
* add ort-trt benchmark script

* Update README.md

* ort version can be newer

* formatting

* specify ORT version
2022-04-12 11:13:59 -04:00
Heerak Son
db3edd050b
Update run_translation_no_trainer.py (#16652)
args.model_name_or_path -> args.config_name
fix it
2022-04-12 08:55:12 -04:00
smelm
b9f12bedd3
Only call get_output_embeddings when tie_word_embeddings is set (#16667)
This avoids an unnecessary call and avoids problems during
initialization of class hierarchies.

Co-authored-by: Samuel Melm <samuel.melm@stud.uni-heidelberg.de>
2022-04-12 07:55:44 -04:00
Michael Chung
924484ee4a
Add Doc Test GPT-2 (#16439)
* First Pass All Tests Pass

* WIP

* Adding file to documentation tests

* Change the base model for the example in the doc test.

* Fix Code Styling by running
make fixup

* Called Style

* Reverted to gpt2 model rather than distill gpt2
Then used a token classification model over a sequence model for an example.

* Fix Styling Issue

* Hopefully ignores the formatting issue.

Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
2022-04-12 12:11:03 +02:00
Patrick von Platen
70851a6bf0
[Bart] correct doc test (#16722) 2022-04-12 10:19:49 +02:00
Zachary Mueller
69233cf03b
Fix example logs repeating themselves (#16669)
Move declaration of log streams to before tests, so that results won't get compounded on top of each other
2022-04-11 16:25:16 -04:00
Yih-Dar
dce33f2150
Improve PT/TF equivalence test (#16557)
* add error message

* Use names in the error message

* allow ModelOutput

* rename to check_pt_tf_outputs and move outside

* fix style

* skip past_key_values in a better way

* Add comments

* improve code for label/loss

* make the logic clear by moving the ignore keys out

* fix _postprocessing_to_ignore

* fix _postprocessing_to_ignore: create new outputs from the remaining fields

* ignore past_key_values in TFGPT2 models for now

* make check_pt_tf_outputs better regarding names

* move check_pt_tf_models outside

* rename methods

* remove test_pt_tf_model_equivalence in TFCLIPModelTest

* Reduce TFViTMAEModelTest.test_pt_tf_model_equivalence

* move prepare_pt_inputs_from_tf_inputs outside check_pt_tf_models

* Fix quality

* Clean-up TFLxmertModelTester.test_pt_tf_model_equivalence

* Fix quality

* fix

* fix style

* Clean-up TFLEDModelTest.test_pt_tf_model_equivalence

* Fix quality

* add docstring

* improve comment

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 22:19:12 +02:00
Yih-Dar
7f7300856d
Handle image_embeds in ViltModel (#16696)
* update

* batch_size -> text_batch_size

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 22:16:20 +02:00
Nicholas Broad
161c0a2eec
Private repo TrainingArgument (#16707)
* private repo argument to trainer

* format

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2022-04-11 13:37:16 -04:00
Zachary Mueller
d4b3e359aa
Don't push checkpoints to hub in no_trainer scripts (#16703)
Adds checkpoint prefixes to the gitignore if `push_to_hub` is used along with `checkpointint_steps`
2022-04-11 12:42:45 -04:00
Yih-Dar
c04619ecf3
Enable more test_torchscript (#16679)
* update _create_and_check_torchscript

* Enable test_torchscript

* clear_class_registry

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:23:35 +02:00
Yih-Dar
3918d6a9d6
Reduce memory leak in _create_and_check_torchscript (#16691)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:22:28 +02:00
Yih-Dar
2109afae71
Rename the method test_torchscript (#16693)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:21:45 +02:00
Yih-Dar
40618ec29e
Fix TF_MASKED_LM_SAMPLE (#16698)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:19:28 +02:00
Suraj Patil
1471857f13
update decoder_vocab_size when resizing embeds (#16700) 2022-04-11 18:02:10 +02:00