Stas Bekman
5da33f8729
[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True)
+ tests ( #16657 )
...
* add low_cpu_mem_usage tests
* wip: revamping
* wip
* install /usr/bin/time
* wip
* cleanup
* cleanup
* cleanup
* cleanup
* cleanup
* fix assert
* put the wrapper back
* cleanup; switch to bert-base-cased
* Trigger CI
* Trigger CI
2022-04-14 18:10:05 -07:00
Stas Bekman
ce2fef2ad2
[trainer / deepspeed] fix hyperparameter_search ( #16740 )
...
* [trainer / deepspeed] fix hyperparameter_search
* require optuna
* style
* oops
* add dep in the right place
* create deepspeed-testing dep group
* Trigger CI
2022-04-14 17:24:38 -07:00
code-review-doctor
1b7de41a07
Fix issue avoid-missing-comma found at https://codereview.doctor ( #16768 )
2022-04-14 16:42:27 -04:00
Sanchit Gandhi
de8b06f9bf
[SpeechEncoderDecoderModel] Fix bug in reshaping labels ( #16748 )
2022-04-14 19:02:40 +01:00
NielsRogge
048443db86
Improve image classification example ( #16585 )
...
* Improve README
* Make dataset_name argument optional
* Improve local data
* Fix bug
* Improve README some more
* Apply suggestions from code review
* Improve README
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-14 18:10:52 +02:00
Sylvain Gugger
3e4eec47f5
Kill async pushes when calling push_to_hub with blocking=True ( #16755 )
2022-04-14 10:02:29 -04:00
Stas Bekman
c21e1071a7
[deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop ( #16717 )
...
* [deepspeed / m2m_100] make deepspeed 3 work with layerdrop
* fix
* revert last
2022-04-14 06:51:55 -07:00
Zachary Mueller
89293a0f6b
Make nightly install dev accelerate ( #16783 )
2022-04-14 09:41:02 -04:00
Sylvain Gugger
b151ddb9b9
Fix batch size in evaluation loop ( #16763 )
...
* Fix batch size in evaluation loop
* remove debug statement
2022-04-14 09:22:54 -04:00
Sanchit Gandhi
d8269eb4d5
[Flax .from_pretrained
] Raise a warning if model weights are not in float32 ( #16762 )
...
* [Flax] Raise a warning if model weights are not in float32
* apply suggestions and few small changes
* reorder wording for better readability
2022-04-14 11:52:15 +02:00
Nicolas Patry
195fbbb6cf
Enabling Tapex
in table question answering pipeline. ( #16663 )
...
* Enabling `Tapex` in table question answering pipeline.
* Questions are independant for Tapex, making the test respect that.
* Missing extra space.
2022-04-14 09:06:14 +02:00
Bhadresh Savani
442dc45645
[Doctest] added doctest changes for electra ( #16675 )
...
* added doctest changes for electra
* fixed doctest tests
* updated changes
2022-04-13 22:39:00 +02:00
Zachary Mueller
be752d12f8
Fixup no_trainer examples scripts and add more tests ( #16765 )
...
* Change tracking to store_true
* Remove step param and use it in the log dictionary directly
* use vars(args) when passing args to init_trackers
* Include tracking tests since tensorboard is already a dep
2022-04-13 14:40:48 -04:00
Stas Bekman
3a16ab25c8
[self-scheduled ci] explain where dependencies are ( #16757 )
2022-04-13 12:28:02 -04:00
Tu Vu
34ef029dc0
Add self training code for text classification ( #16738 )
...
* Add self-training code for text-classification
* Add self-training code for text-classification
* Add self-training code for text-classification
* Add self-training code for text-classification
* Add self-training code for text-classification
* Delete strata
2022-04-13 12:03:24 -04:00
Sylvain Gugger
8e0d3b427f
Add defensive check for config num_labels and id2label ( #16709 )
...
* Add defensive check for config num_labels and id2label
* Actually check value...
* Only warning inside init plus better error message
2022-04-13 11:28:19 -04:00
Yih-Dar
6bed0647fe
Reduce Funnel PT/TF diff ( #16744 )
...
* Make Funnel Test less flaky
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-13 17:19:52 +02:00
Joao Gante
0b8f697219
CI: setup-dependent pip cache ( #16751 )
...
* Setup-dependent pip cache
* Do not restore from old versions
2022-04-13 16:19:14 +01:00
Stas Bekman
ac43a40e6a
[modeling_utils] better explanation of ignore keys ( #16741 )
2022-04-13 08:03:20 -07:00
Jeremy Fisher
0235bc57ab
Fix and improve CTRL doctests ( #16573 )
...
* Improve CTRL doctests
* Fix `CTRLForSequenceClassification` flakiness with inconsistent losses
* Remove unused
* Fixup
* Add CTRL to documentation_tests.txt
* Fix control code not being first
* Add output assertions
* Change from sshleifer/tiny-ctrl -> ctrl
* Run `make fixup`
* apply `list` to output logits shape for clarity
* Reduce output loss precision to make assertion more robust
* Add assertion of control code being first
* Fix docstyle
* upper case sentence following control code
* Weird bug fixes
* Add a better generation example
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-04-13 15:44:31 +02:00
Michael Chung
06b4aac9eb
Add Doc Test for GPT-J ( #16507 )
...
* Required the values GPTJ unfortunately cannot run the model =)
* Added the file to the doc tests
* Run Fixup and Style
* Fixed with the test versions of gptj. Ran Style and Fixup.
* Trigger ci
* A Minor Change to License
* Fixed spacing added to the benchmark_utils. Then refactored tests to const variables.
* Removed strings that were included as default parameters anyways.
Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
2022-04-13 15:04:47 +02:00
Stas Bekman
12bfa97a43
[from_pretrained] refactor find_mismatched_keys ( #16706 )
2022-04-13 07:50:15 -04:00
davidleonfdez
9f8bfe703c
Fix #16660 (tokenizers setters of ids of special tokens) ( #16661 )
...
* Fix setters of *_token_id properties of SpecialTokensMixin
* Test setters of common tokens ids
* Move to a separate test checks of setters of tokens ids
* Add independent test for ByT5
* Add Canine test
* Test speech to text
2022-04-13 07:49:06 -04:00
Patrick von Platen
b24201fa44
[Doctests] Fix all T5 doc tests ( #16646 )
...
* [Doctests] Fix all T5 doc tests
* make style
* Update docs/source/en/model_doc/t5.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply Sylvains comments
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-13 11:36:54 +02:00
Santiago Castro
f7196f2e63
Fix decoding score comparison when using logits processors or warpers ( #10638 )
...
* Normalize using a logits warper
* Add a flag in `generate` to support the logit renormalization
* Add in RAG
2022-04-13 09:37:33 +01:00
Joao Gante
eb5bdcdfa5
TF generate: handle case without cache in beam search ( #16704 )
2022-04-12 20:46:10 +01:00
Minh Chien Vu
9c9db751e2
add Bigbird ONNX config ( #16427 )
...
* add Bigbird ONNX config
2022-04-12 20:46:06 +02:00
Sanchit Gandhi
a960406722
[FlaxWav2Vec2Model] Fix bug in attention mask ( #16725 )
...
* [FlaxWav2Vec2Model] Fix bug in attention mask
* more fixes
* add (Flax)SpeechEncoderDecoderModel PT-FX cross-test
2022-04-12 19:48:24 +02:00
Sanchit Gandhi
6adefba3f0
[FlaxSpeechEncoderDecoder] Fix input shape bug in weights init ( #16728 )
...
* [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init
* make style
2022-04-12 19:33:57 +02:00
hiromu
1bac40db8a
Add Doc Tests for Reformer PyTorch ( #16565 )
...
* start working
* fix: ReformerForQA doctest
* fix: ReformerModelWithLMHead doctest
* fix: ReformerModelForSC doctest
* fix: ReformerModelForMLM doctest
* add: documentation_tests.txt
* make fixup
* change: ReformerModelForSC doctest
* change: checkpoint
2022-04-12 18:52:31 +02:00
Joao Gante
d7f7f29f29
TF: remove set_tensor_by_indices_to_value ( #16729 )
2022-04-12 17:51:47 +01:00
Anmol Joshi
a315988bae
Moved functions to pytorch_utils.py ( #16625 )
...
* Moved functions to pytorch_utils.py
* isort formatting
* Reverted tf changes
* isort, make fix-copies
* documentation fix
* Fixed Conv1D import
* Reverted research examples file
* backward compatibility for pytorch_utils
* missing import
* isort fix
2022-04-12 12:38:50 -04:00
Sylvain Gugger
0711c45eae
Remove duplicate header ( #16732 )
2022-04-12 12:37:13 -04:00
Nicolas Patry
a192f61e08
Change the chunk_iter function to handle ( #16730 )
...
* Change the chunk_iter function to handle
the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.
We need to remove the right striding on the previous item.
* Remove commented line.
2022-04-12 18:25:02 +02:00
Anmol Joshi
cc034f72eb
Replace assertion with exception ( #16720 )
...
* Updated assertions to exceptions
* updated assertions to exceptions
* bug fixes
* fix-copies
* Update modeling_ctrl.py
* Update src/transformers/models/ctrl/modeling_tf_ctrl.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_tf_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update modeling_led.py
* Update modeling_led.py
* Update modeling_led.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-12 11:47:01 -04:00
Shang Zhang
14daa6102a
Qdqbert example add benchmark script with ORT-TRT ( #16592 )
...
* add ort-trt benchmark script
* Update README.md
* ort version can be newer
* formatting
* specify ORT version
2022-04-12 11:13:59 -04:00
Heerak Son
db3edd050b
Update run_translation_no_trainer.py ( #16652 )
...
args.model_name_or_path -> args.config_name
fix it
2022-04-12 08:55:12 -04:00
smelm
b9f12bedd3
Only call get_output_embeddings when tie_word_embeddings is set ( #16667 )
...
This avoids an unnecessary call and avoids problems during
initialization of class hierarchies.
Co-authored-by: Samuel Melm <samuel.melm@stud.uni-heidelberg.de>
2022-04-12 07:55:44 -04:00
Michael Chung
924484ee4a
Add Doc Test GPT-2 ( #16439 )
...
* First Pass All Tests Pass
* WIP
* Adding file to documentation tests
* Change the base model for the example in the doc test.
* Fix Code Styling by running
make fixup
* Called Style
* Reverted to gpt2 model rather than distill gpt2
Then used a token classification model over a sequence model for an example.
* Fix Styling Issue
* Hopefully ignores the formatting issue.
Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
2022-04-12 12:11:03 +02:00
Patrick von Platen
70851a6bf0
[Bart] correct doc test ( #16722 )
2022-04-12 10:19:49 +02:00
Zachary Mueller
69233cf03b
Fix example logs repeating themselves ( #16669 )
...
Move declaration of log streams to before tests, so that results won't get compounded on top of each other
2022-04-11 16:25:16 -04:00
Yih-Dar
dce33f2150
Improve PT/TF equivalence test ( #16557 )
...
* add error message
* Use names in the error message
* allow ModelOutput
* rename to check_pt_tf_outputs and move outside
* fix style
* skip past_key_values in a better way
* Add comments
* improve code for label/loss
* make the logic clear by moving the ignore keys out
* fix _postprocessing_to_ignore
* fix _postprocessing_to_ignore: create new outputs from the remaining fields
* ignore past_key_values in TFGPT2 models for now
* make check_pt_tf_outputs better regarding names
* move check_pt_tf_models outside
* rename methods
* remove test_pt_tf_model_equivalence in TFCLIPModelTest
* Reduce TFViTMAEModelTest.test_pt_tf_model_equivalence
* move prepare_pt_inputs_from_tf_inputs outside check_pt_tf_models
* Fix quality
* Clean-up TFLxmertModelTester.test_pt_tf_model_equivalence
* Fix quality
* fix
* fix style
* Clean-up TFLEDModelTest.test_pt_tf_model_equivalence
* Fix quality
* add docstring
* improve comment
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 22:19:12 +02:00
Yih-Dar
7f7300856d
Handle image_embeds in ViltModel ( #16696 )
...
* update
* batch_size -> text_batch_size
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 22:16:20 +02:00
Nicholas Broad
161c0a2eec
Private repo TrainingArgument ( #16707 )
...
* private repo argument to trainer
* format
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2022-04-11 13:37:16 -04:00
Zachary Mueller
d4b3e359aa
Don't push checkpoints to hub in no_trainer
scripts ( #16703 )
...
Adds checkpoint prefixes to the gitignore if `push_to_hub` is used along with `checkpointint_steps`
2022-04-11 12:42:45 -04:00
Yih-Dar
c04619ecf3
Enable more test_torchscript ( #16679 )
...
* update _create_and_check_torchscript
* Enable test_torchscript
* clear_class_registry
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:23:35 +02:00
Yih-Dar
3918d6a9d6
Reduce memory leak in _create_and_check_torchscript ( #16691 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:22:28 +02:00
Yih-Dar
2109afae71
Rename the method test_torchscript ( #16693 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:21:45 +02:00
Yih-Dar
40618ec29e
Fix TF_MASKED_LM_SAMPLE ( #16698 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:19:28 +02:00
Suraj Patil
1471857f13
update decoder_vocab_size when resizing embeds ( #16700 )
2022-04-11 18:02:10 +02:00