Commit Graph

8805 Commits

Author SHA1 Message Date
Sylvain Gugger
6292532fd1
Update doc writing guide (#15350) 2022-01-26 12:54:11 -05:00
François REMY
19732cc07a
Fix 'eval_split_name' described as defaulting to 'train' (#15348)
The default is correct (`test`) but the description is not.
2022-01-26 10:19:38 -05:00
Ngo Quang Huy
5d8b98608c
Fix deepspeed docs (#15346) 2022-01-26 07:24:33 -05:00
Jacob Deppen
96161ac408
make table into valid Markdown table syntax (#15337) 2022-01-26 07:10:00 -05:00
Yih-Dar
24e2fa1590
Fix encoder-decoder models when labels is passed (#15172)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-26 10:14:46 +01:00
Maciej Pawłowski
e79a0faeae
Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)
* Added missing code in exemplary notebook - custom datasets fine-tuning

Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).

* Changes requested in the review - keep the code as simple as possible
2022-01-25 17:26:17 -05:00
Steven Liu
0501beb846
Add 🤗 Accelerate tutorial (#15263)
* add accelerate tutorial

* 🖍 apply feedback from review

* 📝 make edits
2022-01-25 13:46:11 -06:00
NielsRogge
637e81752a
[Tests] Fix test (#15324)
* Fix Swin device

* Remove print statement
2022-01-25 15:48:25 +01:00
Sylvain Gugger
e695470794
Avoid using get_list_of_files (#15287)
* Avoid using get_list_of_files in config

* Wip, change tokenizer file getter

* Remove call in tokenizer files

* Remove last call to get_list_model_files

* Better tests

* Unit tests for new function

* Document bad API
2022-01-25 09:41:21 -05:00
Sylvain Gugger
e65bfc0971 Try without bad instruction 2022-01-24 15:55:29 -05:00
Sylvain Gugger
81156d20cd
Add model like (#14992)
* Add new model like command

* Bad doc-styler

* black and doc-styler, stop fighting!

* black and doc-styler, stop fighting!

* At last

* Clean up

* Typo

* Bad doc-styler

* Bad doc-styler

* All good maybe?

* Use constants

* Add doc and type hints

* More cleaning

* Add doc

* Fix Copied from

* Doc template

* Use typing.Pattern instead

* Framework-specific files

* Fixes

* Select frameworks clean model init

* Deal with frameworks in main init

* fixes

* Last fix

* Prompt user for info

* Delete exemple config

* Last fixes

* Add test config

* Fix bug with model_type included in each other

* Fixes

* More fixes

* More fixes

* Adapt config

* Remove print statements

* Will fix tokenization later, leave it broken for now

* Add test

* Quality

* Try this way

* Debug

* Maybe by setting the path?

* Let's try another way

* It should go better when actually passing the arg...

* Remove debug statements and style

* Fix config

* Add tests

* Test require the three backends

* intermediate commit

* Revamp pattern replacements and start work on feature extractors

* Adapt model info

* Finalize code for processors

* Fix in main init additions

* Finish questionnaire for processing classes

* Fix file name

* Fix for real

* Fix patterns

* Style

* Remove needless warnings

* Copied from should work now.

* Include Copied form in blocks

* Add test

* More fixes and tests

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comment

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-01-24 15:25:10 -05:00
Patrick von Platen
457dd4392b
[Examples] Correct run ner label2id for fine-tuned models (#15017)
* up

* up

* make style

* apply sylvains suggestions

* apply changes to accelerate as well

* more changes

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-24 21:18:04 +01:00
Patrick von Platen
8d6acc6c29
[Beam Search] Correct returned beam scores (#14654)
* better

* save intermediate

* finish code

* up

* docs

* Apply suggestions from code review

* up

* add compute transition  beam scores function to model and make sure scores are correct with eos

* apply nicos comments

* Apply suggestions from code review

* another fix
2022-01-24 21:13:21 +01:00
novice
e239fc3b0b
Replace NystromformerTokenizer with AutoTokenizer (#15312) 2022-01-24 16:33:43 +01:00
Patrick von Platen
dcaa5100c9
[LayoutLMV2 Tests] Make sure input is on GPU (#15314)
* [LayoutLMV2 Tests] Make sure input is on GPU

* correct empty line
2022-01-24 15:54:47 +01:00
Yih-Dar
c15bb3fe19
[Fix doc example] fix missing import jnp (#15291)
* fix missing import jnp

* Fix missing jax and k=1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-24 14:54:23 +01:00
Nicolas Patry
eac4aecc3d
Remove old debug code leftover. (#15306) 2022-01-24 07:27:45 -05:00
Sylvain Gugger
2390b2cf65
Fix a typo in tag addition (#15286)
* Fix a typo in tag addition

* Put it back again
2022-01-24 07:21:42 -05:00
Kamal Raj
c972433a85
Update CONTRIBUTING.md (#15290)
Fix typo in doc
2022-01-24 07:21:31 -05:00
Patrick von Platen
4bf97415a4
Update eval.py (#15310) 2022-01-24 11:46:38 +01:00
Patrick von Platen
b7cb126ccc
[PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests (#15272)
* [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests

* Update .github/workflows/self-nightly-scheduled.yml

* change lines

* Apply suggestions from code review
2022-01-24 10:53:53 +01:00
Sylvain Gugger
6ac77534bf
Refine errors for pretrained objects (#15261)
* Refine errors for pretrained objects

* PoC to avoid using get_list_of_files

* Adapt tests to use new errors

* Quality + Fix PoC

* Revert "PoC to avoid using get_list_of_files"

This reverts commit cb93b7cae8.

* Revert "Quality + Fix PoC"

This reverts commit 3ba6d0d4ca.

* Fix doc

* Revert PoC

* Add feature extractors

* More tests and PT model

* Adapt error message

* Feature extractor tests

* TF model

* Flax model and test

* Merge flax auto tests

* Add tokenization

* Fix test
2022-01-21 15:00:09 -05:00
Patrick von Platen
80af1048cf
[Wav2Vec2ProcessorWithLM] improve multi processing (#15247)
* [Wav2Vec2ProcessorWithLM] improve multi processing

* close pool
2022-01-21 18:30:10 +01:00
Sylvain Gugger
4cff3fae11 Second failing test 2022-01-21 12:19:28 -05:00
Sylvain Gugger
f6253147df Skip failing test 2022-01-21 12:03:21 -05:00
Yih-Dar
7799b6128f
[Fix doc example] TFLayoutLMForTokenClassification: missing import tf (#15268)
* fix import

* remove import torch

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-21 11:18:11 -05:00
Patrick von Platen
11afb709ec
[Robust Speech Challenge] Add timeline (#15274) 2022-01-21 17:12:09 +01:00
Evandros
3c3cf17a49
fix link (#15278) 2022-01-21 09:52:13 -05:00
Ye Wang
95a75a715f
Specify providers explicitly in ORT session initialization (#15235)
* Specify providers explicitly in ORT session initialization

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-01-21 15:49:29 +01:00
lewtun
833635e259
Move BART + ONNX example to research_projects (#15271)
* Move BART + ONNX example to research_projects

* Add author information
2022-01-21 14:47:34 +01:00
novice
183ce067e0
Fix (#15276)
* Fix

* make style

* Remove trailing commas

* make style
2022-01-21 08:46:15 -05:00
lewtun
b4ce313e6c
Prepare ONNX export for torch v1.11 (#15270)
* Prepare ONNX export for torch v1.11
2022-01-21 14:28:19 +01:00
Sylvain Gugger
126bddd1ba Add module_spec to new model 2022-01-21 08:12:44 -05:00
Jonas Kuball
c962c2adbf
Adds missing module_specs for usages of _LazyModule (#15230)
* Add missing __spec__ for transformers.models.auto

* Moves the __spec__-test to the UnitTest class

* Adds module_spec to all instances of _LazyModule

* Refactors an old test from pytest to unittest
2022-01-21 07:30:12 -05:00
NielsRogge
6c7b68d414
[ViTMAE] Add image pretraining script (#15242)
* Add script

* Improve script

* Fix data collator

* Update README

* Add label_names argument

* Apply suggestions from code review

* Add config parameters

* Update script

* Fix bug

* Improve README

* Improve README and add test

* Fix import

* Add image_column_name
2022-01-21 12:11:08 +01:00
novice
d43e308e7f
Add Swin Transformer (#15085)
* Add all files

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updates

* Apply suggestions from review

* Fix failing tests

* Update __init__.py

* Update configuration_swin.py

* Update auto_factory.py

* Fix pytests

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix tests and default checkpoint

* Fix Recursion error

* Code quality

* Remove copied from

* Update modeling_swin.py

* Code quality

* Update modeling_swin.py

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

* Fix feature extractor

* Fix code quality

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

* Update configuration_swin.py

* Update default checkpoint

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/model_doc/swin.mdx

Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu>

* Update conversion script

* Reformat conversion script

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu>
2022-01-21 12:10:41 +01:00
NielsRogge
515ed3ad2a
Fix doc examples (#15257) 2022-01-20 21:51:51 +01:00
Lysandre Debut
ad7390636d
Tentative workflow improvement (#15255) 2022-01-20 13:51:19 -05:00
Matt
57820456bd
Fix crash when logs are empty because Keras has wiped them out of spite (#15258) 2022-01-20 18:40:48 +00:00
kumapo
1fc0fa4617
Make sure to raise NotImplementedError with correct method name (#15253) 2022-01-20 10:37:35 -05:00
Matt
f00f22a3e2
Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels (#15234)
* Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels

* Add test for numpy scalar inputs
2022-01-20 14:26:51 +00:00
Yih-Dar
4a6a35bc65
[Fix doc example] missing import (#15240)
* fix import

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-20 08:47:24 -05:00
Kamal Raj
08b41b413a
Update pipelines.mdx (#15243)
fix few spelling mistakes
2022-01-20 08:46:48 -05:00
Anton Lozhkov
85ea462c08
Update README.md (#15246)
Clarify OVH instruction
2022-01-20 13:40:26 +03:00
Anton Lozhkov
e57468b8a8
Update README.md (#15239)
Add an OVHcloud tutorial URL for the Robust Speech Challenge
2022-01-20 11:46:50 +03:00
jsnfly
baf1ebe9f0
Fix usage of additional kwargs in from_encoder_decoder_pretrained in encoder-decoder models (#15056)
* [EncoderDecoder] Add test for usage of extra kwargs

* [EncoderDecoder] Fix usage of extra kwargs in from pretrained

* [EncoderDecoder] apply suggested changes (passing **kwargs_encoder)

* [EncoderDecoder] create new test function and make sure it passes

Co-authored-by: jonas <jsnfly@gmx.de>
2022-01-19 23:00:33 +01:00
Nicolas Patry
3fefee9910
Make chuking smartly (long files) work on asr ctc_with_lm. (#15219)
* [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.

* Slow test with functionality.

* Fixing regular test.

* fix for batch size 1

* Handling batch outside `rescale_Stride`.

- Renamed to `rescale_stride`.

* Disable equality in the test.

* Remove print.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-19 21:04:26 +01:00
NielsRogge
80f7296091
Update Trainer code example (#15070)
* Update code example

* Fix code quality

* Add comment
2022-01-19 20:15:12 +01:00
NielsRogge
ac227093e4
Add ViLT (#14895)
* First commit

* Add conversion script

* Make conversion script work for base model

* More improvements

* Update conversion script, works for vqa

* Add indexing argument to meshgrid

* Make conversion script work for ViltForPreTraining

* Add ViltForPreTraining to docs

* Fix device issue

* Add processor

* Add MinMaxResize to feature extractor

* Implement call method of ViltProcessor

* Fix tests

* Add integration test

* Add loss calculation for VQA

* Improve tests

* Improve some more tests

* Debug tests

* Small improvements

* Add support for attention_mask

* Remove mask_it

* Add pixel_mask

* Add tests for ViltFeatureExtractor

* Improve tests

* Add ViltForNaturalLanguageVisualReasoning

* Add ViltForNaturalLanguageVisualReasoning to conversion script

* Minor fixes

* Add support for image_embeds, update docstrings to markdown

* Update docs to markdown

* Improve conversion script

* Rename ViltForPreTraining to ViltForMaskedLM

* Improve conversion script

* Convert docstrings to markdown

* Fix code example of retrieval model

* Properly convert masked language model

* Add integration test for nlvr

* Fix code quality

* Apply suggestions from code review

* Add copied from statements

* Fix pretrained_config_archive_map

* Fix docs

* Add model to README

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply more suggestions from code review

* Make code more readable

* Add ViltForNaturalLanguageVisualReasoning to the tests

* Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering

* Replace pixel_values_2 by single tensor

* Add hidden_states and attentions

* Fix one more test

* Fix all tests

* Update year

* Fix rebase issues

* Fix another rebase issue

* Remove ViltForPreTraining from auto mapping

* Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval

* Make it possible to use BertTokenizerFast in the processor

* Use BertTokenizerFast by default

* Rename ViltForNaturalLanguageVisualReasoning, define custom model output

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-19 19:51:59 +01:00
Patrick von Platen
691878ee2f
Update README.md (#15233) 2022-01-19 18:03:17 +01:00