Commit Graph

7165 Commits

Author SHA1 Message Date
Sylvain Gugger
700229f8a4
Fixes in the templates (#10951)
* Fixes in the templates

* Define in all cases

* Dimensionality -> Dimension

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-29 17:36:13 -04:00
Stas Bekman
05c966f24b
[vulnerability] dep fix (#10954)
Fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pygments/open

@LysandreJik
2021-03-29 17:25:47 -04:00
Stas Bekman
fb7fca718a
[trainer metrics] fix cpu mem metrics; reformat runtime metric (#10937)
* fix cpu mem metrics; reformat runtime metric

* adjust dependency

* extend docs

* soft dependency

* cleanup

* fix the runtime metric issue

* restore

* move docs, cross reference from 2 places, improve

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-29 13:47:02 -07:00
Daniel Stancl
5057213bcc
Add examples/multiple-choice/run_swag_no_trainer.py (#10934)
* Initial commit

* Another bunch of updates

* make style quliaty + delete debug arg from bash script

* Use compue_metrics func

* Do a few fixes

* Add copyright

* Fix typos
2021-03-29 16:41:09 -04:00
pcuenca
ae6b6963ad
Allow use of pre-computed lengths when grouping by length. (#10953)
A new argument `length_column_name` has been added to
`TrainingArguments`, with default value `"length"`. If this column
exists and `group_by_length` is `True`, the train sampler will use
it for grouping rather than computing it before training starts.

This is an optimization that allows the user to prepare data for fast
processing, preventing sequential access to the dataset as described in
issue #10909.
2021-03-29 15:44:19 -04:00
Sylvain Gugger
4002f95eb6 Remove duplicate code 2021-03-29 15:27:12 -04:00
Daniel Stancl
d7b50ce469
Add examples/run_ner_no_trainer.py (#10902)
* Add NER example with accelerate library

* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.

* Fix metric calculation

* make style quality

* mv ner_no_trainer to token-classification dir

* Delete --debug flag from running script

* hf_datasets -> raw_datasets

* Make a few slight adjustments

* Add an informative comment + rewrite a help comment

* Change header

* Fix a few things

* Enforce to use fast tokenizers only

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Change bash script: python3 -> accelerate launch

* make style

* Add a few missing things (see below)

* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality

* Add PyTorch no trainer example to the example README.md

* Remove --do-train from args as being redundant for now

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Remove some obsolete args.do_train conditions from the script

* Delete --do_train from bash running script

* Delete use_slow_tokenizer from args

* Add unintentionally removed flag --label_all_tokens

* Delete --debug flag from running script
2021-03-29 15:11:23 -04:00
Sylvain Gugger
06a6fea782
Instantiate model only once in pipeline (#10888)
* Instantiate model only once in pipeline

* Remove documentation of deprecated method

* Add FutureWarning

* Update src/transformers/pipelines/base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-29 10:39:14 -04:00
Masatoshi Suzuki
cc2366bbb9
Ignore not initialized NO_CONFIG_TOKENIZERs (#10936) 2021-03-29 10:26:15 -04:00
WybeKoper
ddea8771c6
Updated colab links in readme of examples (#10932)
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-29 08:47:09 -04:00
Guillaume Filion
b3544e4cc5
Return global attentions (see #7514) (#10906) 2021-03-29 15:00:23 +03:00
Bhadresh Savani
4f21e1ddd6
fixed finename (#10939) 2021-03-28 09:48:12 -07:00
Sylvain Gugger
b0595d33c1
Add ImageFeatureExtractionMixin (#10905)
* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test
2021-03-26 11:23:56 -04:00
Stas Bekman
3c27d246e5
[vulnerability] fix dependency (#10914)
this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open
2021-03-26 09:06:11 -04:00
Tomy Hsieh
4b2b50aa7b
Rename NLP library to Datasets library (#10920)
* Rename NLP library to Datasets library

* Update github template

* Fix styling
2021-03-26 08:07:59 -04:00
lexhuismans
86c6f8a8b1
Fix comment (#10886) 2021-03-25 21:23:56 +03:00
Sylvain Gugger
9856c9213d Reorder init imports 2021-03-25 12:51:43 -04:00
Sylvain Gugger
e70068a719 Fix typo 2021-03-25 12:40:25 -04:00
Sylvain Gugger
f183a7a3c3 Sort init imports 2021-03-25 12:38:54 -04:00
Amir Tahmasbi
4684bfc757
Layout lm tf 2 (#10636)
* Added embeddings layer

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Added model to doc README

* Added tests

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Fixed a typo in embeddings layer

* Removed imports

* Fixed formatting issues, imports, tests

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Removed imports

* Fixed small formatting issues

* Removed duplicates import from main __init__.py

* Chnaged deafult arg to true for adding  pooling layer to tf layoutlm

* Fixed formatting issues

* Style

* Added copied from to classes copied from bert

* Fixed doc strings examples to work with layoutlm inputs

* Removed PyTorch reference in doc strings example

* Added integration tests

* Cleaned up initialization file

* Updated model checkpoint identifiers

* Fixed imports

Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-25 12:32:38 -04:00
Philipp Schmid
1a3e0c4fe6
make local setup more clearer and added missing links (#10899) 2021-03-25 09:01:31 -04:00
Jethro Kuan
5f1491d3b3
run_glue_no_trainer: datasets -> raw_datasets (#10898)
Use the correct variable (raw_datasets) instead of the module (datasets)
where appropriate.
2021-03-25 08:28:17 -04:00
Sidd Karamcheti
1c06240e1b
Update training args ignore_skip_data -> ignore_data_skip (#10891) 2021-03-24 16:44:51 -04:00
Sylvain Gugger
3b20e910b4
Remove version warning in pretrained BART models (#10890)
* Remove version warning in pretrained BART models

* Put it at the base model
2021-03-24 15:21:40 -04:00
Lysandre Debut
3c12e3c1c4
Fix overflowing bad word ids (#10889)
* Removes overflowing bad word IDs

* Raise warning
2021-03-24 15:13:56 -04:00
Eliza Szczechla
1f5ea9e04a
Add notebook on fine-tuning Bart (#10883)
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
2021-03-24 11:03:37 -04:00
imzhengzx
f81077fcf3
error type of tokenizer in __init__ definition (#10879)
the orignal code in line 246 is
```
tokenizer: Optional["PreTrainedTokenizerBase"] = None,
```

it should be
```
tokenizer: Optional[PreTrainedTokenizerBase] = None,
```
2021-03-24 11:00:14 -04:00
Sylvain Gugger
1aed2b908e
Add new notebook links in the docs (#10876) 2021-03-24 09:45:08 -04:00
Sylvain Gugger
a735f727cc
Fix test_trainer_distributed (#10875) 2021-03-23 19:03:06 -04:00
Philipp Schmid
8c297cdb30
Sm trainer smp init fix (#10870)
* rewrote is_sagemaker_model_parallel_available

* added is_sagemaker_model_parallel_available to SageMakerTrainer

* removed unnecessary mp_parameters as TrainingArguments

* make style happy

* added mp_parameters again to parse mp-specific args.
2021-03-23 20:07:55 +01:00
RafaelWO
d4d4447d53
fixed prefix_allowed_tokens_fn docstring in generate() (#10862) 2021-03-23 13:48:22 -04:00
Bhadresh Savani
7ef40120a0
[Examples] Added predict stage and Updated Example Template (#10868)
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-23 10:37:59 -07:00
Stas Bekman
fb2b89840b
[file_utils] import refactor (#10859)
* import refactor

* fix the fallback
2021-03-23 09:41:41 -07:00
Lysandre
3f48b2bc3e Update stable docs 2021-03-23 11:01:16 -04:00
Philipp Schmid
77ffd5edd5
Amazon SageMaker Documentation (#10867)
* added finished documentation

* changed version from 1.6 to 1.6.0 for distributed

* updated versions

* updated urls
2021-03-23 10:56:44 -04:00
Sylvain Gugger
bf1f43fbd7
Update the example template for a no Trainer option (#10865) 2021-03-23 10:02:39 -04:00
Marta Maślankowska
2eb596f085
Fix p_mask cls token masking in qa pipeline (#10863) 2021-03-23 09:08:39 -04:00
Bhadresh Savani
eb330e8904
fixed typo (#10861) 2021-03-23 08:15:28 -04:00
Stas Bekman
e21f89f64c
fix nan in full-fp16 label_smoothing eval (#10815) 2021-03-22 19:23:24 -07:00
Sylvain Gugger
b5b957a65c
Make convert_to_onnx runable as script again (#10857) 2021-03-22 22:16:39 -04:00
Patrick von Platen
77bf3fe787
[Generate] Add save mode logits processor to remove nans and infs if necessary (#10769)
* push

* finish

* finish

* make fix copies

* change name
2021-03-23 01:00:05 +03:00
Eliza Szczechla
9f8fa4e973
Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856)
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
2021-03-22 15:05:39 -04:00
Ruan Chaves
a8d4d6776d
Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases (#10823)
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config.

* Reformat single quotes as double quotes.
2021-03-22 14:04:51 -04:00
Boris Dayma
125ccead71
feat(wandb): logging and configuration improvements (#10826)
* feat: ensure unique artifact id

* feat: allow manual init

* fix: simplify reinit logic

* fix: no dropped value + immediate commits

* fix: wandb use in sagemaker

* docs: improve documenation and formatting

* fix: typos

* docs: improve formatting
2021-03-22 10:45:17 -04:00
Sidd Karamcheti
b230181d41
Add simple one character fix so that on_step_begin and on_step_end are called at the right times (#10839) 2021-03-22 09:15:39 -04:00
Stas Bekman
24ab5b08a3
[makefile] autogenerate target (#10814)
* autogenerate target

* clarify comment
2021-03-22 09:14:22 -04:00
Sebastian Olsson
2c6684239f
Correct AutoConfig call docstrings (#10822) 2021-03-22 09:12:44 -04:00
Stas Bekman
8fb4671811
[vulnerability] in example deps fix (#10817)
Takes care of:
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/jinja2/open

@LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-22 09:05:24 -04:00
dependabot[bot]
dbfe379514
Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert (#10818)
Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.2 to 2.11.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/2.11.2...2.11.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-22 08:54:50 -04:00
Qiushi Pan
29904a967b
Update FINE_TUNE_XLSR_WAV2VEC2.md (#10849)
Fix typo.
2021-03-22 07:58:59 -04:00