* move the helper code into testing_utils
* port test_trainer_distributed to work with pytest
* improve docs
* simplify notes
* doc
* doc
* style
* doc
* further improvements
* torch might not be available
* real fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* New run_clm script
* Formatting
* More comments
* Remove unused imports
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Change link to the hub
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* better reports
* a whole bunch of reports in their own files
* clean up
* improvements
* github artifacts experiment
* style
* complete the report generator with multiple improvements/fixes
* fix
* save all reports under one dir to easy upload
* can remove temp failing tests
* doc fix
* some cleanup
* Make Seq2Seq Trainer more similar to Trainer
* fix typo
* fix seq2seq trainer
* remove from tests
* remove lock
* remove train files
* delete test files
* correct typo
* check at init
* make sure trainer is not slowed down on TPU
* correct isort
* remove use cache
* fix use cache
* add last use chache = false
Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.
This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py
* Start simplification
* More progress
* Finished script
* Address comments and update tests instructions
* Wrong test
* Accept files as inputs and fix test
* Update src/transformers/trainer_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Fix labels and add combined score
* Add special labels
* Update TPU command
* Revert to old label strategy
* Use model labels
* Fix for STT-B
* Styling
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Code styling
* Fix review comments
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* add CustomHFIndex
* typo in config
* update tests
* add custom dataset example
* clean script
* update test data
* minor in test
* docs
* docs
* style
* fix imports
* allow to pass the indexed dataset directly
* update tests
* use multiset DPR
* address thom and patrick's comments
* style
* update dpr tokenizer
* add output_dir flag in use_own_knowledge_dataset.py
* allow custom datasets in examples/rag/finetune.py
* add test for custom dataset in distributed rag retriever
* splitting fast and slow tokenizers [WIP]
* [WIP] splitting sentencepiece and tokenizers dependencies
* update dummy objects
* add name_or_path to models and tokenizers
* prefix added to file names
* prefix
* styling + quality
* spliting all the tokenizer files - sorting sentencepiece based ones
* update tokenizer version up to 0.9.0
* remove hard dependency on sentencepiece 🎉
* and removed hard dependency on tokenizers 🎉
* update conversion script
* update missing models
* fixing tests
* move test_tokenization_fast to main tokenization tests - fix bugs
* bump up tokenizers
* fix bert_generation
* update ad fix several tokenizers
* keep sentencepiece in deps for now
* fix funnel and deberta tests
* fix fsmt
* fix marian tests
* fix layoutlm
* fix squeezebert and gpt2
* fix T5 tokenization
* fix xlnet tests
* style
* fix mbart
* bump up tokenizers to 0.9.2
* fix model tests
* fix tf models
* fix seq2seq examples
* fix tests without sentencepiece
* fix slow => fast conversion without sentencepiece
* update auto and bert generation tests
* fix mbart tests
* fix auto and common test without tokenizers
* fix tests without tokenizers
* clean up tests lighten up when tokenizers + sentencepiece are both off
* style quality and tests fixing
* add sentencepiece to doc/examples reqs
* leave sentencepiece on for now
* style quality split hebert and fix pegasus
* WIP Herbert fast
* add sample_text_no_unicode and fix hebert tokenization
* skip FSMT example test for now
* fix style
* fix fsmt in example tests
* update following Lysandre and Sylvain's comments
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial callback proposal
* Finish various callbacks
* Post-rebase conflicts
* Fix tests
* Don't use something that's not set
* Documentation
* Remove unwanted print.
* Document all models can work
* Add tests + small fixes
* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Fix TF tests
* Real fix this time
* This one should work
* Fix typo
* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Changed name to all no_... arguments and all references to them, inverting the boolean condition
* Change benchmark tests to use new Benchmark Args
* Update src/transformers/benchmark/benchmark_args_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/benchmark/benchmark.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix Style. Add --no options in help
* fix some part of tests
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* fix all tests
* make style
* add backwards compability
* make backwards compatible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* Formatting / renaming prior to actual work
* First commit
* improve comments
* Retrieval evaluation scripts
* refactor to include modeling outputs + MPI retriever
* Fix rag-token model + refactor
* Various fixes + finetuning logic
* use_bos fix
* Retrieval refactor
* Finetuning refactoring and cleanup
* Add documentation and cleanup
* Remove set_up_rag_env.sh file
* Fix retrieval wit HF index
* Fix import errors
* Fix quality errors
* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
* fix quality
* Fix RAG Sequence generation
* minor cleanup plus initial tests
* fix test
* fix tests 2
* Comments fix
* post-merge fixes
* Improve readme + post-rebase refactor
* Extra dependencied for tests
* Fix tests
* Fix tests 2
* Refactor test requirements
* Fix tests 3
* Post-rebase refactor
* rename nlp->datasets
* RAG integration tests
* add tokenizer to slow integration test and allow retriever to run on cpu
* add tests; fix position ids warning
* change structure
* change structure
* add from encoder generator
* save working solution
* make all integration tests pass
* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
* don't save paths
* delete unnecessary imports
* pass config to AutoTokenizer.from_pretrained for Rag tokenizers
* init wiki_dpr only once
* hardcode legacy index and passages paths (todo: add the right urls)
* finalize config
* finalize retriver api and config api
* LegacyIndex index download refactor
* add dpr to autotokenizer
* make from pretrained more flexible
* fix ragfortokengeneration
* small name changes in tokenizer
* add labels to models
* change default index name
* add retrieval tests
* finish token generate
* align test with previous version and make all tests pass
* add tests
* finalize tests
* implement thoms suggestions
* add first version of test
* make first tests work
* make retriever platform agnostic
* naming
* style
* add legacy index URL
* docstrings + simple retrieval test for distributed
* clean model api
* add doc_ids to retriever's outputs
* fix retrieval tests
* finish model outputs
* finalize model api
* fix generate problem for rag
* fix generate for other modles
* fix some tests
* save intermediate
* set generate to default
* big refactor generate
* delete rag_api
* correct pip faiss install
* fix auto tokenization test
* fix faiss install
* fix test
* move the distributed logic to examples
* model page
* docs
* finish tests
* fix dependencies
* fix import in __init__
* Refactor eval_rag and finetune scripts
* start docstring
* add psutil to test
* fix tf test
* move require torch to top
* fix retrieval test
* align naming
* finish automodel
* fix repo consistency
* test ragtokenizer save/load
* add rag model output docs
* fix ragtokenizer save/load from pretrained
* fix tokenizer dir
* remove torch in retrieval
* fix docs
* fixe finetune scripts
* finish model docs
* finish docs
* remove auto model for now
* add require torch
* remove solved todos
* integrate sylvains suggestions
* sams comments
* correct mistake on purpose
* improve README
* Add generation test cases
* fix rag token
* clean token generate
* fix test
* add note to test
* fix attention mask
* add t5 test for rag
* Fix handling prefix in finetune.py
* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* ready for PR
* cleanup
* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST
* fix
* perfectionism
* revert change from another PR
* odd, already committed this one
* non-interactive upload workaround
* backup the failed experiment
* store langs in config
* workaround for localizing model path
* doc clean up as in https://github.com/huggingface/transformers/pull/6956
* style
* back out debug mode
* document: run_eval.py --num_beams 10
* remove unneeded constant
* typo
* re-use bart's Attention
* re-use EncoderLayer, DecoderLayer from bart
* refactor
* send to cuda and fp16
* cleanup
* revert (moved to another PR)
* better error message
* document run_eval --num_beams
* solve the problem of tokenizer finding the right files when model is local
* polish, remove hardcoded config
* add a note that the file is autogenerated to avoid losing changes
* prep for org change, remove unneeded code
* switch to model4.pt, update scores
* s/python/bash/
* missing init (but doesn't impact the finetuned model)
* cleanup
* major refactor (reuse-bart)
* new model, new expected weights
* cleanup
* cleanup
* full link
* fix model type
* merge porting notes
* style
* cleanup
* have to create a DecoderConfig object to handle vocab_size properly
* doc fix
* add note (not a public class)
* parametrize
* - add bleu scores integration tests
* skip test if sacrebleu is not installed
* cache heavy models/tokenizers
* some tweaks
* remove tokens that aren't used
* more purging
* simplify code
* switch to using decoder_start_token_id
* add doc
* Revert "major refactor (reuse-bart)"
This reverts commit 226dad15ca.
* decouple from bart
* remove unused code #1
* remove unused code #2
* remove unused code #3
* update instructions
* clean up
* move bleu eval to examples
* check import only once
* move data+gen script into files
* reuse via import
* take less space
* add prepare_seq2seq_batch (auto-tested)
* cleanup
* recode test to use json instead of yaml
* ignore keys not needed
* use the new -y in transformers-cli upload -y
* [xlm tok] config dict: fix str into int to match definition (#7034)
* [s2s] --eval_max_generate_length (#7018)
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* extending to support allen_nlp wmt models
- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models
* sync with changes
* s/fsmt-wmt/wmt/ in model names
* s/fsmt-wmt/wmt/ in model names (p2)
* s/fsmt-wmt/wmt/ in model names (p3)
* switch to a better checkpoint
* typo
* make non-optional args such - adjust tests where possible or skip when there is no other choice
* consistency
* style
* adjust header
* cards moved (model rename)
* use best custom hparams
* update info
* remove old cards
* cleanup
* s/stas/facebook/
* update scores
* s/allen_nlp/allenai/
* url maps aren't needed
* typo
* move all the doc / build /eval generators to their own scripts
* cleanup
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix indent
* duplicated line
* style
* use the correct add_start_docstrings
* oops
* resizing can't be done with the core approach, due to 2 dicts
* check that the arg is a list
* style
* style
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL
* Some changes are still to be done
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)
* Removed comments
* Fixed quality
* Changed warning to info
* adding demo
* Update examples/lxmert/requirements.txt
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update examples/lxmert/checkpoint.sh
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* added user input for .py demo
* updated model loading, data extrtaction, checkpoints, and lots of other automation
* adding normalizing for bounding boxes
* Update requirements.txt
* some optimizations for extracting data
* added data extracting file
* added data extraction file
* minor fixes to reqs and readme
* Style
* remove options
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add cache_dir to save features TextDataset
This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).
* style
* Allow tests in examples to use cuda or fp16,if they are available
The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
which made the test to work without cuda. This example is having issue when running with fp16
thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.
Resolves some of: #5057
* Unwanted import of is_apex_available was removed
* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.
* Incorrectly sorted imports fixed
* The model needs to be converted to half precision
* Formatted single line if condition statement to multiline
* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
True by default, setting USE_CUDA to False should result in the examples not using CUDA
* Format some of the code in test_examples file
* The improper import of is_apex_available was sorted
* Formatted the code to keep the style standards
* The comma at the end of list giving a flake8 issue was fixed
* Import sort was fixed
* Removed the clean_test_dir function as its not used right now
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs
* respect after=True for tempfile, simplify code
* comments
* comment fix
* put `before` last in args, so can make debug even faster
* Add more token classification examples
* POS tagging example
* Phrase chunking example
* PR review fixes
* Add conllu to third party list (used in token classification examples)
* replace capsys with the more refined CaptureStderr/CaptureStdout
* Update examples/seq2seq/test_seq2seq_examples.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner
* Add metadata
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Temporarily de-activate TPU CI
* Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
* the test now works again (#6371)
* correct pl link in readme (#6364)
* refactor almost identical tests (#6339)
* refactor almost identical tests
* important to add a clear assert error message
* make the assert error even more descriptive than the original bt
* Small docfile fixes (#6328)
* Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}
* Patch models
* BERT and TF BERT info
s
* Update check_repo
* Ci GitHub caching (#6382)
* Cache Github Actions CI
* Remove useless file
* Colab button (#6389)
* Add colab button
* Add colab link for tutorials
* Fix links for open in colab (#6391)
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove dup (leftover from merge)
* convert the test into the new refactored format
* stick to using the current_step as is, without ++
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* add pl_glue example test
* for now just test that it runs, next validate results of eval or predict?
* complete the run_pl_glue test to validate the actual outcome
* worked on my machine, CI gets less accuracy - trying higher epochs
* match run_pl.sh hparms
* more epochs?
* trying higher lr
* for now just test that the script runs to a completion
* correct the comment
* if cuda is available, add --fp16 --gpus=1 to cover more bases
* style
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
* Support for Comet.ml
* Need to import comet first
* Log this model, not the one in the backprop step
* Log args as hyperparameters; use framework to allow fine control
* Log hyperparameters with context
* Apply black formatting
* isort fix integrations
* isort fix __init__
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer_tf.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address review comments
* Style + Quality, remove Tensorboard import test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* support --lr_scheduler with multiple possibilities
* correct the error message
* add a note about supported schedulers
* cleanup
* cleanup2
* needs the argument default
* style
* add another assert in the test
* implement requested changes
* cleanups
* fix relative import
* cleanup
* enable easy checkout switch
allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.
* make isort happy
* examples needs one too
* Switch from return_tuple to return_dict
* Fix test
* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests
* AutoModels
Tiny tweaks
* Style
* Final changes before merge
* Re-order for simpler review
* Final fixes
* Addressing @sgugger's comments
* Test MultipleChoice
* Rework TF trainer (#6038)
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import
* Switch from return_tuple to return_dict
* Fix test
* Add recent model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import