* Fixed resize_token_embeddings for transfo_xl model
* Fixed resize_token_embeddings for transfo_xl.
Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.
* Updated docstring
* Fixed resizinhg cutoffs; added check for new size of embedding layer.
* Added test for resize_token_embeddings
* Fixed code quality
* Fixed unchanged cutoffs in model.config
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
* ElectraForQuestionAnswering
* udate __init__
* add test for electra qa model
* add ElectraForQuestionAnswering in auto models
* add ElectraForQuestionAnswering in all_model_classes
* fix outputs, input_ids defaults to None
* add ElectraForQuestionAnswering in docs
* remove commented line
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* Fix further regressions in tests relating to `output_attentions`
Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses
* Fix more regressions in `test_output_attentions`
* Fix issues with BertEncoder
* Rename related variables to `output_attentions`
* fix pytorch tests
* fix bert and gpt2 tf
* Fix most TF tests for `test_output_attentions`
* Fix linter errors and more TF tests
* fix conflicts
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* fix pytorch tests
* fix conflicts
* fix conflicts
* Fix linter errors and more TF tests
* fix tf tests
* make style
* fix isort
* improve output_attentions
* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add tpu and torchscipt for benchmark
* fix name in tests
* "fix email"
* make style
* better log message for tpu
* add more print and info for tpu
* allow possibility to print tpu metrics
* correct cpu usage
* fix test for non-install
* remove bugus file
* include psutil in testing
* run a couple of times before tracing in torchscript
* do not allow tpu memory tracing for now
* make style
* add torchscript to env
* better name for torch tpu
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Better None gradients handling
* Apply Style
* Apply Style
* Create a loss class per task to compute its respective loss
* Add loss classes to the ALBERT TF models
* Add loss classes to the BERT TF models
* Add question answering and multiple choice to TF Camembert
* Remove prints
* Add multiple choice model to TF DistilBERT + loss computation
* Add question answering model to TF Electra + loss computation
* Add token classification, question answering and multiple choice models to TF Flaubert
* Add multiple choice model to TF Roberta + loss computation
* Add multiple choice model to TF XLM + loss computation
* Add multiple choice and question answering models to TF XLM-Roberta
* Add multiple choice model to TF XLNet + loss computation
* Remove unused parameters
* Add task loss classes
* Reorder TF imports + add new model classes
* Add new model classes
* Bugfix in TF T5 model
* Bugfix for TF T5 tests
* Bugfix in TF T5 model
* Fix TF T5 model tests
* Fix T5 tests + some renaming
* Fix inheritance issue in the AutoX tests
* Add tests for TF Flaubert and TF XLM Roberta
* Add tests for TF Flaubert and TF XLM Roberta
* Remove unused piece of code in the TF trainer
* bugfix and remove unused code
* Bugfix for TF 2.2
* Apply Style
* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name
* Apply style
* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling
* Fix TF optimizations tests and apply style
* Remove useless parameter
* Bugfix and apply style
* Fix TF Trainer prediction
* Now the TF models return the loss such as their PyTorch couterparts
* Apply Style
* Ignore some tests output
* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.
* Fix names for SQuAD data
* Apply Style
* Fix conflicts with 2.11 release
* Fix conflicts with 2.11
* Fix wrongname
* Add better documentation on the new create_optimizer function
* Fix isort
* logging_dir: use same default as PyTorch
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Refactor tensor creation in tokenizers.
* Make sure to convert string to TensorType
* Refactor convert_to_tensors_
* Introduce numpy tensor creation
* Format
* Add unittest for TensorType creation from str
* sorting imports
* Added unittests for numpy tensor conversion.
* Do not use in-place version for squeeze as numpy doesn't provide such feature.
* Added extra parameter prepend_batch_axis: bool on prepare_for_model.
* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.
* style.
* numpy tests require_torch for now while flax not merged.
* Hopefully will make flake8 happy.
* One more time 🎶
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
* pass on tokenizer to pipeline
* order input names when convert to onnx
* update style
* remove unused imports
* make ordered inputs list needs to be mutable
* add test custom bert model
* remove unused imports
* better api
* improve automatic setting of global attention mask
* fix longformer bug
* fix global attention mask in test
* fix global attn mask flatten
* fix slow tests
* update docstring
* update docs and make more robust
* improve attention mask
* add multiple choice for longformer
* add models to docs
* adapt docstring
* add test to longformer
* add longformer for mc in init and modeling auto
* fix tests
* added LongformerForQuestionAnswering
* add LongformerForQuestionAnswering
* fix import for LongformerForMaskedLM
* add LongformerForQuestionAnswering
* hardcoded sep_token_id
* compute attention_mask if not provided
* combine global_attention_mask with attention_mask when provided
* update example in docstring
* add assert error messages, better attention combine
* add test for longformerForQuestionAnswering
* typo
* cast gloabl_attention_mask to long
* make style
* Update src/transformers/configuration_longformer.py
* Update src/transformers/configuration_longformer.py
* fix the code quality
* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.
* Use Split enum + always output the label name
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* first commit
* bug fixes
* better examples
* undo padding
* remove wrong VOCAB_FILES_NAMES
* License
* make style
* make isort happy
* unit tests
* integration test
* make `black` happy by undoing `isort` changes!!
* lint
* no need for the padding value
* batch_size not bsz
* remove unused type casting
* seqlen not seq_len
* staticmethod
* `bert` selfattention instead of `n2`
* uint8 instead of bool + lints
* pad inputs_embeds using embeddings not a constant
* black
* unit test with padding
* fix unit tests
* remove redundant unit test
* upload model weights
* resolve todo
* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
* increase unittest coverage
* Distributed eval: SequentialDistributedSampler + gather all results
* For consistency only write to disk from world_master
Close https://github.com/huggingface/transformers/issues/4272
* Working distributed eval
* Hook into scripts
* Fix#3721 again
* TPU.mesh_reduce: stay in tensor space
Thanks @jysohn23
* Just a small comment
* whitespace
* torch.hub: pip install packaging
* Add test scenarii
* Add index to be returned by NerPipeline to allow for the creation of
* Add entity groups
* Convert entity list to dict
* Add entity to entity_group_disagg atfter updating entity gorups
* Change 'group' parameter to 'grouped_entities'
* Add unit tests for grouped NER pipeline case
* Correct variable name typo for NER_FINETUNED_MODELS
* Sync grouped tests to recent test updates
* Added generic ONNX conversion script for PyTorch model.
* WIP initial TF support.
* TensorFlow/Keras ONNX export working.
* Print framework version info
* Add possibility to check the model is correctly loading on ONNX runtime.
* Remove quantization option.
* Specify ONNX opset version when exporting.
* Formatting.
* Remove unused imports.
* Make functions more generally reusable from other part of the code.
* isort happy.
* flake happy
* Export only feature-extraction for now
* Correctly check inputs order / filter before export.
* Removed task variable
* Fix invalid args call in load_graph_from_args.
* Fix invalid args call in convert.
* Fix invalid args call in infer_shapes.
* Raise exception and catch in caller function instead of exit.
* Add 04-onnx-export.ipynb notebook
* More WIP on the notebook
* Remove unused imports
* Simplify & remove unused constants.
* Export with constant_folding in PyTorch
* Let's try to put function args in the right order this time ...
* Disable external_data_format temporary
* ONNX notebook draft ready.
* Updated notebooks charts + wording
* Correct error while exporting last chart in notebook.
* Adressing @LysandreJik comment.
* Set ONNX opset to 11 as default value.
* Set opset param mandatory
* Added ONNX export unittests
* Quality.
* flake8 happy
* Add keras2onnx dependency on extras["tf"]
* Pin keras2onnx on github master to v1.6.5
* Second attempt.
* Third attempt.
* Use the right repo URL this time ...
* Do the same for onnxconverter-common
* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2
* Correct commit hash.
* Addressing PR review: Optimization are enabled by default.
* Addressing PR review: small changes in the notebook
* setup.py comment about keras2onnx versioning.
* Improvements to the wandb integration
* small reorg + no global necessary
* feat(trainer): log epoch and final metrics
* Simplify logging a bit
* Fixup
* Fix crash when just running eval
Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
* Created using Colaboratory
* [examples] reorganize files
* remove run_tpu_glue.py as superseded by TPU support in Trainer
* Bugfix: int, not tuple
* move files around
* Rewritten batch support in pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix imports sorting 🔧
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Set pad_to_max_length=True by default on Pipeline.
* Set pad_to_max_length=False for generation pipelines.
Most of generation models doesn't have padding token.
* Address @joeddav review comment: Uniformized *args.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address @joeddav review comment: Uniformized *args (second).
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* first copy & past commit from Bert and morgans LSH code
* add easy way to compare to trax original code
* translate most of function
* make trax lsh self attention deterministic with numpy seed + copy paste code
* add same config
* add same config
* make layer init work
* implemented hash_vectors function for lsh attention
* continue reformer translation
* hf LSHSelfAttentionLayer gives same output as trax layer
* refactor code
* refactor code
* refactor code
* refactor
* refactor + add reformer config
* delete bogus file
* split reformer attention layer into two layers
* save intermediate step
* save intermediate step
* make test work
* add complete reformer block layer
* finish reformer layer
* implement causal and self mask
* clean reformer test and refactor code
* fix merge conflicts
* fix merge conflicts
* update init
* fix device for GPU
* fix chunk length init for tests
* include morgans optimization
* improve memory a bit
* improve comment
* factorize num_buckets
* better testing parameters
* make whole model work
* make lm model work
* add t5 copy paste tokenizer
* add chunking feed forward
* clean config
* add improved assert statements
* make tokenizer work
* improve test
* correct typo
* extend config
* add complexer test
* add new axial position embeddings
* add local block attention layer
* clean tests
* refactor
* better testing
* save intermediate progress
* clean test file
* make shorter input length work for model
* allow variable input length
* refactor
* make forward pass for pretrained model work
* add generation possibility
* finish dropout and init
* make style
* refactor
* add first version of RevNet Layers
* make forward pass work and add convert file
* make uploaded model forward pass work
* make uploaded model forward pass work
* refactor code
* add namedtuples and cache buckets
* correct head masks
* refactor
* made reformer more flexible
* make style
* remove set max length
* add attention masks
* fix up tests
* fix lsh attention mask
* make random seed optional for the moment
* improve memory in reformer
* add tests
* make style
* make sure masks work correctly
* detach gradients
* save intermediate
* correct backprob through gather
* make style
* change back num hashes
* rename to labels
* fix rotation shape
* fix detach
* update
* fix trainer
* fix backward dropout
* make reformer more flexible
* fix conflict
* fix
* fix
* add tests for fixed seed in reformer layer
* fix trainer typo
* fix typo in activations
* add fp16 tests
* add fp16 training
* support fp16
* correct gradient bug in reformer
* add fast gelu
* re-add dropout for embedding dropout
* better naming
* better naming
* renaming
* finalize test branch
* finalize tests
* add more tests
* finish tests
* fix
* fix type trainer
* fix fp16 tests
* fix tests
* fix tests
* fix tests
* fix issue with dropout
* fix dropout seeds
* correct random seed on gpu
* finalize random seed for dropout
* finalize random seed for dropout
* remove duplicate line
* correct half precision bug
* make style
* refactor
* refactor
* docstring
* remove sinusoidal position encodings for reformer
* move chunking to modeling_utils
* make style
* clean config
* make style
* fix tests
* fix auto tests
* pretrained models
* fix docstring
* update conversion file
* Update pretrained_models.rst
* fix rst
* fix rst
* update copyright
* fix test path
* fix test path
* fix small issue in test
* include reformer in generation tests
* add docs for axial position encoding
* finish docs
* Update convert_reformer_trax_checkpoint_to_pytorch.py
* remove isort
* include sams comments
* remove wrong comment in utils
* correct typos
* fix typo
* Update reformer.rst
* applied morgans optimization
* make style
* make gpu compatible
* remove bogus file
* big test refactor
* add example for chunking
* fix typo
* add to README
* First commit to add a TF version of the trainer.
* Make the TF trainer closer to what looks the PT trainer
* Refactoring common code between the PT and TF trainer into an util file.
* Some bugfix + better similarity with the PT trainer
* Add missing class in transformers init
* Bugfix over prediction + use classification report instead of simple metrics
* Fix name error
* Fix optimization tests + style
* Apply style
* Several bugfix for multi-gpu training
* Apply style
* Apply style
* Add glue example for the TF trainer
* Several bugix + address the reviews
* Fix on the TF training args file
* Add a debug mode
* Bugfix in utils_ner.py when segment_ids is None
* Apply style
* Apply style
* Add TPU strategy
* Fix selection strategy
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models
When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.
I'd rather always use the default cache
* Add GenerationPipeline
* Fix parameter names
* Correct parameter __call__ parameters
* Add model type attribute and correct function calls for prepare_input
* Take out trailing commas from init attributes
* Remove unnecessary tokenization line
* Implement support for multiple text inputs
* Apply generation support for multiple input text prompts
* Take out tensor coersion
* Take out batch index
* Add text prompt to return sequence
* Squeeze token tensore before decoding
* Return only a single list of sequences if only one prompt was used
* Correct results variable name
* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2
* Registedred AutoModelWithLMHead for both pt and t
* Update docstring for GenerationPipeline
* Add kwargs parameter to mode.generate
* Take out kwargs parameter after all
* Add generation pipeline example in pipeline docstring
* Fix max length by squeezing tokens tensor
* Apply ensure_tensor_on_device to pytorch tensor
* Include generation step in torch.no_grad
* Take out input from prepare_xlm_input and set 'en' as default xlm_language
* Apply framework specific encoding during prepare_input
* Format w make style
* Move GenerationPipeline import to follow proper import sorting
* Take out training comma from generation dict
* Apply requested changes
* Change name to TextGenerationPipeline
* Apply TextGenerationPipeline rename to __init___
* Changing alias to
* Set input mapping as input to ensure_tensor_on_device
* Fix assertion placement
* Add test_text_generation
* Add TextGenerationPipeline to PipelineCommonTests
* Take out whitespace
* Format __init__ w black
* Fix __init__ style
* Forman __init___
* Add line to end of __init__
* Correct model tokenizer set for test_text_generation
* Ensure to return list of list, not list of string (to pass test)
* Limit test models to only 3 to limit runtime to address circleCI timeout error
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict
* Fix blank result list
* Add TextGenerationPipeline to pipelines.rst
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH
* Fix incorrectly moved result list
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Add back generation line and make style
* Take out blank whitespace
* Apply new alis, text-generation, to test_pipelines
* Fix text generation alias in test
* Update src/transformers/pipelines.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* doc
* [tests] Add sample files for a regression task
* [HUGE] Trainer
* Feedback from @sshleifer
* Feedback from @thomwolf + logging tweak
* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
* [glue] Use default max_seq_length of 128 like before
* [glue] move DataTrainingArguments around
* [ner] Change interface of InputExample, and align run_{tf,pl}
* Re-align the pl scripts a little bit
* ner
* [ner] Add integration test
* Fix language_modeling with API tweak
* [ci] Tweak loss target
* Don't break console output
* amp.initialize: model must be on right device before
* [multiple-choice] update for Trainer
* Re-align to 827d6d6ef0
* First pass on utility classes and python tokenizers
* finishing cleanup pass
* style and quality
* Fix tests
* Updating following @mfuntowicz comment
* style and quality
* Fix Roberta
* fix batch_size/seq_length inBatchEncoding
* add alignement methods + tests
* Fix OpenAI and Transfo-XL tokenizers
* adding trim_offsets=True default for GPT2 et RoBERTa
* style and quality
* fix tests
* add_prefix_space in roberta
* bump up tokenizers to rc7
* style
* unfortunately tensorfow does like these - removing shape/seq_len for now
* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
* Adding doc and docstrings
* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* remove output_past from pt
* make style
* add optional input length for gpt2
* add use cache to prepare input
* save memory in gpt2
* correct gpt2 test inputs
* make past input optional for gpt2
* finish use_cache for all models
* make style
* delete modeling_gpt2 change in test file
* correct docstring
* correct is true statements for gpt2
* [examples] Generate argparsers from type hints on dataclasses
* [HfArgumentParser] way simpler API
* Restore run_language_modeling.py for easier diff
* [HfArgumentParser] final tweaks from code review