* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Added Big Bird Fast Tokenizer initial file
* style fixes
* flake fixes
* Added big bird fast tokenizer to init files
* Added big bird fast to Auto tokenization
* fix styles
* minor quality fixes
* Added initial test code
* Fix SpmConverter when precompiled_charsmap doesn't exist
* fixed post processor
* minor style fix
* minor fix input names
* Actually fix identity normalization
* style
* Added token type ids to fast tokenizer
* style
* flake fix
* fix copies
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* Add the ImageClassificationPipeline
* Code review
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Have `load_image` at the module level
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* add electra model to flax
* Remove Electra Next Sentence Prediction model added by mistake
* fix parameter sharing and loosen equality threshold
* fix styling issues
* add mistaken removen imports
* fix electra table
* Add FlaxElectra to automodels and fixe docs
* fix issues pointed out the PR
* fix flax electra to comply with latest changes
* remove stale class
* add copied from
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add flax roberta
* make style
* correct initialiazation
* modify model to save weights
* fix copied from
* fix copied from
* correct some more code
* add more roberta models
* Apply suggestions from code review
* merge from master
* finish
* finish docs
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Rebase with master
* Minor bug fix in docs
* Copy files from adding_luke_v2 and improve docs
* change the default value of use_entity_aware_attention to True
* remove word_hidden_states
* fix head models
* fix tests
* fix the conversion script
* add integration tests for the pretrained large model
* improve docstring
* Improve docs, make style
* fix _init_weights for pytorch 1.8
* improve docs
* fix tokenizer to construct entity sequence with [MASK] entity when entities=None
* Make fix-copies
* Make style & quality
* Bug fixes
* Add LukeTokenizer to init
* Address most comments by @patil-suraj and @LysandreJik
* rename _compute_extended_attention_mask to get_extended_attention_mask
* add comments to LukeSelfAttention
* fix the documentation of the tokenizer
* address comments by @patil-suraj, @LysandreJik, and @sgugger
* improve docs
* Make style, quality and fix-copies
* Improve docs
* fix docs
* add "entity_span_classification" task
* update example code for LukeForEntitySpanClassification
* improve docs
* improve docs
* improve the code example in luke.rst
* rename the classification layer in LukeForEntityClassification from typing to classifier
* add bias to the classifier in LukeForEntitySpanClassification
* update docs to use fine-tuned hub models in code examples of the head models
* update the example sentences
* Make style & quality
* Add require_torch to tokenizer tests
* Add require_torch to tokenizer tests
* Address comments by @sgugger and add community notebooks
* Make fix-copies
Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Start writing BERT-Japanese doc
* Fix typo, Update toctree
* Modify model file to use comment for document, Add examples
* Clean bert_japanese by make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Split a big code block into two
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add prefix >>> to all lines in code blocks
* Clean bert_japanese by make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft of deit
* More improvements
* Remove DeiTTokenizerFast from init
* Conversion script works
* Add DeiT to ViT conversion script
* Add tests, add head model, add support for deit in vit conversion script
* Update model checkpoint names
* Update image_mean and image_std, set resample to bicubic
* Improve docs
* Docs improvements
* Add DeiTForImageClassificationWithTeacher to init
* Address comments by @sgugger
* Improve feature extractors
* Make fix-copies
* Minor fixes
* Address comments by @patil-suraj
* All models uploaded
* Fix tests
* Remove labels argument from DeiTForImageClassificationWithTeacher
* Fix-copies, style and quality
* Fix tests
* Fix typo
* Multiple docs improvements
* More docs fixes
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* AutoFeatureExtractor
* Init and first tests
* Tests
* Damn you gitignore
* Quality
* Defensive test for when not all backends are here
* Use pattern for Speech2Text models
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit fd4e591bce.
* Place ViT in alphabetical order
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added embeddings layer
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Added model to doc README
* Added tests
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Fixed a typo in embeddings layer
* Removed imports
* Fixed formatting issues, imports, tests
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Removed imports
* Fixed small formatting issues
* Removed duplicates import from main __init__.py
* Chnaged deafult arg to true for adding pooling layer to tf layoutlm
* Fixed formatting issues
* Style
* Added copied from to classes copied from bert
* Fixed doc strings examples to work with layoutlm inputs
* Removed PyTorch reference in doc strings example
* Added integration tests
* Cleaned up initialization file
* Updated model checkpoint identifiers
* Fixed imports
Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* create modeling_tf_rag
* add tests for tf
* add tf tests
* revert wrong pt commit
* further refactor
* further refactor
* refactor
* Update modeling_tf_rag.py
- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate
* delete colab pieces of code
* Show case of greedy "generate"
Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
* cosmetic update
* correct typos
* update
* push some progress
* make easy check
* fix rag save from pretrained
* Update src/transformers/modeling_tf_utils.py
* remove commented out lines
* delete unnecessary lines
* add simple test case for nq_checkpoint
Add nq_checkpoint test to show that current version without hack still fails
* temporarily put ugly hack back again
* Add TFRagSequenceForGeneration!!
* __init__.py , import TFRagSequenceForGeneration
* Add TFRagSequence tests!
* rag init.py - add TFRagSequenceForGeneration
* fix from_pretrained
* fix prepare_inputs_for_generation
* Beam search for RagToken!
* minor clean up
* add tf.cast in TFRagModel
* More tf.cast
* Add all remaining tests (still have issues)
* delete all T5 related
* make style
* fix load weight prefix
* fix bart
* fix return_dict for tf_rag
make all tests pass .. Hooray
* fix some tests
* fix code quality
* fix qualtiy check
* finish tests tf rag
* add tf rag to docs
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Delete outdated comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve doc strings
* add generative model classes
* fix adjust token logic
* refactor generate for TFRag
* using shape_list, not _get_shape
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* axis=[1]->axis=1
* delete NEED_HELP comment
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Indicating model is in a developing state in docstrings
As suggested by Julien
* small last changes
* apply sylvains suggestions
* finish tf rag
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add LayoutLMForSequenceClassification and integration tests
Improve docs
Add LayoutLM notebook to list of community notebooks
* Make style & quality
* Address comments by @sgugger, @patrickvonplaten and @LysandreJik
* Fix rebase with master
* Reformat in one line
* Improve code examples as requested by @patrickvonplaten
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* first commit
* change phobert to phoBERT as per author in overview
* v3 and v4 both runs on same code hence there is no need to differentiate them
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* create model
* add integration
* save current state
* make integration tests pass
* add one more test
* add explanation to tests
* remove from bart
* add padding
* remove unnecessary test
* make all tests pass
* re-add cookie cutter tests
* finish PyTorch
* fix attention test
* Update tests/test_modeling_common.py
* revert change
* remove unused file
* add string to doc
* save intermediate
* make tf integration tests pass
* finish tf
* fix doc
* fix docs again
* add led to doctree
* add to auto tokenizer
* added tips for led
* make style
* apply jplus statements
* correct tf longformer
* apply lysandres suggestions
* apply sylvains suggestions
* Apply suggestions from code review
* Use extlinks to point hyperlink with the version of code
* Point to version on release and master until then
* Apply style
* Correct links
* Add missing backtick
* Simple missing backtick after all.
Co-authored-by: Raghavendra Sugeeth P S <raghav-5305@raghav-5305.csez.zohocorpin.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* First commit: adding all files from tapas_v3
* Fix multiple bugs including soft dependency and new structure of the library
* Improve testing by adding torch_device to inputs and adding dependency on scatter
* Use Python 3 inheritance rather than Python 2
* First draft model cards of base sized models
* Remove model cards as they are already on the hub
* Fix multiple bugs with integration tests
* All model integration tests pass
* Remove print statement
* Add test for convert_logits_to_predictions method of TapasTokenizer
* Incorporate suggestions by Google authors
* Fix remaining tests
* Change position embeddings sizes to 512 instead of 1024
* Comment out positional embedding sizes
* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
* Added more model names
* Fix truncation when no max length is specified
* Disable torchscript test
* Make style & make quality
* Quality
* Address CI needs
* Test the Masked LM model
* Fix the masked LM model
* Truncate when overflowing
* More much needed docs improvements
* Fix some URLs
* Some more docs improvements
* Test PyTorch scatter
* Set to slow + minify
* Calm flake8 down
* First commit: adding all files from tapas_v3
* Fix multiple bugs including soft dependency and new structure of the library
* Improve testing by adding torch_device to inputs and adding dependency on scatter
* Use Python 3 inheritance rather than Python 2
* First draft model cards of base sized models
* Remove model cards as they are already on the hub
* Fix multiple bugs with integration tests
* All model integration tests pass
* Remove print statement
* Add test for convert_logits_to_predictions method of TapasTokenizer
* Incorporate suggestions by Google authors
* Fix remaining tests
* Change position embeddings sizes to 512 instead of 1024
* Comment out positional embedding sizes
* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
* Added more model names
* Fix truncation when no max length is specified
* Disable torchscript test
* Make style & make quality
* Quality
* Address CI needs
* Test the Masked LM model
* Fix the masked LM model
* Truncate when overflowing
* More much needed docs improvements
* Fix some URLs
* Some more docs improvements
* Add add_pooling_layer argument to TapasModel
Fix comments by @sgugger and @patrickvonplaten
* Fix issue in docs + fix style and quality
* Clean up conversion script and add task parameter to TapasConfig
* Revert the task parameter of TapasConfig
Some minor fixes
* Improve conversion script and add test for absolute position embeddings
* Improve conversion script and add test for absolute position embeddings
* Fix bug with reset_position_index_per_cell arg of the conversion cli
* Add notebooks to the examples directory and fix style and quality
* Apply suggestions from code review
* Move from `nielsr/` to `google/` namespace
* Apply Sylvain's comments
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* add model parallelism to T5EncoderModel
add model parallelism to T5EncoderModel
* remove decoder from T5EncoderModel parallelize
* uodate T5EncoderModel docs
* Extend T5ModelTest for T5EncoderModel
* fix T5Stask using range for get_device_map
* fix style
Co-authored-by: Ahmed Elnaggar <elnaggar@rostlab.informatik.tu-muenchen.de>
* remove make on the fly linear embedding
* start refactor
* big first refactor
* save intermediate
* save intermediat
* correct mask issue
* save tests
* refactor padding masks
* make all tests pass
* further refactor
* make pegasus test pass
* fix bool if
* fix leftover tests
* continue
* bart renaming
* delete torchscript test hack
* fix imports in tests
* correct shift
* fix docs and repo cons
* re-add fix for FSTM
* typo in test
* fix typo
* fix another typo
* continue
* hot fix 2 for tf
* small fixes
* refactor types linting
* continue
* finish refactor
* fix import in tests
* better bart names
* further refactor and add test
* delete hack
* apply sylvains and lysandres commens
* small perf improv
* further perf improv
* improv perf
* fix typo
* make style
* small perf improv
* Add TFGPT2ForSequenceClassification based on DialogRPT
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* code refactor for latest other TF PR
* code refactor
* code refactor
* Update modeling_tf_gpt2.py
* fix mems in xlnet
* fix use_mems
* fix use_mem_len
* fix use mems
* clean docs
* fix tf typo
* make xlnet tf for generation work
* fix tf test
* refactor use cache
* add use cache for missing models
* correct use_cache in generate
* correct use cache in tf generate
* fix tf
* correct getattr typo
* make sylvain happy
* change in docs as well
* do not apply to cookie cutter statements
* fix tf test
* make pytorch model fully backward compatible
* working on LongformerForSequenceClassification
* add TFLongformerForMultipleChoice
* add TFLongformerForTokenClassification
* use add_start_docstrings_to_model_forward
* test TFLongformerForSequenceClassification
* test TFLongformerForMultipleChoice
* test TFLongformerForTokenClassification
* remove test from repo
* add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice
* add requested classes to modeling_tf_auto.py
update dummy_tf_objects
fix tests
fix bugs in requested classes
* pass all tests except test_inputs_embeds
* sync with master
* pass all tests except test_inputs_embeds
* pass all tests
* pass all tests
* work on test_inputs_embeds
* fix style and quality
* make multi choice work
* fix TFLongformerForTokenClassification signature
* fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature
* fix mult choice
* fix mc hint
* fix input embeds
* fix input embeds
* refactor input embeds
* fix copy issue
* apply sylvains changes and clean more
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Tokenizers should be framework agnostic
* Run the slow tests
* Not testing
* Fix documentation
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Run on the real suite
* Fix slow tests
* First addition of Flax/Jax documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* make style
* Ensure input order match between Bert & Roberta
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Install dependencies "all" when building doc
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* wraps build_doc deps with ""
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing @sgugger comments.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use list to highlight JAX features.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Let's not look to much into the future for now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
* add training tests
* correct longformer
* fix docs
* fix some tests
* fix some more train tests
* remove ipdb
* fix multiple edge case model training
* fix funnel and prophetnet
* clean gpt models
* undo renaming of albert
* Output global_attentions in Longformer models
* make style
* small refactoring
* fix tests
* make fix-copies
* add for tf as well
* remove comments in test
* make fix-copies
* make style
* add docs
* make docstring pretty
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fd.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
* add CustomHFIndex
* typo in config
* update tests
* add custom dataset example
* clean script
* update test data
* minor in test
* docs
* docs
* style
* fix imports
* allow to pass the indexed dataset directly
* update tests
* use multiset DPR
* address thom and patrick's comments
* style
* update dpr tokenizer
* add output_dir flag in use_own_knowledge_dataset.py
* allow custom datasets in examples/rag/finetune.py
* add test for custom dataset in distributed rag retriever
* Add Documentation for GPT-1 Classification
* Add GPT-1 with Classification head
* Add tests for GPT-1 Classification
* Add GPT-1 For Classification to auto models
* Remove authorized missing keys, change checkpoint to openai-gpt
* [WIP] SP tokenizers
* fixing tests for T5
* WIP tokenizers
* serialization
* update T5
* WIP T5 tokenization
* slow to fast conversion script
* Refactoring to move tokenzier implementations inside transformers
* Adding gpt - refactoring - quality
* WIP adding several tokenizers to the fast world
* WIP Roberta - moving implementations
* update to dev4 switch file loading to in-memory loading
* Updating and fixing
* advancing on the tokenizers - updating do_lower_case
* style and quality
* moving forward with tokenizers conversion and tests
* MBart, T5
* dumping the fast version of transformer XL
* Adding to autotokenizers + style/quality
* update init and space_between_special_tokens
* style and quality
* bump up tokenizers version
* add protobuf
* fix pickle Bert JP with Mecab
* fix newly added tokenizers
* style and quality
* fix bert japanese
* fix funnel
* limite tokenizer warning to one occurence
* clean up file
* fix new tokenizers
* fast tokenizers deep tests
* WIP adding all the special fast tests on the new fast tokenizers
* quick fix
* adding more fast tokenizers in the fast tests
* all tokenizers in fast version tested
* Adding BertGenerationFast
* bump up setup.py for CI
* remove BertGenerationFast (too early)
* bump up tokenizers version
* Clean old docstrings
* Typo
* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports
* Clean up model documentation
* Formatting
* Preparation work
* Long lines
* Main work on rst files
* Cleanup all config files
* Syntax fix
* Clean all tokenizers
* Work on first models
* Models beginning
* FaluBERT
* All PyTorch models
* All models
* Long lines again
* Fixes
* More fixes
* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* Formatting / renaming prior to actual work
* First commit
* improve comments
* Retrieval evaluation scripts
* refactor to include modeling outputs + MPI retriever
* Fix rag-token model + refactor
* Various fixes + finetuning logic
* use_bos fix
* Retrieval refactor
* Finetuning refactoring and cleanup
* Add documentation and cleanup
* Remove set_up_rag_env.sh file
* Fix retrieval wit HF index
* Fix import errors
* Fix quality errors
* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
* fix quality
* Fix RAG Sequence generation
* minor cleanup plus initial tests
* fix test
* fix tests 2
* Comments fix
* post-merge fixes
* Improve readme + post-rebase refactor
* Extra dependencied for tests
* Fix tests
* Fix tests 2
* Refactor test requirements
* Fix tests 3
* Post-rebase refactor
* rename nlp->datasets
* RAG integration tests
* add tokenizer to slow integration test and allow retriever to run on cpu
* add tests; fix position ids warning
* change structure
* change structure
* add from encoder generator
* save working solution
* make all integration tests pass
* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
* don't save paths
* delete unnecessary imports
* pass config to AutoTokenizer.from_pretrained for Rag tokenizers
* init wiki_dpr only once
* hardcode legacy index and passages paths (todo: add the right urls)
* finalize config
* finalize retriver api and config api
* LegacyIndex index download refactor
* add dpr to autotokenizer
* make from pretrained more flexible
* fix ragfortokengeneration
* small name changes in tokenizer
* add labels to models
* change default index name
* add retrieval tests
* finish token generate
* align test with previous version and make all tests pass
* add tests
* finalize tests
* implement thoms suggestions
* add first version of test
* make first tests work
* make retriever platform agnostic
* naming
* style
* add legacy index URL
* docstrings + simple retrieval test for distributed
* clean model api
* add doc_ids to retriever's outputs
* fix retrieval tests
* finish model outputs
* finalize model api
* fix generate problem for rag
* fix generate for other modles
* fix some tests
* save intermediate
* set generate to default
* big refactor generate
* delete rag_api
* correct pip faiss install
* fix auto tokenization test
* fix faiss install
* fix test
* move the distributed logic to examples
* model page
* docs
* finish tests
* fix dependencies
* fix import in __init__
* Refactor eval_rag and finetune scripts
* start docstring
* add psutil to test
* fix tf test
* move require torch to top
* fix retrieval test
* align naming
* finish automodel
* fix repo consistency
* test ragtokenizer save/load
* add rag model output docs
* fix ragtokenizer save/load from pretrained
* fix tokenizer dir
* remove torch in retrieval
* fix docs
* fixe finetune scripts
* finish model docs
* finish docs
* remove auto model for now
* add require torch
* remove solved todos
* integrate sylvains suggestions
* sams comments
* correct mistake on purpose
* improve README
* Add generation test cases
* fix rag token
* clean token generate
* fix test
* add note to test
* fix attention mask
* add t5 test for rag
* Fix handling prefix in finetune.py
* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* ready for PR
* cleanup
* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST
* fix
* perfectionism
* revert change from another PR
* odd, already committed this one
* non-interactive upload workaround
* backup the failed experiment
* store langs in config
* workaround for localizing model path
* doc clean up as in https://github.com/huggingface/transformers/pull/6956
* style
* back out debug mode
* document: run_eval.py --num_beams 10
* remove unneeded constant
* typo
* re-use bart's Attention
* re-use EncoderLayer, DecoderLayer from bart
* refactor
* send to cuda and fp16
* cleanup
* revert (moved to another PR)
* better error message
* document run_eval --num_beams
* solve the problem of tokenizer finding the right files when model is local
* polish, remove hardcoded config
* add a note that the file is autogenerated to avoid losing changes
* prep for org change, remove unneeded code
* switch to model4.pt, update scores
* s/python/bash/
* missing init (but doesn't impact the finetuned model)
* cleanup
* major refactor (reuse-bart)
* new model, new expected weights
* cleanup
* cleanup
* full link
* fix model type
* merge porting notes
* style
* cleanup
* have to create a DecoderConfig object to handle vocab_size properly
* doc fix
* add note (not a public class)
* parametrize
* - add bleu scores integration tests
* skip test if sacrebleu is not installed
* cache heavy models/tokenizers
* some tweaks
* remove tokens that aren't used
* more purging
* simplify code
* switch to using decoder_start_token_id
* add doc
* Revert "major refactor (reuse-bart)"
This reverts commit 226dad15ca.
* decouple from bart
* remove unused code #1
* remove unused code #2
* remove unused code #3
* update instructions
* clean up
* move bleu eval to examples
* check import only once
* move data+gen script into files
* reuse via import
* take less space
* add prepare_seq2seq_batch (auto-tested)
* cleanup
* recode test to use json instead of yaml
* ignore keys not needed
* use the new -y in transformers-cli upload -y
* [xlm tok] config dict: fix str into int to match definition (#7034)
* [s2s] --eval_max_generate_length (#7018)
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* extending to support allen_nlp wmt models
- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models
* sync with changes
* s/fsmt-wmt/wmt/ in model names
* s/fsmt-wmt/wmt/ in model names (p2)
* s/fsmt-wmt/wmt/ in model names (p3)
* switch to a better checkpoint
* typo
* make non-optional args such - adjust tests where possible or skip when there is no other choice
* consistency
* style
* adjust header
* cards moved (model rename)
* use best custom hparams
* update info
* remove old cards
* cleanup
* s/stas/facebook/
* update scores
* s/allen_nlp/allenai/
* url maps aren't needed
* typo
* move all the doc / build /eval generators to their own scripts
* cleanup
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix indent
* duplicated line
* style
* use the correct add_start_docstrings
* oops
* resizing can't be done with the core approach, due to 2 dicts
* check that the arg is a list
* style
* style
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Fix copyright
* Forgot some layers can be repeated
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/modeling_funnel.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Slow integration test
* Make small integration test
* Formatting
* Add checkpoint and separate classification head
* Formatting
* Expand list, fix link and add in pretrained models
* Styling
* Add the model in all summaries
* Typo fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Add a script to check all models are tested and documented
* Apply suggestions from code review
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Address comments
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* Add new models and fix issues
* Quality improvements
* Add T5
* A bit of cleanup
* Fix for slow tests
* Style
* ElectraForQuestionAnswering
* udate __init__
* add test for electra qa model
* add ElectraForQuestionAnswering in auto models
* add ElectraForQuestionAnswering in all_model_classes
* fix outputs, input_ids defaults to None
* add ElectraForQuestionAnswering in docs
* remove commented line
* better api
* improve automatic setting of global attention mask
* fix longformer bug
* fix global attention mask in test
* fix global attn mask flatten
* fix slow tests
* update docstring
* update docs and make more robust
* improve attention mask
* add multiple choice for longformer
* add models to docs
* adapt docstring
* add test to longformer
* add longformer for mc in init and modeling auto
* fix tests