* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fd.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports
* Clean up model documentation
* Formatting
* Preparation work
* Long lines
* Main work on rst files
* Cleanup all config files
* Syntax fix
* Clean all tokenizers
* Work on first models
* Models beginning
* FaluBERT
* All PyTorch models
* All models
* Long lines again
* Fixes
* More fixes
* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Fix copyright
* Forgot some layers can be repeated
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/modeling_funnel.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Slow integration test
* Make small integration test
* Formatting
* Add checkpoint and separate classification head
* Formatting
* Expand list, fix link and add in pretrained models
* Styling
* Add the model in all summaries
* Typo fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Replace mecab-python3 with fugashi
This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.
Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.
This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.
fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.
In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.
I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.
* Style fix
* Remove unused variable
Forgot to delete this...
* Adapt doc with install instructions
* Fix typo
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
* first commit
* bug fixes
* better examples
* undo padding
* remove wrong VOCAB_FILES_NAMES
* License
* make style
* make isort happy
* unit tests
* integration test
* make `black` happy by undoing `isort` changes!!
* lint
* no need for the padding value
* batch_size not bsz
* remove unused type casting
* seqlen not seq_len
* staticmethod
* `bert` selfattention instead of `n2`
* uint8 instead of bool + lints
* pad inputs_embeds using embeddings not a constant
* black
* unit test with padding
* fix unit tests
* remove redundant unit test
* upload model weights
* resolve todo
* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
* increase unittest coverage
* first copy & past commit from Bert and morgans LSH code
* add easy way to compare to trax original code
* translate most of function
* make trax lsh self attention deterministic with numpy seed + copy paste code
* add same config
* add same config
* make layer init work
* implemented hash_vectors function for lsh attention
* continue reformer translation
* hf LSHSelfAttentionLayer gives same output as trax layer
* refactor code
* refactor code
* refactor code
* refactor
* refactor + add reformer config
* delete bogus file
* split reformer attention layer into two layers
* save intermediate step
* save intermediate step
* make test work
* add complete reformer block layer
* finish reformer layer
* implement causal and self mask
* clean reformer test and refactor code
* fix merge conflicts
* fix merge conflicts
* update init
* fix device for GPU
* fix chunk length init for tests
* include morgans optimization
* improve memory a bit
* improve comment
* factorize num_buckets
* better testing parameters
* make whole model work
* make lm model work
* add t5 copy paste tokenizer
* add chunking feed forward
* clean config
* add improved assert statements
* make tokenizer work
* improve test
* correct typo
* extend config
* add complexer test
* add new axial position embeddings
* add local block attention layer
* clean tests
* refactor
* better testing
* save intermediate progress
* clean test file
* make shorter input length work for model
* allow variable input length
* refactor
* make forward pass for pretrained model work
* add generation possibility
* finish dropout and init
* make style
* refactor
* add first version of RevNet Layers
* make forward pass work and add convert file
* make uploaded model forward pass work
* make uploaded model forward pass work
* refactor code
* add namedtuples and cache buckets
* correct head masks
* refactor
* made reformer more flexible
* make style
* remove set max length
* add attention masks
* fix up tests
* fix lsh attention mask
* make random seed optional for the moment
* improve memory in reformer
* add tests
* make style
* make sure masks work correctly
* detach gradients
* save intermediate
* correct backprob through gather
* make style
* change back num hashes
* rename to labels
* fix rotation shape
* fix detach
* update
* fix trainer
* fix backward dropout
* make reformer more flexible
* fix conflict
* fix
* fix
* add tests for fixed seed in reformer layer
* fix trainer typo
* fix typo in activations
* add fp16 tests
* add fp16 training
* support fp16
* correct gradient bug in reformer
* add fast gelu
* re-add dropout for embedding dropout
* better naming
* better naming
* renaming
* finalize test branch
* finalize tests
* add more tests
* finish tests
* fix
* fix type trainer
* fix fp16 tests
* fix tests
* fix tests
* fix tests
* fix issue with dropout
* fix dropout seeds
* correct random seed on gpu
* finalize random seed for dropout
* finalize random seed for dropout
* remove duplicate line
* correct half precision bug
* make style
* refactor
* refactor
* docstring
* remove sinusoidal position encodings for reformer
* move chunking to modeling_utils
* make style
* clean config
* make style
* fix tests
* fix auto tests
* pretrained models
* fix docstring
* update conversion file
* Update pretrained_models.rst
* fix rst
* fix rst
* update copyright
* fix test path
* fix test path
* fix small issue in test
* include reformer in generation tests
* add docs for axial position encoding
* finish docs
* Update convert_reformer_trax_checkpoint_to_pytorch.py
* remove isort
* include sams comments
* remove wrong comment in utils
* correct typos
* fix typo
* Update reformer.rst
* applied morgans optimization
* make style
* make gpu compatible
* remove bogus file
* big test refactor
* add example for chunking
* fix typo
* add to README