* [Model card] SinhalaBERTo model.
This is the model card for keshan/SinhalaBERTo model.
* Update model_cards/keshan/SinhalaBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Create README.md
Model description for all LEGAL-BERT models, published as part of "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020
* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
'The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.'
I dont know how to change the 'How to use this model directly from the 🤗/transformers library:' part since it is not part of the model-paper
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a corpus of 23M COVID-19 English Tweets for 40 epochs.
* Add BERTweet and PhoBERT models
* Update modeling_auto.py
Re-add `bart` to LM_MAPPING
* Update tokenization_auto.py
Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`
* Add BERTweet and PhoBERT to pretrained_models.rst
* Update tokenization_auto.py
Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.
* Update BertweetTokenizer - without nltk
* Update model card for BERTweet
* PhoBERT - with Auto mode - without import fastBPE
* PhoBERT - with Auto mode - without import fastBPE
* BERTweet - with Auto mode - without import fastBPE
* Add PhoBERT and BERTweet to TF modeling auto
* Improve Docstrings for PhobertTokenizer and BertweetTokenizer
* Update PhoBERT and BERTweet model cards
* Fixed a merge conflict in tokenization_auto
* Used black to reformat BERTweet- and PhoBERT-related files
* Used isort to reformat BERTweet- and PhoBERT-related files
* Reformatted BERTweet- and PhoBERT-related files based on flake8
* Updated test files
* Updated test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Update commits from huggingface
* Delete unnecessary files
* Add tokenizers to auto and init files
* Add test files for tokenizers
* Revised model cards
* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files
* Revised test files
* Update orders of Phobert and Bertweet tokenizers in auto tokenization file