Suraj Patil
18177a1a60
lm_labels => labels ( #5080 )
2020-06-18 09:16:29 +02:00
Lysandre
efeb75b805
Remove misleading comment
...
closes #4958
2020-06-17 18:24:35 -04:00
Saurabh Misra
bb154ac50c
Fixing TPU training by disabling wandb.watch gradients logging for TPU ( #4926 )
2020-06-17 18:04:11 -04:00
Suraj Patil
fb6cccb863
fix qa example ( #4929 )
2020-06-17 17:54:16 -04:00
Karthikeyan Singaravelan
38bba9cdd5
Fix deprecation warnings due to invalid escape sequences. ( #4924 )
2020-06-17 17:46:58 -04:00
Sam Shleifer
f1a3d03741
add pandas to setup.cfg ( #5093 )
2020-06-17 16:39:17 -04:00
Sam Shleifer
90c833870c
[MarianTokenizer] Switch to sacremoses for punc normalization ( #5092 )
2020-06-17 16:31:05 -04:00
Pranav Dayanand Pawar
049e14f0e3
very minor spelling correction in script command ( #5090 )
...
actual script name - counts_parameters.py
2020-06-17 16:08:43 -04:00
Sylvain Gugger
20fa828984
Make default_data_collator more flexible and deprecate old behavior ( #5060 )
...
* Make default_data_collator more flexible
* Accept tensors for all features
* Document code
* Refactor
* Formatting
2020-06-17 15:24:51 -04:00
Yacine Jernite
5e06963394
Some changes to simplify the generation function ( #5031 )
...
* moving logits post-processing out of beam search
* moving logits post-processing out of beam search
* first step cache
* fix_Encoder_Decoder
* patrick_version_postprocess
* add_keyword_arg
2020-06-17 14:48:06 -04:00
Sylvain Gugger
204ebc25e6
Update installation page and add contributing to the doc ( #5084 )
...
* Update installation page and add contributing to the doc
* Remove mention of symlinks
2020-06-17 14:01:10 -04:00
Sam Shleifer
043f9f51f9
[examples] SummarizationModule improvements ( #4951 )
2020-06-17 13:51:34 -04:00
Sylvain Gugger
cd40f6564e
Add header and fix command ( #5082 )
2020-06-17 11:45:05 -04:00
Julien Chaumond
70bc3ead4f
[TextClassificationPipeline] Hotfix: make json serializable
2020-06-17 15:09:27 +00:00
Sylvain Gugger
7291ea0bff
Reorganize documentation ( #5064 )
...
* Reorganize topics and add all models
2020-06-17 07:55:20 -04:00
Sylvain Gugger
e4aaa45805
Update pipeline examples to doctest syntax ( #5030 )
2020-06-16 18:14:58 -04:00
Sylvain Gugger
011cc0be51
Fix all sphynx warnings ( #5068 )
2020-06-16 16:50:02 -04:00
flozi00
af497b5672
Typo ( #5069 )
2020-06-16 16:46:20 -04:00
Yacine Jernite
49c5202522
Eli5 examples ( #4968 )
...
* add eli5 examples
* add dense query script
* query_di
* merging
* merging
* add_utils
* adds nearest neighbor wikipedia
* batch queries
* training_retriever
* new notebooks
* moved retriever traiing script
* finished wiki40b
* max_len_fix
* train_s2s
* retriever_batch_checkpointing
* cleanup
* merge
* dim_fix
* fix_indexer
* fix_wiki40b_snippets
* fix_embed_for_r
* fp32 index
* fix_sparse_q
* joint_training
* remove obsolete datasets
* add_passage_nn_results
* add_passage_nn_results
* add_batch_nn
* add_batch_nn
* add_data_scripts
* notebook
* notebook
* notebook
* fix_multi_gpu
* add_app
* full_caching
* full_caching
* notebook
* sparse_done
* images
* notebook
* add_image_gif
* with_Gif
* add_contr_image
* notebook
* notebook
* notebook
* train_functions
* notebook
* min_retrieval_length
* pandas_option
* notebook
* min_retrieval_length
* notebook
* notebook
* eval_Retriever
* notebook
* images
* notebook
* add_example
* add_example
* notebook
* fireworks
* notebook
* notebook
* joe's notebook comments
* app_update
* notebook
* notebook_link
* captions
* notebook
* assing RetriBert model
* add RetriBert to Auto
* change AutoLMHead to AutoSeq2Seq
* notebook downloads from hf models
* style_black
* style_black
* app_update
* app_update
* fix_app_update
* style
* style
* isort
* Delete WikiELI5training.ipynb
* Delete evaluate_eli5.py
* Delete WikiELI5explore.ipynb
* Delete ExploreWikiELI5Support.html
* Delete explainlikeimfive.py
* Delete wiki_snippets.py
* children before parent
* children before parent
* style_black
* style_black_only
* isort
* isort_new
* Update src/transformers/modeling_retribert.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* typo fixes
* app_without_asset
* cleanup
* Delete ELI5animation.gif
* Delete ELI5contrastive.svg
* Delete ELI5wiki_index.svg
* Delete choco_bis.svg
* Delete fireworks.gif
* Delete huggingface_logo.jpg
* Delete huggingface_logo.svg
* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb
* Delete eli5_app.py
* Delete eli5_utils.py
* readme
* Update README.md
* unused imports
* moved_info
* default_beam
* ftuned model
* disclaimer
* Update src/transformers/modeling_retribert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* black
* add_doc
* names
* isort_Examples
* isort_Examples
* Add doc to index
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
Sam Shleifer
c3e607496c
[cleanup] examples test_run_squad uses tiny model ( #5059 )
2020-06-16 14:06:45 -04:00
Sylvain Gugger
439aa1d6e9
Remove old section + caching in install ( #5027 )
2020-06-16 13:03:41 -04:00
Sam Shleifer
3d495c61ef
Fix marian tokenizer save pretrained ( #5043 )
2020-06-16 09:48:19 -04:00
Sylvain Gugger
d5477baf7d
Convert hans to Trainer ( #5025 )
...
* Convert hans to Trainer
* Tick box
2020-06-16 08:06:31 -04:00
Amil Khare
c852036b4a
[cleanup] Hoist ModelTester objects to top level ( #4939 )
...
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-16 08:03:43 -04:00
Manuel Romero
0c55a384f8
Add reference to NLP dataset ( #5028 )
...
* Add reference to NLP dataset
* Update README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:19:09 -04:00
Manuel Romero
0946d1209d
Add reference to NLP (package) dataset ( #5029 )
...
* Add reference to NLP (package) dataset
* Update README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:17:46 -04:00
Boris Dayma
edcb3ac59a
refactor(wandb): consolidate import ( #5044 )
2020-06-16 03:40:43 -04:00
Funtowicz Morgan
9e03364999
Ability to pickle/unpickle BatchEncoding pickle (reimport) ( #5039 )
...
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.
* Added __get_state__() & __set_state__() to be pickable.
* Correct tokens() return type from List[int] to List[str]
* Added unittest for BatchEncoding pickle/unpickle
* Added unittest for BatchEncoding is_fast
* More careful checking on BatchEncoding unpickle tests.
* Formatting.
* is_fast should assertTrue on Rust tokenizers.
* Ensure tensorflow has correct way of checking array_equal
* More formatting.
2020-06-16 09:25:25 +02:00
Sylvain Gugger
f9f8a5312e
Add DistilBertForMultipleChoice ( #5032 )
...
* Add `DistilBertForMultipleChoice`
2020-06-15 18:31:41 -04:00
Anthony MOI
36434220fc
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests ( #4510 )
...
* Use tokenizers pre-tokenized pipeline
* failing pretrokenized test
* Fix is_pretokenized in python
* add pretokenized tests
* style and quality
* better tests for batched pretokenized inputs
* tokenizers clean up - new padding_strategy - split the files
* [HUGE] refactoring tokenizers - padding - truncation - tests
* style and quality
* bump up requied tokenizers version to 0.8.0-rc1
* switched padding/truncation API - simpler better backward compat
* updating tests for custom tokenizers
* style and quality - tests on pad
* fix QA pipeline
* fix backward compatibility for max_length only
* style and quality
* Various cleans up - add verbose
* fix tests
* update docstrings
* Fix tests
* Docs reformatted
* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
Patrick von Platen
ebba39e4e1
[Bart] Question Answering Model is added to tests ( #5024 )
...
* fix test
* Update tests/test_modeling_common.py
* Update tests/test_modeling_common.py
2020-06-15 22:50:09 +02:00
Sylvain Gugger
bbad4c6989
Add position_ids ( #5021 )
2020-06-15 15:50:17 -04:00
Boris Dayma
1bf4098e03
feat(TFTrainer): improve logging ( #4946 )
...
* feat(tftrainer): improve logging
* fix(trainer): consider case with evaluation only
* refactor(tftrainer): address comments
* refactor(tftrainer): move self.epoch_logging to __init__
2020-06-15 14:06:17 -04:00
Funtowicz Morgan
7b5a1e7d51
Fix importing transformers on Windows ( #4997 )
2020-06-15 19:36:57 +02:00
Sam Shleifer
a9f1fc6c94
Add bart-base ( #5014 )
2020-06-15 13:29:26 -04:00
Funtowicz Morgan
7b685f5229
Increase pipeline support for ONNX export. ( #5005 )
...
* Increase pipeline support for ONNX export.
* Style.
2020-06-15 19:13:58 +02:00
Sylvain Gugger
1affde2f10
Make DataCollator a callable ( #5015 )
...
* Make DataCollator a callable
* Update src/transformers/data/data_collator.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
Bram Vanroy
f7c93b3cee
Possible fix to make AMP work with DDP in the trainer ( #4728 )
...
* manually set device in trainer args
* check if current device is cuda before set_device
* Explicitly set GPU ID when using single GPU
This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099
2020-06-15 10:10:26 -04:00
ipuneetrathore
66bcfbb130
Create README.md ( #4975 )
...
* Create README.md
* Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 08:43:50 -04:00
Stefan Schweter
d812e6d76e
NER: fix construction of input examples for RoBERTa ( #4943 )
...
* utils_ner: do not add extra sep token for RoBERTa model
* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
Suraj Patil
ebab096e86
[model card] model card for bart-large-finetuned-squadv1 ( #4977 )
...
* [model card] model card for bart-large-finetuned-squadv1
* add metadata link to the dataset
2020-06-15 05:39:41 -04:00
Funtowicz Morgan
9ad36ad57f
Improve ONNX logging ( #4999 )
...
* Improve ONNX export logging to give more information about the generated graph.
* Correctly handle input and output in the logging.
2020-06-15 11:04:51 +02:00
ZhuBaohe
9931f817b7
fix ( #4976 )
2020-06-14 21:36:14 +02:00
Suraj Patil
9208f57b16
BartTokenizerFast ( #4878 )
2020-06-14 13:04:49 -04:00
Sylvain Gugger
403d309857
Hans data ( #4854 )
...
* Update hans data to be able to use Trainer
* Fixes
* Deal with tokenizer that don't have token_ids
* Clean up things
* Simplify data use
* Fix the input dict
* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
Julien Chaumond
ca5e1cdf8e
model_cards: we can now tag datasets
...
see corresponding model pages to see how it's rendered
2020-06-12 23:19:07 +02:00
Suraj Patil
e93ccb3290
BartForQuestionAnswering ( #4908 )
2020-06-12 15:47:57 -04:00
Sylvain Gugger
538531cde5
Add AlbertForMultipleChoice ( #4959 )
...
* Add AlbertForMultipleChoice
* Make up to date and add all models to common tests
2020-06-12 14:20:19 -04:00
Manuel Romero
fe24139702
Create README.md ( #4865 )
2020-06-12 09:03:43 -04:00
Yannis Papanikolaou
9aa219a1fe
Create README.md ( #4872 )
2020-06-12 09:03:13 -04:00