Commit Graph

8821 Commits

Author SHA1 Message Date
Stefan Schweter
09549aa18c
examples: minor fixes in flax example readme (#13502) 2021-09-10 11:45:57 +05:30
Nicolas Patry
aacd2123ee
Fixing #13381 (#13400)
* Fixing #13381

* Enabling automatic LED models.
2021-09-09 14:23:52 -04:00
Nicolas Patry
db514a75d0
Fixing backward compatiblity for non prefixed tokens (B-, I-). (#13493) 2021-09-09 13:36:09 -04:00
Sylvain Gugger
e59d4d0147
Refactor internals for Trainer push_to_hub (#13486) 2021-09-09 13:04:37 -04:00
Nicolas Patry
3dd538c4d3
[Tentative] Moving slow tokenizer to the Trie world. (#13220)
* Moving slow tokenizer to the Trie world.

* Adding more docstrings to the Trie.

* Fixing doctest (incompatible wiht our format? )

* Update src/transformers/tokenization_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding a lot more comment into the internals of this algorithm.

* Cleaner doc.

* Fixing the namings.

* Update src/transformers/tokenization_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* quality.

* Fixing longest first match.

* Small improvements to cuts + more test + canine resistant test.

* Fixing fast test.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-09-09 17:26:16 +02:00
Matt
b8385d8a11
TF Seq2Seq int dtype fix (#13496)
Fixes problems with passing int64 input to TF Seq2Seq models.
2021-09-09 15:54:08 +01:00
Aleksander Smywiński-Pohl
008c2d0b7a
Fix typo in documentation (#13494)
* Fix typo in deepspeed documentation

* Add missing import in deepspeed configuration

* Fix path in translation examples
2021-09-09 08:00:05 -04:00
Kamal Raj
1c191efc3a
flax ner example (#13365)
* flax ner example

* added task to README

* updated readme

* 1. ArgumentParser -> HfArgumentParser
2. step-wise logging,eval and save

* added requirements.txt

* added progress bar

* updated README

* added check_min_version

* updated training data permuattion with JAX

* added metric lib to requirements

* updated readme table

* fixed imports
2021-09-09 10:12:57 +05:30
Aleksander Smywiński-Pohl
c37573806a
Fix typo in deepspeed documentation (#13482)
* Fix typo in deepspeed documentation

* Add missing import in deepspeed configuration
2021-09-08 11:24:10 -07:00
Anton Lozhkov
e1f6e4903a
Fix integration tests for TFWav2Vec2 and TFHubert 2021-09-08 19:51:51 +03:00
Mohan Zhang
41cd52a768
fixed document (#13414) 2021-09-08 11:48:00 -04:00
Koichi Yasuoka
330d83fdbd
Typo in "end_of_word_suffix" (#13477)
But does it really work?
2021-09-08 11:26:07 -04:00
Mishig Davaadorj
2a15e8ccfb
Object detection pipeline (#12886)
* Implement object-detection pipeline

* Define threshold const

* Add `threshold` argument

* Refactor

* Uncomment test inputs

* `rm

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Chore better doc

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Rm unnecessary lines

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Chore better naming

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo

* Add `detr-tiny` for tests

* Add `ObjectDetectionPipeline` to `trnsfrmrs/init`

* Implement new bbox format

* Update detr post_process

* Update `load_img` method obj det pipeline

* make style

* Implement new testing format for obj det pipeln

* Add guard pytorch specific code in pipeline

* Add doc

* Make pipeline_obj_tet tests deterministic

* Revert some changes to `post_process` COCO api

* Chore

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Rm timm requirement

* make fixup

* Add timm requirement to test

* Make fixup

* Guard torch.Tensor

* Chore

* Delete unnecessary comment

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2021-09-08 17:17:32 +02:00
Matt
707105290b
Fix Tensorflow T5 with int64 input (#13479)
* Fix Tensorflow T5 with int64 input

* Style pass
2021-09-08 15:06:04 +01:00
Kevin Canwen Xu
361b6df36a
Throw ValueError for mirror downloads (#13478) 2021-09-08 09:09:22 -04:00
Lysandre Debut
99029ab6b0
Better error raised when cloned without lfs (#13401)
* Better error raised when cloned without lfs

* add from e
2021-09-08 08:28:22 -04:00
Li-Huai (Allan) Lin
18447c206d
Enable automated model list copying for localized READMEs (#13465)
* Complete basic mechanism

* Save

* Complete everything

* Style & Quality

* Update READMEs

* Add testing

* Fix README.md format

* Apply suggestions

* Fix format

* Update utils/check_copies.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-09-08 08:03:35 -04:00
Sylvain Gugger
cd66539662
Don't modify labels inplace in LabelSmoother (#13464) 2021-09-08 07:45:36 -04:00
Suraj Patil
c164c651dc
[CLIP] fix logit_scale init (#13436)
* fix logit_scale init

* add logit_scale_init_value as config param
2021-09-08 14:21:13 +05:30
Kevin Canwen Xu
f667d5b260
Deprecate Mirror for Downloading (#13470)
* Deprecated Mirror

* revert

* revert

* revert

* fix
2021-09-08 16:09:44 +08:00
Suraj Patil
f5d3bb1dd2
fix CLIP conversion script (#13474) 2021-09-08 12:57:18 +05:30
shabie
4be082ce39
[docs] update dead quickstart link on resuing past for GPT2 (#13455)
* [docs] update dead quickstart link on resuing past for GPT2

Thed dead link have been replaced by two links of forward and call methods of the GPT2 class for torch and tensorflow respectively.

* [docs] fix formatting for gpt2 page update
2021-09-07 16:57:58 -04:00
Anton Lozhkov
2146833767
Add unit_divisor to downloads (#13468) 2021-09-07 13:47:52 -07:00
guillaume-be
63b90a51aa
Optimized bad word ids (#13433)
* Optimized bad word ids generation

* Fixed optimized bad token ids

* Updated style
2021-09-07 16:51:04 +02:00
Nicolas Patry
5c7789d416
Fixing by correctly raising UnicodeDecodeError. (#13449) 2021-09-07 16:45:45 +02:00
Nathan Raw
79815090ea
Fix img classification tests (#13456)
*  Update image-classification example's tests

* 🔥 remove cats_and_dogs test samples

* 💄 fix flake8
2021-09-07 05:58:45 -04:00
Anurag Kumar
92d4ef9ab0
Update setup.py (#13421) 2021-09-06 17:32:24 -04:00
Shiv Dhar
75858ca156
Update version of packaging package (#13454) 2021-09-06 17:19:02 -04:00
Anton Lozhkov
f8363e49f9
Install libsndfile (#13403) 2021-09-06 17:12:43 -04:00
NielsRogge
5642a555ae
Add TAPAS MLM-only models (#13408)
* Add conversion of TapasForMaskedLM

* Add copied from statements
2021-09-06 19:19:30 +02:00
Suraj Patil
2dd975b235
skip image classification test (#13451) 2021-09-06 21:46:25 +05:30
Nils Reimers
c8be8a9adb
Update model configs - Allow setters for common properties (#13026)
* refactor GPT Config to allow dyn. properties

* make attribute_map a class attribute

* remove old code

* update unit test to test config: Add test for common properties setter

* update unit test to test config: Add test for common properties passed as parameters to __init__

* update to black code format

* Allow that setters are not defined for certain config classes

* update config classes to implement attribute_map

* bugfix lxmert config - id2labels was not defined when num_labels was set

* update broken configs - add attribute_maps

* update bart config

* update black codestyle

* update documentation on common config attributes

* update GPTJ config to new attribute map

* update docs on common attributes

* gptj config: add max_position_embeddings

* gptj config: format with black

* update speech to text 2 config

* format doc file to max_len 119

* update config template
2021-09-06 16:30:13 +02:00
Nicolas Patry
cf4eb8b3f9
Adding a test for multibytes unicode. (#13447)
* Adding a test for multibytes unicode.

* Adding some accents.

* Making sure decoding works.

* Make tests passing by being cheesy.
2021-09-06 16:11:23 +02:00
Patrick von Platen
607611f240
up (#13448) 2021-09-06 16:09:24 +02:00
Suraj Patil
6b29bff852
add torchvision in example test requirements (#13438) 2021-09-06 15:17:54 +02:00
Anton Lozhkov
26700a9516
Fix scheduled tests for SpeechEncoderDecoderModel (#13422)
* Add inputs to pretrained tests

* Make style
2021-09-06 14:55:13 +02:00
Yih-Dar
73ad258806
Fix tests without any real effect (#13406)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2021-09-06 14:51:45 +02:00
Nathan Raw
76c4d8bf26
Add PyTorch image classification example (#13134)
*  add pytorch image classification example

* 🔥 remove utils.py

* 💄 fix flake8 style issues

* 🔥 remove unnecessary line

*  limit dataset sizes

* 📌 update reqs

* 🎨 restructure - use datasets lib

* 🎨 import transforms directly

* 📝 add comments

* 💄 style

* 🔥 remove flag

* 📌 update requirement warning

* 📝 add vision README.md

* 📝 update README.md

* 📝 update README.md

* 🎨 add image-classification tag to model card

* 🚚 rename vision ➡️ image-classification

* 📝 update image-classification README.md
2021-09-02 13:29:42 -06:00
Patrick von Platen
9bd5d97cdd
up (#13396) 2021-09-02 18:47:09 +02:00
Patrick von Platen
efa4f5f0ea
fix (#13395) 2021-09-02 18:11:26 +02:00
Aman Madaan
596bb85f2f
[docs] Update perplexity.rst to use negative log likelihood (#13386)
* [docs] Update perplexity.rst to use negative log likelihood

Model `forward` returns the negative log likelihood. The document correctly defines and calculates perplexity, but the description and variable names are inconsistent, which might cause confusion.

* [docs] restyle perplexity.rst
2021-09-02 07:49:12 -04:00
Apoorv Garg
b91e65afe0
Correct order of overflowing_tokens for slow tokenizer (#13179)
* correct order of overflowing_tokens for slow tokenizer (issue fix #13148)

* python 3.9 requires sentencepiece version 0.1.94 or above

* slicing of ids fixed in truncated_sequence()

* Update setup.py

* Correct order of overflowing tokens for pair of sentences

* code reformatted

* Update tokenization_utils_base.py

* reformatting file

* test to check single_input added

* missing function restored

* test to check pair_input overflowing tokens order

* test to check pair_input overflowing tokens order

* test to check pair_input overflowing tokens order

* added an error message for pair of seq and longest_first strategy

* test for pair_input modified

* variable name corrected

* fixed a typo in error message

* requested changes implemented

* required test added

* Corrected the message to match test message

* added error message for Luke Tokenizer

* lost test recovered

* docstring for truncate_sequences and prepare_for_model updated

* docstring for luke tokenizer updated

* updated ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING

* aligned text and fixed puncuatations

* improved style and quality of code

* fixed error_msg in truncate_sequences

* replaced encode_plus method with regular call method

* clean up

* rephrased the docstring
2021-09-02 05:58:23 -04:00
Nicolas Patry
c9184a2e03
Enabling automatic loading of tokenizer with pipeline for (#13376)
`audio-classification`.
2021-09-02 05:37:42 -04:00
Suraj Patil
e92140c567
fix example (#13387) 2021-09-02 11:32:18 +02:00
NielsRogge
4114c9a75b
Add tokenizer docs (#13373) 2021-09-02 09:46:05 +02:00
Sachin Abeywardana
872e6be03d
Update clip loss calculation (#13217)
* Update clip loss calculation

Hello, I'm the author of the blog you took the snippet from. I think this way of calculating is possibly slightly more accurate for calculation.

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-09-02 12:15:56 +05:30
Eduardo Gonzalez Ponferrada
0a22335e66
[Flax/run_hybrid_clip] Fix duplicating images when captions_per_image exceeds the number of captions, enable truncation 2021-09-02 11:19:49 +05:30
Sylvain Gugger
c1c2d68d37
Fix name and get_class method in AutoFeatureExtractor (#13385) 2021-09-01 20:54:49 -04:00
Patrick von Platen
a105c9b776
fix (#13383) 2021-09-01 23:12:01 +02:00
Patrick von Platen
4475f1dc2a
[Flax] Fix BigBird (#13380)
* finish

* finish
2021-09-01 18:33:54 +02:00