Commit Graph

8821 Commits

Author SHA1 Message Date
Kamal Raj
98e409abb3
albert flax (#13294)
* albert flax

* year -> 2021

* docstring updated for flax

* removed head_mask

* removed from_pt

* removed passing attention_mask to embedding layer
2021-08-30 17:29:27 +02:00
Ben Nimmo
ee5b24573b
the use_auth_token has not been set up early enough in the model_kwargs. Fixes #12941 (#13205) 2021-08-30 11:19:50 -04:00
Maxwell Forbes
0305673098
Fall back to observed_batch_size when the dataloader does not know the batch_size. (#13188) 2021-08-30 11:12:35 -04:00
Nathan Raw
ce6add8ecc
🐛 fix small model card bugs (#13310)
* 🐛 fix small model card bugs

* 💄 style
2021-08-30 08:45:57 -06:00
Sylvain Gugger
139e830158
Update label2id in the model config for run_glue (#13334) 2021-08-30 10:35:09 -04:00
fcakyon
6f3c99acca
add ability to connect a neptune.ai run (#13319)
when `NEPTUNE_RUN_ID` environmetnt variable is set, neptune will log into the previous run with id `NEPTUNE_RUN_ID`
2021-08-30 09:59:17 -04:00
Sylvain Gugger
f4f4e6b2d3
Use existing functionality for #13251 (#13333) 2021-08-30 09:43:23 -04:00
Li-Huai (Allan) Lin
d50649531f
Check None before going through iteration (#13250)
* Check None before going through iteration

* Format
2021-08-30 08:18:51 -04:00
Kamal Raj
774760e6f3
distilbert-flax (#13324)
* distilbert-flax

* added missing self

* docs fix

* removed tied kernal extra init

* updated docs

* x -> hidden states

* removed head_mask

* removed from_pt, +FLAX

* updated year
2021-08-30 14:16:18 +02:00
arfy slowy
01977466f4
fix: typo spelling grammar (#13212)
* fix: typo spelling grammar

* fix: make fixup
2021-08-30 08:09:14 -04:00
Navjot
ef83dc4f0c
Improve documentation of pooler_output in ModelOutput (#13228)
* update documentation of pooler_output in modeling_outputs, making it more clear and available for generic usage

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* run make style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-08-30 08:08:16 -04:00
Falk Puschner
7828194ebe
add citation file (#13214) 2021-08-30 07:46:55 -04:00
NielsRogge
b6ddb08a66
Add LayoutLMv2 + LayoutXLM (#12604)
* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-08-30 12:35:42 +02:00
Hwijeen Ahn
439e7abd2d
use float 16 in causal mask and masked bias (#13194) 2021-08-30 06:09:24 -04:00
Nicolas Patry
8be921f9de
Announcing the default model used by the pipeline (with a link). (#13276) 2021-08-30 06:04:30 -04:00
Patrick von Platen
a75db353c4
[Slow tests] Disable Wav2Vec2 pretraining test for now (#13303)
* fix_torch_device_generate_test

* remove @

* wav2vec2 pretraining

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-08-30 06:03:02 -04:00
Patrick von Platen
4362ee298a
correct (#13304) 2021-08-30 06:02:08 -04:00
Stefan Schweter
4046e66e40
examples: only use keep_linebreaks when reading TXT files (#13320)
* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples
2021-08-28 16:22:29 +02:00
Anton Lozhkov
b6f332ecaf
Add Wav2Vec2 & Hubert ForSequenceClassification (#13153)
* Add hubert classifier + tests

* Add hubert classifier + tests

* Dummies for all classification tests

* Wav2Vec2 classifier + ER test

* Fix hubert integration tests

* Add hubert IC

* Pass tests for all classification tasks on Hubert

* Pass all tests + copies

* Move models to the SUPERB org
2021-08-27 20:52:51 +03:00
Patrick von Platen
2bef3433e5
[Flax] Correct all return tensors to numpy (#13307)
* fix_torch_device_generate_test

* remove @

* finish find and replace
2021-08-27 17:38:34 +02:00
Nicolas Patry
8aa67fc192
Fixing mbart50 with return_tensors argument too. (#13301)
* Fixing mbart50 with `return_tensors` argument too.

* Adding mbart50 tokenization tests.
2021-08-27 17:22:06 +02:00
Nicolas Patry
b89a964d3f
Moving zero-shot-classification pipeline to new testing. (#13299)
* Moving `zero-shot-classification` pipeline to new testing.

* Cleaning up old mixins.

* Fixing tests
`sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is
corrupted in PT.

* Adding warning.
2021-08-27 15:46:11 +02:00
NielsRogge
cc27ac1a87
Fix BeitForMaskedImageModeling (#13275)
* First pass

* Fix docs of bool_masked_pos

* Add integration script

* Fix docstring

* Add integration test for BeitForMaskedImageModeling

* Remove file

* Fix docs
2021-08-27 09:09:57 -04:00
Nicolas Patry
a3f96f366a
Moving translation pipeline to new testing scheme. (#13297)
* Moving `translation` pipeline to new testing scheme.

* Update tokenization mbart tests.
2021-08-27 12:26:17 +02:00
Stefan Schweter
319d840b46
examples: add keep_linebreaks option to CLM examples (#13150)
* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples
2021-08-27 11:35:45 +02:00
Nicolas Patry
45a8eb66bb
Moving token-classification pipeline to new testing. (#13286)
* Moving `token-classification` pipeline to new testing.

* Fix tests.
2021-08-27 11:24:56 +02:00
Nicolas Patry
a6e36558ef
Moving text-generation pipeline to new testing framework. (#13285)
* Moving `text-generation` pipeline to new testing framework.

* Keep check_model_type but log instead of raise Exception.

* warning -> error.
2021-08-26 17:30:03 +02:00
NielsRogge
0759f2510c
Add DINO conversion script (#13265)
* First commit

* Add interpolation of patch embeddings

* Comment out code

* Fix bug

* Fix another bug

* Fix bug

* Fix another bug

* Remove print statements

* Update conversion script

* Use the official vit implementation

* Add support for converting dino_vits8

* Add DINO to docs of ViT

* Remove assertion

* Add interpolation of position encodings

* Fix bug

* Add align_corners

* Add interpolate_pos_encoding option to forward pass of ViTModel

* Improve interpolate_pos_encoding method

* Add docstring
2021-08-26 17:25:20 +02:00
Nicolas Patry
14e52783f6
Moving text2text-generation to new pipeline testing mecanism. (#13283) 2021-08-26 16:26:58 +02:00
Nicolas Patry
662b143b71
Hotfixing master tests. (#13282) 2021-08-26 10:09:53 -04:00
Nicolas Patry
59c378d069
Moving text2text-generation to new pipeline testing mecanism. (#13281) 2021-08-26 16:09:48 +02:00
Nicolas Patry
0ebda5382b
Moving table-question-answering pipeline to new testing. (#13280) 2021-08-26 09:09:57 -04:00
Nicolas Patry
879fe8fa75
Moving summarization pipeline to new testing format. (#13279)
* Moving `summarization` pipeline to new testing format.

* Remove generate_kwargs from __init__ args.
2021-08-26 14:47:11 +02:00
Nicolas Patry
55fb88d369
Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. (#13277)
* Moving question_answering tests to the new testing scheme. Had to tweak
a little some ModelTesterConfig for pipelines.

* Removing commented code.
2021-08-26 12:37:55 +02:00
Nicolas Patry
4fa1cd995c
Fixing the test (warnings was incorrect.) (#13278) 2021-08-26 06:13:48 -04:00
Nicolas Patry
6b586ed18c
Move image-classification pipeline to new testing (#13272)
- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking
actual values in small tests)
- Add support for non RGB image for the pipeline.
2021-08-26 05:52:49 -04:00
Bram Vanroy
401377e679
Add error message concerning revision (#13266)
* add error message concerning revision

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* re-add double line endings

* is not None instead of implicit bool casting

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-08-26 04:32:57 -04:00
Stas Bekman
40d60e1536
fix tokenizer_class_from_name for models with - in the name (#13251)
* fix tokenizer_class_from_name

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-08-26 04:29:14 -04:00
Nicolas Patry
83bfdbdd75
Migrating conversational pipeline tests to new testing format (#13114)
* New test format for conversational.

* Putting back old mixin.

* Re-enabling auto tests with LazyLoading.

* Feature extraction tests.

* Remove feature-extraction.

* Feature extraction with feature_extractor (No pun intended).

* Update check_model_type for fill-mask.
2021-08-26 03:50:43 -04:00
Lysandre Debut
72eefb34a9
Add require flax to test (#13260) 2021-08-25 12:56:25 -04:00
Lysandre Debut
5af8df5afb
Some model_types cannot be in the mapping (#13259)
* Some tokenizers cannot be in the mapping

* Style
2021-08-25 12:56:16 -04:00
Lysandre Debut
68b6907290
Add CLIP tokenizer to AutoTokenizer (#13258) 2021-08-25 12:56:07 -04:00
Lysandre Debut
3bbe68f837
Hubert test fix (#13261) 2021-08-25 18:41:26 +02:00
Lysandre Debut
3bb4466260
Better notification service (#13267) 2021-08-25 12:14:44 -04:00
Nishant Prabhu
225de5ccbb
Replace assert statement with if condition and ValueError (#13263) 2021-08-25 12:14:03 -04:00
Lysandre
46554fc12f Grad enabled typo 2021-08-25 11:39:45 +02:00
Lysandre Debut
0e4f727069
Remove side effects of disabling gradient computaiton (#13257) 2021-08-25 05:32:51 -04:00
Will Frey
b1198a8440
Update generation_logits_process.py (#12671)
If you're using type hints, then passing an `int` where a `float` is annotated is acceptable as per [PEP 484](https://www.python.org/dev/peps/pep-0484/#the-numeric-tower).

This makes life a little nicer.
2021-08-25 02:34:05 +08:00
dependabot[bot]
0245cee469
Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert (#13226)
Bumps [notebook](http://jupyter.org) from 6.1.5 to 6.4.1.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-24 09:52:39 -04:00
Ambesh Shekhar
0512bfe79e
Custom errors and BatchSizeError (#13184)
* Adding custom errors and BatchSizeError for GPT2

* Adding custom errors and BatchSizeError for GPT2

* Changing Exception to BaseException

* Exception

* Adding args to Custom Exception

* Adding args to Custom Exception

* Changing from BaseException to Exception

* Changing Conditional loop syntax

* Adding Copyright info

* Handling check_code_quality

* Handling check_code_quality pt2

* Handling check_code_quality pt3

* Handling check_code_quality pt4

* Handling check_code_quality pt5

* Handling check_code_quality pt6

* Handling check_code_quality pt6

* Using black for check_code_quality

* sorting import style

* Changing

* Changing

* verified through style_doc.py

* verified through style_doc.py

* applying isort

* Removing indentation

* Changing

* Changing

* Changing

* Used ValueError

* Using ValueError

* Reformatted Style doc

* Using style doc on modeling_gp2.py

* Adding indentation

* Changing
2021-08-24 09:01:01 -04:00