cody-moveworks
a54961c5f7
Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x ( #15019 )
...
* Make OpenAIGPTTokenizer work with SpaCy 3.x
SpaCy 3.x introduced an API change to creating the tokenizer that
breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in
SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating
the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the
new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and
SpaCy 3.x versions.
* Add is_spacy_available and is_ftfy_available methods to file utils
* Add spacy and ftfy unittest decorator to testing utils
* Add tests for OpenAIGPTTokenizer that require spacy and ftfy
* Modify CircleCI config to run tests that require spacy and ftfy
* Remove unneeded unittest decorators are reuse test code
* Run make fixup
2022-01-10 07:53:20 -05:00
Kamal Raj
9fbf7c87c3
Update check_repo.py ( #15014 )
...
added new line
2022-01-10 06:55:43 -05:00
Yih-Dar
0a03a86813
fix model table cell text alignment ( #14999 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-10 06:44:11 -05:00
Patrick von Platen
d72343d2b8
[Wav2Vec2 Speech Event] Add speech event v2 ( #15083 )
...
* up
* up
* up
* up
* up
* up
* improve
* up
* up
* Update src/transformers/trainer.py
* up
* up
* up
2022-01-10 10:46:21 +01:00
yoquankara
768e6c1449
Fix convert for newer megatron-lm bert model ( #14082 )
...
* Fix convert for newer megatron-lm models
* Save megatron-bert config in a proper way
* Fix code style
2022-01-08 11:33:55 -08:00
Yih-Dar
623b4f7c63
[VisionTextDualEncoder] Add token_type_ids param ( #15073 )
...
* fix doc example - TypeError: get_text_features() got an unexpected keyword argument 'token_type_ids'
* add token_type_ids param
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-07 20:02:49 +01:00
Yih-Dar
ac224bb079
[Fix doc examples] Add missing from_pretrained ( #15044 )
...
* fix doc example - ValueError: Parameter config should be an instance of class `PretrainedConfig`
* Update src/transformers/models/segformer/modeling_segformer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-01-07 16:55:59 +01:00
K.C. Tung
f18c6fa94c
Resubmit changes after rebase to master ( #14982 )
2022-01-07 08:34:12 +01:00
Yih-Dar
cc406da4de
[VisionTextDualEncoder] Fix doc example
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-06 17:59:06 +01:00
flozi00
b67f345d00
Update run_speech_recognition_seq2seq.py ( #14967 )
2022-01-06 19:26:45 +03:00
Tavin Turner
f71fb5c36e
Add 'with torch.no_grad()' to BertGeneration integration test forward passes ( #14963 )
2022-01-06 10:39:13 -05:00
Nicolas Patry
d2183a46fb
Remove old asserts. ( #15012 )
2022-01-06 09:45:41 -05:00
NielsRogge
83c552d390
Add detectron2 to Github actions ( #15053 )
2022-01-06 08:53:58 -05:00
Matt Churgin
5ab87cd4da
wrapped forward passes in torch.no_grad() ( #15037 )
2022-01-06 08:48:49 -05:00
Nicolas Patry
5a06118b39
Enabling TF
on image-classification
pipeline. ( #15030 )
2022-01-06 14:16:00 +01:00
Yih-Dar
9f89fa02ed
Add Flax image captioning example ( #14864 )
...
* add image captioning example
* update README
* fix style & quality
* simplify
* apply review suggestions
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply review suggestions
* add comments about using np instead jax array
* remove unused lines
* add model creation script
* only support from_pretrained
* fix style
* fix
* not use cache_dir when creating model
* fix tokenizer creation
* update README
* fix quality
* apply suggestion
* simplify some blocks
* Update examples/flax/image-captioning/README.md
* Update examples/flax/image-captioning/run_image_captioning_flax.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* apply suggestion
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-01-06 14:00:54 +01:00
Suraj Patil
2e9af29494
[CLIP] Fix TF test ( #15042 )
2022-01-05 16:58:42 +01:00
Patrick von Platen
443fdaf29f
[SpeechEncoderDecoder] Fix from pretrained ( #15043 )
2022-01-05 16:54:39 +01:00
Patrick von Platen
ae929dcbbd
[CLIP] Fix PT test ( #15041 )
2022-01-05 14:21:04 +01:00
Nicolas Patry
65cb94ff77
Adding QoL for batch_size
arg (like others enabled everywhere). ( #15027 )
...
* Adding QoL for `batch_size` arg (like others enabled everywhere).
* Typo.
2022-01-05 12:16:23 +01:00
Yih-Dar
e34dd055e9
Fix doc example: mask_time_indices (numpy) has no attribute 'to' ( #15033 )
...
* fix doc example - AttributeError: 'numpy.ndarray' object has no attribute 'to'
* fix more
* Apply suggestions from code review
* Update src/transformers/models/unispeech/modeling_unispeech.py
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-05 11:34:08 +01:00
Stas Bekman
927f654427
[megatron convert] PYTHONPATH requirements ( #14956 )
...
* [megatron convert] PYTHONPATH requirements
* more info
2022-01-05 04:09:52 -05:00
Kevin Ko
857ab55c01
[doc] Update parallelism.mdx ( #15018 )
...
* Update parallelism.mdx
* Update parallelism.mdx
2022-01-04 09:58:27 -08:00
Nicolas Patry
19d37c2dd3
Hotfix chunk_length_s
instead of _ms
. ( #15029 )
...
* Hotfix `chunk_length_s` instead of `_ms`.
* Adding fix of `pad_token` which should be last/previous token for CTC
proper decoding
* Fixing ChunkPipeline unwrapping.
* Adding a PackIterator specific test.
2022-01-04 14:07:44 +01:00
Daniel Stancl
21aecc0971
Add Flax RoFormer ( #15005 )
...
* Add FlaxRoFormer
* Clean code + make quality
* Fix output pooling for FlaxRoFormerForMultipleChoiceModule
* Apply suggestions from code review
* add flax model to repos
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-04 13:23:10 +01:00
milyiyo
9e1775dd23
Fix a little typo ( #15002 )
2022-01-04 12:59:47 +01:00
flozi00
774ed4a027
Fix Code block ( #14983 )
2022-01-04 12:59:20 +01:00
Kevin Ko
f2ab21833f
Update parallelism.mdx ( #15013 )
...
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
* Update parallelism.mdx
2022-01-03 11:49:27 -08:00
Patrick von Platen
dbac8899fe
[Tests] Correct Wav2Vec2 & WavLM tests ( #15015 )
...
* up
* up
* up
2022-01-03 20:19:04 +01:00
Yih-Dar
0b4c3a1a53
fix missing import ( #15016 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-03 19:11:47 +01:00
Anton Lozhkov
38f95d1846
Large audio chunking for the existing ASR pipeline ( #14896 )
...
* Naive ASR chunking
* Fixing batching for ASR.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-01-03 16:54:17 +01:00
Nicolas Patry
d33dc7966a
Improve truncation_side ( #14947 )
...
* Enabling `truncation_side` for Slow and Fast tokenizer.
Co-Authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>
* Disable failing tests.
* Layout xlm.
* assert -> assertEqual.
Co-authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>
2022-01-03 16:18:39 +01:00
Nicolas Patry
8c2618e6aa
Fixing t2t pipelines lists outputs. ( #15008 )
...
Backward compatibility broken in
https://github.com/huggingface/transformers/pull/14988
2022-01-03 14:49:58 +01:00
Sylvain Gugger
8f6373c61c
Map model_type and doc pages names ( #14944 )
...
* Map model_type and doc pages names
* Add script
* Fix typo
* Quality
* Manual check for Auto
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2022-01-03 05:08:55 -05:00
Sylvain Gugger
e68c3756fe
Allow training to resume even if RNG states are not properly loaded ( #14994 )
...
* Allow training to resume even if RNG states are not properly loaded
* Proper f-string
2021-12-30 17:03:20 -05:00
Nicolas Patry
08cb5718ec
Enabling tokenizers
upgrade. ( #14941 )
...
* Enabling `tokenizers` upgrade.
* Moved ugly comment.
* Tokenizers==0.11.1 needs an update to keep borrow checker
happy in highly contiguous calls.
* Support both 0.11.1 and 0.11.0
2021-12-30 17:30:58 +01:00
Nicolas Patry
f8a989cfb2
Adding num_return_sequences
support for text2text generation. ( #14988 )
...
* Adding `num_return_sequences` support for text2text generation.
Co-Authored-By: Enze <pu.miao@foxmail.com>
* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Enze <pu.miao@foxmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-30 16:17:15 +01:00
Patrick von Platen
c043ce6cfd
[Generate] correct encoder_outputs are passed without attention_mask ( #14980 )
...
* [Generate] correct encoder_outputs are passed without attention_mask
* Apply suggestions from code review
* up
2021-12-30 10:16:03 +01:00
Patrick von Platen
a1392883ce
[AutoProcessor] Correct AutoProcessor and automatically add processor… ( #14881 )
...
* [AutoProcessor] Correct AutoProcessor and automatically add processor class
* up
* up
* up
* up
* up
* up
* up
* up
* continue tomorrow
* up
* up
* up
* make processor class private
* fix loop
2021-12-30 09:56:43 +01:00
Nicolas Patry
d7d60df0ec
Fixing a pathological case for slow tokenizers ( #14981 )
...
* Fixing a pathological case for slow tokenizers
* Update src/transformers/tokenization_utils.py
2021-12-30 09:10:34 +01:00
Stas Bekman
d1ba56d8d8
remove absl workaround as it's no longer needed ( #14909 )
...
the absl workaround hasn't been needed since 2019-04 https://github.com/abseil/abseil-py/issues/99 so it should be safe to remove it.
2021-12-29 17:18:03 -05:00
Jake Tae
04cddaf402
refactor: replace assert
with ValueError
( #14970 )
2021-12-29 10:09:54 -05:00
Patrick von Platen
600496fa50
[Wav2Vec2] Rename model's feature extractor to feature encoder ( #14959 )
...
* rename classes
* clean up more namings
* remove bogus file
* Apply suggestions from code review
* Apply suggestions from code review
* replace more names
* more regex replace
* make style
* correct
* correct more
* make style
* finish
* correct more in wav2vec2
* make style
* improve freeze_extractor
* add aliases
* add tf aliases
2021-12-28 20:33:23 +01:00
Patrick von Platen
1bfa347707
[Tests] Speed up tokenizer tests ( #14964 )
...
* speed up canine and mluke
* speed up mbart and mbart50 toks
* upload files
2021-12-28 17:02:50 +01:00
Patrick von Platen
f80775df2b
Update README.md ( #14965 )
2021-12-28 13:41:27 +01:00
Patrick von Platen
1e847b40c0
[WavLM] give model for precision ( #14958 )
2021-12-28 11:07:05 +01:00
Patrick von Platen
1c121916f3
Add Speech Seq2Seq Training script ( #14792 )
...
* start
* add gradient checkpointing and feature extractor freezing
* Apply suggestions from code review
* up
* up
* up
* correct
* up
* more changes
* up
* up
* up
* remove rst
2021-12-28 10:20:51 +01:00
Stas Bekman
10fd4fa1a6
[doc] :class: hunt ( #14955 )
...
* [doc] :class: hunt
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix the fix + style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-27 17:17:38 -08:00
Sylvain Gugger
2c5597f6c7
Style
2021-12-27 19:18:08 -05:00
Sylvain Gugger
b5e2b183af
Doc styler examples ( #14953 )
...
* Fix bad examples
* Add black formatting to style_doc
* Use first nonempty line
* Put it at the right place
* Don't add spaces to empty lines
* Better templates
* Deal with triple quotes in docstrings
* Result of style_doc
* Enable mdx treatment and fix code examples in MDXs
* Result of doc styler on doc source files
* Last fixes
* Break copy from
2021-12-27 19:07:46 -05:00