* Adding `AutomaticSpeechRecognitionPipeline`.
- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.
Future work:
- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.
* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding docs + main import + type checking + LICENSE.
* Doc style !.
* Fixing TYPE_HINT.
* Specifying waveform shape in the docs.
* Adding asserts + specify in the documentation the shape of the input
np.ndarray.
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* removed max_len
* removed max_length from BeamSearchScorer
* correct max length
* finish
* del vim
* finish & add test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add cross_attn_head_mask to BART
* Fix cross_attentions in TFBart-like models
* This commit enables returning of `cross_attentions`
for TFBart-like models
* It also fixes attention head masking in cross-attenion module
* Update TF model templates
* Fix missing , in TF model templates
* Fix typo: congig -> config
* removes the creation of separate config objects and uses the existing ones instead+overwrite resize_token_embeddings from parent class because it is not working for the EncoderDecoderModel
* rollback to current version of the huggingface master branch
* reworked version that ties the encoder and decoder config of the parent encoderdecoder instance
* overwrite of resize_token_embeddings throws an error now
* review comment suggestion
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* implemented warning in case encoderdecoder is created with differing configs of encoderdecoderconfig and decoderconfig or encoderconfig
* added test to avoid diverging configs of wrapper class and wrapped classes
* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
* make style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add head_mask & decoder_head_mask + some corrections
* Fix head masking for N-grams
* Enable test_headmasking for encoder and decod
* Fix one typo regarding in modeling_propgetnet.py
* Enable test_headmasking for ProphetNetStandaloneDecoderModelTest
and ProphetNetStandaloneEncoderModelTest in test_modeling_prophetnet.py
* make style
* Fix cross_head_mask
* Fix attention head mask naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Still need to merge #10605 to master to pass the tests
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Initial support for upload to hub
* push -> upload
* Fixes + examples
* Fix torchhub test
* Torchhub test I hate you
* push_model_to_hub -> push_to_hub
* Apply mixin to other pretrained models
* Remove ABC inheritance
* Add tests
* Typo
* Run tests
* Install git-lfs
* Change approach
* Add push_to_hub to all
* Staging test suite
* Typo
* Maybe like this?
* More deps
* Cache
* Adapt name
* Quality
* MOAR tests
* Put it in testing_utils
* Docs + torchhub last hope
* Styling
* Wrong method
* Typos
* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address review comments
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed `max_length` from being mandatory within `generate`.
- Moving on to fully using `StoppingCriteria` for `greedy` and `sample`
modes.
- `max_length` still used for `beam_search` and `group_beam_search`
(Follow up PR)
- Fixes a bug with MaxLengthStoppingCriteria (we should stop as soon a
we hit the max_length, the comparison needs to be or equal, that affects
the tests).
- Added options to use `logits_processor` and `stopping_criteria`
directly within `generate` function (so some users can define their own
`logits_processor` and `stopping_criteria`).
- Modified the backward compat tests to make sure we issue a warning.
* Fix `max_length` argument in `generate`.
* Moving validate to being functional.
- Renamed `smax_length` to `stoppping_max_length`.
* Removing `logits_processor` and `stopping_criteria` from `generate`
arguments.
* Deepcopy.
* Fix global variable name.
* Bulk of the work
* Polish and tests
* Update QA Trainer
* Avoid breaking the predict method
* Deprecation warnings
* Store real eval dataloder
* Get eval dataset reference before wrap
* [WIP] Enabling multilingual models for translation pipelines.
* decoder_input_ids -> forced_bos_token_id
* Improve docstring.
* Rebase
* Fixing 2 bugs
- Type token_ids coming from `_parse_and_tokenize`
- Wrong index from tgt_lang.
* Fixing black version.
* Adding tests for _build_translation_inputs and add them for all
tokenizers.
* Mbart actually puts the lang code at the end.
* Fixing m2m100.
* Adding TF support to `deep_round`.
* Update src/transformers/pipelines/text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding one line comment.
* Fixing M2M100 `_build_translation_input_ids`, and fix the call site.
* Fixing tests + deep_round -> nested_simplify
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft of deit
* More improvements
* Remove DeiTTokenizerFast from init
* Conversion script works
* Add DeiT to ViT conversion script
* Add tests, add head model, add support for deit in vit conversion script
* Update model checkpoint names
* Update image_mean and image_std, set resample to bicubic
* Improve docs
* Docs improvements
* Add DeiTForImageClassificationWithTeacher to init
* Address comments by @sgugger
* Improve feature extractors
* Make fix-copies
* Minor fixes
* Address comments by @patil-suraj
* All models uploaded
* Fix tests
* Remove labels argument from DeiTForImageClassificationWithTeacher
* Fix-copies, style and quality
* Fix tests
* Fix typo
* Multiple docs improvements
* More docs fixes
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* AutoFeatureExtractor
* Init and first tests
* Tests
* Damn you gitignore
* Quality
* Defensive test for when not all backends are here
* Use pattern for Speech2Text models
* better names
* add attention mixin
* all slow tests in one class
* make helper methods static so we can test
* add local attention tests
* better names
* doc
* apply review suggestions