Fixing a regression with `return_all_scores` introduced in #17606
- The legacy test actually tested `return_all_scores=False` (the actual
default) instead of `return_all_scores=True` (the actual weird case).
This commit adds the correct legacy test and fixes it.
Tmp legacy tests.
Actually fix the regression (also contains lists)
Less diffed code.
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
When we're preparing the tensors for CPU for postprocessing, we need
to upgrade the `float16` to `float32` since CPUs don't have instructions
for `[b]float16`.
* Adding `top_k` and `sort` arguments to `text-classification` pipeline.
- Deprecate `return_all_scores` as `top_k` is more uniform with other
pipelines, and a superset of what `return_all_scores` can do.
BC is maintained though.
`return_all_scores=True` -> `top_k=None`
`return_all_scores=False` -> `top_k=1`
- Using `top_k` will imply sorting the results, but using no argument
will keep the results unsorted for backward compatibility.
* Remove `sort`.
* Fixing the test.
* Remove bad doc.
* [BC] Fixing usage of text pairs
The BC is actually preventing users from misusing the pipeline since
users could have been willing to send text pairs and the pipeline would
instead understand the thing as a batch returning bogus results.
The correct usage of text pairs is preserved in this PR even when that
makes the code clunky.
Adds support for {"text":..,, "text_pair": ...} inputs for both dataset
iteration and more explicit usage to pairs.
* Updating the doc.
* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_text_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Change the chunk_iter function to handle
the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.
We need to remove the right striding on the previous item.
* Remove commented line.
* Attention mask is important in the case of batching...
* Improve the fix.
* Making the sentence different enough that they exhibit different
predictions.
* Fixing the timestamps with chunking.
* The changes modified (and fixed) the striding tests.
* Adding a tokenizer test.
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Defense -> comment.
* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding the option to return_timestamps on pure CTC ASR models.
* Remove `math.prod` which was introduced in Python 3.8
* int are not floats.
* Reworking the PR to support "char" vs "word" output.
* Fixup!
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>