transformers/tests/pipelines
Luc CAILLIAU d62e7d8842
Chunkable token classification pipeline (#21771)
* Chunkable classification pipeline 

The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* update with latest black format

* update black format

* Update token_classification.py

* Update token_classification.py

* format correction

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update comments

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update token_classification.py

Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update chunk aggregation

Update the chunk aggregation strategy based on entities aggregation.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

Remove unnecessary pop from outputs dict

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add chunking tests

* correct formating

* correct formatting

* correct model id for test chunking

* update scores with nested simplify

* Update test_pipelines_token_classification.py

* Update test_pipelines_token_classification.py

* update model to a tiny one

* Update test_pipelines_token_classification.py

* Adding smaller test for chunking.

* Fixup

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-22 14:13:20 -04:00
..
__init__.py [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
test_pipelines_audio_classification.py Update AudioClassificationPipelineTests::test_small_model_pt for PT 2.0.0 (#22023) 2023-03-08 13:56:47 +01:00
test_pipelines_automatic_speech_recognition.py Refactor whisper asr pipeline to include language too. (#21427) 2023-03-02 18:12:19 +01:00
test_pipelines_common.py Regression pipeline device (#22190) 2023-03-15 14:13:38 -04:00
test_pipelines_conversational.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_depth_estimation.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_document_question_answering.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_feature_extraction.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_fill_mask.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_image_classification.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_image_segmentation.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_image_to_text.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_object_detection.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_question_answering.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_summarization.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_table_question_answering.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_text_classification.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_text_generation.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_text2text_generation.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_token_classification.py Chunkable token classification pipeline (#21771) 2023-03-22 14:13:20 -04:00
test_pipelines_translation.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_video_classification.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_visual_question_answering.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_zero_shot_audio_classification.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_zero_shot_image_classification.py 🔥py38 + torch 2 🔥🔥🔥🚀 (#22204) 2023-03-16 22:59:23 +01:00
test_pipelines_zero_shot_object_detection.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00
test_pipelines_zero_shot.py Mark pipeline tests to skip them easily (#21887) 2023-03-02 10:55:36 -05:00