mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-05 13:50:13 +06:00

* Standardize image-text-to-text-models-output add post_process_image_text_to_text to chameleon and cleanup Fix legacy kwarg behavior and deprecation warning add post_process_image_text_to_text to qwen2_vl and llava_onevision Add post_process_image_text_to_text to idefics3, mllama, pixtral processor * nit var name post_process_image_text_to_text udop * nit fix deprecation warnings * Add image-text-to-text pipeline * add support for image url in chat template for pipeline * Reformat to be fully compatible with chat templates * Add tests chat template * Fix imports and tests * Add pipeline tag * change logic handling of single prompt ans multiple images * add pipeline mapping to models * fix batched inference * fix tests * Add manual batching for preprocessing * Fix outputs with nested images * Add support for all common processing kwargs * Add default padding when multiple text inputs (batch size>1) * nit change version deprecation warning * Add support for text only inference * add chat_template warnings * Add pipeline tests and add copied from post process function * Fix batched pipeline tests * nit * Fix pipeline tests blip2 * remove unnecessary max_new_tokens * revert processing kosmos2 and remove unnecessary max_new_tokens * fix pipeline tests idefics * Force try loading processor if pipeline supports it * revert load_processor change * hardcode loading only processor * remove unnecessary try except * skip imagetexttotext tests for kosmos2 as tiny model causes problems * Make code clearer * Address review comments * remove preprocessing logic from pipeline * fix fuyu * add BC resize fuyu * Move post_process_image_text_to_text to ProcessorMixin * add guard in post_process * fix zero shot object detection pipeline * add support for generator input in pipeline * nit * change default image-text-to-text model to llava onevision * fix owlv2 size dict * Change legacy deprecation warning to only show when True
502 lines
15 KiB
Markdown
502 lines
15 KiB
Markdown
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
|
|
# Pipelines
|
|
|
|
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of
|
|
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
|
|
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
|
|
[task summary](../task_summary) for examples of use.
|
|
|
|
There are two categories of pipeline abstractions to be aware about:
|
|
|
|
- The [`pipeline`] which is the most powerful object encapsulating all other pipelines.
|
|
- Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks.
|
|
|
|
## The pipeline abstraction
|
|
|
|
The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
|
|
pipeline but can provide additional quality of life.
|
|
|
|
Simple call on one item:
|
|
|
|
```python
|
|
>>> pipe = pipeline("text-classification")
|
|
>>> pipe("This restaurant is awesome")
|
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
|
```
|
|
|
|
If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on
|
|
the hub already defines it:
|
|
|
|
```python
|
|
>>> pipe = pipeline(model="FacebookAI/roberta-large-mnli")
|
|
>>> pipe("This restaurant is awesome")
|
|
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
|
|
```
|
|
|
|
To call a pipeline on many items, you can call it with a *list*.
|
|
|
|
```python
|
|
>>> pipe = pipeline("text-classification")
|
|
>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
|
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
|
|
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
|
|
```
|
|
|
|
To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate
|
|
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
|
|
GPU. If it doesn't don't hesitate to create an issue.
|
|
|
|
```python
|
|
import datasets
|
|
from transformers import pipeline
|
|
from transformers.pipelines.pt_utils import KeyDataset
|
|
from tqdm.auto import tqdm
|
|
|
|
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
|
|
dataset = datasets.load_dataset("superb", name="asr", split="test")
|
|
|
|
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
|
|
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
|
|
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
|
|
print(out)
|
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
|
# {"text": ....}
|
|
# ....
|
|
```
|
|
|
|
For ease of use, a generator is also possible:
|
|
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
pipe = pipeline("text-classification")
|
|
|
|
|
|
def data():
|
|
while True:
|
|
# This could come from a dataset, a database, a queue or HTTP request
|
|
# in a server
|
|
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable
|
|
# to use multiple threads to preprocess data. You can still have 1 thread that
|
|
# does the preprocessing while the main runs the big inference
|
|
yield "This is a test"
|
|
|
|
|
|
for out in pipe(data()):
|
|
print(out)
|
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
|
# {"text": ....}
|
|
# ....
|
|
```
|
|
|
|
[[autodoc]] pipeline
|
|
|
|
## Pipeline batching
|
|
|
|
All pipelines can use batching. This will work
|
|
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
from transformers.pipelines.pt_utils import KeyDataset
|
|
import datasets
|
|
|
|
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
|
|
pipe = pipeline("text-classification", device=0)
|
|
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
|
|
print(out)
|
|
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
|
# Exactly the same output as before, but the content are passed
|
|
# as batches to the model
|
|
```
|
|
|
|
<Tip warning={true}>
|
|
|
|
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
|
|
on hardware, data and the actual model being used.
|
|
|
|
Example where it's mostly a speedup:
|
|
|
|
</Tip>
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
from torch.utils.data import Dataset
|
|
from tqdm.auto import tqdm
|
|
|
|
pipe = pipeline("text-classification", device=0)
|
|
|
|
|
|
class MyDataset(Dataset):
|
|
def __len__(self):
|
|
return 5000
|
|
|
|
def __getitem__(self, i):
|
|
return "This is a test"
|
|
|
|
|
|
dataset = MyDataset()
|
|
|
|
for batch_size in [1, 8, 64, 256]:
|
|
print("-" * 30)
|
|
print(f"Streaming batch_size={batch_size}")
|
|
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
|
|
pass
|
|
```
|
|
|
|
```
|
|
# On GTX 970
|
|
------------------------------
|
|
Streaming no batching
|
|
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]
|
|
------------------------------
|
|
Streaming batch_size=8
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]
|
|
------------------------------
|
|
Streaming batch_size=64
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]
|
|
------------------------------
|
|
Streaming batch_size=256
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s]
|
|
(diminishing returns, saturated the GPU)
|
|
```
|
|
|
|
Example where it's most a slowdown:
|
|
|
|
```python
|
|
class MyDataset(Dataset):
|
|
def __len__(self):
|
|
return 5000
|
|
|
|
def __getitem__(self, i):
|
|
if i % 64 == 0:
|
|
n = 100
|
|
else:
|
|
n = 1
|
|
return "This is a test" * n
|
|
```
|
|
|
|
This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400
|
|
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
|
|
bigger batches, the program simply crashes.
|
|
|
|
|
|
```
|
|
------------------------------
|
|
Streaming no batching
|
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]
|
|
------------------------------
|
|
Streaming batch_size=8
|
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]
|
|
------------------------------
|
|
Streaming batch_size=64
|
|
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]
|
|
------------------------------
|
|
Streaming batch_size=256
|
|
0%| | 0/1000 [00:00<?, ?it/s]
|
|
Traceback (most recent call last):
|
|
File "/home/nicolas/src/transformers/test.py", line 42, in <module>
|
|
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
|
|
....
|
|
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
|
|
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
|
|
```
|
|
|
|
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of
|
|
thumb:
|
|
|
|
For users, a rule of thumb is:
|
|
|
|
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
|
|
only way to go.**
|
|
- If you are latency constrained (live product doing inference), don't batch.
|
|
- If you are using CPU, don't batch.
|
|
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:
|
|
|
|
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and
|
|
try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't
|
|
control the sequence_length.)
|
|
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push
|
|
it until you get OOMs.
|
|
- The larger the GPU the more likely batching is going to be more interesting
|
|
- As soon as you enable batching, make sure you can handle OOMs nicely.
|
|
|
|
## Pipeline chunk batching
|
|
|
|
`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield
|
|
multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument.
|
|
|
|
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
|
|
regular `Pipeline`. In short:
|
|
|
|
|
|
```python
|
|
preprocessed = pipe.preprocess(inputs)
|
|
model_outputs = pipe.forward(preprocessed)
|
|
outputs = pipe.postprocess(model_outputs)
|
|
```
|
|
|
|
Now becomes:
|
|
|
|
|
|
```python
|
|
all_model_outputs = []
|
|
for preprocessed in pipe.preprocess(inputs):
|
|
model_outputs = pipe.forward(preprocessed)
|
|
all_model_outputs.append(model_outputs)
|
|
outputs = pipe.postprocess(all_model_outputs)
|
|
```
|
|
|
|
This should be very transparent to your code because the pipelines are used in
|
|
the same way.
|
|
|
|
This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
|
|
about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size`
|
|
independently of the inputs. The caveats from the previous section still apply.
|
|
|
|
## Pipeline FP16 inference
|
|
Models can be run in FP16 which can be significantly faster on GPU while saving memory. Most models will not suffer noticeable performance loss from this. The larger the model, the less likely that it will.
|
|
|
|
To enable FP16 inference, you can simply pass `torch_dtype=torch.float16` or `torch_dtype='float16'` to the pipeline constructor. Note that this only works for models with a PyTorch backend. Your inputs will be converted to FP16 internally.
|
|
|
|
## Pipeline custom code
|
|
|
|
If you want to override a specific pipeline.
|
|
|
|
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
|
|
cases, so `transformers` could maybe support your use case.
|
|
|
|
|
|
If you want to try simply you can:
|
|
|
|
- Subclass your pipeline of choice
|
|
|
|
```python
|
|
class MyPipeline(TextClassificationPipeline):
|
|
def postprocess():
|
|
# Your code goes here
|
|
scores = scores * 100
|
|
# And here
|
|
|
|
|
|
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
|
|
# or if you use *pipeline* function, then:
|
|
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
|
|
```
|
|
|
|
That should enable you to do all the custom code you want.
|
|
|
|
|
|
## Implementing a pipeline
|
|
|
|
[Implementing a new pipeline](../add_new_pipeline)
|
|
|
|
## Audio
|
|
|
|
Pipelines available for audio tasks include the following.
|
|
|
|
### AudioClassificationPipeline
|
|
|
|
[[autodoc]] AudioClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### AutomaticSpeechRecognitionPipeline
|
|
|
|
[[autodoc]] AutomaticSpeechRecognitionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TextToAudioPipeline
|
|
|
|
[[autodoc]] TextToAudioPipeline
|
|
- __call__
|
|
- all
|
|
|
|
|
|
### ZeroShotAudioClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotAudioClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Computer vision
|
|
|
|
Pipelines available for computer vision tasks include the following.
|
|
|
|
### DepthEstimationPipeline
|
|
[[autodoc]] DepthEstimationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageClassificationPipeline
|
|
|
|
[[autodoc]] ImageClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageSegmentationPipeline
|
|
|
|
[[autodoc]] ImageSegmentationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageToImagePipeline
|
|
|
|
[[autodoc]] ImageToImagePipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ObjectDetectionPipeline
|
|
|
|
[[autodoc]] ObjectDetectionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### VideoClassificationPipeline
|
|
|
|
[[autodoc]] VideoClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotImageClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotImageClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotObjectDetectionPipeline
|
|
|
|
[[autodoc]] ZeroShotObjectDetectionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Natural Language Processing
|
|
|
|
Pipelines available for natural language processing tasks include the following.
|
|
|
|
### FillMaskPipeline
|
|
|
|
[[autodoc]] FillMaskPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### QuestionAnsweringPipeline
|
|
|
|
[[autodoc]] QuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### SummarizationPipeline
|
|
|
|
[[autodoc]] SummarizationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TableQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] TableQuestionAnsweringPipeline
|
|
- __call__
|
|
|
|
### TextClassificationPipeline
|
|
|
|
[[autodoc]] TextClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TextGenerationPipeline
|
|
|
|
[[autodoc]] TextGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### Text2TextGenerationPipeline
|
|
|
|
[[autodoc]] Text2TextGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TokenClassificationPipeline
|
|
|
|
[[autodoc]] TokenClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TranslationPipeline
|
|
|
|
[[autodoc]] TranslationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Multimodal
|
|
|
|
Pipelines available for multimodal tasks include the following.
|
|
|
|
### DocumentQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] DocumentQuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### FeatureExtractionPipeline
|
|
|
|
[[autodoc]] FeatureExtractionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageFeatureExtractionPipeline
|
|
|
|
[[autodoc]] ImageFeatureExtractionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageToTextPipeline
|
|
|
|
[[autodoc]] ImageToTextPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageTextToTextPipeline
|
|
|
|
[[autodoc]] ImageTextToTextPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### MaskGenerationPipeline
|
|
|
|
[[autodoc]] MaskGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### VisualQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] VisualQuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Parent class: `Pipeline`
|
|
|
|
[[autodoc]] Pipeline
|