# Pipelines for inference The [`pipeline`] makes it simple to use any model from the [Model Hub](https://huggingface.co/models) for inference on a variety of tasks such as text generation, image segmentation and audio classification. Even if you don't have experience with a specific modality or understand the code powering the models, you can still use them with the [`pipeline`]! This tutorial will teach you to: * Use a [`pipeline`] for inference. * Use a specific tokenizer or model. * Use a [`pipeline`] for audio and vision tasks. Take a look at the [`pipeline`] documentation for a complete list of supported taska. ## Pipeline usage While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the specific task pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. 1. Start by creating a [`pipeline`] and specify an inference task: ```py >>> from transformers import pipeline >>> generator = pipeline(task="text-generation") ``` 2. Pass your input text to the [`pipeline`]: ```py >>> generator("Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone") [{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain'}] ``` If you have more than one input, pass your input as a list: ```py >>> generator( ... [ ... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone", ... "Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne", ... ] ... ) ``` Any additional parameters for your task can also be included in the [`pipeline`]. The `text-generation` task has a [`~generation_utils.GenerationMixin.generate`] method with several parameters for controlling the output. For example, if you want to generate more than one output, set the `num_return_sequences` parameter: ```py >>> generator( ... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone", ... num_return_sequences=2, ... ) ``` ### Choose a model and tokenizer The [`pipeline`] accepts any model from the [Model Hub](https://huggingface.co/models). There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task. Once you've picked an appropriate model, load it with the corresponding `AutoModelFor` and [`AutoTokenizer'] class. For example, load the [`AutoModelForCausalLM`] class for a causal language modeling task: ```py >>> from transformers import AutoTokenizer, AutoModelForCausalLM >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2") >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") ``` Create a [`pipeline`] for your task, and specify the model and tokenizer you've loaded: ```py >>> from transformers import pipeline >>> generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer) ``` Pass your input text to the [`pipeline`] to generate some text: ```py >>> generator("Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone") [{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}] ``` ## Audio pipeline The flexibility of the [`pipeline`] means it can also be extended to audio tasks. For example, let's classify the emotion from a short clip of John F. Kennedy's famous ["We choose to go to the Moon"](https://en.wikipedia.org/wiki/We_choose_to_go_to_the_Moon) speech. Find an [audio classification](https://huggingface.co/models?pipeline_tag=audio-classification) model on the Model Hub for emotion recognition and load it in the [`pipeline`]: ```py >>> from transformers import pipeline >>> audio_classifier = pipeline( ... task="audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition" ... ) ``` Pass the audio file to the [`pipeline`]: ```py >>> audio_classifier("jfk_moon_speech.wav") [{'label': 'calm', 'score': 0.13856211304664612}, {'label': 'disgust', 'score': 0.13148026168346405}, {'label': 'happy', 'score': 0.12635163962841034}, {'label': 'angry', 'score': 0.12439591437578201}, {'label': 'fearful', 'score': 0.12404385954141617}] ``` ## Vision pipeline Finally, using a [`pipeline`] for vision tasks is practically identical. Specify your vision task and pass your image to the classifier. The imaage can be a link or a local path to the image. For example, what species of cat is shown below? ![pipeline-cat-chonk](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg) ```py >>> from transformers import pipeline >>> vision_classifier = pipeline(task="image-classification") >>> vision_classifier( ... images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" ... ) [{'label': 'lynx, catamount', 'score': 0.4403027892112732}, {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor', 'score': 0.03433405980467796}, {'label': 'snow leopard, ounce, Panthera uncia', 'score': 0.032148055732250214}, {'label': 'Egyptian cat', 'score': 0.02353910356760025}, {'label': 'tiger cat', 'score': 0.023034192621707916}] ```