## How can I leverage State-of-the-Art Natural Language Models with only one line of code ?

Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,
API for doing inference over a variety of downstream-tasks, including: 

- Sentence Classification (Sentiment Analysis): Indicate if the overall sentence is either positive or negative. _(Binary Classification task or Logitic Regression task)_
- Token Classification (Named Entity Recognition, Part-of-Speech tagging): For each sub-entities _(**tokens**)_ in the input, assign them a label _(Classification task)_.
- Question-Answering: Provided a tuple (question, context) the model should find the span of text in **content** answering the **question**.
- Mask-Filling: Suggests possible word(s) to fill the masked input with respect to the provided **context**.
- Feature Extraction: Maps the input to a higher, multi-dimensional space learned from the data.

Pipelines encapsulate the overall process of every NLP process:
 
 1. Tokenization: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).
 2. Inference: Maps every tokens into a more meaningful representation. 
 3. Decoding: Use the above representation to generate and/or extract the final output for the underlying task.

The overall API is exposed to the end-user through the `pipeline()` method with the following 
structure:

```python
from transformers import pipeline

# Using default model and tokenizer for the task
pipeline("<task-name>")

# Using a user-specified model
pipeline("<task-name>", model="<model_name>")

# Using custom model/tokenizer as str
pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')
```

In [None]:
!pip install transformers

In [6]:
from __future__ import print_function
import ipywidgets as widgets
from transformers import pipeline

## 1. Sentence Classification - Sentiment Analysis

In [8]:
nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif('Such a nice weather outside !')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




[{'label': 'POSITIVE', 'score': 0.9997656}]

## 2. Token Classification - Named Entity Recognition

In [9]:
nlp_token_class = pipeline('ner')
nlp_token_class('Hugging Face is a French company based in New-York.')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




[{'word': 'Hu', 'score': 0.9970937967300415, 'entity': 'I-ORG'},
 {'word': '##gging', 'score': 0.9345750212669373, 'entity': 'I-ORG'},
 {'word': 'Face', 'score': 0.9787060022354126, 'entity': 'I-ORG'},
 {'word': 'French', 'score': 0.9981995820999146, 'entity': 'I-MISC'},
 {'word': 'New', 'score': 0.9983047246932983, 'entity': 'I-LOC'},
 {'word': '-', 'score': 0.8913455009460449, 'entity': 'I-LOC'},
 {'word': 'York', 'score': 0.9979523420333862, 'entity': 'I-LOC'}]

## 3. Question Answering

In [10]:
nlp_qa = pipeline('question-answering')
nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 225.51it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 2158.67it/s]


{'score': 0.9632966867654424, 'start': 42, 'end': 50, 'answer': 'New-York.'}

## 4. Text Generation - Mask Filling

In [11]:
nlp_fill = pipeline('fill-mask')
nlp_fill('Hugging Face is a French company based in <mask>')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




[{'sequence': '<s> Hugging Face is a French company based in Paris</s>',
  'score': 0.23106691241264343,
  'token': 2201},
 {'sequence': '<s> Hugging Face is a French company based in Lyon</s>',
  'score': 0.0819825753569603,
  'token': 12790},
 {'sequence': '<s> Hugging Face is a French company based in Geneva</s>',
  'score': 0.04769463092088699,
  'token': 11559},
 {'sequence': '<s> Hugging Face is a French company based in Brussels</s>',
  'score': 0.047622501850128174,
  'token': 6497},
 {'sequence': '<s> Hugging Face is a French company based in France</s>',
  'score': 0.04130595177412033,
  'token': 1470}]

## 5. Projection - Features Extraction 

In [12]:
import numpy as np
nlp_features = pipeline('feature-extraction')
output = nlp_features('Hugging Face is a French company based in Paris')
np.array(output).shape   # (Samples, Tokens, Vector Size)


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




(1, 12, 768)

Alright ! Now you have a nice picture of what is possible through transformers' pipelines, and there is more
to come in future releases. 

In the meantime, you can try the different pipelines with your own inputs

In [13]:
task = widgets.Dropdown(
    options=['sentiment-analysis', 'ner', 'fill_mask'],
    value='ner',
    description='Task:',
    disabled=False
)

input = widgets.Text(
    value='',
    placeholder='Enter something',
    description='Your input:',
    disabled=False
)

def forward(_):
    if len(input.value) > 0: 
        if task.value == 'ner':
            output = nlp_token_class(input.value)
        elif task.value == 'sentiment-analysis':
            output = nlp_sentence_classif(input.value)
        else:
            if input.value.find('<mask>') == -1:
                output = nlp_fill(input.value + ' <mask>')
            else:
                output = nlp_fill(input.value)                
        print(output)

input.on_submit(forward)
display(task, input)

Dropdown(description='Task:', index=1, options=('sentiment-analysis', 'ner', 'fill_mask'), value='ner')

Text(value='', description='Your input:', placeholder='Enter something')

In [14]:
context = widgets.Textarea(
    value='Einstein is famous for the general theory of relativity',
    placeholder='Enter something',
    description='Context:',
    disabled=False
)

query = widgets.Text(
    value='Why is Einstein famous for ?',
    placeholder='Enter something',
    description='Question:',
    disabled=False
)

def forward(_):
    if len(context.value) > 0 and len(query.value) > 0: 
        output = nlp_qa(question=query.value, context=context.value)            
        print(output)

query.on_submit(forward)
display(context, query)

Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…