mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 13:20:12 +06:00
511 lines
14 KiB
Plaintext
511 lines
14 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"## How can I leverage State-of-the-Art Natural Language Models with only one line of code ?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,\n",
|
|
"API for doing inference over a variety of downstream-tasks, including: \n",
|
|
"\n",
|
|
"- Sentence Classification (Sentiment Analysis): Indicate if the overall sentence is either positive or negative. _(Binary Classification task or Logitic Regression task)_\n",
|
|
"- Token Classification (Named Entity Recognition, Part-of-Speech tagging): For each sub-entities _(**tokens**)_ in the input, assign them a label _(Classification task)_.\n",
|
|
"- Question-Answering: Provided a tuple (question, context) the model should find the span of text in **content** answering the **question**.\n",
|
|
"- Mask-Filling: Suggests possible word(s) to fill the masked input with respect to the provided **context**.\n",
|
|
"- Feature Extraction: Maps the input to a higher, multi-dimensional space learned from the data.\n",
|
|
"\n",
|
|
"Pipelines encapsulate the overall process of every NLP process:\n",
|
|
" \n",
|
|
" 1. Tokenization: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).\n",
|
|
" 2. Inference: Maps every tokens into a more meaningful representation. \n",
|
|
" 3. Decoding: Use the above representation to generate and/or extract the final output for the underlying task.\n",
|
|
"\n",
|
|
"The overall API is exposed to the end-user through the `pipeline()` method with the following \n",
|
|
"structure:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"from transformers import pipeline\n",
|
|
"\n",
|
|
"# Using default model and tokenizer for the task\n",
|
|
"pipeline(\"<task-name>\")\n",
|
|
"\n",
|
|
"# Using a user-specified model\n",
|
|
"pipeline(\"<task-name>\", model=\"<model_name>\")\n",
|
|
"\n",
|
|
"# Using custom model/tokenizer as str\n",
|
|
"pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install transformers"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"pycharm": {
|
|
"name": "#%% code\n"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code \n"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from __future__ import print_function\n",
|
|
"import ipywidgets as widgets\n",
|
|
"from transformers import pipeline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"## 1. Sentence Classification - Sentiment Analysis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "c9db53f30b9446c0af03268633a966c0"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"text": [
|
|
"\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "[{'label': 'POSITIVE', 'score': 0.9997656}]"
|
|
},
|
|
"metadata": {},
|
|
"output_type": "execute_result",
|
|
"execution_count": 8
|
|
}
|
|
],
|
|
"source": [
|
|
"nlp_sentence_classif = pipeline('sentiment-analysis')\n",
|
|
"nlp_sentence_classif('Such a nice weather outside !')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"## 2. Token Classification - Named Entity Recognition"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "1e300789e22644f1aed66a5ed60e75c4"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"text": [
|
|
"\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "[{'word': 'Hu', 'score': 0.9970937967300415, 'entity': 'I-ORG'},\n {'word': '##gging', 'score': 0.9345750212669373, 'entity': 'I-ORG'},\n {'word': 'Face', 'score': 0.9787060022354126, 'entity': 'I-ORG'},\n {'word': 'French', 'score': 0.9981995820999146, 'entity': 'I-MISC'},\n {'word': 'New', 'score': 0.9983047246932983, 'entity': 'I-LOC'},\n {'word': '-', 'score': 0.8913455009460449, 'entity': 'I-LOC'},\n {'word': 'York', 'score': 0.9979523420333862, 'entity': 'I-LOC'}]"
|
|
},
|
|
"metadata": {},
|
|
"output_type": "execute_result",
|
|
"execution_count": 9
|
|
}
|
|
],
|
|
"source": [
|
|
"nlp_token_class = pipeline('ner')\n",
|
|
"nlp_token_class('Hugging Face is a French company based in New-York.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 3. Question Answering"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "82aca58f1ea24b4cb37f16402e8a5923"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"text": [
|
|
"\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"text": [
|
|
"convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 225.51it/s]\n",
|
|
"add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 2158.67it/s]\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "{'score': 0.9632966867654424, 'start': 42, 'end': 50, 'answer': 'New-York.'}"
|
|
},
|
|
"metadata": {},
|
|
"output_type": "execute_result",
|
|
"execution_count": 10
|
|
}
|
|
],
|
|
"source": [
|
|
"nlp_qa = pipeline('question-answering')\n",
|
|
"nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 4. Text Generation - Mask Filling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "49df2227b4fa4eb28dcdcfc3d9261d0f"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"text": [
|
|
"\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "[{'sequence': '<s> Hugging Face is a French company based in Paris</s>',\n 'score': 0.23106691241264343,\n 'token': 2201},\n {'sequence': '<s> Hugging Face is a French company based in Lyon</s>',\n 'score': 0.0819825753569603,\n 'token': 12790},\n {'sequence': '<s> Hugging Face is a French company based in Geneva</s>',\n 'score': 0.04769463092088699,\n 'token': 11559},\n {'sequence': '<s> Hugging Face is a French company based in Brussels</s>',\n 'score': 0.047622501850128174,\n 'token': 6497},\n {'sequence': '<s> Hugging Face is a French company based in France</s>',\n 'score': 0.04130595177412033,\n 'token': 1470}]"
|
|
},
|
|
"metadata": {},
|
|
"output_type": "execute_result",
|
|
"execution_count": 11
|
|
}
|
|
],
|
|
"source": [
|
|
"nlp_fill = pipeline('fill-mask')\n",
|
|
"nlp_fill('Hugging Face is a French company based in <mask>')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 5. Projection - Features Extraction "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "2af4cfb19e3243dda014d0f56b48f4b2"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"text": [
|
|
"\n"
|
|
],
|
|
"output_type": "stream"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "(1, 12, 768)"
|
|
},
|
|
"metadata": {},
|
|
"output_type": "execute_result",
|
|
"execution_count": 12
|
|
}
|
|
],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"nlp_features = pipeline('feature-extraction')\n",
|
|
"output = nlp_features('Hugging Face is a French company based in Paris')\n",
|
|
"np.array(output).shape # (Samples, Tokens, Vector Size)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"name": "#%% md\n"
|
|
}
|
|
},
|
|
"source": [
|
|
"Alright ! Now you have a nice picture of what is possible through transformers' pipelines, and there is more\n",
|
|
"to come in future releases. \n",
|
|
"\n",
|
|
"In the meantime, you can try the different pipelines with your own inputs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% code\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "Dropdown(description='Task:', index=1, options=('sentiment-analysis', 'ner', 'fill_mask'), value='ner')",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "10bac065d46f4e4d9a8498dcc8104ecd"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": "Text(value='', description='Your input:', placeholder='Enter something')",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "2c5f1411f7a94714bc00f01b0e3b27b2"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"task = widgets.Dropdown(\n",
|
|
" options=['sentiment-analysis', 'ner', 'fill_mask'],\n",
|
|
" value='ner',\n",
|
|
" description='Task:',\n",
|
|
" disabled=False\n",
|
|
")\n",
|
|
"\n",
|
|
"input = widgets.Text(\n",
|
|
" value='',\n",
|
|
" placeholder='Enter something',\n",
|
|
" description='Your input:',\n",
|
|
" disabled=False\n",
|
|
")\n",
|
|
"\n",
|
|
"def forward(_):\n",
|
|
" if len(input.value) > 0: \n",
|
|
" if task.value == 'ner':\n",
|
|
" output = nlp_token_class(input.value)\n",
|
|
" elif task.value == 'sentiment-analysis':\n",
|
|
" output = nlp_sentence_classif(input.value)\n",
|
|
" else:\n",
|
|
" if input.value.find('<mask>') == -1:\n",
|
|
" output = nlp_fill(input.value + ' <mask>')\n",
|
|
" else:\n",
|
|
" output = nlp_fill(input.value) \n",
|
|
" print(output)\n",
|
|
"\n",
|
|
"input.on_submit(forward)\n",
|
|
"display(task, input)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": false,
|
|
"name": "#%% Question Answering\n"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…",
|
|
"application/vnd.jupyter.widget-view+json": {
|
|
"version_major": 2,
|
|
"version_minor": 0,
|
|
"model_id": "019fde2343634e94b6f32d04f6350ec1"
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"context = widgets.Textarea(\n",
|
|
" value='Einstein is famous for the general theory of relativity',\n",
|
|
" placeholder='Enter something',\n",
|
|
" description='Context:',\n",
|
|
" disabled=False\n",
|
|
")\n",
|
|
"\n",
|
|
"query = widgets.Text(\n",
|
|
" value='Why is Einstein famous for ?',\n",
|
|
" placeholder='Enter something',\n",
|
|
" description='Question:',\n",
|
|
" disabled=False\n",
|
|
")\n",
|
|
"\n",
|
|
"def forward(_):\n",
|
|
" if len(context.value) > 0 and len(query.value) > 0: \n",
|
|
" output = nlp_qa(question=query.value, context=context.value) \n",
|
|
" print(output)\n",
|
|
"\n",
|
|
"query.on_submit(forward)\n",
|
|
"display(context, query)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
},
|
|
"pycharm": {
|
|
"stem_cell": {
|
|
"cell_type": "raw",
|
|
"source": [],
|
|
"metadata": {
|
|
"collapsed": false
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
} |