mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
updates to readme and doc
This commit is contained in:
parent
f31154cb9d
commit
43e0e8fa04
35
README.md
35
README.md
@ -1,19 +1,19 @@
|
||||
# 👾 PyTorch-Transformers
|
||||
|
||||
[](https://circleci.com/gh/huggingface/pytorch-pretrained-BERT)
|
||||
[](https://circleci.com/gh/huggingface/pytorch-transformers)
|
||||
|
||||
PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
||||
|
||||
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
||||
|
||||
- **[Google's BERT model](https://github.com/google-research/bert)** released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
- **[OpenAI's GPT model](https://github.com/openai/finetune-transformer-lm)** released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
- **[OpenAI's GPT-2 model](https://blog.openai.com/better-language-models/)** released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
- **[Google/CMU's Transformer-XL model](https://github.com/kimiyoung/transformer-xl)** released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
- **[Google/CMU's XLNet model](https://github.com/zihangdai/xlnet/)** released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
- **[Facebook's XLM model](https://github.com/facebookresearch/XLM/)** released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[BERT](https://github.com/google-research/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
2. **[GPT](https://github.com/openai/finetune-transformer-lm)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
3. **[GPT-2](https://blog.openai.com/better-language-models/)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
4. **[Transformer-XL](https://github.com/kimiyoung/transformer-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
|
||||
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](#documentation).
|
||||
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/pytorch-transformers/examples.html).
|
||||
|
||||
| Section | Description |
|
||||
|-|-|
|
||||
@ -21,7 +21,7 @@ These implementations have been tested on several datasets (see the example scri
|
||||
| [Quick tour: Usage](#quick-tour-usage) | Tokenizers & models usage: Bert and GPT-2 |
|
||||
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
|
||||
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-pytorch-transformers) | Migrating your code from pytorch-pretrained-bert to pytorch-transformers |
|
||||
| [Documentation](#documentation) | Full API documentation and more |
|
||||
| [Documentation](https://huggingface.co/pytorch-transformers/) | Full API documentation and more |
|
||||
|
||||
## Installation
|
||||
|
||||
@ -202,13 +202,14 @@ Examples for each model class of each model architecture (Bert, GPT, GPT-2, Tran
|
||||
|
||||
The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
|
||||
|
||||
- fine-tuning Bert/XLNet/XLM with a *sequence-level classifier* on nine different GLUE tasks,
|
||||
- fine-tuning Bert/XLNet/XLM with a *token-level classifier* on the question answering dataset SQuAD 2.0, and
|
||||
- using GPT/GPT-2/Transformer-XL and XLNet for conditional language generation.
|
||||
- `run_glue.py`: an example fine-tuning Bert, XLNet and XLM on nine different GLUE tasks (*sequence-level classification*)
|
||||
- `run_squad.py`: an example fine-tuning Bert, XLNet and XLM on the question answering dataset SQuAD 2.0 (*token-level classification*)
|
||||
- `run_generation.py`: an example using GPT, GPT-2, Transformer-XL and XLNet for conditional language generation
|
||||
- other model-specific examples (see the documentation).
|
||||
|
||||
Here are three quick usage examples for these scripts:
|
||||
|
||||
### Fine-tuning for sequence classification: GLUE tasks examples
|
||||
### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
|
||||
|
||||
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
|
||||
|
||||
@ -302,7 +303,7 @@ Training with these hyper-parameters gave us the following results:
|
||||
loss = 0.07231863956341798
|
||||
```
|
||||
|
||||
### Fine-tuning for question-answering: SQuAD example
|
||||
### `run_squad.py`: Fine-tuning on SQuAD for question-answering
|
||||
|
||||
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
|
||||
|
||||
@ -333,7 +334,7 @@ python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncase
|
||||
|
||||
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
|
||||
|
||||
### Conditional generation: Text generation with GPT, GPT-2, Transformer-XL and XLNet
|
||||
### `run_generation.py`: Text generation with GPT, GPT-2, Transformer-XL and XLNet
|
||||
|
||||
A conditional generation script is also included to generate text from a prompt.
|
||||
The generation script include the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by by Aman Rusia to get high quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
|
||||
@ -347,10 +348,6 @@ python ./examples/run_glue.py \
|
||||
--model_name_or_path=gpt2 \
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
The full documentation is available at https://huggingface.co/pytorch-transformers/.
|
||||
|
||||
## Migrating from pytorch-pretrained-bert to pytorch-transformers
|
||||
|
||||
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `pytorch-transformers`
|
||||
|
@ -1,4 +1,4 @@
|
||||
Converting Tensorflow Models
|
||||
Converting Tensorflow Checkpoints
|
||||
================================================
|
||||
|
||||
A command-line interface is provided to convert a TensorFlow checkpoint in a PyTorch dump of the ``BertForPreTraining`` class (for BERT) or NumPy checkpoint in a PyTorch dump of the ``OpenAIGPTModel`` class (for OpenAI GPT).
|
||||
|
@ -1,14 +1,24 @@
|
||||
Pytorch-Transformers
|
||||
================================================================================================================================================
|
||||
|
||||
PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
|
||||
|
||||
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
|
||||
|
||||
1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
3. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
4. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Notes
|
||||
|
||||
installation
|
||||
philosophy
|
||||
usage
|
||||
quickstart
|
||||
pretrained_models
|
||||
examples
|
||||
notebooks
|
||||
converting_tensorflow_models
|
||||
@ -28,263 +38,3 @@ Pytorch-Transformers
|
||||
model_doc/gpt2
|
||||
model_doc/xlm
|
||||
model_doc/xlnet
|
||||
|
||||
|
||||
.. image:: https://circleci.com/gh/huggingface/pytorch-pretrained-BERT.svg?style=svg
|
||||
:target: https://circleci.com/gh/huggingface/pytorch-pretrained-BERT
|
||||
:alt: CircleCI
|
||||
|
||||
|
||||
This repository contains op-for-op PyTorch reimplementations, pre-trained models and fine-tuning examples for:
|
||||
|
||||
|
||||
* `Google's BERT model <https://github.com/google-research/bert>`__\ ,
|
||||
* `OpenAI's GPT model <https://github.com/openai/finetune-transformer-lm>`__\ ,
|
||||
* `Google/CMU's Transformer-XL model <https://github.com/kimiyoung/transformer-xl>`__\ , and
|
||||
* `OpenAI's GPT-2 model <https://blog.openai.com/better-language-models/>`__.
|
||||
|
||||
These implementations have been tested on several datasets (see the examples) and should match the performances of the associated TensorFlow implementations (e.g. ~91 F1 on SQuAD for BERT, ~88 F1 on RocStories for OpenAI GPT and ~18.3 perplexity on WikiText 103 for the Transformer-XL). You can find more details in the `Examples <./examples.html>`__ section.
|
||||
|
||||
Here are some information on these models:
|
||||
|
||||
**BERT** was released together with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
This PyTorch implementation of BERT is provided with `Google's pre-trained models <https://github.com/google-research/bert>`__\ , examples, notebooks and a command-line interface to load any pre-trained TensorFlow checkpoint for BERT is also provided.
|
||||
|
||||
**OpenAI GPT** was released together with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
This PyTorch implementation of OpenAI GPT is an adaptation of the `PyTorch implementation by HuggingFace <https://github.com/huggingface/pytorch-openai-transformer-lm>`__ and is provided with `OpenAI's pre-trained model <https://github.com/openai/finetune-transformer-lm>`__ and a command-line interface that was used to convert the pre-trained NumPy checkpoint in PyTorch.
|
||||
|
||||
**Google/CMU's Transformer-XL** was released together with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <http://arxiv.org/abs/1901.02860>`__ by Zihang Dai\*, Zhilin Yang\* , Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
This PyTorch implementation of Transformer-XL is an adaptation of the original `PyTorch implementation <https://github.com/kimiyoung/transformer-xl>`__ which has been slightly modified to match the performances of the TensorFlow implementation and allow to re-use the pretrained weights. A command-line interface is provided to convert TensorFlow checkpoints in PyTorch models.
|
||||
|
||||
**OpenAI GPT-2** was released together with the paper `Language Models are Unsupervised Multitask Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford\*, Jeffrey Wu\* , Rewon Child, David Luan, Dario Amodei\*\* and Ilya Sutskever\*\*.
|
||||
This PyTorch implementation of OpenAI GPT-2 is an adaptation of the `OpenAI's implementation <https://github.com/openai/gpt-2>`__ and is provided with `OpenAI's pre-trained model <https://github.com/openai/gpt-2>`__ and a command-line interface that was used to convert the TensorFlow checkpoint in PyTorch.
|
||||
|
||||
**Facebook Research's XLM** was released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
|
||||
This PyTorch implementation of XLM is an adaptation of the original `PyTorch implementation <https://github.com/facebookresearch/XLM>`__.
|
||||
|
||||
**Google's XLNet** was released together with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang\*, Zihang Dai\*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov and Quoc V. Le.
|
||||
This PyTorch implementation of XLM is an adaptation of the `Tensorflow implementation <https://github.com/zihangdai/xlnet>`__.
|
||||
|
||||
|
||||
Content
|
||||
-------
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Section
|
||||
- Description
|
||||
* - `Installation <./installation.html>`__
|
||||
- How to install the package
|
||||
* - `Philosphy <./philosophy.html>`__
|
||||
- The philosophy behind this package
|
||||
* - `Usage <./usage.html>`__
|
||||
- Quickstart examples
|
||||
* - `Examples <./examples.html>`__
|
||||
- Detailed examples on how to fine-tune Bert
|
||||
* - `Notebooks <./notebooks.html>`__
|
||||
- Introduction on the provided Jupyter Notebooks
|
||||
* - `TPU <./tpu.html>`__
|
||||
- Notes on TPU support and pretraining scripts
|
||||
* - `Command-line interface <./cli.html>`__
|
||||
- Convert a TensorFlow checkpoint in a PyTorch dump
|
||||
* - `Migration <./migration.html>`__
|
||||
- Migrating from ``pytorch_pretrained_BERT`` (v0.6) to ``pytorch_transformers`` (v1.0)
|
||||
* - `Bertology <./bertology.html>`__
|
||||
- Exploring the internals of the pretrained models.
|
||||
* - `TorchScript <./torchscript.html>`__
|
||||
- Convert a model to TorchScript for use in other programming languages
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Section
|
||||
- Description
|
||||
* - `Overview <./model_doc/overview.html>`__
|
||||
- Overview of the package
|
||||
* - `BERT <./model_doc/bert.html>`__
|
||||
- BERT Models, Tokenizers and optimizers
|
||||
* - `OpenAI GPT <./model_doc/gpt.html>`__
|
||||
- GPT Models, Tokenizers and optimizers
|
||||
* - `TransformerXL <./model_doc/transformerxl.html>`__
|
||||
- TransformerXL Models, Tokenizers and optimizers
|
||||
* - `OpenAI GPT2 <./model_doc/gpt2.html>`__
|
||||
- GPT2 Models, Tokenizers and optimizers
|
||||
* - `XLM <./model_doc/xlm.html>`__
|
||||
- XLM Models, Tokenizers and optimizers
|
||||
* - `XLNet <./model_doc/xlnet.html>`__
|
||||
- XLNet Models, Tokenizers and optimizers
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This package comprises the following classes that can be imported in Python and are detailed in the `documentation <./model_doc/overview.html>`__ section of this package:
|
||||
|
||||
|
||||
*
|
||||
Eight **Bert** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_bert.py <./_modules/pytorch_transformers/modeling_bert.html>`__ file):
|
||||
|
||||
|
||||
* `BertModel <./model_doc/bert.html#pytorch_transformers.BertModel>`__ - raw BERT Transformer model (\ **fully pre-trained**\ ),
|
||||
* `BertForMaskedLM <./model_doc/bert.html#pytorch_transformers.BertForMaskedLM>`__ - BERT Transformer with the pre-trained masked language modeling head on top (\ **fully pre-trained**\ ),
|
||||
* `BertForNextSentencePrediction <./model_doc/bert.html#pytorch_transformers.BertForNextSentencePrediction>`__ - BERT Transformer with the pre-trained next sentence prediction classifier on top (\ **fully pre-trained**\ ),
|
||||
* `BertForPreTraining <./model_doc/bert.html#pytorch_transformers.BertForPreTraining>`__ - BERT Transformer with masked language modeling head and next sentence prediction classifier on top (\ **fully pre-trained**\ ),
|
||||
* `BertForSequenceClassification <./model_doc/bert.html#pytorch_transformers.BertForSequenceClassification>`__ - BERT Transformer with a sequence classification head on top (BERT Transformer is **pre-trained**\ , the sequence classification head **is only initialized and has to be trained**\ ),
|
||||
* `BertForMultipleChoice <./model_doc/bert.html#pytorch_transformers.BertForMultipleChoice>`__ - BERT Transformer with a multiple choice head on top (used for task like Swag) (BERT Transformer is **pre-trained**\ , the multiple choice classification head **is only initialized and has to be trained**\ ),
|
||||
* `BertForTokenClassification <./model_doc/bert.html#pytorch_transformers.BertForTokenClassification>`__ - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**\ , the token classification head **is only initialized and has to be trained**\ ),
|
||||
* `BertForQuestionAnswering <./model_doc/bert.html#pytorch_transformers.BertForQuestionAnswering>`__ - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**\ , the token classification head **is only initialized and has to be trained**\ ).
|
||||
|
||||
*
|
||||
Three **OpenAI GPT** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_openai.py <./_modules/pytorch_transformers/modeling_openai.html>`__ file):
|
||||
|
||||
|
||||
* `OpenAIGPTModel <./model_doc/gpt.html#pytorch_transformers.OpenAIGPTModel>`__ - raw OpenAI GPT Transformer model (\ **fully pre-trained**\ ),
|
||||
* `OpenAIGPTLMHeadModel <./model_doc/gpt.html#pytorch_transformers.OpenAIGPTLMHeadModel>`__ - OpenAI GPT Transformer with the tied language modeling head on top (\ **fully pre-trained**\ ),
|
||||
* `OpenAIGPTDoubleHeadsModel <./model_doc/gpt.html#pytorch_transformers.OpenAIGPTDoubleHeadsModel>`__ - OpenAI GPT Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT Transformer is **pre-trained**\ , the multiple choice classification head **is only initialized and has to be trained**\ ),
|
||||
|
||||
*
|
||||
Two **Transformer-XL** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_transfo_xl.py <./_modules/pytorch_transformers/modeling_transfo_xl.html>`__ file):
|
||||
|
||||
|
||||
* `TransfoXLModel <./model_doc/transformerxl.html#pytorch_transformers.TransfoXLModel>`__ - Transformer-XL model which outputs the last hidden state and memory cells (\ **fully pre-trained**\ ),
|
||||
* `TransfoXLLMHeadModel <./model_doc/transformerxl.html#pytorch_transformers.TransfoXLLMHeadModel>`__ - Transformer-XL with the tied adaptive softmax head on top for language modeling which outputs the logits/loss and memory cells (\ **fully pre-trained**\ ),
|
||||
|
||||
*
|
||||
Three **OpenAI GPT-2** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_gpt2.py <./_modules/pytorch_transformers/modeling_gpt2.html>`__ file):
|
||||
|
||||
|
||||
* `GPT2Model <./model_doc/gpt2.html#pytorch_transformers.GPT2Model>`__ - raw OpenAI GPT-2 Transformer model (\ **fully pre-trained**\ ),
|
||||
* `GPT2LMHeadModel <./model_doc/gpt2.html#pytorch_transformers.GPT2LMHeadModel>`__ - OpenAI GPT-2 Transformer with the tied language modeling head on top (\ **fully pre-trained**\ ),
|
||||
* `GPT2DoubleHeadsModel <./model_doc/gpt2.html#pytorch_transformers.GPT2DoubleHeadsModel>`__ - OpenAI GPT-2 Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT-2 Transformer is **pre-trained**\ , the multiple choice classification head **is only initialized and has to be trained**\ ),
|
||||
|
||||
*
|
||||
Four **XLM** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_xlm.py <./_modules/pytorch_transformers/modeling_xlm.html>`__ file):
|
||||
|
||||
|
||||
* `XLMModel <./model_doc/xlm.html#pytorch_transformers.XLMModel>`__ - raw XLM Transformer model (\ **fully pre-trained**\ ),
|
||||
* `XLMWithLMHeadModel <./model_doc/xlm.html#pytorch_transformers.XLMWithLMHeadModel>`__ - XLM Transformer with the tied language modeling head on top (\ **fully pre-trained**\ ),
|
||||
* `XLMForSequenceClassification <./model_doc/xlm.html#pytorch_transformers.XLMForSequenceClassification>`__ - XLM Transformer with a sequence classification head on top (XLM Transformer is **pre-trained**\ , the sequence classification head **is only initialized and has to be trained**\ ),
|
||||
* `XLMForQuestionAnswering <./model_doc/xlm.html#pytorch_transformers.XLMForQuestionAnswering>`__ - XLM Transformer with a token classification head on top (XLM Transformer is **pre-trained**\ , the token classification head **is only initialized and has to be trained**\ )
|
||||
|
||||
*
|
||||
Four **XLNet** PyTorch models (\ ``torch.nn.Module``\ ) with pre-trained weights (in the `modeling_xlnet.py <./_modules/pytorch_transformers/modeling_xlnet.html>`__ file):
|
||||
|
||||
|
||||
* `XLNetModel <./model_doc/xlnet.html#pytorch_transformers.XLNetModel>`__ - raw XLNet Transformer model (\ **fully pre-trained**\ ),
|
||||
* `XLNetLMHeadModel <./model_doc/xlnet.html#pytorch_transformers.XLNetLMHeadModel>`__ - XLNet Transformer with the tied language modeling head on top (\ **fully pre-trained**\ ),
|
||||
* `XLNetForSequenceClassification <./model_doc/xlnet.html#pytorch_transformers.XLNetForSequenceClassification>`__ - XLNet Transformer with a sequence classification head on top (XLM Transformer is **pre-trained**\ , the sequence classification head **is only initialized and has to be trained**\ ),
|
||||
* `XLNetForQuestionAnswering <./model_doc/xlnet.html#pytorch_transformers.XLNetForQuestionAnswering>`__ - XLNet Transformer with a token classification head on top (XLNet Transformer is **pre-trained**\ , the token classification head **is only initialized and has to be trained**\ )
|
||||
|
||||
|
||||
TODO Lysandre filled: I filled in XLM and XLNet. I didn't do the Tokenizers because I don't know the current philosophy behind them.
|
||||
|
||||
*
|
||||
Tokenizers for **BERT** (using word-piece) (in the `tokenization_bert.py <./_modules/pytorch_transformers/tokenization_bert.html>`__ file):
|
||||
|
||||
* ``BasicTokenizer`` - basic tokenization (punctuation splitting, lower casing, etc.),
|
||||
* ``WordpieceTokenizer`` - WordPiece tokenization,
|
||||
* ``BertTokenizer`` - perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization.
|
||||
|
||||
|
||||
*
|
||||
Tokenizer for **OpenAI GPT** (using Byte-Pair-Encoding) (in the `tokenization_openai.py <./_modules/pytorch_transformers/tokenization_openai.html>`__ file):
|
||||
|
||||
* ``OpenAIGPTTokenizer`` - perform Byte-Pair-Encoding (BPE) tokenization.
|
||||
|
||||
|
||||
*
|
||||
Tokenizer for **OpenAI GPT-2** (using byte-level Byte-Pair-Encoding) (in the `tokenization_gpt2.py <./_modules/pytorch_transformers/tokenization_gpt2.html>`__ file):
|
||||
|
||||
* ``GPT2Tokenizer`` - perform byte-level Byte-Pair-Encoding (BPE) tokenization.
|
||||
|
||||
|
||||
*
|
||||
Tokenizer for **Transformer-XL** (word tokens ordered by frequency for adaptive softmax) (in the `tokenization_transfo_xl.py <./_modules/pytorch_transformers/tokenization_transfo_xl.html>`__ file):
|
||||
|
||||
* ``OpenAIGPTTokenizer`` - perform word tokenization and can order words by frequency in a corpus for use in an adaptive softmax.
|
||||
|
||||
|
||||
*
|
||||
Tokenizer for **XLNet** (SentencePiece based tokenizer) (in the `tokenization_xlnet.py <./_modules/pytorch_transformers/tokenization_xlnet.html>`__ file):
|
||||
|
||||
* ``XLNetTokenizer`` - perform SentencePiece tokenization.
|
||||
|
||||
|
||||
*
|
||||
Tokenizer for **XLM** (using Byte-Pair-Encoding) (in the `tokenization_xlm.py <./_modules/pytorch_transformers/tokenization_xlm.html>`__ file):
|
||||
|
||||
* ``GPT2Tokenizer`` - perform Byte-Pair-Encoding (BPE) tokenization.
|
||||
|
||||
|
||||
*
|
||||
Optimizer (in the `optimization.py <./_modules/pytorch_transformers/optimization.html>`__ file):
|
||||
|
||||
|
||||
* ``AdamW`` - Version of Adam algorithm with weight decay fix, warmup and linear decay of the learning rate.
|
||||
|
||||
|
||||
*
|
||||
Configuration classes for BERT, OpenAI GPT, Transformer-XL, XLM and XLNet (in the respective \
|
||||
`modeling_bert.py <./_modules/pytorch_transformers/modeling_bert.html>`__\ , \
|
||||
`modeling_openai.py <./_modules/pytorch_transformers/modeling_openai.html>`__\ , \
|
||||
`modeling_transfo_xl.py <./_modules/pytorch_transformers/modeling_transfo_xl.html>`__, \
|
||||
`modeling_xlm.py <./_modules/pytorch_transformers/modeling_xlm.html>`__, \
|
||||
`modeling_xlnet.py <./_modules/pytorch_transformers/modeling_xlnet.html>`__ \
|
||||
files):
|
||||
|
||||
|
||||
* ``BertConfig`` - Configuration class to store the configuration of a ``BertModel`` with utilities to read and write from JSON configuration files.
|
||||
* ``OpenAIGPTConfig`` - Configuration class to store the configuration of a ``OpenAIGPTModel`` with utilities to read and write from JSON configuration files.
|
||||
* ``GPT2Config`` - Configuration class to store the configuration of a ``GPT2Model`` with utilities to read and write from JSON configuration files.
|
||||
* ``TransfoXLConfig`` - Configuration class to store the configuration of a ``TransfoXLModel`` with utilities to read and write from JSON configuration files.
|
||||
* ``XLMConfig`` - Configuration class to store the configuration of a ``XLMModel`` with utilities to read and write from JSON configuration files.
|
||||
* ``XLNetConfig`` - Configuration class to store the configuration of a ``XLNetModel`` with utilities to read and write from JSON configuration files.
|
||||
|
||||
The repository further comprises:
|
||||
|
||||
|
||||
*
|
||||
Five examples on how to use **BERT** (in the `examples folder <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples>`__\ ):
|
||||
|
||||
|
||||
* `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_bert_extract_features.py>`__ - Show how to extract hidden states from an instance of ``BertModel``\ ,
|
||||
* `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_bert_classifier.py>`__ - Show how to fine-tune an instance of ``BertForSequenceClassification`` on GLUE's MRPC task,
|
||||
* `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_bert_squad.py>`__ - Show how to fine-tune an instance of ``BertForQuestionAnswering`` on SQuAD v1.0 and SQuAD v2.0 tasks.
|
||||
* `run_swag.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_swag.py>`__ - Show how to fine-tune an instance of ``BertForMultipleChoice`` on Swag task.
|
||||
* `simple_lm_finetuning.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/lm_finetuning/simple_lm_finetuning.py>`__ - Show how to fine-tune an instance of ``BertForPretraining`` on a target text corpus.
|
||||
|
||||
*
|
||||
One example on how to use **OpenAI GPT** (in the `examples folder <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples>`__\ ):
|
||||
|
||||
|
||||
* `run_openai_gpt.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_openai_gpt.py>`__ - Show how to fine-tune an instance of ``OpenGPTDoubleHeadsModel`` on the RocStories task.
|
||||
|
||||
*
|
||||
One example on how to use **Transformer-XL** (in the `examples folder <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples>`__\ ):
|
||||
|
||||
|
||||
* `run_transfo_xl.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_transfo_xl.py>`__ - Show how to load and evaluate a pre-trained model of ``TransfoXLLMHeadModel`` on WikiText 103.
|
||||
|
||||
*
|
||||
One example on how to use **OpenAI GPT-2** in the unconditional and interactive mode (in the `examples folder <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples>`__\ ):
|
||||
|
||||
|
||||
* `run_gpt2.py <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_gpt2.py>`__ - Show how to use OpenAI GPT-2 an instance of ``GPT2LMHeadModel`` to generate text (same as the original OpenAI GPT-2 examples).
|
||||
|
||||
These examples are detailed in the `Examples <#examples>`__ section of this readme.
|
||||
|
||||
*
|
||||
Three notebooks that were used to check that the TensorFlow and PyTorch models behave identically (in the `notebooks folder <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks>`__\ ):
|
||||
|
||||
|
||||
* `Comparing-TF-and-PT-models.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing-TF-and-PT-models.ipynb>`__ - Compare the hidden states predicted by ``BertModel``\ ,
|
||||
* `Comparing-TF-and-PT-models-SQuAD.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb>`__ - Compare the spans predicted by ``BertForQuestionAnswering`` instances,
|
||||
* `Comparing-TF-and-PT-models-MLM-NSP.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb>`__ - Compare the predictions of the ``BertForPretraining`` instances.
|
||||
|
||||
These notebooks are detailed in the `Notebooks <#notebooks>`__ section of this readme.
|
||||
|
||||
|
||||
*
|
||||
A command-line interface to convert TensorFlow checkpoints (BERT, Transformer-XL) or NumPy checkpoint (OpenAI) in a PyTorch save of the associated PyTorch model:
|
||||
|
||||
This CLI is detailed in the `Command-line interface <#Command-line-interface>`__ section of this readme.
|
||||
|
@ -6,11 +6,41 @@ This repo was tested on Python 2.7 and 3.5+ (examples are tested only on python
|
||||
With pip
|
||||
^^^^^^^^
|
||||
|
||||
PyTorch pretrained bert can be installed by pip as follows:
|
||||
PyTorch pretrained bert can be installed with pip as follows:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install pytorch-pretrained-bert
|
||||
pip install pytorch-transformers
|
||||
|
||||
From source
|
||||
^^^^^^^^^^^
|
||||
|
||||
Clone the repository and instal locally:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/huggingface/pytorch-transformers.git
|
||||
cd pytorch-transformers
|
||||
pip install [--editable] .
|
||||
|
||||
|
||||
Tests
|
||||
^^^^^
|
||||
|
||||
An extensive test suite is included for the library and the example scripts. Library tests can be found in the `tests folder <https://github.com/huggingface/pytorch-transformers/tree/master/pytorch_transformers/tests>`_ and examples tests in the `examples folder <https://github.com/huggingface/pytorch-transformers/tree/master/examples>`_.
|
||||
|
||||
These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
|
||||
|
||||
You can run the tests from the root of the cloned repository with the commands:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m pytest -sv ./pytorch_transformers/tests/
|
||||
python -m pytest -sv ./examples/
|
||||
|
||||
|
||||
OpenAI GPT original tokenization workflow
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If you want to reproduce the original tokenization process of the ``OpenAI GPT`` paper, you will need to install ``ftfy`` (limit to version 4.4.3 if you are using Python 2) and ``SpaCy`` :
|
||||
|
||||
@ -20,29 +50,3 @@ If you want to reproduce the original tokenization process of the ``OpenAI GPT``
|
||||
python -m spacy download en
|
||||
|
||||
If you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
From source
|
||||
^^^^^^^^^^^
|
||||
|
||||
Clone the repository and run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install [--editable] .
|
||||
|
||||
Here also, if you want to reproduce the original tokenization process of the ``OpenAI GPT`` model, you will need to install ``ftfy`` (limit to version 4.4.3 if you are using Python 2) and ``SpaCy`` :
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install spacy ftfy==4.4.3
|
||||
python -m spacy download en
|
||||
|
||||
Again, if you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage).
|
||||
|
||||
A series of tests is included in the `tests folder <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/tests>`_ and can be run using ``pytest`` (install pytest if needed: ``pip install pytest``\ ).
|
||||
|
||||
You can run the tests with the command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m pytest -sv tests/
|
||||
|
@ -1 +1,96 @@
|
||||
# Migration
|
||||
# Migrating from pytorch-pretrained-bert
|
||||
|
||||
|
||||
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `pytorch-transformers`
|
||||
|
||||
### Models always output `tuples`
|
||||
|
||||
The main breaking change when migrating from `pytorch-pretrained-bert` to `pytorch-transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
|
||||
|
||||
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/pytorch-transformers/).
|
||||
|
||||
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
|
||||
|
||||
Here is a `pytorch-pretrained-bert` to `pytorch-transformers` conversion example for a `BertForSequenceClassification` classification model:
|
||||
|
||||
```python
|
||||
# Let's load our model
|
||||
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
|
||||
|
||||
# If you used to have this line in pytorch-pretrained-bert:
|
||||
loss = model(input_ids, labels=labels)
|
||||
|
||||
# Now just use this line in pytorch-transformers to extract the loss from the output tuple:
|
||||
outputs = model(input_ids, labels=labels)
|
||||
loss = outputs[0]
|
||||
|
||||
# In pytorch-transformers you can also have access to the logits:
|
||||
loss, logits = outputs[:2]
|
||||
|
||||
# And even the attention weigths if you configure the model to output them (and other outputs too, see the docstrings and documentation)
|
||||
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
|
||||
outputs = model(input_ids, labels=labels)
|
||||
loss, logits, attentions = outputs
|
||||
```
|
||||
|
||||
### Serialization
|
||||
|
||||
While not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other seralization method before.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```python
|
||||
### Let's load a model and tokenizer
|
||||
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
|
||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||
|
||||
### Do some stuff to our model and tokenizer
|
||||
# Ex: add new tokens to the vocabulary and embeddings of our model
|
||||
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
|
||||
model.resize_token_embeddings(len(tokenizer))
|
||||
# Train our model
|
||||
train(model)
|
||||
|
||||
### Now let's save our model and tokenizer to a directory
|
||||
model.save_pretrained('./my_saved_model_directory/')
|
||||
tokenizer.save_pretrained('./my_saved_model_directory/')
|
||||
|
||||
### Reload the model and the tokenizer
|
||||
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
|
||||
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
|
||||
```
|
||||
|
||||
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
|
||||
|
||||
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer.
|
||||
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API.
|
||||
|
||||
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
|
||||
|
||||
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
|
||||
|
||||
```python
|
||||
# Parameters:
|
||||
lr = 1e-3
|
||||
num_total_steps = 1000
|
||||
num_warmup_steps = 100
|
||||
warmup_proportion = float(num_warmup_steps) / float(num_total_steps) # 0.1
|
||||
|
||||
### Previously BertAdam optimizer was instantiated like this:
|
||||
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_total_steps)
|
||||
### and used like this:
|
||||
for batch in train_data:
|
||||
loss = model(batch)
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
|
||||
### In PyTorch-Transformers, optimizer and schedules are splitted and instantiated like this:
|
||||
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
|
||||
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
|
||||
### and used like this:
|
||||
for batch in train_data:
|
||||
loss = model(batch)
|
||||
loss.backward()
|
||||
scheduler.step()
|
||||
optimizer.step()
|
||||
```
|
||||
|
@ -1 +0,0 @@
|
||||
# Philosophy
|
59
docs/source/pretrained_models.rst
Normal file
59
docs/source/pretrained_models.rst
Normal file
@ -0,0 +1,59 @@
|
||||
Pretrained models
|
||||
================================================
|
||||
|
||||
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
|
||||
|
||||
+===============+============================================================+===========================+
|
||||
| Architecture | Shortcut name | Details of the model |
|
||||
+===============+============================================================+===========================+
|
||||
| | ``bert-base-uncased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
||||
| | | Trained on lower-cased English text |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-uncased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters
|
||||
| | | Trained on lower-cased English text |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
||||
| | | Trained on cased English text |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||
| | | Trained on cased English text |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-base-multilingual-uncased`` | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters
|
||||
| | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias
|
||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-base-multilingual-cased`` | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||
| | | Trained on cased text in the top 104 languages with the largest Wikipedias
|
||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| BERT | ``bert-base-chinese`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||
| | | Trained on cased Chinese Simplified and Traditional text |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-base-german-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||
| | | Trained on cased German text by Deepset.ai |
|
||||
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-uncased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||
| | | Trained on lower-cased English text using Whole-Word-Masking |
|
||||
| | | (see `details <https://github.com/google-research/bert/#bert>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-cased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||
| | | Trained on cased English text using Whole-Word-Masking |
|
||||
| | | (see `details <https://github.com/google-research/bert/#bert>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||
| | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||
| | | (see details of fine-tuning in the `example section`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||
| | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_) |
|
||||
| +------------------------------------------------------------+---------------------------+
|
||||
| | ``bert-base-cased-finetuned-mrpc`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||
| | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_) |
|
||||
+---------------+------------------------------------------------------------+---------------------------+
|
||||
| GPT | Cells may span columns. |
|
||||
+---------------+----------------------------------------------------------------------------------------+
|
||||
|
||||
.. <https://huggingface.co/pytorch-transformers/examples.html>`_
|
146
docs/source/quickstart.md
Normal file
146
docs/source/quickstart.md
Normal file
@ -0,0 +1,146 @@
|
||||
# Quickstart
|
||||
|
||||
## Main concepts
|
||||
|
||||
|
||||
## Quick tour: Usage
|
||||
|
||||
Here are two quick-start examples showcasing a few `Bert` and `GPT2` classes and pre-trained models.
|
||||
|
||||
See package reference for examples for each model classe.
|
||||
|
||||
### BERT example
|
||||
|
||||
First let's prepare a tokenized input from a text string using `BertTokenizer`
|
||||
|
||||
```python
|
||||
import torch
|
||||
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary)
|
||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||
|
||||
# Tokenize input
|
||||
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
|
||||
tokenized_text = tokenizer.tokenize(text)
|
||||
|
||||
# Mask a token that we will try to predict back with `BertForMaskedLM`
|
||||
masked_index = 8
|
||||
tokenized_text[masked_index] = '[MASK]'
|
||||
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
|
||||
|
||||
# Convert token to vocabulary indices
|
||||
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
|
||||
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
|
||||
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
|
||||
|
||||
# Convert inputs to PyTorch tensors
|
||||
tokens_tensor = torch.tensor([indexed_tokens])
|
||||
segments_tensors = torch.tensor([segments_ids])
|
||||
```
|
||||
|
||||
Let's see how we can use `BertModel` to encode our inputs in hidden-states:
|
||||
|
||||
```python
|
||||
# Load pre-trained model (weights)
|
||||
model = BertModel.from_pretrained('bert-base-uncased')
|
||||
|
||||
# Set the model in evaluation mode to desactivate the DropOut modules
|
||||
# This is IMPORTANT to have reproductible results during evaluation!
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
segments_tensors = segments_tensors.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
# See the models docstrings for the detail of the inputs
|
||||
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
|
||||
# PyTorch-Transformers models always output tuples.
|
||||
# See the models docstrings for the detail of all the outputs
|
||||
# In our case, the first element is the hidden state of the last layer of the Bert model
|
||||
encoded_layers = outputs[0]
|
||||
# We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension)
|
||||
assert tuple(encoded_layers.shape) == (1, len(indexed_tokens), model.config.hidden_size)
|
||||
```
|
||||
|
||||
And how to use `BertForMaskedLM` to predict a masked token:
|
||||
|
||||
```python
|
||||
# Load pre-trained model (weights)
|
||||
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
segments_tensors = segments_tensors.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict all tokens
|
||||
with torch.no_grad():
|
||||
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
|
||||
predictions = outputs[0]
|
||||
|
||||
# confirm we were able to predict 'henson'
|
||||
predicted_index = torch.argmax(predictions[0, masked_index]).item()
|
||||
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
|
||||
assert predicted_token == 'henson'
|
||||
```
|
||||
|
||||
### OpenAI GPT-2
|
||||
|
||||
Here is a quick-start example using `GPT2Tokenizer` and `GPT2LMHeadModel` class with OpenAI's pre-trained model to predict the next token from a text prompt.
|
||||
|
||||
First let's prepare a tokenized input from our text string using `GPT2Tokenizer`
|
||||
|
||||
```python
|
||||
import torch
|
||||
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary)
|
||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||
|
||||
# Encode a text inputs
|
||||
text = "Who was Jim Henson ? Jim Henson was a"
|
||||
indexed_tokens = tokenizer.encode(text)
|
||||
|
||||
# Convert indexed tokens in a PyTorch tensor
|
||||
tokens_tensor = torch.tensor([indexed_tokens])
|
||||
```
|
||||
|
||||
Let's see how to use `GPT2LMHeadModel` to generate the next token following our text:
|
||||
|
||||
```python
|
||||
# Load pre-trained model (weights)
|
||||
model = GPT2LMHeadModel.from_pretrained('gpt2')
|
||||
|
||||
# Set the model in evaluation mode to desactivate the DropOut modules
|
||||
# This is IMPORTANT to have reproductible results during evaluation!
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict all tokens
|
||||
with torch.no_grad():
|
||||
outputs = model(tokens_tensor)
|
||||
predictions = outputs[0]
|
||||
|
||||
# get the predicted next sub-word (in our case, the word 'man')
|
||||
predicted_index = torch.argmax(predictions[0, -1, :]).item()
|
||||
predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])
|
||||
assert predicted_text == 'Who was Jim Henson? Jim Henson was a man'
|
||||
```
|
||||
|
||||
Examples for each model class of each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [documentation](#documentation).
|
@ -1,339 +0,0 @@
|
||||
Usage
|
||||
================================================
|
||||
|
||||
BERT
|
||||
^^^^
|
||||
|
||||
Here is a quick-start example using ``BertTokenizer``\ , ``BertModel`` and ``BertForMaskedLM`` class with Google AI's pre-trained ``Bert base uncased`` model. See the `doc section <./model_doc/overview.html>`_ below for all the details on these classes.
|
||||
|
||||
First let's prepare a tokenized input with ``BertTokenizer``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import torch
|
||||
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary)
|
||||
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
||||
|
||||
# Tokenized input
|
||||
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
|
||||
tokenized_text = tokenizer.tokenize(text)
|
||||
|
||||
# Mask a token that we will try to predict back with `BertForMaskedLM`
|
||||
masked_index = 8
|
||||
tokenized_text[masked_index] = '[MASK]'
|
||||
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
|
||||
|
||||
# Convert token to vocabulary indices
|
||||
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
|
||||
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
|
||||
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
|
||||
|
||||
# Convert inputs to PyTorch tensors
|
||||
tokens_tensor = torch.tensor([indexed_tokens])
|
||||
segments_tensors = torch.tensor([segments_ids])
|
||||
|
||||
Let's see how to use ``BertModel`` to get hidden states
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = BertModel.from_pretrained('bert-base-uncased')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
segments_tensors = segments_tensors.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
encoded_layers, _ = model(tokens_tensor, segments_tensors)
|
||||
# We have a hidden states for each of the 12 layers in model bert-base-uncased
|
||||
assert len(encoded_layers) == 12
|
||||
|
||||
And how to use ``BertForMaskedLM``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
segments_tensors = segments_tensors.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict all tokens
|
||||
with torch.no_grad():
|
||||
predictions = model(tokens_tensor, segments_tensors)
|
||||
|
||||
# confirm we were able to predict 'henson'
|
||||
predicted_index = torch.argmax(predictions[0, masked_index]).item()
|
||||
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
|
||||
assert predicted_token == 'henson'
|
||||
|
||||
OpenAI GPT
|
||||
^^^^^^^^^^
|
||||
|
||||
Here is a quick-start example using ``OpenAIGPTTokenizer``\ , ``OpenAIGPTModel`` and ``OpenAIGPTLMHeadModel`` class with OpenAI's pre-trained model. See the `doc section <./model_doc/overview.html>`_ for all the details on these classes.
|
||||
|
||||
First let's prepare a tokenized input with ``OpenAIGPTTokenizer``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import torch
|
||||
from pytorch_transformers import OpenAIGPTTokenizer, OpenAIGPTModel, OpenAIGPTLMHeadModel
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary)
|
||||
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
|
||||
|
||||
# Tokenized input
|
||||
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
|
||||
tokenized_text = tokenizer.tokenize(text)
|
||||
|
||||
# Convert token to vocabulary indices
|
||||
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
|
||||
|
||||
# Convert inputs to PyTorch tensors
|
||||
tokens_tensor = torch.tensor([indexed_tokens])
|
||||
|
||||
Let's see how to use ``OpenAIGPTModel`` to get hidden states
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = OpenAIGPTModel.from_pretrained('openai-gpt')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
hidden_states = model(tokens_tensor)
|
||||
|
||||
And how to use ``OpenAIGPTLMHeadModel``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor = tokens_tensor.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict all tokens
|
||||
with torch.no_grad():
|
||||
predictions = model(tokens_tensor)
|
||||
|
||||
# get the predicted last token
|
||||
predicted_index = torch.argmax(predictions[0, -1, :]).item()
|
||||
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
|
||||
assert predicted_token == '.</w>'
|
||||
|
||||
And how to use ``OpenAIGPTDoubleHeadsModel``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = OpenAIGPTDoubleHeadsModel.from_pretrained('openai-gpt')
|
||||
model.eval()
|
||||
|
||||
# Prepare tokenized input
|
||||
text1 = "Who was Jim Henson ? Jim Henson was a puppeteer"
|
||||
text2 = "Who was Jim Henson ? Jim Henson was a mysterious young man"
|
||||
tokenized_text1 = tokenizer.tokenize(text1)
|
||||
tokenized_text2 = tokenizer.tokenize(text2)
|
||||
indexed_tokens1 = tokenizer.convert_tokens_to_ids(tokenized_text1)
|
||||
indexed_tokens2 = tokenizer.convert_tokens_to_ids(tokenized_text2)
|
||||
tokens_tensor = torch.tensor([[indexed_tokens1, indexed_tokens2]])
|
||||
mc_token_ids = torch.LongTensor([[len(tokenized_text1)-1, len(tokenized_text2)-1]])
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
lm_logits, multiple_choice_logits = model(tokens_tensor, mc_token_ids)
|
||||
|
||||
Transformer-XL
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Here is a quick-start example using ``TransfoXLTokenizer``\ , ``TransfoXLModel`` and ``TransfoXLModelLMHeadModel`` class with the Transformer-XL model pre-trained on WikiText-103. See the `doc section <./model_doc/overview.html>`_ for all the details on these classes.
|
||||
|
||||
First let's prepare a tokenized input with ``TransfoXLTokenizer``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import torch
|
||||
from pytorch_transformers import TransfoXLTokenizer, TransfoXLModel, TransfoXLLMHeadModel
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary from wikitext 103)
|
||||
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
|
||||
|
||||
# Tokenized input
|
||||
text_1 = "Who was Jim Henson ?"
|
||||
text_2 = "Jim Henson was a puppeteer"
|
||||
tokenized_text_1 = tokenizer.tokenize(text_1)
|
||||
tokenized_text_2 = tokenizer.tokenize(text_2)
|
||||
|
||||
# Convert token to vocabulary indices
|
||||
indexed_tokens_1 = tokenizer.convert_tokens_to_ids(tokenized_text_1)
|
||||
indexed_tokens_2 = tokenizer.convert_tokens_to_ids(tokenized_text_2)
|
||||
|
||||
# Convert inputs to PyTorch tensors
|
||||
tokens_tensor_1 = torch.tensor([indexed_tokens_1])
|
||||
tokens_tensor_2 = torch.tensor([indexed_tokens_2])
|
||||
|
||||
Let's see how to use ``TransfoXLModel`` to get hidden states
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = TransfoXLModel.from_pretrained('transfo-xl-wt103')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor_1 = tokens_tensor_1.to('cuda')
|
||||
tokens_tensor_2 = tokens_tensor_2.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
with torch.no_grad():
|
||||
# Predict hidden states features for each layer
|
||||
hidden_states_1, mems_1 = model(tokens_tensor_1)
|
||||
# We can re-use the memory cells in a subsequent call to attend a longer context
|
||||
hidden_states_2, mems_2 = model(tokens_tensor_2, mems=mems_1)
|
||||
|
||||
And how to use ``TransfoXLLMHeadModel``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = TransfoXLLMHeadModel.from_pretrained('transfo-xl-wt103')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor_1 = tokens_tensor_1.to('cuda')
|
||||
tokens_tensor_2 = tokens_tensor_2.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
with torch.no_grad():
|
||||
# Predict all tokens
|
||||
predictions_1, mems_1 = model(tokens_tensor_1)
|
||||
# We can re-use the memory cells in a subsequent call to attend a longer context
|
||||
predictions_2, mems_2 = model(tokens_tensor_2, mems=mems_1)
|
||||
|
||||
# get the predicted last token
|
||||
predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
|
||||
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
|
||||
assert predicted_token == 'who'
|
||||
|
||||
OpenAI GPT-2
|
||||
^^^^^^^^^^^^
|
||||
|
||||
Here is a quick-start example using ``GPT2Tokenizer``\ , ``GPT2Model`` and ``GPT2LMHeadModel`` class with OpenAI's pre-trained model. See the `doc section <./model_doc/overview.html>`_ for all the details on these classes.
|
||||
|
||||
First let's prepare a tokenized input with ``GPT2Tokenizer``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import torch
|
||||
from pytorch_transformers import GPT2Tokenizer, GPT2Model, GPT2LMHeadModel
|
||||
|
||||
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
|
||||
import logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
# Load pre-trained model tokenizer (vocabulary)
|
||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||
|
||||
# Encode some inputs
|
||||
text_1 = "Who was Jim Henson ?"
|
||||
text_2 = "Jim Henson was a puppeteer"
|
||||
indexed_tokens_1 = tokenizer.encode(text_1)
|
||||
indexed_tokens_2 = tokenizer.encode(text_2)
|
||||
|
||||
# Convert inputs to PyTorch tensors
|
||||
tokens_tensor_1 = torch.tensor([indexed_tokens_1])
|
||||
tokens_tensor_2 = torch.tensor([indexed_tokens_2])
|
||||
|
||||
Let's see how to use ``GPT2Model`` to get hidden states
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = GPT2Model.from_pretrained('gpt2')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor_1 = tokens_tensor_1.to('cuda')
|
||||
tokens_tensor_2 = tokens_tensor_2.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
hidden_states_1, past = model(tokens_tensor_1)
|
||||
# past can be used to reuse precomputed hidden state in a subsequent predictions
|
||||
# (see beam-search examples in the run_gpt2.py example).
|
||||
hidden_states_2, past = model(tokens_tensor_2, past=past)
|
||||
|
||||
And how to use ``GPT2LMHeadModel``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = GPT2LMHeadModel.from_pretrained('gpt2')
|
||||
model.eval()
|
||||
|
||||
# If you have a GPU, put everything on cuda
|
||||
tokens_tensor_1 = tokens_tensor_1.to('cuda')
|
||||
tokens_tensor_2 = tokens_tensor_2.to('cuda')
|
||||
model.to('cuda')
|
||||
|
||||
# Predict all tokens
|
||||
with torch.no_grad():
|
||||
predictions_1, past = model(tokens_tensor_1)
|
||||
# past can be used to reuse precomputed hidden state in a subsequent predictions
|
||||
# (see beam-search examples in the run_gpt2.py example).
|
||||
predictions_2, past = model(tokens_tensor_2, past=past)
|
||||
|
||||
# get the predicted last token
|
||||
predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
|
||||
predicted_token = tokenizer.decode([predicted_index])
|
||||
|
||||
And how to use ``GPT2DoubleHeadsModel``
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Load pre-trained model (weights)
|
||||
model = GPT2DoubleHeadsModel.from_pretrained('gpt2')
|
||||
model.eval()
|
||||
|
||||
# Prepare tokenized input
|
||||
text1 = "Who was Jim Henson ? Jim Henson was a puppeteer"
|
||||
text2 = "Who was Jim Henson ? Jim Henson was a mysterious young man"
|
||||
tokenized_text1 = tokenizer.tokenize(text1)
|
||||
tokenized_text2 = tokenizer.tokenize(text2)
|
||||
indexed_tokens1 = tokenizer.convert_tokens_to_ids(tokenized_text1)
|
||||
indexed_tokens2 = tokenizer.convert_tokens_to_ids(tokenized_text2)
|
||||
tokens_tensor = torch.tensor([[indexed_tokens1, indexed_tokens2]])
|
||||
mc_token_ids = torch.LongTensor([[len(tokenized_text1)-1, len(tokenized_text2)-1]])
|
||||
|
||||
# Predict hidden states features for each layer
|
||||
with torch.no_grad():
|
||||
lm_logits, multiple_choice_logits, past = model(tokens_tensor, mc_token_ids)
|
Loading…
Reference in New Issue
Block a user