@@ -42,12 +42,12 @@ CTRL モデルは、Nitish Shirish Keskar*、Bryan McCann*、Lav R. Varshney、C
モデルベースのソース帰属を介して。*
このモデルは、[keskarnitishr](https://huggingface.co/keskarnitishr) によって提供されました。元のコードが見つかる
-[こちら](https://github.com/salesforce/ctrl)。
+[こちら](https://github.com/salesforce/Salesforce/ctrl)。
## Usage tips
- CTRL は制御コードを利用してテキストを生成します。生成を特定の単語や文で開始する必要があります。
- またはリンクして一貫したテキストを生成します。 [元の実装](https://github.com/salesforce/ctrl) を参照してください。
+ またはリンクして一貫したテキストを生成します。 [元の実装](https://github.com/salesforce/Salesforce/ctrl) を参照してください。
詳しくは。
- CTRL は絶対位置埋め込みを備えたモデルであるため、通常は入力を右側にパディングすることをお勧めします。
左。
diff --git a/docs/source/ja/model_doc/dialogpt.md b/docs/source/ja/model_doc/dialogpt.md
index 82d6f8481af..22ce0c9a099 100644
--- a/docs/source/ja/model_doc/dialogpt.md
+++ b/docs/source/ja/model_doc/dialogpt.md
@@ -52,6 +52,6 @@ OpenAI GPT-2に従って、マルチターン対話セッションを長いテ
-DialoGPT のアーキテクチャは GPT2 モデルに基づいています。API リファレンスと例については、[GPT2 のドキュメント ページ](gpt2) を参照してください。
+DialoGPT のアーキテクチャは GPT2 モデルに基づいています。API リファレンスと例については、[GPT2 のドキュメント ページ](openai-community/gpt2) を参照してください。
diff --git a/docs/source/ja/model_memory_anatomy.md b/docs/source/ja/model_memory_anatomy.md
index 52374d58f98..5f09489b7f7 100644
--- a/docs/source/ja/model_memory_anatomy.md
+++ b/docs/source/ja/model_memory_anatomy.md
@@ -88,14 +88,14 @@ GPU memory occupied: 1343 MB.
## Load Model
-まず、`bert-large-uncased` モデルを読み込みます。モデルの重みを直接GPUに読み込むことで、重みだけがどれだけのスペースを使用しているかを確認できます。
+まず、`google-bert/bert-large-uncased` モデルを読み込みます。モデルの重みを直接GPUに読み込むことで、重みだけがどれだけのスペースを使用しているかを確認できます。
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased").to("cuda")
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-large-uncased").to("cuda")
>>> print_gpu_utilization()
GPU memory occupied: 2631 MB.
```
diff --git a/docs/source/ja/model_sharing.md b/docs/source/ja/model_sharing.md
index 14e0c2e7a85..aa8f7a3d1e3 100644
--- a/docs/source/ja/model_sharing.md
+++ b/docs/source/ja/model_sharing.md
@@ -254,7 +254,7 @@ Hugging Faceプロフィールに移動すると、新しく作成したモデ
* 手動で`README.md`ファイルを作成およびアップロードする。
* モデルリポジトリ内の**Edit model card**ボタンをクリックする。
-モデルカードに含めるべき情報の例については、DistilBert [モデルカード](https://huggingface.co/distilbert-base-uncased)をご覧ください。`README.md`ファイルで制御できる他のオプション、例えばモデルの炭素フットプリントやウィジェットの例などについての詳細は、[こちらのドキュメンテーション](https://huggingface.co/docs/hub/models-cards)を参照してください。
+モデルカードに含めるべき情報の例については、DistilBert [モデルカード](https://huggingface.co/distilbert/distilbert-base-uncased)をご覧ください。`README.md`ファイルで制御できる他のオプション、例えばモデルの炭素フットプリントやウィジェットの例などについての詳細は、[こちらのドキュメンテーション](https://huggingface.co/docs/hub/models-cards)を参照してください。
diff --git a/docs/source/ja/multilingual.md b/docs/source/ja/multilingual.md
index 86dabb94633..39524195f88 100644
--- a/docs/source/ja/multilingual.md
+++ b/docs/source/ja/multilingual.md
@@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
[[open-in-colab]]
-🤗 Transformers にはいくつかの多言語モデルがあり、それらの推論の使用方法は単一言語モデルとは異なります。ただし、多言語モデルの使用方法がすべて異なるわけではありません。 [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) などの一部のモデルは、単一言語モデルと同様に使用できます。 このガイドでは、推論のために使用方法が異なる多言語モデルをどのように使うかを示します。
+🤗 Transformers にはいくつかの多言語モデルがあり、それらの推論の使用方法は単一言語モデルとは異なります。ただし、多言語モデルの使用方法がすべて異なるわけではありません。 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased) などの一部のモデルは、単一言語モデルと同様に使用できます。 このガイドでは、推論のために使用方法が異なる多言語モデルをどのように使うかを示します。
## XLM
@@ -28,24 +28,24 @@ XLM には10の異なるチェックポイントがあり、そのうちの1つ
次の XLM モデルは、言語の埋め込みを使用して、推論で使用される言語を指定します。
-- `xlm-mlm-ende-1024` (マスク化された言語モデリング、英語-ドイツ語)
-- `xlm-mlm-enfr-1024` (マスク化された言語モデリング、英語-フランス語)
-- `xlm-mlm-enro-1024` (マスク化された言語モデリング、英語-ルーマニア語)
-- `xlm-mlm-xnli15-1024` (マスク化された言語モデリング、XNLI 言語)
-- `xlm-mlm-tlm-xnli15-1024` (マスク化された言語モデリング + 翻訳 + XNLI 言語)
-- `xlm-clm-enfr-1024` (因果言語モデリング、英語-フランス語)
-- `xlm-clm-ende-1024` (因果言語モデリング、英語-ドイツ語)
+- `FacebookAI/xlm-mlm-ende-1024` (マスク化された言語モデリング、英語-ドイツ語)
+- `FacebookAI/xlm-mlm-enfr-1024` (マスク化された言語モデリング、英語-フランス語)
+- `FacebookAI/xlm-mlm-enro-1024` (マスク化された言語モデリング、英語-ルーマニア語)
+- `FacebookAI/xlm-mlm-xnli15-1024` (マスク化された言語モデリング、XNLI 言語)
+- `FacebookAI/xlm-mlm-tlm-xnli15-1024` (マスク化された言語モデリング + 翻訳 + XNLI 言語)
+- `FacebookAI/xlm-clm-enfr-1024` (因果言語モデリング、英語-フランス語)
+- `FacebookAI/xlm-clm-ende-1024` (因果言語モデリング、英語-ドイツ語)
言語の埋め込みは、モデルに渡される `input_ids` と同じ形状のテンソルとして表されます。 これらのテンソルの値は、使用される言語に依存し、トークナイザーの `lang2id` および `id2lang` 属性によって識別されます。
-この例では、`xlm-clm-enfr-1024` チェックポイントをロードします (因果言語モデリング、英語-フランス語)。
+この例では、`FacebookAI/xlm-clm-enfr-1024` チェックポイントをロードします (因果言語モデリング、英語-フランス語)。
```py
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
->>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
->>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
+>>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
+>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
```
トークナイザーの `lang2id` 属性は、このモデルの言語とその ID を表示します。
@@ -83,8 +83,8 @@ XLM には10の異なるチェックポイントがあり、そのうちの1つ
次の XLM モデルは、推論中に言語の埋め込みを必要としません。
-- `xlm-mlm-17-1280` (マスク化された言語モデリング、17の言語)
-- `xlm-mlm-100-1280` (マスク化された言語モデリング、100の言語)
+- `FacebookAI/xlm-mlm-17-1280` (マスク化された言語モデリング、17の言語)
+- `FacebookAI/xlm-mlm-100-1280` (マスク化された言語モデリング、100の言語)
これらのモデルは、以前の XLM チェックポイントとは異なり、一般的な文の表現に使用されます。
@@ -92,8 +92,8 @@ XLM には10の異なるチェックポイントがあり、そのうちの1つ
以下の BERT モデルは、多言語タスクに使用できます。
-- `bert-base-multilingual-uncased` (マスク化された言語モデリング + 次の文の予測、102の言語)
-- `bert-base-multilingual-cased` (マスク化された言語モデリング + 次の文の予測、104の言語)
+- `google-bert/bert-base-multilingual-uncased` (マスク化された言語モデリング + 次の文の予測、102の言語)
+- `google-bert/bert-base-multilingual-cased` (マスク化された言語モデリング + 次の文の予測、104の言語)
これらのモデルは、推論中に言語の埋め込みを必要としません。 文脈から言語を識別し、それに応じて推測する必要があります。
@@ -101,8 +101,8 @@ XLM には10の異なるチェックポイントがあり、そのうちの1つ
次の XLM-RoBERTa モデルは、多言語タスクに使用できます。
-- `xlm-roberta-base` (マスク化された言語モデリング、100の言語)
-- `xlm-roberta-large` (マスク化された言語モデリング、100の言語)
+- `FacebookAI/xlm-roberta-base` (マスク化された言語モデリング、100の言語)
+- `FacebookAI/xlm-roberta-large` (マスク化された言語モデリング、100の言語)
XLM-RoBERTa は、100の言語で新しく作成およびクリーニングされた2.5 TB の CommonCrawl データでトレーニングされました。 これは、分類、シーケンスのラベル付け、質問応答などのダウンストリームタスクで、mBERT や XLM などの以前にリリースされた多言語モデルを大幅に改善します。
diff --git a/docs/source/ja/perf_hardware.md b/docs/source/ja/perf_hardware.md
index 2ebc0eef9b6..0d104ed3ddb 100644
--- a/docs/source/ja/perf_hardware.md
+++ b/docs/source/ja/perf_hardware.md
@@ -140,7 +140,7 @@ NVLinkを使用すると、トレーニングが約23%速く完了すること
# DDP w/ NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
@@ -149,7 +149,7 @@ rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
# DDP w/o NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
diff --git a/docs/source/ja/perf_train_cpu.md b/docs/source/ja/perf_train_cpu.md
index b22d7b96aa1..bf623d13136 100644
--- a/docs/source/ja/perf_train_cpu.md
+++ b/docs/source/ja/perf_train_cpu.md
@@ -49,7 +49,7 @@ TrainerでIPEXの自動混合精度を有効にするには、ユーザーはト
- CPU上でBF16自動混合精度を使用してIPEXでトレーニングを行う場合:
python run_qa.py \
---model_name_or_path bert-base-uncased \
+--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
diff --git a/docs/source/ja/perf_train_cpu_many.md b/docs/source/ja/perf_train_cpu_many.md
index a15cb5d4900..26da32f5772 100644
--- a/docs/source/ja/perf_train_cpu_many.md
+++ b/docs/source/ja/perf_train_cpu_many.md
@@ -100,7 +100,7 @@ IPEXは、Float32およびBFloat16の両方でCPUトレーニングのパフォ
export MASTER_ADDR=127.0.0.1
mpirun -n 2 -genv OMP_NUM_THREADS=23 \
python3 run_qa.py \
- --model_name_or_path bert-large-uncased \
+ --model_name_or_path google-bert/bert-large-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -134,7 +134,7 @@ node0では、各ノードのIPアドレスを含む構成ファイルを作成
mpirun -f hostfile -n 4 -ppn 2 \
-genv OMP_NUM_THREADS=23 \
python3 run_qa.py \
- --model_name_or_path bert-large-uncased \
+ --model_name_or_path google-bert/bert-large-uncased \
--dataset_name squad \
--do_train \
--do_eval \
diff --git a/docs/source/ja/perf_train_gpu_many.md b/docs/source/ja/perf_train_gpu_many.md
index 44186bba796..d85165d0c54 100644
--- a/docs/source/ja/perf_train_gpu_many.md
+++ b/docs/source/ja/perf_train_gpu_many.md
@@ -136,7 +136,7 @@ DPとDDPの他にも違いがありますが、この議論には関係ありま
# DP
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 110.5948, 'train_samples_per_second': 1.808, 'epoch': 0.69}
@@ -144,7 +144,7 @@ python examples/pytorch/language-modeling/run_clm.py \
# DDP w/ NVlink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 101.9003, 'train_samples_per_second': 1.963, 'epoch': 0.69}
@@ -152,7 +152,7 @@ torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
# DDP w/o NVlink
rm -r /tmp/test-clm; NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 131.4367, 'train_samples_per_second': 1.522, 'epoch': 0.69}
diff --git a/docs/source/ja/perf_train_gpu_one.md b/docs/source/ja/perf_train_gpu_one.md
index 215c0914d1f..2c2bc540e48 100644
--- a/docs/source/ja/perf_train_gpu_one.md
+++ b/docs/source/ja/perf_train_gpu_one.md
@@ -193,7 +193,7 @@ AdamWオプティマイザの代替手段について詳しく見てみましょ
1. [`Trainer`]で使用可能な`adafactor`
2. Trainerで使用可能な`adamw_bnb_8bit`は、デモンストレーション用に以下でサードパーティの統合が提供されています。
-比較のため、3Bパラメータモデル(例:「t5-3b」)の場合:
+比較のため、3Bパラメータモデル(例:「google-t5/t5-3b」)の場合:
* 標準のAdamWオプティマイザは、各パラメータに8バイトを使用するため、24GBのGPUメモリが必要です(8 * 3 => 24GB)。
* Adafactorオプティマイザは12GB以上必要です。各パラメータにわずか4バイト以上を使用するため、4 * 3と少し余分になります。
* 8ビットのBNB量子化オプティマイザは、すべてのオプティマイザの状態が量子化されている場合、わずか6GBしか使用しません。
diff --git a/docs/source/ja/perplexity.md b/docs/source/ja/perplexity.md
index aa88a7a212f..368a301ec3a 100644
--- a/docs/source/ja/perplexity.md
+++ b/docs/source/ja/perplexity.md
@@ -56,7 +56,7 @@ GPT-2を使用してこのプロセスをデモンストレーションしてみ
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
device = "cuda"
-model_id = "gpt2-large"
+model_id = "openai-community/gpt2-large"
model = GPT2LMHeadModel.from_pretrained(model_id).to(device)
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)
```
diff --git a/docs/source/ja/pipeline_tutorial.md b/docs/source/ja/pipeline_tutorial.md
index 8892a7c4b87..354e2a2be38 100644
--- a/docs/source/ja/pipeline_tutorial.md
+++ b/docs/source/ja/pipeline_tutorial.md
@@ -165,7 +165,7 @@ def data():
yield f"My example {i}"
-pipe = pipeline(model="gpt2", device=0)
+pipe = pipeline(model="openai-community/gpt2", device=0)
generated_characters = 0
for out in pipe(data()):
generated_characters += len(out[0]["generated_text"])
diff --git a/docs/source/ja/pipeline_webserver.md b/docs/source/ja/pipeline_webserver.md
index c7dd3363748..3b35a01490d 100644
--- a/docs/source/ja/pipeline_webserver.md
+++ b/docs/source/ja/pipeline_webserver.md
@@ -36,7 +36,7 @@ async def homepage(request):
async def server_loop(q):
- pipe = pipeline(model="bert-base-uncased")
+ pipe = pipeline(model="google-bert/bert-base-uncased")
while True:
(string, response_q) = await q.get()
out = pipe(string)
diff --git a/docs/source/ja/preprocessing.md b/docs/source/ja/preprocessing.md
index b8fad2a0d21..ea0b98df028 100644
--- a/docs/source/ja/preprocessing.md
+++ b/docs/source/ja/preprocessing.md
@@ -59,7 +59,7 @@ pip install datasets
```python
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
```
次に、テキストをトークナイザに渡します:
diff --git a/docs/source/ja/quicktour.md b/docs/source/ja/quicktour.md
index e16b2272c26..3bec2f827a4 100644
--- a/docs/source/ja/quicktour.md
+++ b/docs/source/ja/quicktour.md
@@ -83,7 +83,7 @@ pip install tensorflow
>>> classifier = pipeline("sentiment-analysis")
```
-[`pipeline`]は、感情分析のためのデフォルトの[事前学習済みモデル](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)とトークナイザをダウンロードしてキャッシュし、使用できるようになります。
+[`pipeline`]は、感情分析のためのデフォルトの[事前学習済みモデル](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)とトークナイザをダウンロードしてキャッシュし、使用できるようになります。
これで、`classifier`を対象のテキストに使用できます:
```python
@@ -411,7 +411,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```python
>>> from transformers import AutoConfig
->>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
+>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
@@ -452,7 +452,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoModelForSequenceClassification
- >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. [`TrainingArguments`]には、変更できるモデルのハイパーパラメータが含まれており、学習率、バッチサイズ、トレーニングエポック数などが変更できます。指定しない場合、デフォルト値が使用されます:
@@ -474,7 +474,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
4. データセットをロードする:
@@ -547,7 +547,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import TFAutoModelForSequenceClassification
- >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. トークナイザ、画像プロセッサ、特徴量抽出器、またはプロセッサのような前処理クラスをロードします:
@@ -555,7 +555,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
3. データセットをトークナイズするための関数を作成します:
diff --git a/docs/source/ja/run_scripts.md b/docs/source/ja/run_scripts.md
index a7cc89d1348..af99d1c6da9 100644
--- a/docs/source/ja/run_scripts.md
+++ b/docs/source/ja/run_scripts.md
@@ -92,12 +92,12 @@ pip install -r requirements.txt
-この例のスクリプトは、🤗 [Datasets](https://huggingface.co/docs/datasets/) ライブラリからデータセットをダウンロードし、前処理を行います。次に、[Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) を使用して要約をサポートするアーキテクチャ上でデータセットをファインチューニングします。以下の例では、[CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) データセット上で [T5-small](https://huggingface.co/t5-small) をファインチューニングする方法が示されています。T5 モデルは、そのトレーニング方法に起因して追加の `source_prefix` 引数が必要です。このプロンプトにより、T5 はこれが要約タスクであることを知ることができます。
+この例のスクリプトは、🤗 [Datasets](https://huggingface.co/docs/datasets/) ライブラリからデータセットをダウンロードし、前処理を行います。次に、[Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) を使用して要約をサポートするアーキテクチャ上でデータセットをファインチューニングします。以下の例では、[CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) データセット上で [T5-small](https://huggingface.co/google-t5/t5-small) をファインチューニングする方法が示されています。T5 モデルは、そのトレーニング方法に起因して追加の `source_prefix` 引数が必要です。このプロンプトにより、T5 はこれが要約タスクであることを知ることができます。
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -112,12 +112,12 @@ python examples/pytorch/summarization/run_summarization.py \
-この例のスクリプトは、🤗 [Datasets](https://huggingface.co/docs/datasets/) ライブラリからデータセットをダウンロードして前処理します。その後、スクリプトは要約をサポートするアーキテクチャ上で Keras を使用してデータセットをファインチューニングします。以下の例では、[T5-small](https://huggingface.co/t5-small) を [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) データセットでファインチューニングする方法を示しています。T5 モデルは、そのトレーニング方法に起因して追加の `source_prefix` 引数が必要です。このプロンプトは、T5 にこれが要約タスクであることを知らせます。
+この例のスクリプトは、🤗 [Datasets](https://huggingface.co/docs/datasets/) ライブラリからデータセットをダウンロードして前処理します。その後、スクリプトは要約をサポートするアーキテクチャ上で Keras を使用してデータセットをファインチューニングします。以下の例では、[T5-small](https://huggingface.co/google-t5/t5-small) を [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) データセットでファインチューニングする方法を示しています。T5 モデルは、そのトレーニング方法に起因して追加の `source_prefix` 引数が必要です。このプロンプトは、T5 にこれが要約タスクであることを知らせます。
```bash
python examples/tensorflow/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -143,7 +143,7 @@ python examples/tensorflow/summarization/run_summarization.py \
torchrun \
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
--fp16 \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -167,7 +167,7 @@ Tensor Processing Units (TPUs)は、パフォーマンスを加速させるた
```bash
python xla_spawn.py --num_cores 8 \
summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -186,7 +186,7 @@ python xla_spawn.py --num_cores 8 \
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -226,7 +226,7 @@ Now you are ready to launch the training:
```bash
accelerate launch run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -245,7 +245,7 @@ accelerate launch run_summarization_no_trainer.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -270,7 +270,7 @@ python examples/pytorch/summarization/run_summarization.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--max_train_samples 50 \
--max_eval_samples 50 \
--max_predict_samples 50 \
@@ -300,7 +300,7 @@ examples/pytorch/summarization/run_summarization.py -h
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -318,7 +318,7 @@ python examples/pytorch/summarization/run_summarization.py
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -350,7 +350,7 @@ huggingface-cli login
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
diff --git a/docs/source/ja/serialization.md b/docs/source/ja/serialization.md
index da23b63e652..3e9d81180de 100644
--- a/docs/source/ja/serialization.md
+++ b/docs/source/ja/serialization.md
@@ -57,10 +57,10 @@ pip install optimum[exporters]
optimum-cli export onnx --help
```
-🤗 Hubからモデルのチェックポイントをエクスポートするには、例えば `distilbert-base-uncased-distilled-squad` を使いたい場合、以下のコマンドを実行してください:
+🤗 Hubからモデルのチェックポイントをエクスポートするには、例えば `distilbert/distilbert-base-uncased-distilled-squad` を使いたい場合、以下のコマンドを実行してください:
```bash
-optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
+optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
```
進行状況を示し、結果の `model.onnx` が保存される場所を表示するログは、以下のように表示されるはずです:
@@ -147,7 +147,7 @@ pip install transformers[onnx]
`transformers.onnx`パッケージをPythonモジュールとして使用して、事前に用意された設定を使用してチェックポイントをエクスポートする方法は以下の通りです:
```bash
-python -m transformers.onnx --model=distilbert-base-uncased onnx/
+python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
この方法は、`--model`引数で定義されたチェックポイントのONNXグラフをエクスポートします。🤗 Hubのいずれかのチェックポイントまたはローカルに保存されたチェックポイントを渡すことができます。エクスポートされた`model.onnx`ファイルは、ONNX標準をサポートする多くのアクセラレータで実行できます。例えば、ONNX Runtimeを使用してモデルを読み込んで実行する方法は以下の通りです:
@@ -157,7 +157,7 @@ python -m transformers.onnx --model=distilbert-base-uncased onnx/
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
diff --git a/docs/source/ja/task_summary.md b/docs/source/ja/task_summary.md
index 74c3f143641..0069f6afaf3 100644
--- a/docs/source/ja/task_summary.md
+++ b/docs/source/ja/task_summary.md
@@ -281,7 +281,7 @@ score: 0.9327, start: 30, end: 54, answer: huggingface/transformers
>>> from transformers import pipeline
>>> text = "translate English to French: Hugging Face is a community-based open-source platform for machine learning."
->>> translator = pipeline(task="translation", model="t5-small")
+>>> translator = pipeline(task="translation", model="google-t5/t5-small")
>>> translator(text)
[{'translation_text': "Hugging Face est une tribune communautaire de l'apprentissage des machines."}]
```
diff --git a/docs/source/ja/tasks/language_modeling.md b/docs/source/ja/tasks/language_modeling.md
index b7ad65c6c4a..835a0d54ea4 100644
--- a/docs/source/ja/tasks/language_modeling.md
+++ b/docs/source/ja/tasks/language_modeling.md
@@ -32,7 +32,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. [ELI5](https:/) の [r/askscience](https://www.reddit.com/r/askscience/) サブセットで [DistilGPT2](https://huggingface.co/distilgpt2) を微調整します。 /huggingface.co/datasets/eli5) データセット。
+1. [ELI5](https:/) の [r/askscience](https://www.reddit.com/r/askscience/) サブセットで [DistilGPT2](https://huggingface.co/distilbert/distilgpt2) を微調整します。 /huggingface.co/datasets/eli5) データセット。
2. 微調整したモデルを推論に使用します。
@@ -112,7 +112,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
```
上の例からわかるように、`text`フィールドは実際には`answers`内にネストされています。つまり、次のことが必要になります。
@@ -234,7 +234,7 @@ Apply the `group_texts` function over the entire dataset:
```py
>>> from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
->>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
この時点で残っている手順は次の 3 つだけです。
@@ -298,7 +298,7 @@ TensorFlow でモデルを微調整するには、オプティマイザー関数
```py
>>> from transformers import TFAutoModelForCausalLM
->>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> model = TFAutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] を使用して、データセットを `tf.data.Dataset` 形式に変換します。
diff --git a/docs/source/ja/tasks/masked_language_modeling.md b/docs/source/ja/tasks/masked_language_modeling.md
index 3cf6db70f2e..b0fff72f9b0 100644
--- a/docs/source/ja/tasks/masked_language_modeling.md
+++ b/docs/source/ja/tasks/masked_language_modeling.md
@@ -26,7 +26,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. [ELI5](https://huggingface.co/distilroberta-base) の [r/askscience](https://www.reddit.com/r/askscience/) サブセットで [DistilRoBERTa](https://huggingface.co/distilroberta-base) を微調整します。 ://huggingface.co/datasets/eli5) データセット。
+1. [ELI5](https://huggingface.co/distilbert/distilroberta-base) の [r/askscience](https://www.reddit.com/r/askscience/) サブセットで [DistilRoBERTa](https://huggingface.co/distilbert/distilroberta-base) を微調整します。 ://huggingface.co/datasets/eli5) データセット。
2. 微調整したモデルを推論に使用します。
@@ -101,7 +101,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base")
```
上の例からわかるように、`text`フィールドは実際には`answers`内にネストされています。これは、次のことを行う必要があることを意味します
@@ -219,7 +219,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoModelForMaskedLM
->>> model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
+>>> model = AutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
この時点で残っている手順は次の 3 つだけです。
@@ -287,7 +287,7 @@ TensorFlow でモデルを微調整するには、オプティマイザー関数
```py
>>> from transformers import TFAutoModelForMaskedLM
->>> model = TFAutoModelForMaskedLM.from_pretrained("distilroberta-base")
+>>> model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] を使用して、データセットを `tf.data.Dataset` 形式に変換します。
diff --git a/docs/source/ja/tasks/multiple_choice.md b/docs/source/ja/tasks/multiple_choice.md
index 6b634710550..bfe5f388cb4 100644
--- a/docs/source/ja/tasks/multiple_choice.md
+++ b/docs/source/ja/tasks/multiple_choice.md
@@ -22,7 +22,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. [SWAG](https://huggingface.co/datasets/swag) データセットの「通常」構成で [BERT](https://huggingface.co/bert-base-uncased) を微調整して、最適なデータセットを選択します複数の選択肢と何らかのコンテキストを考慮して回答します。
+1. [SWAG](https://huggingface.co/datasets/swag) データセットの「通常」構成で [BERT](https://huggingface.co/google-bert/bert-base-uncased) を微調整して、最適なデータセットを選択します複数の選択肢と何らかのコンテキストを考慮して回答します。
2. 微調整したモデルを推論に使用します。
@@ -90,7 +90,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
作成する前処理関数は次のことを行う必要があります。
@@ -254,7 +254,7 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
```py
>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
->>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
この時点で残っている手順は次の 3 つだけです。
@@ -318,7 +318,7 @@ TensorFlow でモデルを微調整するには、オプティマイザー関数
```py
>>> from transformers import TFAutoModelForMultipleChoice
->>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] を使用して、データセットを `tf.data.Dataset` 形式に変換します。
diff --git a/docs/source/ja/tasks/prompting.md b/docs/source/ja/tasks/prompting.md
index 1c85bd7a20a..bd66e751ee6 100644
--- a/docs/source/ja/tasks/prompting.md
+++ b/docs/source/ja/tasks/prompting.md
@@ -76,7 +76,7 @@ Falcon、LLaMA などの大規模言語モデルは、事前にトレーニン
>>> torch.manual_seed(0) # doctest: +IGNORE_RESULT
->>> generator = pipeline('text-generation', model = 'gpt2')
+>>> generator = pipeline('text-generation', model = 'openai-community/gpt2')
>>> prompt = "Hello, I'm a language model"
>>> generator(prompt, max_length = 30)
diff --git a/docs/source/ja/tasks/question_answering.md b/docs/source/ja/tasks/question_answering.md
index 9c2ca869ffc..54df687c2f0 100644
--- a/docs/source/ja/tasks/question_answering.md
+++ b/docs/source/ja/tasks/question_answering.md
@@ -27,7 +27,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. 抽出的質問応答用に [SQuAD](https://huggingface.co/datasets/squad) データセット上の [DistilBERT](https://huggingface.co/distilbert-base-uncased) を微調整します。
+1. 抽出的質問応答用に [SQuAD](https://huggingface.co/datasets/squad) データセット上の [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) を微調整します。
2. 微調整したモデルを推論に使用します。
@@ -102,7 +102,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
質問応答タスクに特有の、注意すべき前処理手順がいくつかあります。
@@ -208,7 +208,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
->>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
この時点で残っている手順は次の 3 つだけです。
@@ -276,7 +276,7 @@ TensorFlow でモデルを微調整するには、オプティマイザー関数
```py
>>> from transformers import TFAutoModelForQuestionAnswering
->>> model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
+>>> model = TFAutoModelForQuestionAnswering("distilbert/distilbert-base-uncased")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] を使用して、データセットを `tf.data.Dataset` 形式に変換します。
diff --git a/docs/source/ja/tasks/summarization.md b/docs/source/ja/tasks/summarization.md
index 47b04888d48..a4b012d712f 100644
--- a/docs/source/ja/tasks/summarization.md
+++ b/docs/source/ja/tasks/summarization.md
@@ -27,7 +27,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. 抽象的な要約のために、[BillSum](https://huggingface.co/datasets/billsum) データセットのカリフォルニア州請求書サブセットで [T5](https://huggingface.co/t5-small) を微調整します。
+1. 抽象的な要約のために、[BillSum](https://huggingface.co/datasets/billsum) データセットのカリフォルニア州請求書サブセットで [T5](https://huggingface.co/google-t5/t5-small) を微調整します。
2. 微調整したモデルを推論に使用します。
@@ -92,7 +92,7 @@ pip install transformers datasets evaluate rouge_score
```py
>>> from transformers import AutoTokenizer
->>> checkpoint = "t5-small"
+>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```
diff --git a/docs/source/ja/tasks/token_classification.md b/docs/source/ja/tasks/token_classification.md
index a4b759d6b5b..2b650c4a844 100644
--- a/docs/source/ja/tasks/token_classification.md
+++ b/docs/source/ja/tasks/token_classification.md
@@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. [WNUT 17](https://huggingface.co/datasets/wnut_17) データセットで [DistilBERT](https://huggingface.co/distilbert-base-uncased) を微調整して、新しいエンティティを検出します。
+1. [WNUT 17](https://huggingface.co/datasets/wnut_17) データセットで [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) を微調整して、新しいエンティティを検出します。
2. 微調整されたモデルを推論に使用します。
@@ -107,7 +107,7 @@ pip install transformers datasets evaluate seqeval
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
上の `tokens`フィールドの例で見たように、入力はすでにトークン化されているようです。しかし、実際には入力はまだトークン化されていないため、単語をサブワードにトークン化するには`is_split_into_words=True` を設定する必要があります。例えば:
@@ -270,7 +270,7 @@ pip install transformers datasets evaluate seqeval
>>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
>>> model = AutoModelForTokenClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
@@ -340,7 +340,7 @@ TensorFlow でモデルを微調整するには、オプティマイザー関数
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] を使用して、データセットを `tf.data.Dataset` 形式に変換します。
diff --git a/docs/source/ja/tasks/translation.md b/docs/source/ja/tasks/translation.md
index 9004a87fcbf..fb2c89f3856 100644
--- a/docs/source/ja/tasks/translation.md
+++ b/docs/source/ja/tasks/translation.md
@@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.
このガイドでは、次の方法を説明します。
-1. [OPUS Books](https://huggingface.co/datasets/opus_books) データセットの英語-フランス語サブセットの [T5](https://huggingface.co/t5-small) を微調整して、英語のテキストを次の形式に翻訳します。フランス語。
+1. [OPUS Books](https://huggingface.co/datasets/opus_books) データセットの英語-フランス語サブセットの [T5](https://huggingface.co/google-t5/t5-small) を微調整して、英語のテキストを次の形式に翻訳します。フランス語。
2. 微調整されたモデルを推論に使用します。
@@ -88,7 +88,7 @@ pip install transformers datasets evaluate sacrebleu
```py
>>> from transformers import AutoTokenizer
->>> checkpoint = "t5-small"
+>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```
diff --git a/docs/source/ja/tf_xla.md b/docs/source/ja/tf_xla.md
index d5d83725372..1f5a2af1a5a 100644
--- a/docs/source/ja/tf_xla.md
+++ b/docs/source/ja/tf_xla.md
@@ -88,8 +88,8 @@ from transformers.utils import check_min_version
check_min_version("4.21.0")
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
# One line to create an XLA generation function
@@ -118,8 +118,8 @@ XLAを有効にした関数(上記の`xla_generate()`など)を初めて実
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
xla_generate = tf.function(model.generate, jit_compile=True)
@@ -139,8 +139,8 @@ import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
xla_generate = tf.function(model.generate, jit_compile=True)
diff --git a/docs/source/ja/tflite.md b/docs/source/ja/tflite.md
index 8ef20a27beb..ad3e9a3f484 100644
--- a/docs/source/ja/tflite.md
+++ b/docs/source/ja/tflite.md
@@ -34,10 +34,10 @@ pip install optimum[exporters-tf]
optimum-cli export tflite --help
```
-🤗 Hubからモデルのチェックポイントをエクスポートするには、例えば `bert-base-uncased` を使用する場合、次のコマンドを実行します:
+🤗 Hubからモデルのチェックポイントをエクスポートするには、例えば `google-bert/bert-base-uncased` を使用する場合、次のコマンドを実行します:
```bash
-optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/
+optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
```
進行状況を示すログが表示され、生成された `model.tflite` が保存された場所も表示されるはずです:
diff --git a/docs/source/ja/tokenizer_summary.md b/docs/source/ja/tokenizer_summary.md
index e17201d7972..448ad9c871a 100644
--- a/docs/source/ja/tokenizer_summary.md
+++ b/docs/source/ja/tokenizer_summary.md
@@ -76,7 +76,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import BertTokenizer
->>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> tokenizer.tokenize("I have a new GPU!")
["i", "have", "a", "new", "gp", "##u", "!"]
```
@@ -88,7 +88,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import XLNetTokenizer
->>> tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")
+>>> tokenizer = XLNetTokenizer.from_pretrained("xlnet/xlnet-base-cased")
>>> tokenizer.tokenize("Don't you love 🤗 Transformers? We sure do.")
["▁Don", "'", "t", "▁you", "▁love", "▁", "🤗", "▁", "Transform", "ers", "?", "▁We", "▁sure", "▁do", "."]
```
diff --git a/docs/source/ja/torchscript.md b/docs/source/ja/torchscript.md
index 99926a0dae8..27d64a625c8 100644
--- a/docs/source/ja/torchscript.md
+++ b/docs/source/ja/torchscript.md
@@ -71,7 +71,7 @@ TorchScriptで`BertModel`をエクスポートするには、`BertConfig`クラ
from transformers import BertModel, BertTokenizer, BertConfig
import torch
-enc = BertTokenizer.from_pretrained("bert-base-uncased")
+enc = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
@@ -106,7 +106,7 @@ model = BertModel(config)
model.eval()
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
-model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
+model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
diff --git a/docs/source/ja/training.md b/docs/source/ja/training.md
index 4e5dbaa77ae..79fbb1b7fb2 100644
--- a/docs/source/ja/training.md
+++ b/docs/source/ja/training.md
@@ -55,7 +55,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> def tokenize_function(examples):
... return tokenizer(examples["text"], padding="max_length", truncation=True)
@@ -91,7 +91,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
@@ -194,7 +194,7 @@ dataset = dataset["train"] # 今のところトレーニング分割のみを
```python
from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# トークナイザはBatchEncodingを返しますが、それをKeras用に辞書に変換します
tokenized_data = dict(tokenized_data)
@@ -210,7 +210,7 @@ from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# モデルをロードしてコンパイルする
-model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
+model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# ファインチューニングには通常、学習率を下げると良いです
model.compile(optimizer=Adam(3e-5)) # 損失関数の指定は不要です!
@@ -332,7 +332,7 @@ torch.cuda.empty_cache()
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
### Optimizer and learning rate scheduler
diff --git a/docs/source/ja/troubleshooting.md b/docs/source/ja/troubleshooting.md
index ece688d46a7..b13b5993171 100644
--- a/docs/source/ja/troubleshooting.md
+++ b/docs/source/ja/troubleshooting.md
@@ -132,7 +132,7 @@ GPUからより良いトレースバックを取得する別のオプション
>>> from transformers import AutoModelForSequenceClassification
>>> import torch
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
>>> model.config.pad_token_id
0
```
@@ -188,8 +188,8 @@ tensor([[ 0.0082, -0.2307],
```py
>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering
->>> processor = AutoProcessor.from_pretrained("gpt2-medium")
->>> model = AutoModelForQuestionAnswering.from_pretrained("gpt2-medium")
+>>> processor = AutoProcessor.from_pretrained("openai-community/gpt2-medium")
+>>> model = AutoModelForQuestionAnswering.from_pretrained("openai-community/gpt2-medium")
ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModelForQuestionAnswering.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...
```
diff --git a/docs/source/ko/add_tensorflow_model.md b/docs/source/ko/add_tensorflow_model.md
index 378f2163b5d..22980b1320c 100644
--- a/docs/source/ko/add_tensorflow_model.md
+++ b/docs/source/ko/add_tensorflow_model.md
@@ -33,7 +33,7 @@ rendered properly in your Markdown viewer.
사용하려는 모델이 이미 해당하는 TensorFlow 아키텍처가 있는지 확실하지 않나요?
-선택한 모델([예](https://huggingface.co/bert-base-uncased/blob/main/config.json#L14))의 `config.json`의 `model_type` 필드를 확인해보세요. 🤗 Transformers의 해당 모델 폴더에는 "modeling_tf"로 시작하는 파일이 있는 경우, 해당 모델에는 해당 TensorFlow 아키텍처([예](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert))가 있다는 의미입니다.
+선택한 모델([예](https://huggingface.co/google-bert/bert-base-uncased/blob/main/config.json#L14))의 `config.json`의 `model_type` 필드를 확인해보세요. 🤗 Transformers의 해당 모델 폴더에는 "modeling_tf"로 시작하는 파일이 있는 경우, 해당 모델에는 해당 TensorFlow 아키텍처([예](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert))가 있다는 의미입니다.
diff --git a/docs/source/ko/autoclass_tutorial.md b/docs/source/ko/autoclass_tutorial.md
index 9ecfd9c2015..e41a2acc7b4 100644
--- a/docs/source/ko/autoclass_tutorial.md
+++ b/docs/source/ko/autoclass_tutorial.md
@@ -21,7 +21,7 @@ rendered properly in your Markdown viewer.
-아키텍처는 모델의 골격을 의미하며 체크포인트는 주어진 아키텍처에 대한 가중치입니다. 예를 들어, [BERT](https://huggingface.co/bert-base-uncased)는 아키텍처이고, `bert-base-uncased`는 체크포인트입니다. 모델은 아키텍처 또는 체크포인트를 의미할 수 있는 일반적인 용어입니다.
+아키텍처는 모델의 골격을 의미하며 체크포인트는 주어진 아키텍처에 대한 가중치입니다. 예를 들어, [BERT](https://huggingface.co/google-bert/bert-base-uncased)는 아키텍처이고, `google-bert/bert-base-uncased`는 체크포인트입니다. 모델은 아키텍처 또는 체크포인트를 의미할 수 있는 일반적인 용어입니다.
@@ -41,7 +41,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
그리고 아래와 같이 입력을 토큰화합니다:
@@ -100,7 +100,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
동일한 체크포인트를 쉽게 재사용하여 다른 작업에 아키텍처를 로드할 수 있습니다:
@@ -108,7 +108,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForTokenClassification
->>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -128,7 +128,7 @@ PyTorch모델의 경우 `from_pretrained()` 메서드는 내부적으로 피클
```py
>>> from transformers import TFAutoModelForSequenceClassification
->>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
쉽게 동일한 체크포인트를 재사용하여 다른 작업에 아키텍처를 로드할 수 있습니다:
@@ -136,7 +136,7 @@ PyTorch모델의 경우 `from_pretrained()` 메서드는 내부적으로 피클
```py
>>> from transformers import TFAutoModelForTokenClassification
->>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
+>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
일반적으로, `AutoTokenizer`클래스와 `TFAutoModelFor` 클래스를 사용하여 미리 학습된 모델 인스턴스를 로드하는 것이 좋습니다. 이렇게 하면 매번 올바른 아키텍처를 로드할 수 있습니다. 다음 [튜토리얼](preprocessing)에서는 새롭게 로드한 토크나이저, 이미지 프로세서, 특징 추출기를 사용하여 미세 튜닝용 데이터 세트를 전처리하는 방법에 대해 알아봅니다.
diff --git a/docs/source/ko/big_models.md b/docs/source/ko/big_models.md
index 17b3d8db61e..3180b51117a 100644
--- a/docs/source/ko/big_models.md
+++ b/docs/source/ko/big_models.md
@@ -41,7 +41,7 @@ rendered properly in your Markdown viewer.
```py
from transformers import AutoModel
-model = AutoModel.from_pretrained("bert-base-cased")
+model = AutoModel.from_pretrained("google-bert/bert-base-cased")
```
[`~PreTrainedModel.save_pretrained`]을 사용하여 모델을 저장하면, 모델의 구성과 가중치가 들어있는 두 개의 파일이 있는 새 폴더가 생성됩니다:
diff --git a/docs/source/ko/community.md b/docs/source/ko/community.md
index 2d12e9de4a2..d50168d7548 100644
--- a/docs/source/ko/community.md
+++ b/docs/source/ko/community.md
@@ -43,8 +43,8 @@ rendered properly in your Markdown viewer.
|[감정 분석을 위해 Roberta 미세 조정하기](https://github.com/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb) | 감정 분석을 위해 Roberta 모델을 미세 조정하는 방법 | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)|
|[질문 생성 모델 평가하기](https://github.com/flexudy-pipe/qugeev) | seq2seq 트랜스포머 모델이 생성한 질문과 이에 대한 답변이 얼마나 정확한가요? | [Pascal Zoleko](https://github.com/zolekode) | [](https://colab.research.google.com/drive/1bpsSqCQU-iw_5nNoRm_crPq6FRuJthq_?usp=sharing)|
|[DistilBERT와 Tensorflow로 텍스트 분류하기](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | 텍스트 분류를 위해 TensorFlow로 DistilBERT를 미세 조정하는 방법 | [Peter Bayerle](https://github.com/peterbayerle) | [](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)|
-|[CNN/Dailail 요약을 위해 인코더-디코더 모델에 BERT 활용하기](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | CNN/Dailail 요약을 위해 *bert-base-uncased* 체크포인트를 활용하여 *EncoderDecoderModel*을 워밍업하는 방법 | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
-|[BBC XSum 요약을 위해 인코더-디코더 모델에 RoBERTa 활용하기](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | BBC/XSum 요약을 위해 *roberta-base* 체크포인트를 활용하여 공유 *EncoderDecoderModel*을 워밍업하는 방법 | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)|
+|[CNN/Dailail 요약을 위해 인코더-디코더 모델에 BERT 활용하기](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | CNN/Dailail 요약을 위해 *google-bert/bert-base-uncased* 체크포인트를 활용하여 *EncoderDecoderModel*을 워밍업하는 방법 | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
+|[BBC XSum 요약을 위해 인코더-디코더 모델에 RoBERTa 활용하기](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | BBC/XSum 요약을 위해 *FacebookAI/roberta-base* 체크포인트를 활용하여 공유 *EncoderDecoderModel*을 워밍업하는 방법 | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)|
|[순차적 질문 답변(SQA)을 위해 TAPAS 미세 조정하기](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb) | *tapas-base* 체크포인트를 활용하여 순차적 질문 답변(SQA) 데이터 세트로 *TapasForQuestionAnswering*을 미세 조정하는 방법 | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb)|
|[표 사실 검사(TabFact)로 TAPAS 평가하기](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb) | 🤗 Datasets와 🤗 Transformer 라이브러리를 함께 사용하여 *tapas-base-finetuned-tabfact* 체크포인트로 미세 조정된 *TapasForSequenceClassification*을 평가하는 방법 | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb)|
|[번역을 위해 mBART 미세 조정하기](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb) | 힌디어에서 영어로 번역하기 위해 Seq2SeqTrainer를 사용하여 mBART를 미세 조정하는 방법 | [Vasudev Gupta](https://github.com/vasudevgupta7) | [](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb)|
diff --git a/docs/source/ko/create_a_model.md b/docs/source/ko/create_a_model.md
index 62a118563f1..b911669bb17 100644
--- a/docs/source/ko/create_a_model.md
+++ b/docs/source/ko/create_a_model.md
@@ -87,7 +87,7 @@ DistilBertConfig {
사전 학습된 모델 속성은 [`~PretrainedConfig.from_pretrained`] 함수에서 수정할 수 있습니다:
```py
->>> my_config = DistilBertConfig.from_pretrained("distilbert-base-uncased", activation="relu", attention_dropout=0.4)
+>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
모델 구성이 만족스러우면 [`~PretrainedConfig.save_pretrained`]로 저장할 수 있습니다. 설정 파일은 지정된 작업 경로에 JSON 파일로 저장됩니다:
@@ -128,13 +128,13 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
사전 학습된 모델을 [`~PreTrainedModel.from_pretrained`]로 생성합니다:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
🤗 Transformers에서 제공한 모델의 사전 학습된 가중치를 사용하는 경우 기본 모델 configuration을 자동으로 불러옵니다. 그러나 원하는 경우 기본 모델 configuration 속성의 일부 또는 전부를 사용자 지정으로 바꿀 수 있습니다:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -152,13 +152,13 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
사전 학습된 모델을 [`~TFPreTrainedModel.from_pretrained`]로 생성합니다:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
🤗 Transformers에서 제공한 모델의 사전 학습된 가중치를 사용하는 경우 기본 모델 configuration을 자동으로 불러옵니다. 그러나 원하는 경우 기본 모델 configuration 속성의 일부 또는 전부를 사용자 지정으로 바꿀 수 있습니다:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -174,7 +174,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import DistilBertForSequenceClassification
->>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
다른 모델 헤드로 전환하여 이 체크포인트를 다른 작업에 쉽게 재사용할 수 있습니다. 질의응답 작업의 경우, [`DistilBertForQuestionAnswering`] 모델 헤드를 사용할 수 있습니다. 질의응답 헤드는 숨겨진 상태 출력 위에 선형 레이어가 있다는 점을 제외하면 시퀀스 분류 헤드와 유사합니다.
@@ -182,7 +182,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import DistilBertForQuestionAnswering
->>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -191,7 +191,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import TFDistilBertForSequenceClassification
->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
다른 모델 헤드로 전환하여 이 체크포인트를 다른 작업에 쉽게 재사용할 수 있습니다. 질의응답 작업의 경우, [`TFDistilBertForQuestionAnswering`] 모델 헤드를 사용할 수 있습니다. 질의응답 헤드는 숨겨진 상태 출력 위에 선형 레이어가 있다는 점을 제외하면 시퀀스 분류 헤드와 유사합니다.
@@ -199,7 +199,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import TFDistilBertForQuestionAnswering
->>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -231,7 +231,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import DistilBertTokenizer
->>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
+>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
[`DistilBertTokenizerFast`] 클래스로 빠른 토크나이저를 생성합니다:
@@ -239,7 +239,7 @@ configuration 파일을 딕셔너리로 저장하거나 사용자 정의 configu
```py
>>> from transformers import DistilBertTokenizerFast
->>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
+>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
```
diff --git a/docs/source/ko/custom_tools.md b/docs/source/ko/custom_tools.md
index 6e07ccf86c5..853d69187f6 100644
--- a/docs/source/ko/custom_tools.md
+++ b/docs/source/ko/custom_tools.md
@@ -548,7 +548,7 @@ task = "text-classification"
model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
print(model.id)
```
-`text-classification`(텍스트 분류) 작업의 경우 `'facebook/bart-large-mnli'`를 반환하고, `translation`(번역) 작업의 경우 `'t5-base'`를 반환합니다.
+`text-classification`(텍스트 분류) 작업의 경우 `'facebook/bart-large-mnli'`를 반환하고, `translation`(번역) 작업의 경우 `'google-t5/t5-base'`를 반환합니다.
이를 에이전트가 활용할 수 있는 도구로 변환하려면 어떻게 해야 할까요?
모든 도구는 필요한 주요 속성을 보유하는 슈퍼클래스 `Tool`에 의존합니다. 이를 상속하는 클래스를 만들어 보겠습니다:
diff --git a/docs/source/ko/installation.md b/docs/source/ko/installation.md
index f7995aa487d..062184e5b3b 100644
--- a/docs/source/ko/installation.md
+++ b/docs/source/ko/installation.md
@@ -168,14 +168,14 @@ conda install conda-forge::transformers
예를 들어 외부 기기 사이에 방화벽을 둔 일반 네트워크에서 평소처럼 프로그램을 다음과 같이 실행할 수 있습니다.
```bash
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
오프라인 기기에서 동일한 프로그램을 다음과 같이 실행할 수 있습니다.
```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
이제 스크립트는 로컬 파일에 한해서만 검색할 것이므로, 스크립트가 중단되거나 시간이 초과될 때까지 멈춰있지 않고 잘 실행될 것입니다.
diff --git a/docs/source/ko/model_memory_anatomy.md b/docs/source/ko/model_memory_anatomy.md
index 351cbebe028..5701e19aaa0 100644
--- a/docs/source/ko/model_memory_anatomy.md
+++ b/docs/source/ko/model_memory_anatomy.md
@@ -85,14 +85,14 @@ GPU memory occupied: 1343 MB.
## 모델 로드 [[load-model]]
-우선, `bert-large-uncased` 모델을 로드합니다. 모델의 가중치를 직접 GPU에 로드해서 가중치만이 얼마나 많은 공간을 차지하는지 확인할 수 있습니다.
+우선, `google-bert/bert-large-uncased` 모델을 로드합니다. 모델의 가중치를 직접 GPU에 로드해서 가중치만이 얼마나 많은 공간을 차지하는지 확인할 수 있습니다.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased").to("cuda")
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-large-uncased").to("cuda")
>>> print_gpu_utilization()
GPU memory occupied: 2631 MB.
```
diff --git a/docs/source/ko/model_sharing.md b/docs/source/ko/model_sharing.md
index ed6836e8de5..868cc3b231d 100644
--- a/docs/source/ko/model_sharing.md
+++ b/docs/source/ko/model_sharing.md
@@ -229,4 +229,4 @@ Flax에서 모델을 사용하는 경우, PyTorch에서 Flax로 체크포인트
* `README.md` 파일을 수동으로 생성하여 업로드합니다.
* 모델 저장소에서 **Edit model card** 버튼을 클릭합니다.
-모델 카드에 포함할 정보 유형에 대한 좋은 예는 DistilBert [모델 카드](https://huggingface.co/distilbert-base-uncased)를 참조하세요. 모델의 탄소 발자국이나 위젯 예시 등 `README.md` 파일에서 제어할 수 있는 다른 옵션에 대한 자세한 내용은 [여기](https://huggingface.co/docs/hub/models-cards) 문서를 참조하세요.
+모델 카드에 포함할 정보 유형에 대한 좋은 예는 DistilBert [모델 카드](https://huggingface.co/distilbert/distilbert-base-uncased)를 참조하세요. 모델의 탄소 발자국이나 위젯 예시 등 `README.md` 파일에서 제어할 수 있는 다른 옵션에 대한 자세한 내용은 [여기](https://huggingface.co/docs/hub/models-cards) 문서를 참조하세요.
diff --git a/docs/source/ko/multilingual.md b/docs/source/ko/multilingual.md
index 2862bd98388..c0eee024358 100644
--- a/docs/source/ko/multilingual.md
+++ b/docs/source/ko/multilingual.md
@@ -21,7 +21,7 @@ rendered properly in your Markdown viewer.
🤗 Transformers에는 여러 종류의 다국어(multilingual) 모델이 있으며, 단일 언어(monolingual) 모델과 추론 시 사용법이 다릅니다.
그렇다고 해서 *모든* 다국어 모델의 사용법이 다른 것은 아닙니다.
-[bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased)와 같은 몇몇 모델은 단일 언어 모델처럼 사용할 수 있습니다.
+[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)와 같은 몇몇 모델은 단일 언어 모델처럼 사용할 수 있습니다.
이번 가이드에서 다국어 모델의 추론 시 사용 방법을 알아볼 것입니다.
## XLM[[xlm]]
@@ -33,25 +33,25 @@ XLM에는 10가지 체크포인트(checkpoint)가 있는데, 이 중 하나만
다음 XLM 모델은 추론 시에 언어 임베딩을 사용합니다:
-- `xlm-mlm-ende-1024` (마스킹된 언어 모델링, 영어-독일어)
-- `xlm-mlm-enfr-1024` (마스킹된 언어 모델링, 영어-프랑스어)
-- `xlm-mlm-enro-1024` (마스킹된 언어 모델링, 영어-루마니아어)
-- `xlm-mlm-xnli15-1024` (마스킹된 언어 모델링, XNLI 데이터 세트에서 제공하는 15개 국어)
-- `xlm-mlm-tlm-xnli15-1024` (마스킹된 언어 모델링 + 번역, XNLI 데이터 세트에서 제공하는 15개 국어)
-- `xlm-clm-enfr-1024` (Causal language modeling, 영어-프랑스어)
-- `xlm-clm-ende-1024` (Causal language modeling, 영어-독일어)
+- `FacebookAI/xlm-mlm-ende-1024` (마스킹된 언어 모델링, 영어-독일어)
+- `FacebookAI/xlm-mlm-enfr-1024` (마스킹된 언어 모델링, 영어-프랑스어)
+- `FacebookAI/xlm-mlm-enro-1024` (마스킹된 언어 모델링, 영어-루마니아어)
+- `FacebookAI/xlm-mlm-xnli15-1024` (마스킹된 언어 모델링, XNLI 데이터 세트에서 제공하는 15개 국어)
+- `FacebookAI/xlm-mlm-tlm-xnli15-1024` (마스킹된 언어 모델링 + 번역, XNLI 데이터 세트에서 제공하는 15개 국어)
+- `FacebookAI/xlm-clm-enfr-1024` (Causal language modeling, 영어-프랑스어)
+- `FacebookAI/xlm-clm-ende-1024` (Causal language modeling, 영어-독일어)
언어 임베딩은 모델에 전달된 `input_ids`와 동일한 shape의 텐서로 표현됩니다.
이러한 텐서의 값은 사용된 언어에 따라 다르며 토크나이저의 `lang2id` 및 `id2lang` 속성에 의해 식별됩니다.
-다음 예제에서는 `xlm-clm-enfr-1024` 체크포인트(코잘 언어 모델링(causal language modeling), 영어-프랑스어)를 가져옵니다:
+다음 예제에서는 `FacebookAI/xlm-clm-enfr-1024` 체크포인트(코잘 언어 모델링(causal language modeling), 영어-프랑스어)를 가져옵니다:
```py
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
->>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
->>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
+>>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
+>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
```
토크나이저의 `lang2id` 속성은 모델의 언어와 해당 ID를 표시합니다:
@@ -91,8 +91,8 @@ XLM에는 10가지 체크포인트(checkpoint)가 있는데, 이 중 하나만
다음 XLM 모델은 추론 시에 언어 임베딩이 필요하지 않습니다:
-- `xlm-mlm-17-1280` (마스킹된 언어 모델링, 17개 국어)
-- `xlm-mlm-100-1280` (마스킹된 언어 모델링, 100개 국어)
+- `FacebookAI/xlm-mlm-17-1280` (마스킹된 언어 모델링, 17개 국어)
+- `FacebookAI/xlm-mlm-100-1280` (마스킹된 언어 모델링, 100개 국어)
이전의 XLM 체크포인트와 달리 이 모델은 일반 문장 표현에 사용됩니다.
@@ -100,8 +100,8 @@ XLM에는 10가지 체크포인트(checkpoint)가 있는데, 이 중 하나만
다음 BERT 모델은 다국어 태스크에 사용할 수 있습니다:
-- `bert-base-multilingual-uncased` (마스킹된 언어 모델링 + 다음 문장 예측, 102개 국어)
-- `bert-base-multilingual-cased` (마스킹된 언어 모델링 + 다음 문장 예측, 104개 국어)
+- `google-bert/bert-base-multilingual-uncased` (마스킹된 언어 모델링 + 다음 문장 예측, 102개 국어)
+- `google-bert/bert-base-multilingual-cased` (마스킹된 언어 모델링 + 다음 문장 예측, 104개 국어)
이러한 모델은 추론 시에 언어 임베딩이 필요하지 않습니다.
문맥에서 언어를 식별하고, 식별된 언어로 추론합니다.
@@ -110,8 +110,8 @@ XLM에는 10가지 체크포인트(checkpoint)가 있는데, 이 중 하나만
다음 XLM-RoBERTa 또한 다국어 다국어 태스크에 사용할 수 있습니다:
-- `xlm-roberta-base` (마스킹된 언어 모델링, 100개 국어)
-- `xlm-roberta-large` (마스킹된 언어 모델링, 100개 국어)
+- `FacebookAI/xlm-roberta-base` (마스킹된 언어 모델링, 100개 국어)
+- `FacebookAI/xlm-roberta-large` (마스킹된 언어 모델링, 100개 국어)
XLM-RoBERTa는 100개 국어에 대해 새로 생성되고 정제된 2.5TB 규모의 CommonCrawl 데이터로 학습되었습니다.
이전에 공개된 mBERT나 XLM과 같은 다국어 모델에 비해 분류, 시퀀스 라벨링, 질의 응답과 같은 다운스트림(downstream) 작업에서 이점이 있습니다.
diff --git a/docs/source/ko/perf_hardware.md b/docs/source/ko/perf_hardware.md
index dedb9a60ed1..01282a0c711 100644
--- a/docs/source/ko/perf_hardware.md
+++ b/docs/source/ko/perf_hardware.md
@@ -117,7 +117,7 @@ GPU1 PHB X 0-11 N/A
따라서 `nvidia-smi topo -m`의 결과에서 `NVX`의 값이 높을수록 더 좋습니다. 세대는 GPU 아키텍처에 따라 다를 수 있습니다.
-그렇다면, gpt2를 작은 wikitext 샘플로 학습시키는 예제를 통해, NVLink가 훈련에 어떤 영향을 미치는지 살펴보겠습니다.
+그렇다면, openai-community/gpt2를 작은 wikitext 샘플로 학습시키는 예제를 통해, NVLink가 훈련에 어떤 영향을 미치는지 살펴보겠습니다.
결과는 다음과 같습니다:
@@ -136,7 +136,7 @@ NVLink 사용 시 훈련이 약 23% 더 빠르게 완료됨을 확인할 수 있
# DDP w/ NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
@@ -145,7 +145,7 @@ rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
# DDP w/o NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
diff --git a/docs/source/ko/perf_train_cpu.md b/docs/source/ko/perf_train_cpu.md
index f0398aaa262..1a6c58b25af 100644
--- a/docs/source/ko/perf_train_cpu.md
+++ b/docs/source/ko/perf_train_cpu.md
@@ -49,7 +49,7 @@ Trainer에서 IPEX의 자동 혼합 정밀도를 활성화하려면 사용자는
- CPU에서 BF16 자동 혼합 정밀도를 사용하여 IPEX로 훈련하기:
python run_qa.py \
---model_name_or_path bert-base-uncased \
+--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
diff --git a/docs/source/ko/perf_train_cpu_many.md b/docs/source/ko/perf_train_cpu_many.md
index 9ff4cfbfa6e..e7a68971a7d 100644
--- a/docs/source/ko/perf_train_cpu_many.md
+++ b/docs/source/ko/perf_train_cpu_many.md
@@ -88,7 +88,7 @@ Trainer에서 ccl 백엔드를 사용하여 멀티 CPU 분산 훈련을 활성
export MASTER_ADDR=127.0.0.1
mpirun -n 2 -genv OMP_NUM_THREADS=23 \
python3 run_qa.py \
- --model_name_or_path bert-large-uncased \
+ --model_name_or_path google-bert/bert-large-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -117,7 +117,7 @@ Trainer에서 ccl 백엔드를 사용하여 멀티 CPU 분산 훈련을 활성
mpirun -f hostfile -n 4 -ppn 2 \
-genv OMP_NUM_THREADS=23 \
python3 run_qa.py \
- --model_name_or_path bert-large-uncased \
+ --model_name_or_path google-bert/bert-large-uncased \
--dataset_name squad \
--do_train \
--do_eval \
diff --git a/docs/source/ko/perf_train_gpu_many.md b/docs/source/ko/perf_train_gpu_many.md
index 1fc6ce8e1cc..c2a80505ef7 100644
--- a/docs/source/ko/perf_train_gpu_many.md
+++ b/docs/source/ko/perf_train_gpu_many.md
@@ -138,7 +138,7 @@ DP와 DDP 사이에는 다른 차이점이 있지만, 이 토론과는 관련이
# DP
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 110.5948, 'train_samples_per_second': 1.808, 'epoch': 0.69}
@@ -146,7 +146,7 @@ python examples/pytorch/language-modeling/run_clm.py \
# DDP w/ NVlink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 101.9003, 'train_samples_per_second': 1.963, 'epoch': 0.69}
@@ -154,7 +154,7 @@ torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
# DDP w/o NVlink
rm -r /tmp/test-clm; NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
---model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
+--model_name_or_path openai-community/gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
{'train_runtime': 131.4367, 'train_samples_per_second': 1.522, 'epoch': 0.69}
diff --git a/docs/source/ko/perplexity.md b/docs/source/ko/perplexity.md
index 72eee0643c3..9de84a5f289 100644
--- a/docs/source/ko/perplexity.md
+++ b/docs/source/ko/perplexity.md
@@ -72,7 +72,7 @@ $$\text{PPL}(X) = \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
```
그 다음으로 텍스트를 토크나이저에 넣어주세요:
diff --git a/docs/source/ko/quicktour.md b/docs/source/ko/quicktour.md
index a456c4e0017..c92279fa916 100644
--- a/docs/source/ko/quicktour.md
+++ b/docs/source/ko/quicktour.md
@@ -81,7 +81,7 @@ pip install tensorflow
>>> classifier = pipeline("sentiment-analysis")
```
-[`pipeline`]은 감정 분석을 위한 [사전 훈련된 모델](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)과 토크나이저를 자동으로 다운로드하고 캐시합니다. 이제 `classifier`를 대상 텍스트에 사용할 수 있습니다:
+[`pipeline`]은 감정 분석을 위한 [사전 훈련된 모델](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)과 토크나이저를 자동으로 다운로드하고 캐시합니다. 이제 `classifier`를 대상 텍스트에 사용할 수 있습니다:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
@@ -385,7 +385,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoConfig
->>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
+>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
@@ -422,7 +422,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoModelForSequenceClassification
- >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. [`TrainingArguments`]는 학습률, 배치 크기, 훈련할 에포크 수와 같은 모델 하이퍼파라미터를 포함합니다. 훈련 인자를 지정하지 않으면 기본값이 사용됩니다:
@@ -444,7 +444,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
4. 데이터셋을 로드하세요:
@@ -516,7 +516,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import TFAutoModelForSequenceClassification
- >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. 토크나이저, 이미지 프로세서, 특징 추출기(feature extractor) 또는 프로세서와 같은 전처리 클래스를 로드하세요:
@@ -524,7 +524,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
3. 데이터셋을 토큰화하는 함수를 생성하세요:
diff --git a/docs/source/ko/run_scripts.md b/docs/source/ko/run_scripts.md
index f88e8e8252f..715a949dde4 100644
--- a/docs/source/ko/run_scripts.md
+++ b/docs/source/ko/run_scripts.md
@@ -94,12 +94,12 @@ pip install -r requirements.txt
예제 스크립트는 🤗 [Datasets](https://huggingface.co/docs/datasets/) 라이브러리에서 데이터 세트를 다운로드하고 전처리합니다.
그런 다음 스크립트는 요약 기능을 지원하는 아키텍처에서 [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)를 사용하여 데이터 세트를 미세 조정합니다.
-다음 예는 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 데이터 세트에서 [T5-small](https://huggingface.co/t5-small)을 미세 조정합니다.
+다음 예는 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 데이터 세트에서 [T5-small](https://huggingface.co/google-t5/t5-small)을 미세 조정합니다.
T5 모델은 훈련 방식에 따라 추가 `source_prefix` 인수가 필요하며, 이 프롬프트는 요약 작업임을 T5에 알려줍니다.
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -115,11 +115,11 @@ python examples/pytorch/summarization/run_summarization.py \
예제 스크립트는 🤗 [Datasets](https://huggingface.co/docs/datasets/) 라이브러리에서 데이터 세트를 다운로드하고 전처리합니다.
그런 다음 스크립트는 요약 기능을 지원하는 아키텍처에서 Keras를 사용하여 데이터 세트를 미세 조정합니다.
-다음 예는 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 데이터 세트에서 [T5-small](https://huggingface.co/t5-small)을 미세 조정합니다.
+다음 예는 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 데이터 세트에서 [T5-small](https://huggingface.co/google-t5/t5-small)을 미세 조정합니다.
T5 모델은 훈련 방식에 따라 추가 `source_prefix` 인수가 필요하며, 이 프롬프트는 요약 작업임을 T5에 알려줍니다.
```bash
python examples/tensorflow/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -144,7 +144,7 @@ python examples/tensorflow/summarization/run_summarization.py \
torchrun \
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
--fp16 \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -171,7 +171,7 @@ TPU를 사용하려면 `xla_spawn.py` 스크립트를 실행하고 `num_cores`
```bash
python xla_spawn.py --num_cores 8 \
summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -192,7 +192,7 @@ TPU를 사용하려면 TPU 리소스의 이름을 `tpu` 인수에 전달합니
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -232,7 +232,7 @@ accelerate test
```bash
accelerate launch run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -252,7 +252,7 @@ accelerate launch run_summarization_no_trainer.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -278,7 +278,7 @@ python examples/pytorch/summarization/run_summarization.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--max_train_samples 50 \
--max_eval_samples 50 \
--max_predict_samples 50 \
@@ -311,7 +311,7 @@ examples/pytorch/summarization/run_summarization.py -h
이 경우 `overwrite_output_dir`을 제거해야 합니다:
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -328,7 +328,7 @@ python examples/pytorch/summarization/run_summarization.py
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -359,7 +359,7 @@ huggingface-cli login
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
diff --git a/docs/source/ko/serialization.md b/docs/source/ko/serialization.md
index 0cbcf005e3a..2e521e2b7b4 100644
--- a/docs/source/ko/serialization.md
+++ b/docs/source/ko/serialization.md
@@ -56,10 +56,10 @@ pip install optimum[exporters]
optimum-cli export onnx --help
```
-예를 들어, 🤗 Hub에서 `distilbert-base-uncased-distilled-squad`와 같은 모델의 체크포인트를 내보내려면 다음 명령을 실행하세요:
+예를 들어, 🤗 Hub에서 `distilbert/distilbert-base-uncased-distilled-squad`와 같은 모델의 체크포인트를 내보내려면 다음 명령을 실행하세요:
```bash
-optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
+optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
```
위와 같이 진행 상황을 나타내는 로그가 표시되고 결과인 `model.onnx`가 저장된 위치가 표시됩니다.
@@ -141,7 +141,7 @@ pip install transformers[onnx]
`transformers.onnx` 패키지를 Python 모듈로 사용하여 준비된 구성을 사용하여 체크포인트를 내보냅니다:
```bash
-python -m transformers.onnx --model=distilbert-base-uncased onnx/
+python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
이렇게 하면 `--model` 인수에 정의된 체크포인트의 ONNX 그래프가 내보내집니다. 🤗 Hub에서 제공하는 체크포인트나 로컬에 저장된 체크포인트를 전달할 수 있습니다. 결과로 생성된 `model.onnx` 파일은 ONNX 표준을 지원하는 많은 가속기 중 하나에서 실행할 수 있습니다. 예를 들어, 다음과 같이 ONNX Runtime을 사용하여 모델을 로드하고 실행할 수 있습니다:
@@ -150,7 +150,7 @@ python -m transformers.onnx --model=distilbert-base-uncased onnx/
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
diff --git a/docs/source/ko/task_summary.md b/docs/source/ko/task_summary.md
index dbebf38760a..a0e60c60924 100644
--- a/docs/source/ko/task_summary.md
+++ b/docs/source/ko/task_summary.md
@@ -296,7 +296,7 @@ score: 0.9327, start: 30, end: 54, answer: huggingface/transformers
>>> from transformers import pipeline
>>> text = "translate English to French: Hugging Face is a community-based open-source platform for machine learning."
->>> translator = pipeline(task="translation", model="t5-small")
+>>> translator = pipeline(task="translation", model="google-t5/t5-small")
>>> translator(text)
[{'translation_text': "Hugging Face est une tribune communautaire de l'apprentissage des machines."}]
```
diff --git a/docs/source/ko/tasks/language_modeling.md b/docs/source/ko/tasks/language_modeling.md
index bf10660c61c..ee1d11c1d09 100644
--- a/docs/source/ko/tasks/language_modeling.md
+++ b/docs/source/ko/tasks/language_modeling.md
@@ -29,7 +29,7 @@ rendered properly in your Markdown viewer.
이 가이드에서는 다음 작업을 수행하는 방법을 안내합니다:
-1. [DistilGPT2](https://huggingface.co/distilgpt2) 모델을 [ELI5](https://huggingface.co/datasets/eli5) 데이터 세트의 [r/askscience](https://www.reddit.com/r/askscience/) 하위 집합으로 미세 조정
+1. [DistilGPT2](https://huggingface.co/distilbert/distilgpt2) 모델을 [ELI5](https://huggingface.co/datasets/eli5) 데이터 세트의 [r/askscience](https://www.reddit.com/r/askscience/) 하위 집합으로 미세 조정
2. 미세 조정된 모델을 추론에 사용
@@ -104,7 +104,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
```
위의 예제에서 알 수 있듯이, `text` 필드는 `answers` 아래에 중첩되어 있습니다. 따라서 [`flatten`](https://huggingface.co/docs/datasets/process#flatten) 메소드를 사용하여 중첩 구조에서 `text` 하위 필드를 추출해야 합니다.
@@ -221,7 +221,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
->>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
여기까지 진행하면 세 단계만 남았습니다:
@@ -285,7 +285,7 @@ TensorFlow에서 모델을 미세 조정하려면, 먼저 옵티마이저 함수
```py
>>> from transformers import TFAutoModelForCausalLM
->>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> model = TFAutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`]을 사용하여 데이터 세트를 `tf.data.Dataset` 형식으로 변환하세요:
diff --git a/docs/source/ko/tasks/masked_language_modeling.md b/docs/source/ko/tasks/masked_language_modeling.md
index ee835d13ebc..3aafdf1cb9e 100644
--- a/docs/source/ko/tasks/masked_language_modeling.md
+++ b/docs/source/ko/tasks/masked_language_modeling.md
@@ -26,7 +26,7 @@ rendered properly in your Markdown viewer.
이번 가이드에서 다룰 내용은 다음과 같습니다:
-1. [ELI5](https://huggingface.co/datasets/eli5) 데이터 세트에서 [r/askscience](https://www.reddit.com/r/askscience/) 부분을 사용해 [DistilRoBERTa](https://huggingface.co/distilroberta-base) 모델을 미세 조정합니다.
+1. [ELI5](https://huggingface.co/datasets/eli5) 데이터 세트에서 [r/askscience](https://www.reddit.com/r/askscience/) 부분을 사용해 [DistilRoBERTa](https://huggingface.co/distilbert/distilroberta-base) 모델을 미세 조정합니다.
2. 추론 시에 직접 미세 조정한 모델을 사용합니다.
@@ -103,7 +103,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티와
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base")
```
위의 예제에서와 마찬가지로, `text` 필드는 `answers` 안에 중첩되어 있습니다.
@@ -224,7 +224,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티와
```py
>>> from transformers import AutoModelForMaskedLM
->>> model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
+>>> model = AutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
이제 세 단계가 남았습니다:
@@ -289,7 +289,7 @@ TensorFlow로 모델을 미세 조정하기 위해서는 옵티마이저(optimiz
```py
>>> from transformers import TFAutoModelForMaskedLM
->>> model = TFAutoModelForMaskedLM.from_pretrained("distilroberta-base")
+>>> model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`] 메소드를 사용해 데이터 세트를 `tf.data.Dataset` 형식으로 변환하세요:
diff --git a/docs/source/ko/tasks/multiple_choice.md b/docs/source/ko/tasks/multiple_choice.md
index c174ca632f6..4e02f7fabe5 100644
--- a/docs/source/ko/tasks/multiple_choice.md
+++ b/docs/source/ko/tasks/multiple_choice.md
@@ -22,7 +22,7 @@ rendered properly in your Markdown viewer.
진행하는 방법은 아래와 같습니다:
-1. [SWAG](https://huggingface.co/datasets/swag) 데이터 세트의 'regular' 구성으로 [BERT](https://huggingface.co/bert-base-uncased)를 미세 조정하여 여러 옵션과 일부 컨텍스트가 주어졌을 때 가장 적합한 답을 선택합니다.
+1. [SWAG](https://huggingface.co/datasets/swag) 데이터 세트의 'regular' 구성으로 [BERT](https://huggingface.co/google-bert/bert-base-uncased)를 미세 조정하여 여러 옵션과 일부 컨텍스트가 주어졌을 때 가장 적합한 답을 선택합니다.
2. 추론에 미세 조정된 모델을 사용합니다.
@@ -90,7 +90,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
생성하려는 전처리 함수는 다음과 같아야 합니다:
@@ -253,7 +253,7 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
```py
>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
->>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
이제 세 단계만 남았습니다:
@@ -317,7 +317,7 @@ TensorFlow에서 모델을 미세 조정하려면 최적화 함수, 학습률
```py
>>> from transformers import TFAutoModelForMultipleChoice
->>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`]을 사용하여 데이터 세트를 `tf.data.Dataset` 형식으로 변환합니다:
diff --git a/docs/source/ko/tasks/question_answering.md b/docs/source/ko/tasks/question_answering.md
index 4b218ccce21..9539b9a4030 100644
--- a/docs/source/ko/tasks/question_answering.md
+++ b/docs/source/ko/tasks/question_answering.md
@@ -27,7 +27,7 @@ rendered properly in your Markdown viewer.
이 가이드는 다음과 같은 방법들을 보여줍니다.
-1. 추출적 질의 응답을 하기 위해 [SQuAD](https://huggingface.co/datasets/squad) 데이터 세트에서 [DistilBERT](https://huggingface.co/distilbert-base-uncased) 미세 조정하기
+1. 추출적 질의 응답을 하기 위해 [SQuAD](https://huggingface.co/datasets/squad) 데이터 세트에서 [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) 미세 조정하기
2. 추론에 미세 조정된 모델 사용하기
@@ -99,7 +99,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
질의 응답 태스크와 관련해서 특히 유의해야할 몇 가지 전처리 단계가 있습니다:
@@ -203,7 +203,7 @@ pip install transformers datasets evaluate
```py
>>> from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
->>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
이제 세 단계만 남았습니다:
@@ -268,7 +268,7 @@ TensorFlow를 이용한 모델을 미세 조정하려면 옵티마이저 함수,
```py
>>> from transformers import TFAutoModelForQuestionAnswering
->>> model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
+>>> model = TFAutoModelForQuestionAnswering("distilbert/distilbert-base-uncased")
```
[`~transformers.TFPreTrainedModel.prepare_tf_dataset`]을 사용해서 데이터 세트를 `tf.data.Dataset` 형식으로 변환합니다:
diff --git a/docs/source/ko/tasks/sequence_classification.md b/docs/source/ko/tasks/sequence_classification.md
index bc364d3199e..a1a5da50e9f 100644
--- a/docs/source/ko/tasks/sequence_classification.md
+++ b/docs/source/ko/tasks/sequence_classification.md
@@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.
이 가이드에서 학습할 내용은:
-1. [IMDb](https://huggingface.co/datasets/imdb) 데이터셋에서 [DistilBERT](https://huggingface.co/distilbert-base-uncased)를 파인 튜닝하여 영화 리뷰가 긍정적인지 부정적인지 판단합니다.
+1. [IMDb](https://huggingface.co/datasets/imdb) 데이터셋에서 [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased)를 파인 튜닝하여 영화 리뷰가 긍정적인지 부정적인지 판단합니다.
2. 추론을 위해 파인 튜닝 모델을 사용합니다.
@@ -85,7 +85,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티에
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
`text`를 토큰화하고 시퀀스가 DistilBERT의 최대 입력 길이보다 길지 않도록 자르기 위한 전처리 함수를 생성하세요:
@@ -167,7 +167,7 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
>>> model = AutoModelForSequenceClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```
@@ -241,7 +241,7 @@ TensorFlow에서 모델을 파인 튜닝하려면, 먼저 옵티마이저 함수
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```
diff --git a/docs/source/ko/tasks/summarization.md b/docs/source/ko/tasks/summarization.md
index 5ca5f63a27c..43eae25d79f 100644
--- a/docs/source/ko/tasks/summarization.md
+++ b/docs/source/ko/tasks/summarization.md
@@ -29,7 +29,7 @@ rendered properly in your Markdown viewer.
이 가이드에서 소개할 내용은 아래와 같습니다:
-1. 생성 요약을 위한 [BillSum](https://huggingface.co/datasets/billsum) 데이터셋 중 캘리포니아 주 법안 하위 집합으로 [T5](https://huggingface.co/t5-small)를 파인튜닝합니다.
+1. 생성 요약을 위한 [BillSum](https://huggingface.co/datasets/billsum) 데이터셋 중 캘리포니아 주 법안 하위 집합으로 [T5](https://huggingface.co/google-t5/t5-small)를 파인튜닝합니다.
2. 파인튜닝된 모델을 사용하여 추론합니다.
@@ -95,7 +95,7 @@ Hugging Face 계정에 로그인하면 모델을 업로드하고 커뮤니티에
```py
>>> from transformers import AutoTokenizer
->>> checkpoint = "t5-small"
+>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```
diff --git a/docs/source/ko/tasks/token_classification.md b/docs/source/ko/tasks/token_classification.md
index b09c2c8078a..1e49d79a0d7 100644
--- a/docs/source/ko/tasks/token_classification.md
+++ b/docs/source/ko/tasks/token_classification.md
@@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.
이 가이드에서 학습할 내용은:
-1. [WNUT 17](https://huggingface.co/datasets/wnut_17) 데이터 세트에서 [DistilBERT](https://huggingface.co/distilbert-base-uncased)를 파인 튜닝하여 새로운 개체를 탐지합니다.
+1. [WNUT 17](https://huggingface.co/datasets/wnut_17) 데이터 세트에서 [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased)를 파인 튜닝하여 새로운 개체를 탐지합니다.
2. 추론을 위해 파인 튜닝 모델을 사용합니다.
@@ -109,7 +109,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티에
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
위의 예제 `tokens` 필드를 보면 입력이 이미 토큰화된 것처럼 보입니다. 그러나 실제로 입력은 아직 토큰화되지 않았으므로 단어를 하위 단어로 토큰화하기 위해 `is_split_into_words=True`를 설정해야 합니다. 예제로 확인합니다:
@@ -270,7 +270,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티에
>>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
>>> model = AutoModelForTokenClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
@@ -341,7 +341,7 @@ TensorFlow에서 모델을 파인 튜닝하려면, 먼저 옵티마이저 함수
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained(
-... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
diff --git a/docs/source/ko/tasks/translation.md b/docs/source/ko/tasks/translation.md
index fa7dc348fce..6de275f7d04 100644
--- a/docs/source/ko/tasks/translation.md
+++ b/docs/source/ko/tasks/translation.md
@@ -24,7 +24,7 @@ rendered properly in your Markdown viewer.
이 가이드에서 학습할 내용은:
-1. 영어 텍스트를 프랑스어로 번역하기 위해 [T5](https://huggingface.co/t5-small) 모델을 OPUS Books 데이터세트의 영어-프랑스어 하위 집합으로 파인튜닝하는 방법과
+1. 영어 텍스트를 프랑스어로 번역하기 위해 [T5](https://huggingface.co/google-t5/t5-small) 모델을 OPUS Books 데이터세트의 영어-프랑스어 하위 집합으로 파인튜닝하는 방법과
2. 파인튜닝된 모델을 추론에 사용하는 방법입니다.
@@ -88,7 +88,7 @@ pip install transformers datasets evaluate sacrebleu
```py
>>> from transformers import AutoTokenizer
->>> checkpoint = "t5-small"
+>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```
diff --git a/docs/source/ko/tf_xla.md b/docs/source/ko/tf_xla.md
index 66d30abb2e9..0b47d6fbad8 100644
--- a/docs/source/ko/tf_xla.md
+++ b/docs/source/ko/tf_xla.md
@@ -85,8 +85,8 @@ from transformers.utils import check_min_version
check_min_version("4.21.0")
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
# XLA 생성 함수를 만들기 위한 한 줄
@@ -114,8 +114,8 @@ XLA 활성화 함수(`xla_generate()`와 같은)를 처음 실행할 때 내부
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
xla_generate = tf.function(model.generate, jit_compile=True)
@@ -135,8 +135,8 @@ import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
xla_generate = tf.function(model.generate, jit_compile=True)
diff --git a/docs/source/ko/tflite.md b/docs/source/ko/tflite.md
index 5d08ea40785..464106a6b7c 100644
--- a/docs/source/ko/tflite.md
+++ b/docs/source/ko/tflite.md
@@ -38,10 +38,10 @@ pip install optimum[exporters-tf]
optimum-cli export tflite --help
```
-예를 들어 🤗 Hub에서의 `bert-base-uncased` 모델 체크포인트를 내보내려면, 다음 명령을 실행하세요:
+예를 들어 🤗 Hub에서의 `google-bert/bert-base-uncased` 모델 체크포인트를 내보내려면, 다음 명령을 실행하세요:
```bash
-optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/
+optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
```
다음과 같이 진행 상황을 나타내는 로그와 결과물인 `model.tflite`가 저장된 위치를 보여주는 로그가 표시됩니다:
diff --git a/docs/source/ko/tokenizer_summary.md b/docs/source/ko/tokenizer_summary.md
index 5c6b9a6b73c..0a4ece29a47 100644
--- a/docs/source/ko/tokenizer_summary.md
+++ b/docs/source/ko/tokenizer_summary.md
@@ -97,7 +97,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import BertTokenizer
->>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> tokenizer.tokenize("I have a new GPU!")
["i", "have", "a", "new", "gp", "##u", "!"]
```
@@ -111,7 +111,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import XLNetTokenizer
->>> tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")
+>>> tokenizer = XLNetTokenizer.from_pretrained("xlnet/xlnet-base-cased")
>>> tokenizer.tokenize("Don't you love 🤗 Transformers? We sure do.")
["▁Don", "'", "t", "▁you", "▁love", "▁", "🤗", "▁", "Transform", "ers", "?", "▁We", "▁sure", "▁do", "."]
```
diff --git a/docs/source/ko/torchscript.md b/docs/source/ko/torchscript.md
index 297479caf2c..28e198c5ec9 100644
--- a/docs/source/ko/torchscript.md
+++ b/docs/source/ko/torchscript.md
@@ -82,7 +82,7 @@ TorchScript는 묶인 가중치를 가진 모델을 내보낼 수 없으므로,
from transformers import BertModel, BertTokenizer, BertConfig
import torch
-enc = BertTokenizer.from_pretrained("bert-base-uncased")
+enc = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# 입력 텍스트 토큰화하기
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
@@ -117,7 +117,7 @@ model = BertModel(config)
model.eval()
# 만약 *from_pretrained*를 사용하여 모델을 인스턴스화하는 경우, TorchScript 플래그를 쉽게 설정할 수 있습니다
-model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
+model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
# 추적 생성하기
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
diff --git a/docs/source/ko/training.md b/docs/source/ko/training.md
index f4ab1332294..fa6d56bdc36 100644
--- a/docs/source/ko/training.md
+++ b/docs/source/ko/training.md
@@ -48,7 +48,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> def tokenize_function(examples):
@@ -84,7 +84,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
@@ -187,7 +187,7 @@ dataset = dataset["train"] # Just take the training split for now
```py
from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
@@ -202,7 +202,7 @@ from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# Load and compile our model
-model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
+model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))
@@ -329,7 +329,7 @@ torch.cuda.empty_cache()
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
### 옵티마이저 및 학습 속도 스케줄러[[optimizer-and-learning-rate-scheduler]]
diff --git a/docs/source/ko/troubleshooting.md b/docs/source/ko/troubleshooting.md
index 5eef788e099..263d693c23d 100644
--- a/docs/source/ko/troubleshooting.md
+++ b/docs/source/ko/troubleshooting.md
@@ -134,7 +134,7 @@ RuntimeError: CUDA error: device-side assert triggered
>>> from transformers import AutoModelForSequenceClassification
>>> import torch
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
>>> model.config.pad_token_id
0
```
@@ -191,8 +191,8 @@ tensor([[ 0.0082, -0.2307],
```py
>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering
->>> processor = AutoProcessor.from_pretrained("gpt2-medium")
->>> model = AutoModelForQuestionAnswering.from_pretrained("gpt2-medium")
+>>> processor = AutoProcessor.from_pretrained("openai-community/gpt2-medium")
+>>> model = AutoModelForQuestionAnswering.from_pretrained("openai-community/gpt2-medium")
ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModelForQuestionAnswering.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...
```
diff --git a/docs/source/pt/converting_tensorflow_models.md b/docs/source/pt/converting_tensorflow_models.md
index 97767b2ad42..190c1aec5b2 100644
--- a/docs/source/pt/converting_tensorflow_models.md
+++ b/docs/source/pt/converting_tensorflow_models.md
@@ -100,9 +100,9 @@ transformers-cli convert --model_type gpt \
Aqui está um exemplo do processo de conversão para um modelo OpenAI GPT-2 pré-treinado (consulte [aqui](https://github.com/openai/gpt-2))
```bash
-export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
+export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/openai-community/gpt2/pretrained/weights
-transformers-cli convert --model_type gpt2 \
+transformers-cli convert --model_type openai-community/gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
diff --git a/docs/source/pt/create_a_model.md b/docs/source/pt/create_a_model.md
index fd1e9c8f39a..dd71963236f 100644
--- a/docs/source/pt/create_a_model.md
+++ b/docs/source/pt/create_a_model.md
@@ -86,7 +86,7 @@ DistilBertConfig {
Atributos de um modelo pré-treinado podem ser modificados na função [`~PretrainedConfig.from_pretrained`]:
```py
->>> my_config = DistilBertConfig.from_pretrained("distilbert-base-uncased", activation="relu", attention_dropout=0.4)
+>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
Uma vez que você está satisfeito com as configurações do seu modelo, você consegue salvar elas com [`~PretrainedConfig.save_pretrained`]. Seu arquivo de configurações está salvo como um arquivo JSON no diretório especificado:
@@ -127,13 +127,13 @@ Isso cria um modelo com valores aleatórios ao invés de pré-treinar os pesos.
Criar um modelo pré-treinado com [`~PreTrainedModel.from_pretrained`]:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
Quando você carregar os pesos pré-treinados, a configuração padrão do modelo é automaticamente carregada se o modelo é provido pelo 🤗 Transformers. No entanto, você ainda consegue mudar - alguns ou todos - os atributos padrões de configuração do modelo com os seus próprio atributos, se você preferir:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -151,13 +151,13 @@ Isso cria um modelo com valores aleatórios ao invés de pré-treinar os pesos.
Criar um modelo pré-treinado com [`~TFPreTrainedModel.from_pretrained`]:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
Quando você carregar os pesos pré-treinados, a configuração padrão do modelo é automaticamente carregada se o modelo é provido pelo 🤗 Transformers. No entanto, você ainda consegue mudar - alguns ou todos - os atributos padrões de configuração do modelo com os seus próprio atributos, se você preferir:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -173,7 +173,7 @@ Por exemplo, [`DistilBertForSequenceClassification`] é um modelo DistilBERT bas
```py
>>> from transformers import DistilBertForSequenceClassification
->>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Reutilize facilmente esse ponto de parada para outra tarefe mudando para uma head de modelo diferente. Para uma tarefe de responder questões, você usaria a head do modelo [`DistilBertForQuestionAnswering`]. A head de responder questões é similar com a de classificação de sequências exceto o fato de que ela é uma camada no topo dos estados das saídas ocultas.
@@ -181,7 +181,7 @@ Reutilize facilmente esse ponto de parada para outra tarefe mudando para uma hea
```py
>>> from transformers import DistilBertForQuestionAnswering
->>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -190,7 +190,7 @@ Por exemplo, [`TFDistilBertForSequenceClassification`] é um modelo DistilBERT b
```py
>>> from transformers import TFDistilBertForSequenceClassification
->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Reutilize facilmente esse ponto de parada para outra tarefe mudando para uma head de modelo diferente. Para uma tarefe de responder questões, você usaria a head do modelo [`TFDistilBertForQuestionAnswering`]. A head de responder questões é similar com a de classificação de sequências exceto o fato de que ela é uma camada no topo dos estados das saídas ocultas.
@@ -198,7 +198,7 @@ Reutilize facilmente esse ponto de parada para outra tarefe mudando para uma hea
```py
>>> from transformers import TFDistilBertForQuestionAnswering
->>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -231,7 +231,7 @@ Se você treinou seu prórpio tokenizer, você pode criar um a partir do seu arq
```py
>>> from transformers import DistilBertTokenizer
->>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
+>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
Criando um 'fast tokenizer' com a classe [`DistilBertTokenizerFast`]:
@@ -239,7 +239,7 @@ Criando um 'fast tokenizer' com a classe [`DistilBertTokenizerFast`]:
```py
>>> from transformers import DistilBertTokenizerFast
->>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
+>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
```
diff --git a/docs/source/pt/installation.md b/docs/source/pt/installation.md
index 574d34ee560..7eeefd883d6 100644
--- a/docs/source/pt/installation.md
+++ b/docs/source/pt/installation.md
@@ -185,14 +185,14 @@ Você pode adicionar o [🤗 Datasets](https://huggingface.co/docs/datasets/) ao
Segue um exemplo de execução do programa numa rede padrão com firewall para instâncias externas, usando o seguinte comando:
```bash
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
Execute esse mesmo programa numa instância offline com o seguinte comando:
```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
O script agora deve ser executado sem travar ou expirar, pois procurará apenas por arquivos locais.
diff --git a/docs/source/pt/multilingual.md b/docs/source/pt/multilingual.md
index b6366b8c228..5515c6a922a 100644
--- a/docs/source/pt/multilingual.md
+++ b/docs/source/pt/multilingual.md
@@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
Existem vários modelos multilinguísticos no 🤗 Transformers e seus usos para inferência diferem dos modelos monolíngues.
No entanto, nem *todos* os usos dos modelos multilíngues são tão diferentes.
-Alguns modelos, como o [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased),
+Alguns modelos, como o [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased),
podem ser usados como se fossem monolíngues. Este guia irá te ajudar a usar modelos multilíngues cujo uso difere
para o propósito de inferência.
@@ -34,25 +34,25 @@ checkpoints que usam de language embeddings e os que não.
Os seguintes modelos de XLM usam language embeddings para especificar a linguagem utilizada para a inferência.
-- `xlm-mlm-ende-1024` (Masked language modeling, English-German)
-- `xlm-mlm-enfr-1024` (Masked language modeling, English-French)
-- `xlm-mlm-enro-1024` (Masked language modeling, English-Romanian)
-- `xlm-mlm-xnli15-1024` (Masked language modeling, XNLI languages)
-- `xlm-mlm-tlm-xnli15-1024` (Masked language modeling + translation, XNLI languages)
-- `xlm-clm-enfr-1024` (Causal language modeling, English-French)
-- `xlm-clm-ende-1024` (Causal language modeling, English-German)
+- `FacebookAI/xlm-mlm-ende-1024` (Masked language modeling, English-German)
+- `FacebookAI/xlm-mlm-enfr-1024` (Masked language modeling, English-French)
+- `FacebookAI/xlm-mlm-enro-1024` (Masked language modeling, English-Romanian)
+- `FacebookAI/xlm-mlm-xnli15-1024` (Masked language modeling, XNLI languages)
+- `FacebookAI/xlm-mlm-tlm-xnli15-1024` (Masked language modeling + translation, XNLI languages)
+- `FacebookAI/xlm-clm-enfr-1024` (Causal language modeling, English-French)
+- `FacebookAI/xlm-clm-ende-1024` (Causal language modeling, English-German)
Os language embeddings são representados por um tensor de mesma dimensão que os `input_ids` passados ao modelo.
Os valores destes tensores dependem do idioma utilizado e se identificam pelos atributos `lang2id` e `id2lang` do tokenizador.
-Neste exemplo, carregamos o checkpoint `xlm-clm-enfr-1024`(Causal language modeling, English-French):
+Neste exemplo, carregamos o checkpoint `FacebookAI/xlm-clm-enfr-1024`(Causal language modeling, English-French):
```py
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
->>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
->>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
+>>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
+>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
```
O atributo `lang2id` do tokenizador mostra os idiomas deste modelo e seus ids:
@@ -92,8 +92,8 @@ O script [run_generation.py](https://github.com/huggingface/transformers/tree/ma
Os seguintes modelos XLM não requerem o uso de language embeddings durante a inferência:
-- `xlm-mlm-17-1280` (Modelagem de linguagem com máscara, 17 idiomas)
-- `xlm-mlm-100-1280` (Modelagem de linguagem com máscara, 100 idiomas)
+- `FacebookAI/xlm-mlm-17-1280` (Modelagem de linguagem com máscara, 17 idiomas)
+- `FacebookAI/xlm-mlm-100-1280` (Modelagem de linguagem com máscara, 100 idiomas)
Estes modelos são utilizados para representações genéricas de frase diferentemente dos checkpoints XLM anteriores.
@@ -101,8 +101,8 @@ Estes modelos são utilizados para representações genéricas de frase diferent
Os seguintes modelos do BERT podem ser utilizados para tarefas multilinguísticas:
-- `bert-base-multilingual-uncased` (Modelagem de linguagem com máscara + Previsão de frases, 102 idiomas)
-- `bert-base-multilingual-cased` (Modelagem de linguagem com máscara + Previsão de frases, 104 idiomas)
+- `google-bert/bert-base-multilingual-uncased` (Modelagem de linguagem com máscara + Previsão de frases, 102 idiomas)
+- `google-bert/bert-base-multilingual-cased` (Modelagem de linguagem com máscara + Previsão de frases, 104 idiomas)
Estes modelos não requerem language embeddings durante a inferência. Devem identificar a linguagem a partir
do contexto e realizar a inferência em sequência.
@@ -111,8 +111,8 @@ do contexto e realizar a inferência em sequência.
Os seguintes modelos do XLM-RoBERTa podem ser utilizados para tarefas multilinguísticas:
-- `xlm-roberta-base` (Modelagem de linguagem com máscara, 100 idiomas)
-- `xlm-roberta-large` Modelagem de linguagem com máscara, 100 idiomas)
+- `FacebookAI/xlm-roberta-base` (Modelagem de linguagem com máscara, 100 idiomas)
+- `FacebookAI/xlm-roberta-large` Modelagem de linguagem com máscara, 100 idiomas)
O XLM-RoBERTa foi treinado com 2,5 TB de dados do CommonCrawl recém-criados e testados em 100 idiomas.
Proporciona fortes vantagens sobre os modelos multilinguísticos publicados anteriormente como o mBERT e o XLM em tarefas
diff --git a/docs/source/pt/pipeline_tutorial.md b/docs/source/pt/pipeline_tutorial.md
index b2294863013..9c0cb3567e7 100644
--- a/docs/source/pt/pipeline_tutorial.md
+++ b/docs/source/pt/pipeline_tutorial.md
@@ -85,8 +85,8 @@ para uma tarefa de modelagem de linguagem causal:
```py
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
->>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
->>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
+>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
Crie uma [`pipeline`] para a sua tarefa e especifíque o modelo e o tokenizador que foram carregados:
diff --git a/docs/source/pt/quicktour.md b/docs/source/pt/quicktour.md
index 67c511169e3..d34480ee23a 100644
--- a/docs/source/pt/quicktour.md
+++ b/docs/source/pt/quicktour.md
@@ -87,7 +87,7 @@ Importe [`pipeline`] e especifique a tarefa que deseja completar:
>>> classifier = pipeline("sentiment-analysis")
```
-A pipeline baixa and armazena um [modelo pré-treinado](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) padrão e tokenizer para análise sentimental. Agora você pode usar `classifier` no texto alvo:
+A pipeline baixa and armazena um [modelo pré-treinado](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) padrão e tokenizer para análise sentimental. Agora você pode usar `classifier` no texto alvo:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
diff --git a/docs/source/pt/run_scripts.md b/docs/source/pt/run_scripts.md
index ff3110817e8..a64ad72f1db 100644
--- a/docs/source/pt/run_scripts.md
+++ b/docs/source/pt/run_scripts.md
@@ -88,11 +88,11 @@ pip install -r requirements.txt
-O script de exemplo baixa e pré-processa um conjunto de dados da biblioteca 🤗 [Datasets](https://huggingface.co/docs/datasets/). Em seguida, o script ajusta um conjunto de dados com o [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) em uma arquitetura que oferece suporte à sumarização. O exemplo a seguir mostra como ajustar [T5-small](https://huggingface.co/t5-small) no conjunto de dados [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail). O modelo T5 requer um argumento `source_prefix` adicional devido à forma como foi treinado. Este prompt informa ao T5 que esta é uma tarefa de sumarização.
+O script de exemplo baixa e pré-processa um conjunto de dados da biblioteca 🤗 [Datasets](https://huggingface.co/docs/datasets/). Em seguida, o script ajusta um conjunto de dados com o [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) em uma arquitetura que oferece suporte à sumarização. O exemplo a seguir mostra como ajustar [T5-small](https://huggingface.co/google-t5/t5-small) no conjunto de dados [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail). O modelo T5 requer um argumento `source_prefix` adicional devido à forma como foi treinado. Este prompt informa ao T5 que esta é uma tarefa de sumarização.
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -106,11 +106,11 @@ python examples/pytorch/summarization/run_summarization.py \
```
-Este outro script de exemplo baixa e pré-processa um conjunto de dados da biblioteca 🤗 [Datasets](https://huggingface.co/docs/datasets/). Em seguida, o script ajusta um conjunto de dados usando Keras em uma arquitetura que oferece suporte à sumarização. O exemplo a seguir mostra como ajustar [T5-small](https://huggingface.co/t5-small) no conjunto de dados [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail). O modelo T5 requer um argumento `source_prefix` adicional devido à forma como foi treinado. Este prompt informa ao T5 que esta é uma tarefa de sumarização.
+Este outro script de exemplo baixa e pré-processa um conjunto de dados da biblioteca 🤗 [Datasets](https://huggingface.co/docs/datasets/). Em seguida, o script ajusta um conjunto de dados usando Keras em uma arquitetura que oferece suporte à sumarização. O exemplo a seguir mostra como ajustar [T5-small](https://huggingface.co/google-t5/t5-small) no conjunto de dados [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail). O modelo T5 requer um argumento `source_prefix` adicional devido à forma como foi treinado. Este prompt informa ao T5 que esta é uma tarefa de sumarização.
```bash
python examples/tensorflow/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -134,7 +134,7 @@ O [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) ofere
torchrun \
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
--fp16 \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -158,7 +158,7 @@ As Unidades de Processamento de Tensor (TPUs) são projetadas especificamente pa
```bash
python xla_spawn.py --num_cores 8 \
summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -178,7 +178,7 @@ As Unidades de Processamento de Tensor (TPUs) são projetadas especificamente pa
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -217,7 +217,7 @@ Agora você está pronto para iniciar o treinamento:
```bash
accelerate launch run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -236,7 +236,7 @@ Um script para sumarização usando um conjunto de dados customizado ficaria ass
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -261,7 +261,7 @@ Geralmente, é uma boa ideia executar seu script em um número menor de exemplos
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--max_train_samples 50 \
--max_eval_samples 50 \
--max_predict_samples 50 \
@@ -291,7 +291,7 @@ O primeiro método usa o argumento `output_dir previous_output_dir` para retomar
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -308,7 +308,7 @@ O segundo método usa o argumento `resume_from_checkpoint path_to_specific_check
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -338,7 +338,7 @@ O exemplo a seguir mostra como fazer upload de um modelo com um nome de reposit
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
diff --git a/docs/source/pt/serialization.md b/docs/source/pt/serialization.md
index d5a21c7f890..9e390f07bde 100644
--- a/docs/source/pt/serialization.md
+++ b/docs/source/pt/serialization.md
@@ -146,7 +146,7 @@ optional arguments:
A exportação de um checkpoint usando uma configuração pronta pode ser feita da seguinte forma:
```bash
-python -m transformers.onnx --model=distilbert-base-uncased onnx/
+python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
Você deve ver os seguintes logs:
@@ -161,7 +161,7 @@ All good, model saved at: onnx/model.onnx
```
Isso exporta um grafo ONNX do ponto de verificação definido pelo argumento `--model`. Nisso
-Por exemplo, é `distilbert-base-uncased`, mas pode ser qualquer checkpoint no Hugging
+Por exemplo, é `distilbert/distilbert-base-uncased`, mas pode ser qualquer checkpoint no Hugging
Face Hub ou um armazenado localmente.
O arquivo `model.onnx` resultante pode ser executado em um dos [muitos
@@ -173,7 +173,7 @@ Tempo de execução](https://onnxruntime.ai/) da seguinte forma:
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
@@ -207,8 +207,8 @@ arquivos tokenizer armazenados em um diretório. Por exemplo, podemos carregar e
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> # Load tokenizer and PyTorch weights form the Hub
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
->>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
+>>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
>>> # Save to disk
>>> tokenizer.save_pretrained("local-pt-checkpoint")
>>> pt_model.save_pretrained("local-pt-checkpoint")
@@ -225,8 +225,8 @@ python -m transformers.onnx --model=local-pt-checkpoint onnx/
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> # Load tokenizer and TensorFlow weights from the Hub
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
->>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
+>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
>>> # Save to disk
>>> tokenizer.save_pretrained("local-tf-checkpoint")
>>> tf_model.save_pretrained("local-tf-checkpoint")
@@ -271,7 +271,7 @@ pacote `transformers.onnx`. Por exemplo, para exportar um modelo de classificaç
escolher um modelo ajustado no Hub e executar:
```bash
-python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english \
+python -m transformers.onnx --model=distilbert/distilbert-base-uncased-finetuned-sst-2-english \
--feature=sequence-classification onnx/
```
@@ -287,7 +287,7 @@ All good, model saved at: onnx/model.onnx
```
Observe que, neste caso, os nomes de saída do modelo ajustado são `logits`
-em vez do `last_hidden_state` que vimos com o checkpoint `distilbert-base-uncased`
+em vez do `last_hidden_state` que vimos com o checkpoint `distilbert/distilbert-base-uncased`
mais cedo. Isso é esperado, pois o modelo ajustado (fine-tuned) possui uma cabeça de classificação de sequência.
@@ -379,7 +379,7 @@ configuração do modelo base da seguinte forma:
```python
>>> from transformers import AutoConfig
->>> config = AutoConfig.from_pretrained("distilbert-base-uncased")
+>>> config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased")
>>> onnx_config = DistilBertOnnxConfig(config)
```
@@ -410,7 +410,7 @@ de classificação, poderíamos usar:
```python
>>> from transformers import AutoConfig
->>> config = AutoConfig.from_pretrained("distilbert-base-uncased")
+>>> config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased")
>>> onnx_config_for_seq_clf = DistilBertOnnxConfig(config, task="sequence-classification")
>>> print(onnx_config_for_seq_clf.outputs)
OrderedDict([('logits', {0: 'batch'})])
@@ -437,7 +437,7 @@ e o caminho para salvar o arquivo exportado:
>>> from transformers import AutoTokenizer, AutoModel
>>> onnx_path = Path("model.onnx")
->>> model_ckpt = "distilbert-base-uncased"
+>>> model_ckpt = "distilbert/distilbert-base-uncased"
>>> base_model = AutoModel.from_pretrained(model_ckpt)
>>> tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
diff --git a/docs/source/pt/tasks/sequence_classification.md b/docs/source/pt/tasks/sequence_classification.md
index 02647f68f88..e7776894f87 100644
--- a/docs/source/pt/tasks/sequence_classification.md
+++ b/docs/source/pt/tasks/sequence_classification.md
@@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
A classificação de texto é uma tarefa comum de NLP que atribui um rótulo ou classe a um texto. Existem muitas aplicações práticas de classificação de texto amplamente utilizadas em produção por algumas das maiores empresas da atualidade. Uma das formas mais populares de classificação de texto é a análise de sentimento, que atribui um rótulo como positivo, negativo ou neutro a um texto.
-Este guia mostrará como realizar o fine-tuning do [DistilBERT](https://huggingface.co/distilbert-base-uncased) no conjunto de dados [IMDb](https://huggingface.co/datasets/imdb) para determinar se a crítica de filme é positiva ou negativa.
+Este guia mostrará como realizar o fine-tuning do [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) no conjunto de dados [IMDb](https://huggingface.co/datasets/imdb) para determinar se a crítica de filme é positiva ou negativa.
@@ -60,7 +60,7 @@ Carregue o tokenizador do DistilBERT para processar o campo `text`:
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
Crie uma função de pré-processamento para tokenizar o campo `text` e truncar as sequências para que não sejam maiores que o comprimento máximo de entrada do DistilBERT:
@@ -104,7 +104,7 @@ Carregue o DistilBERT com [`AutoModelForSequenceClassification`] junto com o nú
```py
>>> from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
->>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased", num_labels=2)
```
@@ -190,7 +190,7 @@ Carregue o DistilBERT com [`TFAutoModelForSequenceClassification`] junto com o n
```py
>>> from transformers import TFAutoModelForSequenceClassification
->>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased", num_labels=2)
```
Configure o modelo para treinamento com o método [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
diff --git a/docs/source/pt/tasks/token_classification.md b/docs/source/pt/tasks/token_classification.md
index 316d6a81021..3465680dcc2 100644
--- a/docs/source/pt/tasks/token_classification.md
+++ b/docs/source/pt/tasks/token_classification.md
@@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
A classificação de tokens atribui um rótulo a tokens individuais em uma frase. Uma das tarefas de classificação de tokens mais comuns é o Reconhecimento de Entidade Nomeada, também chamada de NER (sigla em inglês para Named Entity Recognition). O NER tenta encontrar um rótulo para cada entidade em uma frase, como uma pessoa, local ou organização.
-Este guia mostrará como realizar o fine-tuning do [DistilBERT](https://huggingface.co/distilbert-base-uncased) no conjunto de dados [WNUT 17](https://huggingface.co/datasets/wnut_17) para detectar novas entidades.
+Este guia mostrará como realizar o fine-tuning do [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) no conjunto de dados [WNUT 17](https://huggingface.co/datasets/wnut_17) para detectar novas entidades.
@@ -85,7 +85,7 @@ Carregue o tokenizer do DistilBERT para processar os `tokens`:
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
Como a entrada já foi dividida em palavras, defina `is_split_into_words=True` para tokenizar as palavras em subpalavras:
@@ -162,7 +162,7 @@ Carregue o DistilBERT com o [`AutoModelForTokenClassification`] junto com o núm
```py
>>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
->>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=14)
+>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased", num_labels=14)
```
@@ -246,7 +246,7 @@ Carregue o DistilBERT com o [`TFAutoModelForTokenClassification`] junto com o n
```py
>>> from transformers import TFAutoModelForTokenClassification
->>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased", num_labels=2)
```
Configure o modelo para treinamento com o método [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
diff --git a/docs/source/pt/training.md b/docs/source/pt/training.md
index 6e39a46b164..49f57dead24 100644
--- a/docs/source/pt/training.md
+++ b/docs/source/pt/training.md
@@ -58,7 +58,7 @@ todo o dataset.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> def tokenize_function(examples):
@@ -93,7 +93,7 @@ sabemos ter 5 labels usamos o seguinte código:
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
@@ -232,7 +232,7 @@ Carregue um modelo do TensorFlow com o número esperado de rótulos:
>>> import tensorflow as tf
>>> from transformers import TFAutoModelForSequenceClassification
->>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
A seguir, compile e ajuste o fine-tuning a seu modelo com [`fit`](https://keras.io/api/models/model_training_apis/) como
@@ -311,7 +311,7 @@ Carregue seu modelo com o número de labels esperados:
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
### Otimização e configuração do Learning Rate
diff --git a/docs/source/te/quicktour.md b/docs/source/te/quicktour.md
index 862ec416da8..75efa841128 100644
--- a/docs/source/te/quicktour.md
+++ b/docs/source/te/quicktour.md
@@ -81,7 +81,7 @@ Here is the translation in Telugu:
>>> classifier = pipeline("sentiment-analysis")
```
-సెంటిమెంట్ విశ్లేషణ కోసం [`pipeline`] డిఫాల్ట్ [ప్రీట్రైన్డ్ మోడల్](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) మరియు టోకెనైజర్ని డౌన్లోడ్ చేస్తుంది మరియు కాష్ చేస్తుంది. ఇప్పుడు మీరు మీ లక్ష్య వచనంలో `classifier`ని ఉపయోగించవచ్చు:
+సెంటిమెంట్ విశ్లేషణ కోసం [`pipeline`] డిఫాల్ట్ [ప్రీట్రైన్డ్ మోడల్](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) మరియు టోకెనైజర్ని డౌన్లోడ్ చేస్తుంది మరియు కాష్ చేస్తుంది. ఇప్పుడు మీరు మీ లక్ష్య వచనంలో `classifier`ని ఉపయోగించవచ్చు:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
@@ -389,7 +389,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoConfig
->>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
+>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
@@ -425,7 +425,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoModelForSequenceClassification
- >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. [`TrainingArguments`] మీరు నేర్చుకునే రేటు, బ్యాచ్ పరిమాణం మరియు శిక్షణ పొందవలసిన యుగాల సంఖ్య వంటి మార్చగల మోడల్ హైపర్పారామీటర్లను కలిగి ఉంది. మీరు ఎలాంటి శిక్షణా వాదనలను పేర్కొనకుంటే డిఫాల్ట్ విలువలు ఉపయోగించబడతాయి:
@@ -446,7 +446,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
4. డేటాసెట్ను లోడ్ చేయండి:
@@ -517,7 +517,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import TFAutoModelForSequenceClassification
- >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. టోకెనైజర్, ఇమేజ్ ప్రాసెసర్, ఫీచర్ ఎక్స్ట్రాక్టర్ లేదా ప్రాసెసర్ వంటి ప్రీప్రాసెసింగ్ క్లాస్ని లోడ్ చేయండి:
@@ -525,7 +525,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
3. డేటాసెట్ను టోకనైజ్ చేయడానికి ఒక ఫంక్షన్ను సృష్టించండి:
diff --git a/docs/source/zh/autoclass_tutorial.md b/docs/source/zh/autoclass_tutorial.md
index 936080a8315..7205aa0872d 100644
--- a/docs/source/zh/autoclass_tutorial.md
+++ b/docs/source/zh/autoclass_tutorial.md
@@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
-请记住,架构指的是模型的结构,而checkpoints是给定架构的权重。例如,[BERT](https://huggingface.co/bert-base-uncased)是一种架构,而`bert-base-uncased`是一个checkpoint。模型是一个通用术语,可以指代架构或checkpoint。
+请记住,架构指的是模型的结构,而checkpoints是给定架构的权重。例如,[BERT](https://huggingface.co/google-bert/bert-base-uncased)是一种架构,而`google-bert/bert-base-uncased`是一个checkpoint。模型是一个通用术语,可以指代架构或checkpoint。
@@ -43,7 +43,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
然后按照如下方式对输入进行分词:
@@ -104,7 +104,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
轻松地重复使用相同的checkpoint来为不同任务加载模型架构:
@@ -113,7 +113,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForTokenClassification
->>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
+>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -133,7 +133,7 @@ TensorFlow和Flax的checkpoints不受影响,并且可以在PyTorch架构中使
```py
>>> from transformers import TFAutoModelForSequenceClassification
->>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
轻松地重复使用相同的checkpoint来为不同任务加载模型架构:
@@ -141,7 +141,7 @@ TensorFlow和Flax的checkpoints不受影响,并且可以在PyTorch架构中使
```py
>>> from transformers import TFAutoModelForTokenClassification
->>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
+>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
一般来说,我们推荐使用`AutoTokenizer`类和`TFAutoModelFor`类来加载模型的预训练实例。这样可以确保每次加载正确的架构。在下一个[教程](preprocessing)中,学习如何使用新加载的`tokenizer`, `image processor`, `feature extractor`和`processor`对数据集进行预处理以进行微调。
diff --git a/docs/source/zh/big_models.md b/docs/source/zh/big_models.md
index ccb8b7ecbba..2215c706618 100644
--- a/docs/source/zh/big_models.md
+++ b/docs/source/zh/big_models.md
@@ -42,7 +42,7 @@ rendered properly in your Markdown viewer.
```py
from transformers import AutoModel
-model = AutoModel.from_pretrained("bert-base-cased")
+model = AutoModel.from_pretrained("google-bert/bert-base-cased")
```
如果您使用 [`PreTrainedModel.save_pretrained`](模型预训练保存) 进行保存,您将得到一个新的文件夹,其中包含两个文件:模型的配置和权重:
diff --git a/docs/source/zh/create_a_model.md b/docs/source/zh/create_a_model.md
index 9b36d539762..fd07497e7ab 100644
--- a/docs/source/zh/create_a_model.md
+++ b/docs/source/zh/create_a_model.md
@@ -87,7 +87,7 @@ DistilBertConfig {
预训练模型的属性可以在 [`~PretrainedConfig.from_pretrained`] 函数中进行修改:
```py
->>> my_config = DistilBertConfig.from_pretrained("distilbert-base-uncased", activation="relu", attention_dropout=0.4)
+>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
当你对模型配置满意时,可以使用 [`~PretrainedConfig.save_pretrained`] 来保存配置。你的配置文件将以 JSON 文件的形式存储在指定的保存目录中:
@@ -128,13 +128,13 @@ DistilBertConfig {
使用 [`~PreTrainedModel.from_pretrained`] 创建预训练模型:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
当加载预训练权重时,如果模型是由 🤗 Transformers 提供的,将自动加载默认模型配置。然而,如果你愿意,仍然可以将默认模型配置的某些或者所有属性替换成你自己的配置:
```py
->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -152,13 +152,13 @@ DistilBertConfig {
使用 [`~TFPreTrainedModel.from_pretrained`] 创建预训练模型:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
当加载预训练权重时,如果模型是由 🤗 Transformers 提供的,将自动加载默认模型配置。然而,如果你愿意,仍然可以将默认模型配置的某些或者所有属性替换成自己的配置:
```py
->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
+>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config)
```
@@ -174,7 +174,7 @@ DistilBertConfig {
```py
>>> from transformers import DistilBertForSequenceClassification
->>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
通过切换到不同的模型头,可以轻松地将此检查点重复用于其他任务。对于问答任务,你可以使用 [`DistilBertForQuestionAnswering`] 模型头。问答头(question answering head)与序列分类头类似,不同点在于它是隐藏状态输出之上的线性层。
@@ -182,7 +182,7 @@ DistilBertConfig {
```py
>>> from transformers import DistilBertForQuestionAnswering
->>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -191,7 +191,7 @@ DistilBertConfig {
```py
>>> from transformers import TFDistilBertForSequenceClassification
->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
通过切换到不同的模型头,可以轻松地将此检查点重复用于其他任务。对于问答任务,你可以使用 [`TFDistilBertForQuestionAnswering`] 模型头。问答头(question answering head)与序列分类头类似,不同点在于它是隐藏状态输出之上的线性层。
@@ -199,7 +199,7 @@ DistilBertConfig {
```py
>>> from transformers import TFDistilBertForQuestionAnswering
->>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
+>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
@@ -232,7 +232,7 @@ DistilBertConfig {
```py
>>> from transformers import DistilBertTokenizer
->>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
+>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
使用 [`DistilBertTokenizerFast`] 类创建快速分词器:
@@ -240,7 +240,7 @@ DistilBertConfig {
```py
>>> from transformers import DistilBertTokenizerFast
->>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
+>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
```
diff --git a/docs/source/zh/installation.md b/docs/source/zh/installation.md
index 0ce10ba5290..91e09dc904b 100644
--- a/docs/source/zh/installation.md
+++ b/docs/source/zh/installation.md
@@ -180,14 +180,14 @@ conda install conda-forge::transformers
例如,你通常会使用以下命令对外部实例进行防火墙保护的的普通网络上运行程序:
```bash
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
在离线环境中运行相同的程序:
```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
现在脚本可以应该正常运行,而无需挂起或等待超时,因为它知道只应查找本地文件。
diff --git a/docs/source/zh/internal/generation_utils.md b/docs/source/zh/internal/generation_utils.md
index a8e191f1ca9..34e9bf2f787 100644
--- a/docs/source/zh/internal/generation_utils.md
+++ b/docs/source/zh/internal/generation_utils.md
@@ -36,8 +36,8 @@ rendered properly in your Markdown viewer.
```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
-tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
-model = GPT2LMHeadModel.from_pretrained("gpt2")
+tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
diff --git a/docs/source/zh/main_classes/deepspeed.md b/docs/source/zh/main_classes/deepspeed.md
index 85c5d017ef3..75a0a13df75 100644
--- a/docs/source/zh/main_classes/deepspeed.md
+++ b/docs/source/zh/main_classes/deepspeed.md
@@ -178,7 +178,7 @@ deepspeed --num_gpus=2 your_program.py --deepspeed ds_config.js
```bash
deepspeed examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero3.json \
---model_name_or_path t5-small --per_device_train_batch_size 1 \
+--model_name_or_path google-t5/t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir --fp16 \
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
@@ -201,7 +201,7 @@ deepspeed examples/pytorch/translation/run_translation.py \
```bash
deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero2.json \
---model_name_or_path t5-small --per_device_train_batch_size 1 \
+--model_name_or_path google-t5/t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir --fp16 \
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
@@ -1628,7 +1628,7 @@ from transformers import T5ForConditionalGeneration, T5Config
import deepspeed
with deepspeed.zero.Init():
- config = T5Config.from_pretrained("t5-small")
+ config = T5Config.from_pretrained("google-t5/t5-small")
model = T5ForConditionalGeneration(config)
```
@@ -1640,7 +1640,7 @@ with deepspeed.zero.Init():
from transformers import AutoModel, Trainer, TrainingArguments
training_args = TrainingArguments(..., deepspeed=ds_config)
-model = AutoModel.from_pretrained("t5-small")
+model = AutoModel.from_pretrained("google-t5/t5-small")
trainer = Trainer(model=model, args=training_args, ...)
```
@@ -1690,7 +1690,7 @@ deepspeed --num_gpus=2 your_program.py --do_eval --deepspeed ds
```bash
deepspeed examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero3.json \
---model_name_or_path t5-small --output_dir output_dir \
+--model_name_or_path google-t5/t5-small --output_dir output_dir \
--do_eval --max_eval_samples 50 --warmup_steps 50 \
--max_source_length 128 --val_max_target_length 128 \
--overwrite_output_dir --per_device_eval_batch_size 4 \
@@ -1870,7 +1870,7 @@ import deepspeed
ds_config = {...} # deepspeed config object or path to the file
# must run before instantiating the model to detect zero 3
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
-model = AutoModel.from_pretrained("gpt2")
+model = AutoModel.from_pretrained("openai-community/gpt2")
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
```
@@ -1884,7 +1884,7 @@ import deepspeed
ds_config = {...} # deepspeed config object or path to the file
# must run before instantiating the model to detect zero 3
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
-config = AutoConfig.from_pretrained("gpt2")
+config = AutoConfig.from_pretrained("openai-community/gpt2")
model = AutoModel.from_config(config)
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
```
diff --git a/docs/source/zh/main_classes/output.md b/docs/source/zh/main_classes/output.md
index 1619e27219d..f4d5c3c6941 100644
--- a/docs/source/zh/main_classes/output.md
+++ b/docs/source/zh/main_classes/output.md
@@ -24,8 +24,8 @@ rendered properly in your Markdown viewer.
from transformers import BertTokenizer, BertForSequenceClassification
import torch
-tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
-model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
+tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
+model = BertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
diff --git a/docs/source/zh/main_classes/pipelines.md b/docs/source/zh/main_classes/pipelines.md
index 82d6de8e716..3cef40478c3 100644
--- a/docs/source/zh/main_classes/pipelines.md
+++ b/docs/source/zh/main_classes/pipelines.md
@@ -39,7 +39,7 @@ pipelines是使用模型进行推理的一种简单方法。这些pipelines是
如果您想使用 [hub](https://huggingface.co) 上的特定模型,可以忽略任务,如果hub上的模型已经定义了该任务:
```python
->>> pipe = pipeline(model="roberta-large-mnli")
+>>> pipe = pipeline(model="FacebookAI/roberta-large-mnli")
>>> pipe("This restaurant is awesome")
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
```
diff --git a/docs/source/zh/main_classes/trainer.md b/docs/source/zh/main_classes/trainer.md
index 049a3724114..cb0262140cb 100644
--- a/docs/source/zh/main_classes/trainer.md
+++ b/docs/source/zh/main_classes/trainer.md
@@ -462,7 +462,7 @@ sudo ln -s /usr/bin/g++-7 /usr/local/cuda-10.2/bin/g++
export TASK_NAME=mrpc
python examples/pytorch/text-classification/run_glue.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
@@ -597,7 +597,7 @@ cd transformers
accelerate launch \
./examples/pytorch/text-classification/run_glue.py \
---model_name_or_path bert-base-cased \
+--model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
@@ -622,7 +622,7 @@ accelerate launch --num_processes=2 \
--fsdp_sharding_strategy=1 \
--fsdp_state_dict_type=FULL_STATE_DICT \
./examples/pytorch/text-classification/run_glue.py
---model_name_or_path bert-base-cased \
+--model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
diff --git a/docs/source/zh/model_sharing.md b/docs/source/zh/model_sharing.md
index fbea41a9039..e28a000c115 100644
--- a/docs/source/zh/model_sharing.md
+++ b/docs/source/zh/model_sharing.md
@@ -235,4 +235,4 @@ pip install huggingface_hub
* 手动创建并上传一个`README.md`文件。
* 在你的模型仓库中点击**编辑模型卡片**按钮。
-可以参考DistilBert的[模型卡片](https://huggingface.co/distilbert-base-uncased)来了解模型卡片应该包含的信息类型。有关您可以在`README.md`文件中控制的更多选项的细节,例如模型的碳足迹或小部件示例,请参考文档[这里](https://huggingface.co/docs/hub/models-cards)。
\ No newline at end of file
+可以参考DistilBert的[模型卡片](https://huggingface.co/distilbert/distilbert-base-uncased)来了解模型卡片应该包含的信息类型。有关您可以在`README.md`文件中控制的更多选项的细节,例如模型的碳足迹或小部件示例,请参考文档[这里](https://huggingface.co/docs/hub/models-cards)。
\ No newline at end of file
diff --git a/docs/source/zh/multilingual.md b/docs/source/zh/multilingual.md
index 7e8ab1336d9..9c27bd5f335 100644
--- a/docs/source/zh/multilingual.md
+++ b/docs/source/zh/multilingual.md
@@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
[[open-in-colab]]
-🤗 Transformers 中有多种多语言模型,它们的推理用法与单语言模型不同。但是,并非*所有*的多语言模型用法都不同。一些模型,例如 [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) 就可以像单语言模型一样使用。本指南将向您展示如何使用不同用途的多语言模型进行推理。
+🤗 Transformers 中有多种多语言模型,它们的推理用法与单语言模型不同。但是,并非*所有*的多语言模型用法都不同。一些模型,例如 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased) 就可以像单语言模型一样使用。本指南将向您展示如何使用不同用途的多语言模型进行推理。
## XLM
@@ -28,24 +28,24 @@ XLM 有十个不同的检查点,其中只有一个是单语言的。剩下的
以下 XLM 模型使用语言嵌入来指定推理中使用的语言:
-- `xlm-mlm-ende-1024` (掩码语言建模,英语-德语)
-- `xlm-mlm-enfr-1024` (掩码语言建模,英语-法语)
-- `xlm-mlm-enro-1024` (掩码语言建模,英语-罗马尼亚语)
-- `xlm-mlm-xnli15-1024` (掩码语言建模,XNLI 数据集语言)
-- `xlm-mlm-tlm-xnli15-1024` (掩码语言建模+翻译,XNLI 数据集语言)
-- `xlm-clm-enfr-1024` (因果语言建模,英语-法语)
-- `xlm-clm-ende-1024` (因果语言建模,英语-德语)
+- `FacebookAI/xlm-mlm-ende-1024` (掩码语言建模,英语-德语)
+- `FacebookAI/xlm-mlm-enfr-1024` (掩码语言建模,英语-法语)
+- `FacebookAI/xlm-mlm-enro-1024` (掩码语言建模,英语-罗马尼亚语)
+- `FacebookAI/xlm-mlm-xnli15-1024` (掩码语言建模,XNLI 数据集语言)
+- `FacebookAI/xlm-mlm-tlm-xnli15-1024` (掩码语言建模+翻译,XNLI 数据集语言)
+- `FacebookAI/xlm-clm-enfr-1024` (因果语言建模,英语-法语)
+- `FacebookAI/xlm-clm-ende-1024` (因果语言建模,英语-德语)
语言嵌入被表示一个张量,其形状与传递给模型的 `input_ids` 相同。这些张量中的值取决于所使用的语言,并由分词器的 `lang2id` 和 `id2lang` 属性识别。
-在此示例中,加载 `xlm-clm-enfr-1024` 检查点(因果语言建模,英语-法语):
+在此示例中,加载 `FacebookAI/xlm-clm-enfr-1024` 检查点(因果语言建模,英语-法语):
```py
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
->>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
->>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
+>>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
+>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
```
分词器的 `lang2id` 属性显示了该模型的语言及其对应的id:
@@ -83,8 +83,8 @@ XLM 有十个不同的检查点,其中只有一个是单语言的。剩下的
以下 XLM 模型在推理时不需要语言嵌入:
-- `xlm-mlm-17-1280` (掩码语言建模,支持 17 种语言)
-- `xlm-mlm-100-1280` (掩码语言建模,支持 100 种语言)
+- `FacebookAI/xlm-mlm-17-1280` (掩码语言建模,支持 17 种语言)
+- `FacebookAI/xlm-mlm-100-1280` (掩码语言建模,支持 100 种语言)
与之前的 XLM 检查点不同,这些模型用于通用句子表示。
@@ -92,8 +92,8 @@ XLM 有十个不同的检查点,其中只有一个是单语言的。剩下的
以下 BERT 模型可用于多语言任务:
-- `bert-base-multilingual-uncased` (掩码语言建模 + 下一句预测,支持 102 种语言)
-- `bert-base-multilingual-cased` (掩码语言建模 + 下一句预测,支持 104 种语言)
+- `google-bert/bert-base-multilingual-uncased` (掩码语言建模 + 下一句预测,支持 102 种语言)
+- `google-bert/bert-base-multilingual-cased` (掩码语言建模 + 下一句预测,支持 104 种语言)
这些模型在推理时不需要语言嵌入。它们应该能够从上下文中识别语言并进行相应的推理。
@@ -101,8 +101,8 @@ XLM 有十个不同的检查点,其中只有一个是单语言的。剩下的
以下 XLM-RoBERTa 模型可用于多语言任务:
-- `xlm-roberta-base` (掩码语言建模,支持 100 种语言)
-- `xlm-roberta-large` (掩码语言建模,支持 100 种语言)
+- `FacebookAI/xlm-roberta-base` (掩码语言建模,支持 100 种语言)
+- `FacebookAI/xlm-roberta-large` (掩码语言建模,支持 100 种语言)
XLM-RoBERTa 使用 100 种语言的 2.5TB 新创建和清理的 CommonCrawl 数据进行了训练。与之前发布的 mBERT 或 XLM 等多语言模型相比,它在分类、序列标记和问答等下游任务上提供了更强大的优势。
diff --git a/docs/source/zh/perf_hardware.md b/docs/source/zh/perf_hardware.md
index e193e09cd8c..95a09eaab4e 100644
--- a/docs/source/zh/perf_hardware.md
+++ b/docs/source/zh/perf_hardware.md
@@ -136,7 +136,7 @@ GPU1 PHB X 0-11 N/A
# DDP w/ NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
@@ -145,7 +145,7 @@ rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
# DDP w/o NVLink
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
---nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 \
+--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
diff --git a/docs/source/zh/pipeline_tutorial.md b/docs/source/zh/pipeline_tutorial.md
index 01e621840cd..568f8bb6360 100644
--- a/docs/source/zh/pipeline_tutorial.md
+++ b/docs/source/zh/pipeline_tutorial.md
@@ -175,7 +175,7 @@ def data():
yield f"My example {i}"
-pipe = pipeline(model="gpt2", device=0)
+pipe = pipeline(model="openai-community/gpt2", device=0)
generated_characters = 0
for out in pipe(data()):
generated_characters += len(out[0]["generated_text"])
diff --git a/docs/source/zh/preprocessing.md b/docs/source/zh/preprocessing.md
index 266cf0e6b9e..b90c89b36d1 100644
--- a/docs/source/zh/preprocessing.md
+++ b/docs/source/zh/preprocessing.md
@@ -56,7 +56,7 @@ pip install datasets
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
```
然后将您的文本传递给`tokenizer`:
diff --git a/docs/source/zh/quicktour.md b/docs/source/zh/quicktour.md
index 75b5f398e94..c23a38ab5f0 100644
--- a/docs/source/zh/quicktour.md
+++ b/docs/source/zh/quicktour.md
@@ -73,7 +73,7 @@ pip install tensorflow
>>> classifier = pipeline("sentiment-analysis")
```
-[`pipeline`] 会下载并缓存一个用于情感分析的默认的[预训练模型](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)和分词器。现在你可以在目标文本上使用 `classifier` 了:
+[`pipeline`] 会下载并缓存一个用于情感分析的默认的[预训练模型](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)和分词器。现在你可以在目标文本上使用 `classifier` 了:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
@@ -379,7 +379,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoConfig
->>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
+>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
@@ -416,7 +416,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoModelForSequenceClassification
- >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. [`TrainingArguments`] 含有你可以修改的模型超参数,比如学习率,批次大小和训练时的迭代次数。如果你没有指定训练参数,那么它会使用默认值:
@@ -438,7 +438,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
4. 加载一个数据集:
@@ -506,7 +506,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import TFAutoModelForSequenceClassification
- >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+ >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
2. 一个预处理类,比如分词器,特征提取器或者处理器:
@@ -514,7 +514,7 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> from transformers import AutoTokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
3. 创建一个给数据集分词的函数
diff --git a/docs/source/zh/run_scripts.md b/docs/source/zh/run_scripts.md
index 0a0121c32f0..b6e9c8ea6a2 100644
--- a/docs/source/zh/run_scripts.md
+++ b/docs/source/zh/run_scripts.md
@@ -88,11 +88,11 @@ pip install -r requirements.txt
-示例脚本从🤗 [Datasets](https://huggingface.co/docs/datasets/)库下载并预处理数据集。然后,脚本通过[Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)使用支持摘要任务的架构对数据集进行微调。以下示例展示了如何在[CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail)数据集上微调[T5-small](https://huggingface.co/t5-small)。由于T5模型的训练方式,它需要一个额外的`source_prefix`参数。这个提示让T5知道这是一个摘要任务。
+示例脚本从🤗 [Datasets](https://huggingface.co/docs/datasets/)库下载并预处理数据集。然后,脚本通过[Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)使用支持摘要任务的架构对数据集进行微调。以下示例展示了如何在[CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail)数据集上微调[T5-small](https://huggingface.co/google-t5/t5-small)。由于T5模型的训练方式,它需要一个额外的`source_prefix`参数。这个提示让T5知道这是一个摘要任务。
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -107,11 +107,11 @@ python examples/pytorch/summarization/run_summarization.py \
-示例脚本从 🤗 [Datasets](https://huggingface.co/docs/datasets/) 库下载并预处理数据集。然后,脚本使用 Keras 在支持摘要的架构上微调数据集。以下示例展示了如何在 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 数据集上微调 [T5-small](https://huggingface.co/t5-small)。T5 模型由于训练方式需要额外的 `source_prefix` 参数。这个提示让 T5 知道这是一个摘要任务。
+示例脚本从 🤗 [Datasets](https://huggingface.co/docs/datasets/) 库下载并预处理数据集。然后,脚本使用 Keras 在支持摘要的架构上微调数据集。以下示例展示了如何在 [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) 数据集上微调 [T5-small](https://huggingface.co/google-t5/t5-small)。T5 模型由于训练方式需要额外的 `source_prefix` 参数。这个提示让 T5 知道这是一个摘要任务。
```bash
python examples/tensorflow/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -136,7 +136,7 @@ python examples/tensorflow/summarization/run_summarization.py \
torchrun \
--nproc_per_node 8 pytorch/summarization/run_summarization.py \
--fp16 \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -161,7 +161,7 @@ TensorFlow脚本使用[`MirroredStrategy`](https://www.tensorflow.org/guide/dist
```bash
python xla_spawn.py --num_cores 8 \
summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -181,7 +181,7 @@ python xla_spawn.py --num_cores 8 \
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
@@ -219,7 +219,7 @@ accelerate test
```bash
accelerate launch run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -238,7 +238,7 @@ accelerate launch run_summarization_no_trainer.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -264,7 +264,7 @@ python examples/pytorch/summarization/run_summarization.py \
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--max_train_samples 50 \
--max_eval_samples 50 \
--max_predict_samples 50 \
@@ -294,7 +294,7 @@ examples/pytorch/summarization/run_summarization.py -h
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -312,7 +312,7 @@ python examples/pytorch/summarization/run_summarization.py
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -343,7 +343,7 @@ huggingface-cli login
```bash
python examples/pytorch/summarization/run_summarization.py
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
diff --git a/docs/source/zh/serialization.md b/docs/source/zh/serialization.md
index 584befebe2d..b9cc74e5849 100644
--- a/docs/source/zh/serialization.md
+++ b/docs/source/zh/serialization.md
@@ -56,10 +56,10 @@ pip install optimum[exporters]
optimum-cli export onnx --help
```
-运行以下命令,以从 🤗 Hub 导出模型的检查点(checkpoint),以 `distilbert-base-uncased-distilled-squad` 为例:
+运行以下命令,以从 🤗 Hub 导出模型的检查点(checkpoint),以 `distilbert/distilbert-base-uncased-distilled-squad` 为例:
```bash
-optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
+optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
```
你应该能在日志中看到导出进度以及生成的 `model.onnx` 文件的保存位置,如下所示:
@@ -141,7 +141,7 @@ pip install transformers[onnx]
将 `transformers.onnx` 包作为 Python 模块使用,以使用现成的配置导出检查点:
```bash
-python -m transformers.onnx --model=distilbert-base-uncased onnx/
+python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
以上代码将导出由 `--model` 参数定义的检查点的 ONNX 图。传入任何 🤗 Hub 上或者存储与本地的检查点。生成的 `model.onnx` 文件可以在支持 ONNX 标准的众多加速引擎上运行。例如,使用 ONNX Runtime 加载并运行模型,如下所示:
@@ -150,7 +150,7 @@ python -m transformers.onnx --model=distilbert-base-uncased onnx/
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
->>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
diff --git a/docs/source/zh/task_summary.md b/docs/source/zh/task_summary.md
index da60f4a080a..8d088bfa71b 100644
--- a/docs/source/zh/task_summary.md
+++ b/docs/source/zh/task_summary.md
@@ -272,7 +272,7 @@ score: 0.9327, start: 30, end: 54, answer: huggingface/transformers
>>> from transformers import pipeline
>>> text = "translate English to French: Hugging Face is a community-based open-source platform for machine learning."
->>> translator = pipeline(task="translation", model="t5-small")
+>>> translator = pipeline(task="translation", model="google-t5/t5-small")
>>> translator(text)
[{'translation_text': "Hugging Face est une tribune communautaire de l'apprentissage des machines."}]
```
diff --git a/docs/source/zh/tf_xla.md b/docs/source/zh/tf_xla.md
index da8d13d8d04..2e5b444d876 100644
--- a/docs/source/zh/tf_xla.md
+++ b/docs/source/zh/tf_xla.md
@@ -86,8 +86,8 @@ from transformers.utils import check_min_version
check_min_version("4.21.0")
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
# One line to create an XLA generation function
@@ -115,8 +115,8 @@ print(f"Generated -- {decoded_text}")
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
input_string = ["TensorFlow is"]
xla_generate = tf.function(model.generate, jit_compile=True)
@@ -136,8 +136,8 @@ import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="")
-model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2", padding_side="left", pad_token="")
+model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
xla_generate = tf.function(model.generate, jit_compile=True)
diff --git a/docs/source/zh/tflite.md b/docs/source/zh/tflite.md
index bf47d411447..f0280156def 100644
--- a/docs/source/zh/tflite.md
+++ b/docs/source/zh/tflite.md
@@ -32,10 +32,10 @@ pip install optimum[exporters-tf]
optimum-cli export tflite --help
```
-运行以下命令,以从 🤗 Hub 导出模型的检查点(checkpoint),以 `bert-base-uncased` 为例:
+运行以下命令,以从 🤗 Hub 导出模型的检查点(checkpoint),以 `google-bert/bert-base-uncased` 为例:
```bash
-optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/
+optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
```
你应该能在日志中看到导出进度以及生成的 `model.tflite` 文件的保存位置,如下所示:
diff --git a/docs/source/zh/tokenizer_summary.md b/docs/source/zh/tokenizer_summary.md
index d3a4cf7a330..c349154f961 100644
--- a/docs/source/zh/tokenizer_summary.md
+++ b/docs/source/zh/tokenizer_summary.md
@@ -92,7 +92,7 @@ and [SentencePiece](#sentencepiece),并且给出了示例,哪个模型用到
```py
>>> from transformers import BertTokenizer
->>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> tokenizer.tokenize("I have a new GPU!")
["i", "have", "a", "new", "gp", "##u", "!"]
```
@@ -106,7 +106,7 @@ token应该附着在前面那个token的后面,不带空格的附着(分词
```py
>>> from transformers import XLNetTokenizer
->>> tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")
+>>> tokenizer = XLNetTokenizer.from_pretrained("xlnet/xlnet-base-cased")
>>> tokenizer.tokenize("Don't you love 🤗 Transformers? We sure do.")
["▁Don", "'", "t", "▁you", "▁love", "▁", "🤗", "▁", "Transform", "ers", "?", "▁We", "▁sure", "▁do", "."]
```
diff --git a/docs/source/zh/training.md b/docs/source/zh/training.md
index 89908130fe3..773c58181c3 100644
--- a/docs/source/zh/training.md
+++ b/docs/source/zh/training.md
@@ -48,7 +48,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoTokenizer
->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> def tokenize_function(examples):
@@ -85,7 +85,7 @@ rendered properly in your Markdown viewer.
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
@@ -180,7 +180,7 @@ dataset = dataset["train"] # Just take the training split for now
```py
from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
@@ -194,7 +194,7 @@ from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# Load and compile our model
-model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
+model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5)) # No loss argument!
@@ -306,7 +306,7 @@ torch.cuda.empty_cache()
```py
>>> from transformers import AutoModelForSequenceClassification
->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
+>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
### Optimizer and learning rate scheduler
diff --git a/examples/README.md b/examples/README.md
index 3a18950064b..a38b4576b35 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -118,8 +118,8 @@ pip install runhouse
# For an on-demand V100 with whichever cloud provider you have configured:
python run_on_remote.py \
--example pytorch/text-generation/run_generation.py \
- --model_type=gpt2 \
- --model_name_or_path=gpt2 \
+ --model_type=openai-community/gpt2 \
+ --model_name_or_path=openai-community/gpt2 \
--prompt "I am a language model and"
# For byo (bring your own) cluster:
diff --git a/examples/flax/image-captioning/README.md b/examples/flax/image-captioning/README.md
index b76dc4cd057..dd2b4206392 100644
--- a/examples/flax/image-captioning/README.md
+++ b/examples/flax/image-captioning/README.md
@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
python3 create_model_from_encoder_decoder_models.py \
--output_dir model \
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
- --decoder_model_name_or_path gpt2
+ --decoder_model_name_or_path openai-community/gpt2
```
### Train the model
diff --git a/examples/flax/language-modeling/README.md b/examples/flax/language-modeling/README.md
index e687c76a9cc..cb8671147ff 100644
--- a/examples/flax/language-modeling/README.md
+++ b/examples/flax/language-modeling/README.md
@@ -28,7 +28,7 @@ way which enables simple and efficient model parallelism.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax can be leveraged
-to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
+to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -76,13 +76,13 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
-as loading and storing [`**roberta-base**`](https://huggingface.co/roberta-base)
+as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
in the local model folder:
```python
from transformers import RobertaConfig
-config = RobertaConfig.from_pretrained("roberta-base", vocab_size=50265)
+config = RobertaConfig.from_pretrained("FacebookAI/roberta-base", vocab_size=50265)
config.save_pretrained("./norwegian-roberta-base")
```
@@ -129,8 +129,8 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
In the following, we demonstrate how to train an auto-regressive causal transformer model
in JAX/Flax.
-More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
-to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
+More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
+to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
in Norwegian on a single TPUv3-8 pod.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -179,13 +179,13 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
### Create configuration
Next, we create the model's configuration file. This is as simple
-as loading and storing [`**gpt2**`](https://huggingface.co/gpt2)
+as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
in the local model folder:
```python
from transformers import GPT2Config
-config = GPT2Config.from_pretrained("gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
+config = GPT2Config.from_pretrained("openai-community/gpt2", resid_pdrop=0.0, embd_pdrop=0.0, attn_pdrop=0.0, vocab_size=50257)
config.save_pretrained("./norwegian-gpt2")
```
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
```bash
python run_clm_flax.py \
--output_dir="./norwegian-gpt2" \
- --model_type="gpt2" \
+ --model_type="openai-community/gpt2" \
--config_name="./norwegian-gpt2" \
--tokenizer_name="./norwegian-gpt2" \
--dataset_name="oscar" \
diff --git a/examples/flax/question-answering/README.md b/examples/flax/question-answering/README.md
index 822342a99e2..2f6caa984d4 100644
--- a/examples/flax/question-answering/README.md
+++ b/examples/flax/question-answering/README.md
@@ -29,7 +29,7 @@ The following example fine-tunes BERT on SQuAD:
```bash
python run_qa.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -67,7 +67,7 @@ Here is an example training on 4 TITAN RTX GPUs and Bert Whole Word Masking unca
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python run_qa.py \
---model_name_or_path bert-large-uncased-whole-word-masking \
+--model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
diff --git a/examples/flax/test_flax_examples.py b/examples/flax/test_flax_examples.py
index 47ac66de118..9fc424c1a75 100644
--- a/examples/flax/test_flax_examples.py
+++ b/examples/flax/test_flax_examples.py
@@ -78,7 +78,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_glue.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
@@ -101,7 +101,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm_flax.py
- --model_name_or_path distilgpt2
+ --model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -125,7 +125,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--test_file tests/fixtures/tests_samples/xsum/sample.json
@@ -155,7 +155,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
- --model_name_or_path distilroberta-base
+ --model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -179,7 +179,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_t5_mlm_flax.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -206,7 +206,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_flax_ner.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -233,7 +233,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
diff --git a/examples/flax/text-classification/README.md b/examples/flax/text-classification/README.md
index 8d43ab7725a..65e50a075b7 100644
--- a/examples/flax/text-classification/README.md
+++ b/examples/flax/text-classification/README.md
@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
export TASK_NAME=mrpc
python run_flax_glue.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--task_name ${TASK_NAME} \
--max_seq_length 128 \
--learning_rate 2e-5 \
diff --git a/examples/flax/token-classification/README.md b/examples/flax/token-classification/README.md
index 915cf6ae20f..1f817507214 100644
--- a/examples/flax/token-classification/README.md
+++ b/examples/flax/token-classification/README.md
@@ -25,7 +25,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_flax_ner.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--max_seq_length 128 \
--learning_rate 2e-5 \
diff --git a/examples/legacy/benchmarking/README.md b/examples/legacy/benchmarking/README.md
index 7099ed9f6b3..03e174770d1 100644
--- a/examples/legacy/benchmarking/README.md
+++ b/examples/legacy/benchmarking/README.md
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
| Benchmark description | Results | Environment info | Author |
|:----------|:-------------|:-------------|------:|
-| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
-| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
diff --git a/examples/legacy/question-answering/README.md b/examples/legacy/question-answering/README.md
index 905fabf35bd..339837c94f5 100644
--- a/examples/legacy/question-answering/README.md
+++ b/examples/legacy/question-answering/README.md
@@ -1,7 +1,7 @@
#### Fine-tuning BERT on SQuAD1.0 with relative position embeddings
The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model
-`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
+`google-bert/bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model
training, but with different relative position embeddings.
@@ -10,7 +10,7 @@ Shaw et al., [Self-Attention with Relative Position Representations](https://arx
* `zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4
in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
* `zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model
-`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
+`google-bert/bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
[Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
@@ -61,7 +61,7 @@ torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
--gradient_accumulation_steps 3
```
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
-`bert-large-uncased-whole-word-masking`.
+`google-bert/bert-large-uncased-whole-word-masking`.
#### Distributed training
@@ -69,7 +69,7 @@ Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word
```bash
torchrun --nproc_per_node=8 ./examples/question-answering/run_squad.py \
- --model_name_or_path bert-large-uncased-whole-word-masking \
+ --model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
@@ -90,7 +90,7 @@ exact_match = 86.91
```
This fine-tuned model is available as a checkpoint under the reference
-[`bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
+[`google-bert/bert-large-uncased-whole-word-masking-finetuned-squad`](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
## Results
diff --git a/examples/legacy/run_camembert.py b/examples/legacy/run_camembert.py
index 9651570b39e..67e04babe10 100755
--- a/examples/legacy/run_camembert.py
+++ b/examples/legacy/run_camembert.py
@@ -39,8 +39,8 @@ def fill_mask(masked_input, model, tokenizer, topk=5):
return topk_filled_outputs
-tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
-model = CamembertForMaskedLM.from_pretrained("camembert-base")
+tokenizer = CamembertTokenizer.from_pretrained("almanach/camembert-base")
+model = CamembertForMaskedLM.from_pretrained("almanach/camembert-base")
model.eval()
masked_input = "Le camembert est :)"
diff --git a/examples/legacy/run_openai_gpt.py b/examples/legacy/run_openai_gpt.py
index 03031f20576..d0c21aba27e 100755
--- a/examples/legacy/run_openai_gpt.py
+++ b/examples/legacy/run_openai_gpt.py
@@ -20,7 +20,7 @@
This script with default values fine-tunes and evaluate a pretrained OpenAI GPT on the RocStories dataset:
python run_openai_gpt.py \
- --model_name openai-gpt \
+ --model_name openai-community/openai-gpt \
--do_train \
--do_eval \
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
@@ -104,7 +104,7 @@ def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, d
def main():
parser = argparse.ArgumentParser()
- parser.add_argument("--model_name", type=str, default="openai-gpt", help="pretrained model name")
+ parser.add_argument("--model_name", type=str, default="openai-community/openai-gpt", help="pretrained model name")
parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the dev set.")
parser.add_argument(
diff --git a/examples/legacy/run_transfo_xl.py b/examples/legacy/run_transfo_xl.py
index 7ee94115085..1c48974f39c 100755
--- a/examples/legacy/run_transfo_xl.py
+++ b/examples/legacy/run_transfo_xl.py
@@ -40,7 +40,7 @@ logger = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description="PyTorch Transformer Language Model")
- parser.add_argument("--model_name", type=str, default="transfo-xl-wt103", help="pretrained model name")
+ parser.add_argument("--model_name", type=str, default="transfo-xl/transfo-xl-wt103", help="pretrained model name")
parser.add_argument(
"--split", type=str, default="test", choices=["all", "valid", "test"], help="which split to evaluate"
)
diff --git a/examples/legacy/seq2seq/README.md b/examples/legacy/seq2seq/README.md
index e6e3e20dcf8..f574ccabda2 100644
--- a/examples/legacy/seq2seq/README.md
+++ b/examples/legacy/seq2seq/README.md
@@ -170,7 +170,7 @@ If 'translation' is in your task name, the computed metric will be BLEU. Otherwi
For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
```bash
export DATA_DIR=wmt_en_ro
-./run_eval.py t5-base \
+./run_eval.py google-t5/t5-base \
$DATA_DIR/val.source t5_val_generations.txt \
--reference_path $DATA_DIR/val.target \
--score_path enro_bleu.json \
diff --git a/examples/legacy/seq2seq/old_test_datasets.py b/examples/legacy/seq2seq/old_test_datasets.py
index 0b907b1ed9f..be108f7645f 100644
--- a/examples/legacy/seq2seq/old_test_datasets.py
+++ b/examples/legacy/seq2seq/old_test_datasets.py
@@ -28,7 +28,7 @@ from transformers.testing_utils import TestCasePlus, slow
from utils import FAIRSEQ_AVAILABLE, DistributedSortishSampler, LegacySeq2SeqDataset, Seq2SeqDataset
-BERT_BASE_CASED = "bert-base-cased"
+BERT_BASE_CASED = "google-bert/bert-base-cased"
PEGASUS_XSUM = "google/pegasus-xsum"
ARTICLES = [" Sam ate lunch today.", "Sams lunch ingredients."]
SUMMARIES = ["A very interesting story about what I ate for lunch.", "Avocado, celery, turkey, coffee"]
diff --git a/examples/legacy/seq2seq/pack_dataset.py b/examples/legacy/seq2seq/pack_dataset.py
index 8b069e452a7..5c13c74f412 100755
--- a/examples/legacy/seq2seq/pack_dataset.py
+++ b/examples/legacy/seq2seq/pack_dataset.py
@@ -74,7 +74,7 @@ def pack_data_dir(tok, data_dir: Path, max_tokens, save_path):
def packer_cli():
parser = argparse.ArgumentParser()
- parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
+ parser.add_argument("--tok_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
parser.add_argument("--max_seq_len", type=int, default=128)
parser.add_argument("--data_dir", type=str)
parser.add_argument("--save_path", type=str)
diff --git a/examples/legacy/seq2seq/run_distributed_eval.py b/examples/legacy/seq2seq/run_distributed_eval.py
index 4e828372775..40a946f81c5 100755
--- a/examples/legacy/seq2seq/run_distributed_eval.py
+++ b/examples/legacy/seq2seq/run_distributed_eval.py
@@ -124,7 +124,7 @@ def run_generate():
parser.add_argument(
"--model_name",
type=str,
- help="like facebook/bart-large-cnn,t5-base, etc.",
+ help="like facebook/bart-large-cnn,google-t5/t5-base, etc.",
default="sshleifer/distilbart-xsum-12-3",
)
parser.add_argument("--save_dir", type=str, help="where to save", default="tmp_gen")
diff --git a/examples/legacy/seq2seq/run_eval.py b/examples/legacy/seq2seq/run_eval.py
index cc9ceae6f83..f69e5d51264 100755
--- a/examples/legacy/seq2seq/run_eval.py
+++ b/examples/legacy/seq2seq/run_eval.py
@@ -100,7 +100,7 @@ def run_generate(verbose=True):
"""
parser = argparse.ArgumentParser()
- parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,t5-base, etc.")
+ parser.add_argument("model_name", type=str, help="like facebook/bart-large-cnn,google-t5/t5-base, etc.")
parser.add_argument("input_path", type=str, help="like cnn_dm/test.source")
parser.add_argument("save_path", type=str, help="where to save summaries")
parser.add_argument("--reference_path", type=str, required=False, help="like cnn_dm/test.target")
diff --git a/examples/legacy/token-classification/README.md b/examples/legacy/token-classification/README.md
index c2fa6eec728..fbf17f84d2d 100644
--- a/examples/legacy/token-classification/README.md
+++ b/examples/legacy/token-classification/README.md
@@ -34,7 +34,7 @@ Let's define some variables that we need for further pre-processing steps and tr
```bash
export MAX_LENGTH=128
-export BERT_MODEL=bert-base-multilingual-cased
+export BERT_MODEL=google-bert/bert-base-multilingual-cased
```
Run the pre-processing script on training, dev and test datasets:
@@ -92,7 +92,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
{
"data_dir": ".",
"labels": "./labels.txt",
- "model_name_or_path": "bert-base-multilingual-cased",
+ "model_name_or_path": "google-bert/bert-base-multilingual-cased",
"output_dir": "germeval-model",
"max_seq_length": 128,
"num_train_epochs": 3,
@@ -222,7 +222,7 @@ Let's define some variables that we need for further pre-processing steps:
```bash
export MAX_LENGTH=128
-export BERT_MODEL=bert-large-cased
+export BERT_MODEL=google-bert/bert-large-cased
```
Here we use the English BERT large model for fine-tuning.
@@ -250,7 +250,7 @@ This configuration file looks like:
{
"data_dir": "./data_wnut_17",
"labels": "./data_wnut_17/labels.txt",
- "model_name_or_path": "bert-large-cased",
+ "model_name_or_path": "google-bert/bert-large-cased",
"output_dir": "wnut-17-model-1",
"max_seq_length": 128,
"num_train_epochs": 3,
diff --git a/examples/legacy/token-classification/utils_ner.py b/examples/legacy/token-classification/utils_ner.py
index 2b54c7c4a49..e7e3a157e30 100644
--- a/examples/legacy/token-classification/utils_ner.py
+++ b/examples/legacy/token-classification/utils_ner.py
@@ -113,7 +113,7 @@ class TokenClassificationTask:
for word, label in zip(example.words, example.labels):
word_tokens = tokenizer.tokenize(word)
- # bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
+ # google-bert/bert-base-multilingual-cased sometimes output "nothing ([]) when calling tokenize with just a space.
if len(word_tokens) > 0:
tokens.extend(word_tokens)
# Use the real label id for the first token of the word, and padding ids for the remaining tokens
diff --git a/examples/pytorch/README.md b/examples/pytorch/README.md
index be3c9c52a07..63a56a06e8d 100644
--- a/examples/pytorch/README.md
+++ b/examples/pytorch/README.md
@@ -109,7 +109,7 @@ classification MNLI task using the `run_glue` script, with 8 GPUs:
```bash
torchrun \
--nproc_per_node 8 pytorch/text-classification/run_glue.py \
- --model_name_or_path bert-large-uncased-whole-word-masking \
+ --model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--task_name mnli \
--do_train \
--do_eval \
@@ -153,7 +153,7 @@ classification MNLI task using the `run_glue` script, with 8 TPUs (from this fol
```bash
python xla_spawn.py --num_cores 8 \
text-classification/run_glue.py \
- --model_name_or_path bert-large-uncased-whole-word-masking \
+ --model_name_or_path google-bert/bert-large-uncased-whole-word-masking \
--task_name mnli \
--do_train \
--do_eval \
diff --git a/examples/pytorch/contrastive-image-text/README.md b/examples/pytorch/contrastive-image-text/README.md
index f22f2c82dce..c39f17a138a 100644
--- a/examples/pytorch/contrastive-image-text/README.md
+++ b/examples/pytorch/contrastive-image-text/README.md
@@ -64,10 +64,10 @@ from transformers import (
)
model = VisionTextDualEncoderModel.from_vision_text_pretrained(
- "openai/clip-vit-base-patch32", "roberta-base"
+ "openai/clip-vit-base-patch32", "FacebookAI/roberta-base"
)
-tokenizer = AutoTokenizer.from_pretrained("roberta-base")
+tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
image_processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)
diff --git a/examples/pytorch/language-modeling/README.md b/examples/pytorch/language-modeling/README.md
index 3069fe9eb97..23c0bc2c79a 100644
--- a/examples/pytorch/language-modeling/README.md
+++ b/examples/pytorch/language-modeling/README.md
@@ -36,7 +36,7 @@ the tokenization). The loss here is that of causal language modeling.
```bash
python run_clm.py \
- --model_name_or_path gpt2 \
+ --model_name_or_path openai-community/gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -53,7 +53,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_clm.py \
- --model_name_or_path gpt2 \
+ --model_name_or_path openai-community/gpt2 \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -69,7 +69,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
python run_clm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
- --model_name_or_path gpt2 \
+ --model_name_or_path openai-community/gpt2 \
--output_dir /tmp/test-clm
```
@@ -84,7 +84,7 @@ converge slightly slower (over-fitting takes more epochs).
```bash
python run_mlm.py \
- --model_name_or_path roberta-base \
+ --model_name_or_path FacebookAI/roberta-base \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -98,7 +98,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_mlm.py \
- --model_name_or_path roberta-base \
+ --model_name_or_path FacebookAI/roberta-base \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -117,7 +117,7 @@ This uses the built in HuggingFace `Trainer` for training. If you want to use a
python run_mlm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
- --model_name_or_path roberta-base \
+ --model_name_or_path FacebookAI/roberta-base \
--output_dir /tmp/test-mlm
```
@@ -144,7 +144,7 @@ Here is how to fine-tune XLNet on wikitext-2:
```bash
python run_plm.py \
- --model_name_or_path=xlnet-base-cased \
+ --model_name_or_path=xlnet/xlnet-base-cased \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
@@ -158,7 +158,7 @@ To fine-tune it on your own training and validation file, run:
```bash
python run_plm.py \
- --model_name_or_path=xlnet-base-cased \
+ --model_name_or_path=xlnet/xlnet-base-cased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--per_device_train_batch_size 8 \
@@ -188,7 +188,7 @@ When training a model from scratch, configuration values may be overridden with
```bash
-python run_clm.py --model_type gpt2 --tokenizer_name gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
+python run_clm.py --model_type openai-community/gpt2 --tokenizer_name openai-community/gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
[...]
```
diff --git a/examples/pytorch/multiple-choice/README.md b/examples/pytorch/multiple-choice/README.md
index 8d56ccfe3db..118234002c8 100644
--- a/examples/pytorch/multiple-choice/README.md
+++ b/examples/pytorch/multiple-choice/README.md
@@ -22,7 +22,7 @@ limitations under the License.
```bash
python examples/multiple-choice/run_swag.py \
---model_name_or_path roberta-base \
+--model_name_or_path FacebookAI/roberta-base \
--do_train \
--do_eval \
--learning_rate 5e-5 \
@@ -62,7 +62,7 @@ then
export DATASET_NAME=swag
python run_swag_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name $DATASET_NAME \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
@@ -89,7 +89,7 @@ that will check everything is ready for training. Finally, you can launch traini
export DATASET_NAME=swag
accelerate launch run_swag_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name $DATASET_NAME \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
diff --git a/examples/pytorch/old_test_xla_examples.py b/examples/pytorch/old_test_xla_examples.py
index 4a29ce3beea..2f24035d723 100644
--- a/examples/pytorch/old_test_xla_examples.py
+++ b/examples/pytorch/old_test_xla_examples.py
@@ -54,7 +54,7 @@ class TorchXLAExamplesTests(TestCasePlus):
./examples/pytorch/text-classification/run_glue.py
--num_cores=8
./examples/pytorch/text-classification/run_glue.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
diff --git a/examples/pytorch/question-answering/README.md b/examples/pytorch/question-answering/README.md
index 6b86a4effa9..9fac0b30385 100644
--- a/examples/pytorch/question-answering/README.md
+++ b/examples/pytorch/question-answering/README.md
@@ -40,7 +40,7 @@ on a single tesla V100 16GB.
```bash
python run_qa.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -67,7 +67,7 @@ The [`run_qa_beam_search.py`](https://github.com/huggingface/transformers/blob/m
```bash
python run_qa_beam_search.py \
- --model_name_or_path xlnet-large-cased \
+ --model_name_or_path xlnet/xlnet-large-cased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -87,7 +87,7 @@ python run_qa_beam_search.py \
export SQUAD_DIR=/path/to/SQUAD
python run_qa_beam_search.py \
- --model_name_or_path xlnet-large-cased \
+ --model_name_or_path xlnet/xlnet-large-cased \
--dataset_name squad_v2 \
--do_train \
--do_eval \
@@ -111,7 +111,7 @@ This example code fine-tunes T5 on the SQuAD2.0 dataset.
```bash
python run_seq2seq_qa.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name squad_v2 \
--context_column context \
--question_column question \
@@ -143,7 +143,7 @@ then
```bash
python run_qa_no_trainer.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \
@@ -166,7 +166,7 @@ that will check everything is ready for training. Finally, you can launch traini
```bash
accelerate launch run_qa_no_trainer.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \
diff --git a/examples/pytorch/summarization/README.md b/examples/pytorch/summarization/README.md
index 027119681de..93c0bbccef6 100644
--- a/examples/pytorch/summarization/README.md
+++ b/examples/pytorch/summarization/README.md
@@ -41,7 +41,7 @@ and you also will find examples of these below.
Here is an example on a summarization task:
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
@@ -54,9 +54,9 @@ python examples/pytorch/summarization/run_summarization.py \
--predict_with_generate
```
-Only T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
+Only T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "summarize: "`.
-We used CNN/DailyMail dataset in this example as `t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
+We used CNN/DailyMail dataset in this example as `google-t5/t5-small` was trained on it and one can get good scores even when pre-training with a very small sample.
Extreme Summarization (XSum) Dataset is another commonly used dataset for the task of summarization. To use it replace `--dataset_name cnn_dailymail --dataset_config "3.0.0"` with `--dataset_name xsum`.
@@ -65,7 +65,7 @@ And here is how you would use it on your own files, after adjusting the values f
```bash
python examples/pytorch/summarization/run_summarization.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--train_file path_to_csv_or_jsonlines_file \
@@ -156,7 +156,7 @@ then
```bash
python run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
@@ -179,7 +179,7 @@ that will check everything is ready for training. Finally, you can launch traini
```bash
accelerate launch run_summarization_no_trainer.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
diff --git a/examples/pytorch/summarization/run_summarization.py b/examples/pytorch/summarization/run_summarization.py
index 92f59cb2c80..793917264a7 100755
--- a/examples/pytorch/summarization/run_summarization.py
+++ b/examples/pytorch/summarization/run_summarization.py
@@ -368,11 +368,11 @@ def main():
logger.info(f"Training/evaluation parameters {training_args}")
if data_args.source_prefix is None and model_args.model_name_or_path in [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
diff --git a/examples/pytorch/summarization/run_summarization_no_trainer.py b/examples/pytorch/summarization/run_summarization_no_trainer.py
index 5432e508d6f..1cd9f3865df 100644
--- a/examples/pytorch/summarization/run_summarization_no_trainer.py
+++ b/examples/pytorch/summarization/run_summarization_no_trainer.py
@@ -339,11 +339,11 @@ def main():
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, **accelerator_log_kwargs)
if args.source_prefix is None and args.model_name_or_path in [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
diff --git a/examples/pytorch/test_accelerate_examples.py b/examples/pytorch/test_accelerate_examples.py
index fc485cf59a2..918167635e8 100644
--- a/examples/pytorch/test_accelerate_examples.py
+++ b/examples/pytorch/test_accelerate_examples.py
@@ -80,7 +80,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/text-classification/run_glue_no_trainer.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
--validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv
@@ -105,7 +105,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/language-modeling/run_clm_no_trainer.py
- --model_name_or_path distilgpt2
+ --model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--block_size 128
@@ -133,7 +133,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/language-modeling/run_mlm_no_trainer.py
- --model_name_or_path distilroberta-base
+ --model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -156,7 +156,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/token-classification/run_ner_no_trainer.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -181,7 +181,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/question-answering/run_qa_no_trainer.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -209,7 +209,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/multiple-choice/run_swag_no_trainer.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -232,7 +232,7 @@ class ExamplesTestsNoTrainer(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
{self.examples_dir}/pytorch/summarization/run_summarization_no_trainer.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}
diff --git a/examples/pytorch/test_pytorch_examples.py b/examples/pytorch/test_pytorch_examples.py
index 0aabbb4bcb8..1d4f8db9259 100644
--- a/examples/pytorch/test_pytorch_examples.py
+++ b/examples/pytorch/test_pytorch_examples.py
@@ -99,7 +99,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_glue.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
@@ -127,7 +127,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm.py
- --model_name_or_path distilgpt2
+ --model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -160,7 +160,7 @@ class ExamplesTests(TestCasePlus):
testargs = f"""
run_clm.py
--model_type gpt2
- --tokenizer_name gpt2
+ --tokenizer_name openai-community/gpt2
--train_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
--config_overrides n_embd=10,n_head=2
@@ -181,7 +181,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
- --model_name_or_path distilroberta-base
+ --model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--output_dir {tmp_dir}
@@ -207,7 +207,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_ner.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -235,7 +235,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -260,7 +260,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_seq2seq_qa.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--context_column context
--question_column question
--answer_column answers
@@ -289,7 +289,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_swag.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -327,7 +327,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}
diff --git a/examples/pytorch/text-classification/README.md b/examples/pytorch/text-classification/README.md
index 95116bcfd6e..6eae65e7c4b 100644
--- a/examples/pytorch/text-classification/README.md
+++ b/examples/pytorch/text-classification/README.md
@@ -31,7 +31,7 @@ GLUE is made up of a total of 9 different tasks. Here is how to run the script o
export TASK_NAME=mrpc
python run_glue.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
@@ -68,7 +68,7 @@ The following example fine-tunes BERT on the `imdb` dataset hosted on our [hub](
```bash
python run_glue.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name imdb \
--do_train \
--do_predict \
@@ -90,7 +90,7 @@ We can specify the metric, the label column and aso choose which text columns to
dataset="amazon_reviews_multi"
subset="en"
python run_classification.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name ${dataset} \
--dataset_config_name ${subset} \
--shuffle_train_dataset \
@@ -113,7 +113,7 @@ The following is a multi-label classification example. It fine-tunes BERT on the
dataset="reuters21578"
subset="ModApte"
python run_classification.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name ${dataset} \
--dataset_config_name ${subset} \
--shuffle_train_dataset \
@@ -175,7 +175,7 @@ then
export TASK_NAME=mrpc
python run_glue_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--max_length 128 \
--per_device_train_batch_size 32 \
@@ -202,7 +202,7 @@ that will check everything is ready for training. Finally, you can launch traini
export TASK_NAME=mrpc
accelerate launch run_glue_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--task_name $TASK_NAME \
--max_length 128 \
--per_device_train_batch_size 32 \
@@ -232,7 +232,7 @@ This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It
```bash
python run_xnli.py \
- --model_name_or_path bert-base-multilingual-cased \
+ --model_name_or_path google-bert/bert-base-multilingual-cased \
--language de \
--train_language en \
--do_train \
diff --git a/examples/pytorch/text-generation/README.md b/examples/pytorch/text-generation/README.md
index cc914754adc..e619c25e162 100644
--- a/examples/pytorch/text-generation/README.md
+++ b/examples/pytorch/text-generation/README.md
@@ -26,6 +26,6 @@ Example usage:
```bash
python run_generation.py \
- --model_type=gpt2 \
- --model_name_or_path=gpt2
+ --model_type=openai-community/gpt2 \
+ --model_name_or_path=openai-community/gpt2
```
diff --git a/examples/pytorch/text-generation/run_generation_contrastive_search.py b/examples/pytorch/text-generation/run_generation_contrastive_search.py
index 91781f05185..a48529fb30d 100755
--- a/examples/pytorch/text-generation/run_generation_contrastive_search.py
+++ b/examples/pytorch/text-generation/run_generation_contrastive_search.py
@@ -16,7 +16,7 @@
""" The examples of running contrastive search on the auto-APIs;
Running this example:
-python run_generation_contrastive_search.py --model_name_or_path=gpt2-large --penalty_alpha=0.6 --k=4 --length=256
+python run_generation_contrastive_search.py --model_name_or_path=openai-community/gpt2-large --penalty_alpha=0.6 --k=4 --length=256
"""
diff --git a/examples/pytorch/token-classification/README.md b/examples/pytorch/token-classification/README.md
index 496722cf6b9..568e5242fee 100644
--- a/examples/pytorch/token-classification/README.md
+++ b/examples/pytorch/token-classification/README.md
@@ -29,7 +29,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_ner.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name conll2003 \
--output_dir /tmp/test-ner \
--do_train \
@@ -42,7 +42,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_ner.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--output_dir /tmp/test-ner \
@@ -84,7 +84,7 @@ then
export TASK_NAME=ner
python run_ner_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--task_name $TASK_NAME \
--max_length 128 \
@@ -112,7 +112,7 @@ that will check everything is ready for training. Finally, you can launch traini
export TASK_NAME=ner
accelerate launch run_ner_no_trainer.py \
- --model_name_or_path bert-base-cased \
+ --model_name_or_path google-bert/bert-base-cased \
--dataset_name conll2003 \
--task_name $TASK_NAME \
--max_length 128 \
diff --git a/examples/pytorch/translation/README.md b/examples/pytorch/translation/README.md
index bd95e3a5521..74ca16ccb0b 100644
--- a/examples/pytorch/translation/README.md
+++ b/examples/pytorch/translation/README.md
@@ -59,11 +59,11 @@ python examples/pytorch/translation/run_translation.py \
MBart and some T5 models require special handling.
-T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
+T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
```bash
python examples/pytorch/translation/run_translation.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
@@ -105,7 +105,7 @@ values for the arguments `--train_file`, `--validation_file` to match your setup
```bash
python examples/pytorch/translation/run_translation.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
@@ -134,7 +134,7 @@ If you want to use a pre-processed dataset that leads to high BLEU scores, but f
```bash
python examples/pytorch/translation/run_translation.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
diff --git a/examples/pytorch/translation/run_translation.py b/examples/pytorch/translation/run_translation.py
index 807311531f9..f2718c1122a 100755
--- a/examples/pytorch/translation/run_translation.py
+++ b/examples/pytorch/translation/run_translation.py
@@ -317,11 +317,11 @@ def main():
logger.info(f"Training/evaluation parameters {training_args}")
if data_args.source_prefix is None and model_args.model_name_or_path in [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is expected, e.g. with "
diff --git a/examples/research_projects/bert-loses-patience/README.md b/examples/research_projects/bert-loses-patience/README.md
index d1e5baa92e9..b405e8a9488 100755
--- a/examples/research_projects/bert-loses-patience/README.md
+++ b/examples/research_projects/bert-loses-patience/README.md
@@ -15,7 +15,7 @@ export TASK_NAME=MRPC
python ./run_glue_with_pabee.py \
--model_type albert \
- --model_name_or_path bert-base-uncased/albert-base-v2 \
+ --model_name_or_path google-bert/bert-base-uncased/albert/albert-base-v2 \
--task_name $TASK_NAME \
--do_train \
--do_eval \
diff --git a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py
index 57b649ec067..6881bf8d184 100644
--- a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py
+++ b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py
@@ -276,8 +276,8 @@ class AlbertForSequenceClassificationWithPabee(AlbertPreTrainedModel):
from torch import nn
import torch
- tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
- model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert-base-v2')
+ tokenizer = AlbertTokenizer.from_pretrained('albert/albert-base-v2')
+ model = AlbertForSequenceClassificationWithPabee.from_pretrained('albert/albert-base-v2')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
diff --git a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py
index b32f47d0c30..dfa78585a64 100644
--- a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py
+++ b/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py
@@ -300,8 +300,8 @@ class BertForSequenceClassificationWithPabee(BertPreTrainedModel):
from torch import nn
import torch
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
- model = BertForSequenceClassificationWithPabee.from_pretrained('bert-base-uncased')
+ tokenizer = BertTokenizer.from_pretrained('google-bert/bert-base-uncased')
+ model = BertForSequenceClassificationWithPabee.from_pretrained('google-bert/bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
diff --git a/examples/research_projects/bert-loses-patience/test_run_glue_with_pabee.py b/examples/research_projects/bert-loses-patience/test_run_glue_with_pabee.py
index 6a084d0741d..5516924f0f2 100644
--- a/examples/research_projects/bert-loses-patience/test_run_glue_with_pabee.py
+++ b/examples/research_projects/bert-loses-patience/test_run_glue_with_pabee.py
@@ -29,7 +29,7 @@ class PabeeTests(TestCasePlus):
testargs = f"""
run_glue_with_pabee.py
--model_type albert
- --model_name_or_path albert-base-v2
+ --model_name_or_path albert/albert-base-v2
--data_dir ./tests/fixtures/tests_samples/MRPC/
--output_dir {tmp_dir}
--overwrite_output_dir
diff --git a/examples/research_projects/bertabs/convert_bertabs_original_pytorch_checkpoint.py b/examples/research_projects/bertabs/convert_bertabs_original_pytorch_checkpoint.py
index 53ba3829b15..b6f5d177515 100644
--- a/examples/research_projects/bertabs/convert_bertabs_original_pytorch_checkpoint.py
+++ b/examples/research_projects/bertabs/convert_bertabs_original_pytorch_checkpoint.py
@@ -107,7 +107,7 @@ def convert_bertabs_checkpoints(path_to_checkpoints, dump_path):
# ----------------------------------
logging.info("Make sure that the models' outputs are identical")
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# prepare the model inputs
encoder_input_ids = tokenizer.encode("This is sample éàalj'-.")
diff --git a/examples/research_projects/bertabs/modeling_bertabs.py b/examples/research_projects/bertabs/modeling_bertabs.py
index 19e62804ef0..2ebce466561 100644
--- a/examples/research_projects/bertabs/modeling_bertabs.py
+++ b/examples/research_projects/bertabs/modeling_bertabs.py
@@ -128,7 +128,7 @@ class Bert(nn.Module):
def __init__(self):
super().__init__()
- config = BertConfig.from_pretrained("bert-base-uncased")
+ config = BertConfig.from_pretrained("google-bert/bert-base-uncased")
self.model = BertModel(config)
def forward(self, input_ids, attention_mask=None, token_type_ids=None, **kwargs):
diff --git a/examples/research_projects/bertabs/run_summarization.py b/examples/research_projects/bertabs/run_summarization.py
index 82ef8ab39ea..1f969f117ba 100644
--- a/examples/research_projects/bertabs/run_summarization.py
+++ b/examples/research_projects/bertabs/run_summarization.py
@@ -29,7 +29,7 @@ Batch = namedtuple("Batch", ["document_names", "batch_size", "src", "segs", "mas
def evaluate(args):
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=True)
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased", do_lower_case=True)
model = BertAbs.from_pretrained("remi/bertabs-finetuned-extractive-abstractive-summarization")
model.to(args.device)
model.eval()
diff --git a/examples/research_projects/codeparrot/README.md b/examples/research_projects/codeparrot/README.md
index 3259041ba54..f0af3d144f7 100644
--- a/examples/research_projects/codeparrot/README.md
+++ b/examples/research_projects/codeparrot/README.md
@@ -79,7 +79,7 @@ python scripts/pretokenizing.py \
Before training a new model for code we create a new tokenizer that is efficient at code tokenization. To train the tokenizer you can run the following command:
```bash
python scripts/bpe_training.py \
- --base_tokenizer gpt2 \
+ --base_tokenizer openai-community/gpt2 \
--dataset_name codeparrot/codeparrot-clean-train
```
@@ -90,12 +90,12 @@ The models are randomly initialized and trained from scratch. To initialize a ne
```bash
python scripts/initialize_model.py \
---config_name gpt2-large \
+--config_name openai-community/gpt2-large \
--tokenizer_name codeparrot/codeparrot \
--model_name codeparrot \
--push_to_hub True
```
-This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
+This will initialize a new model with the architecture and configuration of `openai-community/gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
We can either pass the name of a text dataset or a pretokenized dataset which speeds up training a bit.
Now that the tokenizer and model are also ready we can start training the model. The main training script is built with `accelerate` to scale across a wide range of platforms and infrastructure scales. We train two models with [110M](https://huggingface.co/codeparrot/codeparrot-small/) and [1.5B](https://huggingface.co/codeparrot/codeparrot/) parameters for 25-30B tokens on a 16xA100 (40GB) machine which takes 1 day and 1 week, respectively.
diff --git a/examples/research_projects/codeparrot/scripts/arguments.py b/examples/research_projects/codeparrot/scripts/arguments.py
index 4def9ac3b85..5fee05eb04c 100644
--- a/examples/research_projects/codeparrot/scripts/arguments.py
+++ b/examples/research_projects/codeparrot/scripts/arguments.py
@@ -172,7 +172,7 @@ class TokenizerTrainingArguments:
"""
base_tokenizer: Optional[str] = field(
- default="gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
+ default="openai-community/gpt2", metadata={"help": "Base tokenizer to build new tokenizer from."}
)
dataset_name: Optional[str] = field(
default="transformersbook/codeparrot-train", metadata={"help": "Dataset to train tokenizer on."}
@@ -211,7 +211,7 @@ class InitializationArguments:
"""
config_name: Optional[str] = field(
- default="gpt2-large", metadata={"help": "Configuration to use for model initialization."}
+ default="openai-community/gpt2-large", metadata={"help": "Configuration to use for model initialization."}
)
tokenizer_name: Optional[str] = field(
default="codeparrot/codeparrot", metadata={"help": "Tokenizer attached to model."}
diff --git a/examples/research_projects/deebert/test_glue_deebert.py b/examples/research_projects/deebert/test_glue_deebert.py
index 775c4d70b65..7a5f059c8ce 100644
--- a/examples/research_projects/deebert/test_glue_deebert.py
+++ b/examples/research_projects/deebert/test_glue_deebert.py
@@ -48,7 +48,7 @@ class DeeBertTests(TestCasePlus):
def test_glue_deebert_train(self):
train_args = """
--model_type roberta
- --model_name_or_path roberta-base
+ --model_name_or_path FacebookAI/roberta-base
--task_name MRPC
--do_train
--do_eval
@@ -61,7 +61,7 @@ class DeeBertTests(TestCasePlus):
--num_train_epochs 3
--overwrite_output_dir
--seed 42
- --output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
+ --output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--save_steps 0
--overwrite_cache
@@ -71,12 +71,12 @@ class DeeBertTests(TestCasePlus):
eval_args = """
--model_type roberta
- --model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
+ --model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--task_name MRPC
--do_eval
--do_lower_case
--data_dir ./tests/fixtures/tests_samples/MRPC/
- --output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
+ --output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--max_seq_length 128
--eval_each_highway
@@ -88,12 +88,12 @@ class DeeBertTests(TestCasePlus):
entropy_eval_args = """
--model_type roberta
- --model_name_or_path ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
+ --model_name_or_path ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--task_name MRPC
--do_eval
--do_lower_case
--data_dir ./tests/fixtures/tests_samples/MRPC/
- --output_dir ./examples/deebert/saved_models/roberta-base/MRPC/two_stage
+ --output_dir ./examples/deebert/saved_models/FacebookAI/roberta-base/MRPC/two_stage
--plot_data_dir ./examples/deebert/results/
--max_seq_length 128
--early_exit_entropy 0.1
diff --git a/examples/research_projects/information-gain-filtration/README.md b/examples/research_projects/information-gain-filtration/README.md
index cba7a808947..f685a512509 100644
--- a/examples/research_projects/information-gain-filtration/README.md
+++ b/examples/research_projects/information-gain-filtration/README.md
@@ -64,7 +64,7 @@ To fine-tune a transformer model with IGF on a language modeling task, use the f
```python
python run_clm_igf.py\
---model_name_or_path "gpt2" \
+--model_name_or_path "openai-community/gpt2" \
--data_file="data/tokenized_stories_train_wikitext103" \
--igf_data_file="data/IGF_values" \
--context_len 32 \
diff --git a/examples/research_projects/information-gain-filtration/igf/igf.py b/examples/research_projects/information-gain-filtration/igf/igf.py
index 6861467a335..4c5aefd9584 100644
--- a/examples/research_projects/information-gain-filtration/igf/igf.py
+++ b/examples/research_projects/information-gain-filtration/igf/igf.py
@@ -69,9 +69,9 @@ def compute_perplexity(model, test_data, context_len):
return perplexity
-def load_gpt2(model_name="gpt2"):
+def load_gpt2(model_name="openai-community/gpt2"):
"""
- load original gpt2 and save off for quicker loading
+ load original openai-community/gpt2 and save off for quicker loading
Args:
model_name: GPT-2
diff --git a/examples/research_projects/information-gain-filtration/run_clm_igf.py b/examples/research_projects/information-gain-filtration/run_clm_igf.py
index 26b72072784..74973309c4e 100644
--- a/examples/research_projects/information-gain-filtration/run_clm_igf.py
+++ b/examples/research_projects/information-gain-filtration/run_clm_igf.py
@@ -84,7 +84,7 @@ def generate_n_pairs(
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# load pretrained model
- model = load_gpt2("gpt2").to(device)
+ model = load_gpt2("openai-community/gpt2").to(device)
print("computing perplexity on objective set")
orig_perp = compute_perplexity(model, objective_set, context_len).item()
print("perplexity on objective set:", orig_perp)
@@ -121,7 +121,7 @@ def training_secondary_learner(
set_seed(42)
# Load pre-trained model
- model = GPT2LMHeadModel.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
# Initialize secondary learner to use embedding weights of model
secondary_learner = SecondaryLearner(model)
@@ -153,7 +153,7 @@ def finetune(
recopy_model=recopy_gpt2,
secondary_learner=None,
eval_interval=10,
- finetuned_model_name="gpt2_finetuned.pt",
+ finetuned_model_name="openai-community/gpt2_finetuned.pt",
):
"""
fine-tune with IGF if secondary_learner is not None, else standard fine-tuning
@@ -346,7 +346,10 @@ def main():
)
parser.add_argument(
- "--batch_size", default=16, type=int, help="batch size of training data of language model(gpt2) "
+ "--batch_size",
+ default=16,
+ type=int,
+ help="batch size of training data of language model(openai-community/gpt2) ",
)
parser.add_argument(
@@ -383,7 +386,9 @@ def main():
),
)
- parser.add_argument("--finetuned_model_name", default="gpt2_finetuned.pt", type=str, help="finetuned_model_name")
+ parser.add_argument(
+ "--finetuned_model_name", default="openai-community/gpt2_finetuned.pt", type=str, help="finetuned_model_name"
+ )
parser.add_argument(
"--recopy_model",
@@ -416,16 +421,16 @@ def main():
igf_model_path="igf_model.pt",
)
- # load pretrained gpt2 model
- model = GPT2LMHeadModel.from_pretrained("gpt2")
+ # load pretrained openai-community/gpt2 model
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
set_seed(42)
- # Generate train and test data to train and evaluate gpt2 model
+ # Generate train and test data to train and evaluate openai-community/gpt2 model
train_dataset, test_dataset = generate_datasets(
context_len=32, file="data/tokenized_stories_train_wikitext103.jbl", number=100, min_len=1026, trim=True
)
- # fine-tuning of the gpt2 model using igf (Information Gain Filtration)
+ # fine-tuning of the openai-community/gpt2 model using igf (Information Gain Filtration)
finetune(
model,
train_dataset,
@@ -437,7 +442,7 @@ def main():
recopy_model=recopy_gpt2,
secondary_learner=secondary_learner,
eval_interval=10,
- finetuned_model_name="gpt2_finetuned.pt",
+ finetuned_model_name="openai-community/gpt2_finetuned.pt",
)
diff --git a/examples/research_projects/jax-projects/README.md b/examples/research_projects/jax-projects/README.md
index cb670a0a520..88d8d7f9eba 100644
--- a/examples/research_projects/jax-projects/README.md
+++ b/examples/research_projects/jax-projects/README.md
@@ -159,13 +159,13 @@ to be used, but that everybody in team is on the same page on what type of model
To give an example, a well-defined project would be the following:
- task: summarization
-- model: [t5-small](https://huggingface.co/t5-small)
+- model: [google-t5/t5-small](https://huggingface.co/google-t5/t5-small)
- dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail)
- training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
- outcome: t5 model that can summarize news
-- work flow: adapt `run_summarization_flax.py` to work with `t5-small`.
+- work flow: adapt `run_summarization_flax.py` to work with `google-t5/t5-small`.
-This example is a very easy and not the most interesting project since a `t5-small`
+This example is a very easy and not the most interesting project since a `google-t5/t5-small`
summarization model exists already for CNN/Daily mail and pretty much no code has to be
written.
A well-defined project does not need to have the dataset be part of
@@ -335,7 +335,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
-tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
+tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -492,7 +492,7 @@ dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', str
dummy_input = next(iter(dataset))["text"]
-tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
+tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer(dummy_input, return_tensors="np").input_ids[:, :10]
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -518,7 +518,7 @@ be available in a couple of days.
- [BigBird](https://github.com/huggingface/transformers/blob/main/src/transformers/models/big_bird/modeling_flax_big_bird.py)
- [CLIP](https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_flax_clip.py)
- [ELECTRA](https://github.com/huggingface/transformers/blob/main/src/transformers/models/electra/modeling_flax_electra.py)
-- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_flax_gpt2.py)
+- [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai-community/gpt2/modeling_flax_gpt2.py)
- [(TODO) MBART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mbart/modeling_flax_mbart.py)
- [RoBERTa](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_flax_roberta.py)
- [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_flax_t5.py)
@@ -729,7 +729,7 @@ Let's use the base `FlaxRobertaModel` without any heads as an example.
from transformers import FlaxRobertaModel, RobertaTokenizerFast
import jax
-tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
+tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
inputs = tokenizer("JAX/Flax is amazing ", padding="max_length", max_length=128, return_tensors="np")
model = FlaxRobertaModel.from_pretrained("julien-c/dummy-unknown")
@@ -1011,7 +1011,7 @@ and run the following commands in a Python shell to save a config.
```python
from transformers import RobertaConfig
-config = RobertaConfig.from_pretrained("roberta-base")
+config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
config.save_pretrained("./")
```
@@ -1193,12 +1193,12 @@ All the widgets are open sourced in the `huggingface_hub` [repo](https://github.
**NLP**
* **Conversational:** To have the best conversations!. [Example](https://huggingface.co/microsoft/DialoGPT-large?).
* **Feature Extraction:** Retrieve the input embeddings. [Example](https://huggingface.co/sentence-transformers/distilbert-base-nli-mean-tokens?text=test).
-* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/bert-base-uncased?).
-* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad).
+* **Fill Mask:** Predict potential words for a mask token. [Example](https://huggingface.co/google-bert/bert-base-uncased?).
+* **Question Answering:** Given a context and a question, predict the answer. [Example](https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad).
* **Sentence Simmilarity:** Predict how similar a set of sentences are. Useful for Sentence Transformers.
* **Summarization:** Given a text, output a summary of it. [Example](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
* **Table Question Answering:** Given a table and a question, predict the answer. [Example](https://huggingface.co/google/tapas-base-finetuned-wtq).
-* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/gpt2)
+* **Text Generation:** Generate text based on a prompt. [Example](https://huggingface.co/openai-community/gpt2)
* **Token Classification:** Useful for tasks such as Named Entity Recognition and Part of Speech. [Example](https://huggingface.co/dslim/bert-base-NER).
* **Zero-Shot Classification:** Too cool to explain with words. Here is an [example](https://huggingface.co/typeform/distilbert-base-uncased-mnli)
* ([WIP](https://github.com/huggingface/huggingface_hub/issues/99)) **Table to Text Generation**.
diff --git a/examples/research_projects/jax-projects/dataset-streaming/README.md b/examples/research_projects/jax-projects/dataset-streaming/README.md
index bbb58037443..bdb6629e509 100644
--- a/examples/research_projects/jax-projects/dataset-streaming/README.md
+++ b/examples/research_projects/jax-projects/dataset-streaming/README.md
@@ -31,7 +31,7 @@ without ever having to download the full dataset.
In the following, we demonstrate how to train a bi-directional transformer model
using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
More specifically, we demonstrate how JAX/Flax and dataset streaming can be leveraged
-to pre-train [**`roberta-base`**](https://huggingface.co/roberta-base)
+to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
in English on a single TPUv3-8 pod for 10000 update steps.
The example script uses the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
@@ -80,8 +80,8 @@ from transformers import RobertaTokenizerFast, RobertaConfig
model_dir = "./english-roberta-base-dummy"
-tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
-config = RobertaConfig.from_pretrained("roberta-base")
+tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
+config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
tokenizer.save_pretrained(model_dir)
config.save_pretrained(model_dir)
diff --git a/examples/research_projects/jax-projects/hybrid_clip/README.md b/examples/research_projects/jax-projects/hybrid_clip/README.md
index 76df92e463c..72d3db19358 100644
--- a/examples/research_projects/jax-projects/hybrid_clip/README.md
+++ b/examples/research_projects/jax-projects/hybrid_clip/README.md
@@ -32,7 +32,7 @@ Models written in JAX/Flax are **immutable** and updated in a purely functional
way which enables simple and efficient model parallelism.
In this example we will use the vision model from [CLIP](https://huggingface.co/models?filter=clip)
-as the image encoder and [`roberta-base`](https://huggingface.co/roberta-base) as the text encoder.
+as the image encoder and [`FacebookAI/roberta-base`](https://huggingface.co/FacebookAI/roberta-base) as the text encoder.
Note that one can also use the [ViT](https://huggingface.co/models?filter=vit) model as image encoder and any other BERT or ROBERTa model as text encoder.
To train the model on languages other than English one should choose a text encoder trained on the desired
language and a image-text dataset in that language. One such dataset is [WIT](https://github.com/google-research-datasets/wit).
@@ -76,7 +76,7 @@ Here is an example of how to load the model using pre-trained text and vision mo
```python
from modeling_hybrid_clip import FlaxHybridCLIP
-model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32")
+model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32")
# save the model
model.save_pretrained("bert-clip")
@@ -89,7 +89,7 @@ If the checkpoints are in PyTorch then one could pass `text_from_pt=True` and `v
PyTorch checkpoints convert them to flax and load the model.
```python
-model = FlaxHybridCLIP.from_text_vision_pretrained("bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
+model = FlaxHybridCLIP.from_text_vision_pretrained("google-bert/bert-base-uncased", "openai/clip-vit-base-patch32", text_from_pt=True, vision_from_pt=True)
```
This loads both the text and vision encoders using pre-trained weights, the projection layers are randomly
@@ -154,9 +154,9 @@ Next we can run the example script to train the model:
```bash
python run_hybrid_clip.py \
--output_dir ${MODEL_DIR} \
- --text_model_name_or_path="roberta-base" \
+ --text_model_name_or_path="FacebookAI/roberta-base" \
--vision_model_name_or_path="openai/clip-vit-base-patch32" \
- --tokenizer_name="roberta-base" \
+ --tokenizer_name="FacebookAI/roberta-base" \
--train_file="coco_dataset/train_dataset.json" \
--validation_file="coco_dataset/validation_dataset.json" \
--do_train --do_eval \
diff --git a/examples/research_projects/jax-projects/hybrid_clip/modeling_hybrid_clip.py b/examples/research_projects/jax-projects/hybrid_clip/modeling_hybrid_clip.py
index e60f07bdd06..08cb3bd0b34 100644
--- a/examples/research_projects/jax-projects/hybrid_clip/modeling_hybrid_clip.py
+++ b/examples/research_projects/jax-projects/hybrid_clip/modeling_hybrid_clip.py
@@ -314,8 +314,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
- a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -327,8 +325,6 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the `model id` of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
- a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.FlaxPreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `PyTorch checkpoint folder` (e.g, ``./pt_model``). In
@@ -354,7 +350,7 @@ class FlaxHybridCLIP(FlaxPreTrainedModel):
>>> from transformers import FlaxHybridCLIP
>>> # initialize a model from pretrained BERT and CLIP models. Note that the projection layers will be randomly initialized.
>>> # If using CLIP's vision model the vision projection layer will be initialized using pre-trained weights
- >>> model = FlaxHybridCLIP.from_text_vision_pretrained('bert-base-uncased', 'openai/clip-vit-base-patch32')
+ >>> model = FlaxHybridCLIP.from_text_vision_pretrained('google-bert/bert-base-uncased', 'openai/clip-vit-base-patch32')
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert-clip")
>>> # load fine-tuned model
diff --git a/examples/research_projects/jax-projects/model_parallel/README.md b/examples/research_projects/jax-projects/model_parallel/README.md
index 97f3cdb0477..393c9e89375 100644
--- a/examples/research_projects/jax-projects/model_parallel/README.md
+++ b/examples/research_projects/jax-projects/model_parallel/README.md
@@ -54,7 +54,7 @@ model.save_pretrained("gpt-neo-1.3B")
```bash
python run_clm_mp.py \
--model_name_or_path gpt-neo-1.3B \
- --tokenizer_name gpt2 \
+ --tokenizer_name openai-community/gpt2 \
--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --do_eval \
--block_size 1024 \
diff --git a/examples/research_projects/longform-qa/eli5_app.py b/examples/research_projects/longform-qa/eli5_app.py
index ae8d8f91568..6b1b15cc9cb 100644
--- a/examples/research_projects/longform-qa/eli5_app.py
+++ b/examples/research_projects/longform-qa/eli5_app.py
@@ -36,7 +36,7 @@ def load_models():
_ = s2s_model.eval()
else:
s2s_tokenizer, s2s_model = make_qa_s2s_model(
- model_name="t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
+ model_name="google-t5/t5-small", from_file="seq2seq_models/eli5_t5_model_1024_4.pth", device="cuda:0"
)
return (qar_tokenizer, qar_model, s2s_tokenizer, s2s_model)
diff --git a/examples/research_projects/mlm_wwm/README.md b/examples/research_projects/mlm_wwm/README.md
index 0144b1ad309..bf5aa941082 100644
--- a/examples/research_projects/mlm_wwm/README.md
+++ b/examples/research_projects/mlm_wwm/README.md
@@ -32,7 +32,7 @@ to that word). This technique has been refined for Chinese in [this paper](https
To fine-tune a model using whole word masking, use the following script:
```bash
python run_mlm_wwm.py \
- --model_name_or_path roberta-base \
+ --model_name_or_path FacebookAI/roberta-base \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--do_train \
@@ -83,7 +83,7 @@ export VALIDATION_REF_FILE=/path/to/validation/chinese_ref/file
export OUTPUT_DIR=/tmp/test-mlm-wwm
python run_mlm_wwm.py \
- --model_name_or_path roberta-base \
+ --model_name_or_path FacebookAI/roberta-base \
--train_file $TRAIN_FILE \
--validation_file $VALIDATION_FILE \
--train_ref_file $TRAIN_REF_FILE \
diff --git a/examples/research_projects/mm-imdb/README.md b/examples/research_projects/mm-imdb/README.md
index 73e77aeb962..68b2f15159e 100644
--- a/examples/research_projects/mm-imdb/README.md
+++ b/examples/research_projects/mm-imdb/README.md
@@ -10,7 +10,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer
python run_mmimdb.py \
--data_dir /path/to/mmimdb/dataset/ \
--model_type bert \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--output_dir /path/to/save/dir/ \
--do_train \
--do_eval \
diff --git a/examples/research_projects/movement-pruning/README.md b/examples/research_projects/movement-pruning/README.md
index c2f74d6dcdd..575ec1a9b49 100644
--- a/examples/research_projects/movement-pruning/README.md
+++ b/examples/research_projects/movement-pruning/README.md
@@ -61,7 +61,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -84,7 +84,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -104,7 +104,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
@@ -124,7 +124,7 @@ python examples/movement-pruning/masked_run_squad.py \
--predict_file dev-v1.1.json \
--do_train --do_eval --do_lower_case \
--model_type masked_bert \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--per_gpu_train_batch_size 16 \
--warmup_steps 5400 \
--num_train_epochs 10 \
diff --git a/examples/research_projects/performer/README.md b/examples/research_projects/performer/README.md
index 42cb6fa358f..fa847268b0c 100644
--- a/examples/research_projects/performer/README.md
+++ b/examples/research_projects/performer/README.md
@@ -10,8 +10,8 @@ Paper authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyo
## Examples
-`sanity_script.sh` will launch performer fine-tuning from the bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
-`full_script.sh` will launch performer fine-tuning from the bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
+`sanity_script.sh` will launch performer fine-tuning from the google-bert/bert-base-cased checkpoint on the Simple Wikipedia dataset (a small, easy-language English Wikipedia) from `datasets`.
+`full_script.sh` will launch performer fine-tuning from the google-bert/bert-large-cased checkpoint on the English Wikipedia dataset from `datasets`.
Here are a few key arguments:
- Remove the `--performer` argument to use a standard Bert model.
diff --git a/examples/research_projects/pplm/run_pplm.py b/examples/research_projects/pplm/run_pplm.py
index 54008d56c14..cc49b7fa83c 100644
--- a/examples/research_projects/pplm/run_pplm.py
+++ b/examples/research_projects/pplm/run_pplm.py
@@ -61,7 +61,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
"embed_size": 1024,
"class_vocab": {"non_clickbait": 0, "clickbait": 1},
"default_class": 1,
- "pretrained_model": "gpt2-medium",
+ "pretrained_model": "openai-community/gpt2-medium",
},
"sentiment": {
"url": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/discriminators/SST_classifier_head.pt",
@@ -69,7 +69,7 @@ DISCRIMINATOR_MODELS_PARAMS = {
"embed_size": 1024,
"class_vocab": {"very_positive": 2, "very_negative": 3},
"default_class": 3,
- "pretrained_model": "gpt2-medium",
+ "pretrained_model": "openai-community/gpt2-medium",
},
}
@@ -585,7 +585,7 @@ def set_generic_model_params(discrim_weights, discrim_meta):
def run_pplm_example(
- pretrained_model="gpt2-medium",
+ pretrained_model="openai-community/gpt2-medium",
cond_text="",
uncond=False,
num_samples=1,
@@ -738,7 +738,7 @@ if __name__ == "__main__":
"--pretrained_model",
"-M",
type=str,
- default="gpt2-medium",
+ default="openai-community/gpt2-medium",
help="pretrained model name or path to local checkpoint",
)
parser.add_argument("--cond_text", type=str, default="The lake", help="Prefix texts to condition on")
diff --git a/examples/research_projects/pplm/run_pplm_discrim_train.py b/examples/research_projects/pplm/run_pplm_discrim_train.py
index 4ac603a33bc..43ec5823e37 100644
--- a/examples/research_projects/pplm/run_pplm_discrim_train.py
+++ b/examples/research_projects/pplm/run_pplm_discrim_train.py
@@ -45,7 +45,7 @@ max_length_seq = 100
class Discriminator(nn.Module):
"""Transformer encoder followed by a Classification Head"""
- def __init__(self, class_size, pretrained_model="gpt2-medium", cached_mode=False, device="cpu"):
+ def __init__(self, class_size, pretrained_model="openai-community/gpt2-medium", cached_mode=False, device="cpu"):
super().__init__()
self.tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model)
self.encoder = GPT2LMHeadModel.from_pretrained(pretrained_model)
@@ -218,7 +218,7 @@ def get_cached_data_loader(dataset, batch_size, discriminator, shuffle=False, de
def train_discriminator(
dataset,
dataset_fp=None,
- pretrained_model="gpt2-medium",
+ pretrained_model="openai-community/gpt2-medium",
epochs=10,
batch_size=64,
log_interval=10,
@@ -502,7 +502,10 @@ if __name__ == "__main__":
help="File path of the dataset to use. Needed only in case of generic datadset",
)
parser.add_argument(
- "--pretrained_model", type=str, default="gpt2-medium", help="Pretrained model to use as encoder"
+ "--pretrained_model",
+ type=str,
+ default="openai-community/gpt2-medium",
+ help="Pretrained model to use as encoder",
)
parser.add_argument("--epochs", type=int, default=10, metavar="N", help="Number of training epochs")
parser.add_argument(
diff --git a/examples/research_projects/quantization-qdqbert/README.md b/examples/research_projects/quantization-qdqbert/README.md
index 4d459c4c715..2cc2d5e5f98 100644
--- a/examples/research_projects/quantization-qdqbert/README.md
+++ b/examples/research_projects/quantization-qdqbert/README.md
@@ -50,11 +50,11 @@ Calibrate the pretrained model and finetune with quantization awared:
```bash
python3 run_quant_qa.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--max_seq_length 128 \
--doc_stride 32 \
- --output_dir calib/bert-base-uncased \
+ --output_dir calib/google-bert/bert-base-uncased \
--do_calib \
--calibrator percentile \
--percentile 99.99
@@ -62,7 +62,7 @@ python3 run_quant_qa.py \
```bash
python3 run_quant_qa.py \
- --model_name_or_path calib/bert-base-uncased \
+ --model_name_or_path calib/google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
@@ -71,8 +71,8 @@ python3 run_quant_qa.py \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 32 \
- --output_dir finetuned_int8/bert-base-uncased \
- --tokenizer_name bert-base-uncased \
+ --output_dir finetuned_int8/google-bert/bert-base-uncased \
+ --tokenizer_name google-bert/bert-base-uncased \
--save_steps 0
```
@@ -82,14 +82,14 @@ To export the QAT model finetuned above:
```bash
python3 run_quant_qa.py \
- --model_name_or_path finetuned_int8/bert-base-uncased \
+ --model_name_or_path finetuned_int8/google-bert/bert-base-uncased \
--output_dir ./ \
--save_onnx \
--per_device_eval_batch_size 1 \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
- --tokenizer_name bert-base-uncased
+ --tokenizer_name google-bert/bert-base-uncased
```
Use `--recalibrate-weights` to calibrate the weight ranges according to the quantizer axis. Use `--quant-per-tensor` for per tensor quantization (default is per channel).
@@ -117,7 +117,7 @@ python3 evaluate-hf-trt-qa.py \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
- --tokenizer_name bert-base-uncased \
+ --tokenizer_name google-bert/bert-base-uncased \
--int8 \
--seed 42
```
@@ -128,14 +128,14 @@ Finetune a fp32 precision model with [transformers/examples/pytorch/question-ans
```bash
python3 ../../pytorch/question-answering/run_qa.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 32 \
- --output_dir ./finetuned_fp32/bert-base-uncased \
+ --output_dir ./finetuned_fp32/google-bert/bert-base-uncased \
--save_steps 0 \
--do_train \
--do_eval
@@ -147,13 +147,13 @@ python3 ../../pytorch/question-answering/run_qa.py \
```bash
python3 run_quant_qa.py \
- --model_name_or_path ./finetuned_fp32/bert-base-uncased \
+ --model_name_or_path ./finetuned_fp32/google-bert/bert-base-uncased \
--dataset_name squad \
--calibrator percentile \
--percentile 99.99 \
--max_seq_length 128 \
--doc_stride 32 \
- --output_dir ./calib/bert-base-uncased \
+ --output_dir ./calib/google-bert/bert-base-uncased \
--save_steps 0 \
--do_calib \
--do_eval
@@ -163,14 +163,14 @@ python3 run_quant_qa.py \
```bash
python3 run_quant_qa.py \
- --model_name_or_path ./calib/bert-base-uncased \
+ --model_name_or_path ./calib/google-bert/bert-base-uncased \
--output_dir ./ \
--save_onnx \
--per_device_eval_batch_size 1 \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
- --tokenizer_name bert-base-uncased
+ --tokenizer_name google-bert/bert-base-uncased
```
### Evaluate the INT8 PTQ ONNX model inference with TensorRT
@@ -183,7 +183,7 @@ python3 evaluate-hf-trt-qa.py \
--max_seq_length 128 \
--doc_stride 32 \
--dataset_name squad \
- --tokenizer_name bert-base-uncased \
+ --tokenizer_name google-bert/bert-base-uncased \
--int8 \
--seed 42
```
diff --git a/examples/tensorflow/benchmarking/README.md b/examples/tensorflow/benchmarking/README.md
index 7099ed9f6b3..03e174770d1 100644
--- a/examples/tensorflow/benchmarking/README.md
+++ b/examples/tensorflow/benchmarking/README.md
@@ -22,5 +22,5 @@ If you would like to list benchmark results on your favorite models of the [mode
| Benchmark description | Results | Environment info | Author |
|:----------|:-------------|:-------------|------:|
-| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
-| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
+| PyTorch Benchmark on inference for `google-bert/bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) |
diff --git a/examples/tensorflow/contrastive-image-text/README.md b/examples/tensorflow/contrastive-image-text/README.md
index 9e3a011fcb3..29d9b897734 100644
--- a/examples/tensorflow/contrastive-image-text/README.md
+++ b/examples/tensorflow/contrastive-image-text/README.md
@@ -65,7 +65,7 @@ Finally, we can run the example script to train the model:
python examples/tensorflow/contrastive-image-text/run_clip.py \
--output_dir ./clip-roberta-finetuned \
--vision_model_name_or_path openai/clip-vit-base-patch32 \
- --text_model_name_or_path roberta-base \
+ --text_model_name_or_path FacebookAI/roberta-base \
--data_dir $PWD/data \
--dataset_name ydshieh/coco_dataset_script \
--dataset_config_name=2017 \
diff --git a/examples/tensorflow/language-modeling-tpu/run_mlm.py b/examples/tensorflow/language-modeling-tpu/run_mlm.py
index 544bca716ad..7ed111ab127 100644
--- a/examples/tensorflow/language-modeling-tpu/run_mlm.py
+++ b/examples/tensorflow/language-modeling-tpu/run_mlm.py
@@ -57,7 +57,7 @@ def parse_args():
parser.add_argument(
"--pretrained_model_config",
type=str,
- default="roberta-base",
+ default="FacebookAI/roberta-base",
help="The model config to use. Note that we don't copy the model's weights, only the config!",
)
parser.add_argument(
diff --git a/examples/tensorflow/language-modeling/README.md b/examples/tensorflow/language-modeling/README.md
index e91639adb00..ed4f507d4e8 100644
--- a/examples/tensorflow/language-modeling/README.md
+++ b/examples/tensorflow/language-modeling/README.md
@@ -43,7 +43,7 @@ This script trains a masked language model.
### Example command
```bash
python run_mlm.py \
---model_name_or_path distilbert-base-cased \
+--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--dataset_name wikitext \
--dataset_config_name wikitext-103-raw-v1
@@ -52,7 +52,7 @@ python run_mlm.py \
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```bash
python run_mlm.py \
---model_name_or_path distilbert-base-cased \
+--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--train_file train_file_path
```
@@ -64,7 +64,7 @@ This script trains a causal language model.
### Example command
```bash
python run_clm.py \
---model_name_or_path distilgpt2 \
+--model_name_or_path distilbert/distilgpt2 \
--output_dir output \
--dataset_name wikitext \
--dataset_config_name wikitext-103-raw-v1
@@ -74,7 +74,7 @@ When using a custom dataset, the validation file can be separately passed as an
```bash
python run_clm.py \
---model_name_or_path distilgpt2 \
+--model_name_or_path distilbert/distilgpt2 \
--output_dir output \
--train_file train_file_path
```
diff --git a/examples/tensorflow/multiple-choice/README.md b/examples/tensorflow/multiple-choice/README.md
index 01e33fb62db..a7f499963ec 100644
--- a/examples/tensorflow/multiple-choice/README.md
+++ b/examples/tensorflow/multiple-choice/README.md
@@ -36,7 +36,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_swag.py \
- --model_name_or_path distilbert-base-cased \
+ --model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--do_eval \
--do_train
diff --git a/examples/tensorflow/question-answering/README.md b/examples/tensorflow/question-answering/README.md
index b347ffad81a..41cc8b7ef30 100644
--- a/examples/tensorflow/question-answering/README.md
+++ b/examples/tensorflow/question-answering/README.md
@@ -47,7 +47,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_qa.py \
---model_name_or_path distilbert-base-cased \
+--model_name_or_path distilbert/distilbert-base-cased \
--output_dir output \
--dataset_name squad \
--do_train \
diff --git a/examples/tensorflow/summarization/run_summarization.py b/examples/tensorflow/summarization/run_summarization.py
index 92c2f11d598..d4430227860 100644
--- a/examples/tensorflow/summarization/run_summarization.py
+++ b/examples/tensorflow/summarization/run_summarization.py
@@ -334,11 +334,11 @@ def main():
# region T5 special-casing
if data_args.source_prefix is None and model_args.model_name_or_path in [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
]:
logger.warning(
"You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with "
diff --git a/examples/tensorflow/test_tensorflow_examples.py b/examples/tensorflow/test_tensorflow_examples.py
index b07d5f7df89..914ea767d0f 100644
--- a/examples/tensorflow/test_tensorflow_examples.py
+++ b/examples/tensorflow/test_tensorflow_examples.py
@@ -107,7 +107,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_text_classification.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--output_dir {tmp_dir}
--overwrite_output_dir
--train_file ./tests/fixtures/tests_samples/MRPC/train.csv
@@ -137,7 +137,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_clm.py
- --model_name_or_path distilgpt2
+ --model_name_or_path distilbert/distilgpt2
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--do_train
@@ -163,7 +163,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_mlm.py
- --model_name_or_path distilroberta-base
+ --model_name_or_path distilbert/distilroberta-base
--train_file ./tests/fixtures/sample_text.txt
--validation_file ./tests/fixtures/sample_text.txt
--max_seq_length 64
@@ -188,7 +188,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_ner.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/conll/sample.json
--validation_file tests/fixtures/tests_samples/conll/sample.json
--output_dir {tmp_dir}
@@ -212,7 +212,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_qa.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--version_2_with_negative
--train_file tests/fixtures/tests_samples/SQUAD/sample.json
--validation_file tests/fixtures/tests_samples/SQUAD/sample.json
@@ -237,7 +237,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_swag.py
- --model_name_or_path bert-base-uncased
+ --model_name_or_path google-bert/bert-base-uncased
--train_file tests/fixtures/tests_samples/swag/sample.json
--validation_file tests/fixtures/tests_samples/swag/sample.json
--output_dir {tmp_dir}
@@ -261,7 +261,7 @@ class ExamplesTests(TestCasePlus):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
run_summarization.py
- --model_name_or_path t5-small
+ --model_name_or_path google-t5/t5-small
--train_file tests/fixtures/tests_samples/xsum/sample.json
--validation_file tests/fixtures/tests_samples/xsum/sample.json
--output_dir {tmp_dir}
diff --git a/examples/tensorflow/text-classification/README.md b/examples/tensorflow/text-classification/README.md
index 39ce9153034..b8bc0b367c4 100644
--- a/examples/tensorflow/text-classification/README.md
+++ b/examples/tensorflow/text-classification/README.md
@@ -71,7 +71,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_text_classification.py \
---model_name_or_path distilbert-base-cased \
+--model_name_or_path distilbert/distilbert-base-cased \
--train_file training_data.json \
--validation_file validation_data.json \
--output_dir output/ \
@@ -103,7 +103,7 @@ README, but for more information you can see the 'Input Datasets' section of
### Example command
```bash
python run_glue.py \
---model_name_or_path distilbert-base-cased \
+--model_name_or_path distilbert/distilbert-base-cased \
--task_name mnli \
--do_train \
--do_eval \
diff --git a/examples/tensorflow/token-classification/README.md b/examples/tensorflow/token-classification/README.md
index 0e5ec84528f..6c8a15c00e8 100644
--- a/examples/tensorflow/token-classification/README.md
+++ b/examples/tensorflow/token-classification/README.md
@@ -27,7 +27,7 @@ The following example fine-tunes BERT on CoNLL-2003:
```bash
python run_ner.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--dataset_name conll2003 \
--output_dir /tmp/test-ner
```
@@ -36,7 +36,7 @@ To run on your own training and validation files, use the following command:
```bash
python run_ner.py \
- --model_name_or_path bert-base-uncased \
+ --model_name_or_path google-bert/bert-base-uncased \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--output_dir /tmp/test-ner
diff --git a/examples/tensorflow/translation/README.md b/examples/tensorflow/translation/README.md
index df5ee9c1ae3..bbe6e27e9c7 100644
--- a/examples/tensorflow/translation/README.md
+++ b/examples/tensorflow/translation/README.md
@@ -29,11 +29,11 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
MBart and some T5 models require special handling.
-T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
+T5 models `google-t5/t5-small`, `google-t5/t5-base`, `google-t5/t5-large`, `google-t5/t5-3b` and `google-t5/t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
```bash
python run_translation.py \
- --model_name_or_path t5-small \
+ --model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--source_lang en \
diff --git a/hubconf.py b/hubconf.py
index f2ef70b73db..412cb27f638 100644
--- a/hubconf.py
+++ b/hubconf.py
@@ -41,12 +41,12 @@ def config(*args, **kwargs):
# Using torch.hub !
import torch
- config = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased') # Download configuration from huggingface.co and cache.
+ config = torch.hub.load('huggingface/transformers', 'config', 'google-bert/bert-base-uncased') # Download configuration from huggingface.co and cache.
config = torch.hub.load('huggingface/transformers', 'config', './test/bert_saved_model/') # E.g. config (or model) was saved using `save_pretrained('./test/saved_model/')`
config = torch.hub.load('huggingface/transformers', 'config', './test/bert_saved_model/my_configuration.json')
- config = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased', output_attentions=True, foo=False)
+ config = torch.hub.load('huggingface/transformers', 'config', 'google-bert/bert-base-uncased', output_attentions=True, foo=False)
assert config.output_attentions == True
- config, unused_kwargs = torch.hub.load('huggingface/transformers', 'config', 'bert-base-uncased', output_attentions=True, foo=False, return_unused_kwargs=True)
+ config, unused_kwargs = torch.hub.load('huggingface/transformers', 'config', 'google-bert/bert-base-uncased', output_attentions=True, foo=False, return_unused_kwargs=True)
assert config.output_attentions == True
assert unused_kwargs == {'foo': False}
@@ -61,7 +61,7 @@ def tokenizer(*args, **kwargs):
# Using torch.hub !
import torch
- tokenizer = torch.hub.load('huggingface/transformers', 'tokenizer', 'bert-base-uncased') # Download vocabulary from huggingface.co and cache.
+ tokenizer = torch.hub.load('huggingface/transformers', 'tokenizer', 'google-bert/bert-base-uncased') # Download vocabulary from huggingface.co and cache.
tokenizer = torch.hub.load('huggingface/transformers', 'tokenizer', './test/bert_saved_model/') # E.g. tokenizer was saved using `save_pretrained('./test/saved_model/')`
"""
@@ -75,9 +75,9 @@ def model(*args, **kwargs):
# Using torch.hub !
import torch
- model = torch.hub.load('huggingface/transformers', 'model', 'bert-base-uncased') # Download model and configuration from huggingface.co and cache.
+ model = torch.hub.load('huggingface/transformers', 'model', 'google-bert/bert-base-uncased') # Download model and configuration from huggingface.co and cache.
model = torch.hub.load('huggingface/transformers', 'model', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
- model = torch.hub.load('huggingface/transformers', 'model', 'bert-base-uncased', output_attentions=True) # Update configuration during loading
+ model = torch.hub.load('huggingface/transformers', 'model', 'google-bert/bert-base-uncased', output_attentions=True) # Update configuration during loading
assert model.config.output_attentions == True
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
config = AutoConfig.from_pretrained('./tf_model/bert_tf_model_config.json')
@@ -94,9 +94,9 @@ def modelForCausalLM(*args, **kwargs):
# Using torch.hub !
import torch
- model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'gpt2') # Download model and configuration from huggingface.co and cache.
+ model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'openai-community/gpt2') # Download model and configuration from huggingface.co and cache.
model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', './test/saved_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
- model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'gpt2', output_attentions=True) # Update configuration during loading
+ model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'openai-community/gpt2', output_attentions=True) # Update configuration during loading
assert model.config.output_attentions == True
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
config = AutoConfig.from_pretrained('./tf_model/gpt_tf_model_config.json')
@@ -112,9 +112,9 @@ def modelForMaskedLM(*args, **kwargs):
# Using torch.hub !
import torch
- model = torch.hub.load('huggingface/transformers', 'modelForMaskedLM', 'bert-base-uncased') # Download model and configuration from huggingface.co and cache.
+ model = torch.hub.load('huggingface/transformers', 'modelForMaskedLM', 'google-bert/bert-base-uncased') # Download model and configuration from huggingface.co and cache.
model = torch.hub.load('huggingface/transformers', 'modelForMaskedLM', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
- model = torch.hub.load('huggingface/transformers', 'modelForMaskedLM', 'bert-base-uncased', output_attentions=True) # Update configuration during loading
+ model = torch.hub.load('huggingface/transformers', 'modelForMaskedLM', 'google-bert/bert-base-uncased', output_attentions=True) # Update configuration during loading
assert model.config.output_attentions == True
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
config = AutoConfig.from_pretrained('./tf_model/bert_tf_model_config.json')
@@ -131,9 +131,9 @@ def modelForSequenceClassification(*args, **kwargs):
# Using torch.hub !
import torch
- model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'bert-base-uncased') # Download model and configuration from huggingface.co and cache.
+ model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'google-bert/bert-base-uncased') # Download model and configuration from huggingface.co and cache.
model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
- model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'bert-base-uncased', output_attentions=True) # Update configuration during loading
+ model = torch.hub.load('huggingface/transformers', 'modelForSequenceClassification', 'google-bert/bert-base-uncased', output_attentions=True) # Update configuration during loading
assert model.config.output_attentions == True
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
config = AutoConfig.from_pretrained('./tf_model/bert_tf_model_config.json')
@@ -150,9 +150,9 @@ def modelForQuestionAnswering(*args, **kwargs):
# Using torch.hub !
import torch
- model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'bert-base-uncased') # Download model and configuration from huggingface.co and cache.
+ model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'google-bert/bert-base-uncased') # Download model and configuration from huggingface.co and cache.
model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', './test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')`
- model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'bert-base-uncased', output_attentions=True) # Update configuration during loading
+ model = torch.hub.load('huggingface/transformers', 'modelForQuestionAnswering', 'google-bert/bert-base-uncased', output_attentions=True) # Update configuration during loading
assert model.config.output_attentions == True
# Loading from a TF checkpoint file instead of a PyTorch model (slower)
config = AutoConfig.from_pretrained('./tf_model/bert_tf_model_config.json')
diff --git a/scripts/benchmark/trainer-benchmark.py b/scripts/benchmark/trainer-benchmark.py
index 903b4e0dd6d..9eab3f638d7 100755
--- a/scripts/benchmark/trainer-benchmark.py
+++ b/scripts/benchmark/trainer-benchmark.py
@@ -54,7 +54,7 @@
#
# CUDA_VISIBLE_DEVICES=0 python ./scripts/benchmark/trainer-benchmark.py \
# --base-cmd \
-# ' examples/pytorch/translation/run_translation.py --model_name_or_path t5-small \
+# ' examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small \
# --output_dir output_dir --do_train --label_smoothing 0.1 --logging_strategy no \
# --save_strategy no --per_device_train_batch_size 32 --max_source_length 512 \
# --max_target_length 512 --num_train_epochs 1 --overwrite_output_dir \
diff --git a/src/transformers/benchmark/benchmark_args_utils.py b/src/transformers/benchmark/benchmark_args_utils.py
index 48fcb311b43..b63d792986c 100644
--- a/src/transformers/benchmark/benchmark_args_utils.py
+++ b/src/transformers/benchmark/benchmark_args_utils.py
@@ -151,7 +151,7 @@ class BenchmarkArguments:
if len(self.models) <= 0:
raise ValueError(
"Please make sure you provide at least one model name / model identifier, *e.g.* `--models"
- " bert-base-cased` or `args.models = ['bert-base-cased']."
+ " google-bert/bert-base-cased` or `args.models = ['google-bert/bert-base-cased']."
)
return self.models
diff --git a/src/transformers/commands/add_new_model_like.py b/src/transformers/commands/add_new_model_like.py
index df86a22799a..3b7fcdf19f8 100644
--- a/src/transformers/commands/add_new_model_like.py
+++ b/src/transformers/commands/add_new_model_like.py
@@ -1674,7 +1674,7 @@ def get_user_input():
"What will be the name of the config class for this model? ", default_value=f"{model_camel_cased}Config"
)
checkpoint = get_user_field(
- "Please give a checkpoint identifier (on the model Hub) for this new model (e.g. facebook/roberta-base): "
+ "Please give a checkpoint identifier (on the model Hub) for this new model (e.g. facebook/FacebookAI/roberta-base): "
)
old_processing_classes = [
diff --git a/src/transformers/commands/train.py b/src/transformers/commands/train.py
index bdcbae9e01b..5c264dbb068 100644
--- a/src/transformers/commands/train.py
+++ b/src/transformers/commands/train.py
@@ -82,7 +82,7 @@ class TrainCommand(BaseTransformersCLICommand):
"--task", type=str, default="text_classification", help="Task to train the model on."
)
train_parser.add_argument(
- "--model", type=str, default="bert-base-uncased", help="Model's name or path to stored model."
+ "--model", type=str, default="google-bert/bert-base-uncased", help="Model's name or path to stored model."
)
train_parser.add_argument("--train_batch_size", type=int, default=32, help="Batch size for training.")
train_parser.add_argument("--valid_batch_size", type=int, default=64, help="Batch size for validation.")
diff --git a/src/transformers/configuration_utils.py b/src/transformers/configuration_utils.py
index bd7c5b0c7fe..819fe5fcf28 100755
--- a/src/transformers/configuration_utils.py
+++ b/src/transformers/configuration_utils.py
@@ -527,8 +527,7 @@ class PretrainedConfig(PushToHubMixin):
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PretrainedConfig.save_pretrained`] method, e.g., `./my_model_directory/`.
- a path or url to a saved configuration JSON *file*, e.g., `./my_model_directory/configuration.json`.
@@ -581,16 +580,16 @@ class PretrainedConfig(PushToHubMixin):
# We can't instantiate directly the base class *PretrainedConfig* so let's show the examples on a
# derived class: BertConfig
config = BertConfig.from_pretrained(
- "bert-base-uncased"
+ "google-bert/bert-base-uncased"
) # Download configuration from huggingface.co and cache.
config = BertConfig.from_pretrained(
"./test/saved_model/"
) # E.g. config (or model) was saved using *save_pretrained('./test/saved_model/')*
config = BertConfig.from_pretrained("./test/saved_model/my_configuration.json")
- config = BertConfig.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
+ config = BertConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
assert config.output_attentions == True
config, unused_kwargs = BertConfig.from_pretrained(
- "bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
+ "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
)
assert config.output_attentions == True
assert unused_kwargs == {"foo": False}
diff --git a/src/transformers/convert_graph_to_onnx.py b/src/transformers/convert_graph_to_onnx.py
index 4538f381f2e..e3270bb9deb 100644
--- a/src/transformers/convert_graph_to_onnx.py
+++ b/src/transformers/convert_graph_to_onnx.py
@@ -61,9 +61,9 @@ class OnnxConverterArgumentParser(ArgumentParser):
"--model",
type=str,
required=True,
- help="Model's id or path (ex: bert-base-cased)",
+ help="Model's id or path (ex: google-bert/bert-base-cased)",
)
- self.add_argument("--tokenizer", type=str, help="Tokenizer's id or path (ex: bert-base-cased)")
+ self.add_argument("--tokenizer", type=str, help="Tokenizer's id or path (ex: google-bert/bert-base-cased)")
self.add_argument(
"--framework",
type=str,
diff --git a/src/transformers/convert_pytorch_checkpoint_to_tf2.py b/src/transformers/convert_pytorch_checkpoint_to_tf2.py
index 26b19a4e81f..12f89ff2e57 100755
--- a/src/transformers/convert_pytorch_checkpoint_to_tf2.py
+++ b/src/transformers/convert_pytorch_checkpoint_to_tf2.py
@@ -148,19 +148,19 @@ MODEL_CLASSES = {
BertForPreTraining,
BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
),
- "bert-large-uncased-whole-word-masking-finetuned-squad": (
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": (
BertConfig,
TFBertForQuestionAnswering,
BertForQuestionAnswering,
BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
),
- "bert-large-cased-whole-word-masking-finetuned-squad": (
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": (
BertConfig,
TFBertForQuestionAnswering,
BertForQuestionAnswering,
BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
),
- "bert-base-cased-finetuned-mrpc": (
+ "google-bert/bert-base-cased-finetuned-mrpc": (
BertConfig,
TFBertForSequenceClassification,
BertForSequenceClassification,
@@ -178,7 +178,7 @@ MODEL_CLASSES = {
DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST,
DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST,
),
- "gpt2": (
+ "openai-community/gpt2": (
GPT2Config,
TFGPT2LMHeadModel,
GPT2LMHeadModel,
@@ -208,7 +208,7 @@ MODEL_CLASSES = {
TransfoXLLMHeadModel,
TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP,
),
- "openai-gpt": (
+ "openai-community/openai-gpt": (
OpenAIGPTConfig,
TFOpenAIGPTLMHeadModel,
OpenAIGPTLMHeadModel,
@@ -227,7 +227,7 @@ MODEL_CLASSES = {
LayoutLMForMaskedLM,
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
),
- "roberta-large-mnli": (
+ "FacebookAI/roberta-large-mnli": (
RobertaConfig,
TFRobertaForSequenceClassification,
RobertaForSequenceClassification,
@@ -269,7 +269,7 @@ MODEL_CLASSES = {
LxmertVisualFeatureEncoder,
LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
),
- "ctrl": (
+ "Salesforce/ctrl": (
CTRLConfig,
TFCTRLLMHeadModel,
CTRLLMHeadModel,
diff --git a/src/transformers/convert_tf_hub_seq_to_seq_bert_to_pytorch.py b/src/transformers/convert_tf_hub_seq_to_seq_bert_to_pytorch.py
index 9be405f4719..2b003d4bc48 100755
--- a/src/transformers/convert_tf_hub_seq_to_seq_bert_to_pytorch.py
+++ b/src/transformers/convert_tf_hub_seq_to_seq_bert_to_pytorch.py
@@ -33,7 +33,7 @@ logging.set_verbosity_info()
def convert_tf_checkpoint_to_pytorch(tf_hub_path, pytorch_dump_path, is_encoder_named_decoder, vocab_size, is_encoder):
# Initialise PyTorch model
bert_config = BertConfig.from_pretrained(
- "bert-large-cased",
+ "google-bert/bert-large-cased",
vocab_size=vocab_size,
max_position_embeddings=512,
is_decoder=True,
diff --git a/src/transformers/dynamic_module_utils.py b/src/transformers/dynamic_module_utils.py
index 7cdc0ad93d5..2236b30f778 100644
--- a/src/transformers/dynamic_module_utils.py
+++ b/src/transformers/dynamic_module_utils.py
@@ -224,8 +224,7 @@ def get_cached_module_file(
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
@@ -401,6 +400,8 @@ def get_class_from_dynamic_module(
+
+
Args:
class_reference (`str`):
The full name of the class to load, including its module and optionally its repo.
@@ -408,8 +409,7 @@ def get_class_from_dynamic_module(
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
diff --git a/src/transformers/feature_extraction_utils.py b/src/transformers/feature_extraction_utils.py
index fe1f7a78c93..bed343e48d6 100644
--- a/src/transformers/feature_extraction_utils.py
+++ b/src/transformers/feature_extraction_utils.py
@@ -281,8 +281,7 @@ class FeatureExtractionMixin(PushToHubMixin):
This can be either:
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a feature extractor file saved using the
[`~feature_extraction_utils.FeatureExtractionMixin.save_pretrained`] method, e.g.,
`./my_model_directory/`.
diff --git a/src/transformers/generation/configuration_utils.py b/src/transformers/generation/configuration_utils.py
index 4c3cdc12a44..ad8cfd796b4 100644
--- a/src/transformers/generation/configuration_utils.py
+++ b/src/transformers/generation/configuration_utils.py
@@ -636,8 +636,7 @@ class GenerationConfig(PushToHubMixin):
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~GenerationConfig.save_pretrained`] method, e.g., `./my_model_directory/`.
config_file_name (`str` or `os.PathLike`, *optional*, defaults to `"generation_config.json"`):
@@ -691,7 +690,7 @@ class GenerationConfig(PushToHubMixin):
>>> from transformers import GenerationConfig
>>> # Download configuration from huggingface.co and cache.
- >>> generation_config = GenerationConfig.from_pretrained("gpt2")
+ >>> generation_config = GenerationConfig.from_pretrained("openai-community/gpt2")
>>> # E.g. config was saved using *save_pretrained('./test/saved_model/')*
>>> generation_config.save_pretrained("./test/saved_model/")
@@ -704,7 +703,7 @@ class GenerationConfig(PushToHubMixin):
>>> # If you'd like to try a minor variation to an existing configuration, you can also pass generation
>>> # arguments to `.from_pretrained()`. Be mindful that typos and unused arguments will be ignored
>>> generation_config, unused_kwargs = GenerationConfig.from_pretrained(
- ... "gpt2", top_k=1, foo=False, do_sample=True, return_unused_kwargs=True
+ ... "openai-community/gpt2", top_k=1, foo=False, do_sample=True, return_unused_kwargs=True
... )
>>> generation_config.top_k
1
diff --git a/src/transformers/generation/logits_process.py b/src/transformers/generation/logits_process.py
index 04120e39fbd..aa773f3bc6a 100644
--- a/src/transformers/generation/logits_process.py
+++ b/src/transformers/generation/logits_process.py
@@ -246,8 +246,8 @@ class TemperatureLogitsWarper(LogitsWarper):
>>> set_seed(0) # for reproducibility
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> model.config.pad_token_id = model.config.eos_token_id
>>> inputs = tokenizer(["Hugging Face Company is"], return_tensors="pt")
@@ -306,8 +306,8 @@ class RepetitionPenaltyLogitsProcessor(LogitsProcessor):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> # Initializing the model and tokenizer for it
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer(["I'm not going to"], return_tensors="pt")
>>> # This shows a normal generate without any specific parameters
@@ -414,8 +414,8 @@ class TopPLogitsWarper(LogitsWarper):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
>>> set_seed(0)
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt")
@@ -478,8 +478,8 @@ class TopKLogitsWarper(LogitsWarper):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
>>> set_seed(0)
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: A, B, C, D", return_tensors="pt")
@@ -619,8 +619,8 @@ class EpsilonLogitsWarper(LogitsWarper):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
>>> set_seed(0)
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt")
@@ -696,8 +696,8 @@ class EtaLogitsWarper(LogitsWarper):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
>>> set_seed(0)
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt")
@@ -840,8 +840,8 @@ class NoRepeatNGramLogitsProcessor(LogitsProcessor):
```py
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer(["Today I"], return_tensors="pt")
>>> output = model.generate(**inputs)
@@ -967,8 +967,8 @@ class SequenceBiasLogitsProcessor(LogitsProcessor):
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> inputs = tokenizer(["The full name of Donald is Donald"], return_tensors="pt")
>>> summary_ids = model.generate(inputs["input_ids"], max_new_tokens=4)
@@ -976,7 +976,7 @@ class SequenceBiasLogitsProcessor(LogitsProcessor):
The full name of Donald is Donald J. Trump Jr
>>> # Now let's control generation through a bias. Please note that the tokenizer is initialized differently!
- >>> tokenizer_with_prefix_space = AutoTokenizer.from_pretrained("gpt2", add_prefix_space=True)
+ >>> tokenizer_with_prefix_space = AutoTokenizer.from_pretrained("openai-community/gpt2", add_prefix_space=True)
>>> def get_tokens_as_tuple(word):
@@ -1112,8 +1112,8 @@ class NoBadWordsLogitsProcessor(SequenceBiasLogitsProcessor):
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> inputs = tokenizer(["In a word, the cake is a"], return_tensors="pt")
>>> output_ids = model.generate(inputs["input_ids"], max_new_tokens=5, pad_token_id=tokenizer.eos_token_id)
@@ -1121,7 +1121,7 @@ class NoBadWordsLogitsProcessor(SequenceBiasLogitsProcessor):
In a word, the cake is a bit of a mess.
>>> # Now let's take the bad words out. Please note that the tokenizer is initialized differently
- >>> tokenizer_with_prefix_space = AutoTokenizer.from_pretrained("gpt2", add_prefix_space=True)
+ >>> tokenizer_with_prefix_space = AutoTokenizer.from_pretrained("openai-community/gpt2", add_prefix_space=True)
>>> def get_tokens_as_list(word_list):
@@ -1272,8 +1272,8 @@ class HammingDiversityLogitsProcessor(LogitsProcessor):
>>> import torch
>>> # Initialize the model and tokenizer
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> # A long text about the solar system
>>> text = (
@@ -1436,8 +1436,8 @@ class ForcedEOSTokenLogitsProcessor(LogitsProcessor):
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: 1, 2, 3", return_tensors="pt")
@@ -1511,8 +1511,8 @@ class ExponentialDecayLengthPenalty(LogitsProcessor):
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> text = "Just wanted to let you know, I"
>>> inputs = tokenizer(text, return_tensors="pt")
@@ -1595,8 +1595,8 @@ class LogitNormalization(LogitsProcessor, LogitsWarper):
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> import torch
- >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
>>> inputs = tokenizer("A sequence: 1, 2, 3", return_tensors="pt")
@@ -2083,8 +2083,8 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(LogitsProcessor):
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> inputs = tokenizer(["Today, a dragon flew over Paris, France,"], return_tensors="pt")
>>> out = model.generate(inputs["input_ids"], guidance_scale=1.5)
>>> tokenizer.batch_decode(out, skip_special_tokens=True)[0]
diff --git a/src/transformers/generation/streamers.py b/src/transformers/generation/streamers.py
index 4b299db5da6..c75b43466af 100644
--- a/src/transformers/generation/streamers.py
+++ b/src/transformers/generation/streamers.py
@@ -58,8 +58,8 @@ class TextStreamer(BaseStreamer):
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
- >>> tok = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tok = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
>>> streamer = TextStreamer(tok)
@@ -185,8 +185,8 @@ class TextIteratorStreamer(TextStreamer):
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
>>> from threading import Thread
- >>> tok = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tok = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
>>> streamer = TextIteratorStreamer(tok)
diff --git a/src/transformers/generation/tf_utils.py b/src/transformers/generation/tf_utils.py
index 7e015d718e7..3021e1e5594 100644
--- a/src/transformers/generation/tf_utils.py
+++ b/src/transformers/generation/tf_utils.py
@@ -511,8 +511,8 @@ class TFGenerationMixin:
>>> from transformers import GPT2Tokenizer, TFAutoModelForCausalLM
>>> import numpy as np
- >>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- >>> model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="tf")
@@ -1583,8 +1583,8 @@ class TFGenerationMixin:
... TFMinLengthLogitsProcessor,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
@@ -1857,8 +1857,8 @@ class TFGenerationMixin:
... TFTemperatureLogitsWarper,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = TFAutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a EOS token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
@@ -2180,8 +2180,8 @@ class TFGenerationMixin:
... )
>>> import tensorflow as tf
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = TFAutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="tf").input_ids
diff --git a/src/transformers/generation/utils.py b/src/transformers/generation/utils.py
index 87d14d2c85e..d131b2f8d59 100644
--- a/src/transformers/generation/utils.py
+++ b/src/transformers/generation/utils.py
@@ -976,7 +976,7 @@ class GenerationMixin:
>>> import numpy as np
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="pt")
@@ -2263,8 +2263,8 @@ class GenerationMixin:
... MaxLengthCriteria,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
@@ -2530,8 +2530,8 @@ class GenerationMixin:
... )
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a EOS token
>>> model.config.pad_token_id = model.config.eos_token_id
@@ -2838,8 +2838,8 @@ class GenerationMixin:
... )
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
@@ -2959,7 +2959,16 @@ class GenerationMixin:
if sequential:
if any(
model_name in self.__class__.__name__.lower()
- for model_name in ["fsmt", "reformer", "bloom", "ctrl", "gpt_bigcode", "transo_xl", "xlnet", "cpm"]
+ for model_name in [
+ "fsmt",
+ "reformer",
+ "bloom",
+ "ctrl",
+ "gpt_bigcode",
+ "transo_xl",
+ "xlnet",
+ "cpm",
+ ]
):
raise RuntimeError(
f"Currently generation for {self.__class__.__name__} is not supported "
@@ -3203,8 +3212,8 @@ class GenerationMixin:
... )
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
@@ -3535,8 +3544,8 @@ class GenerationMixin:
... )
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
@@ -3925,8 +3934,8 @@ class GenerationMixin:
... )
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
- >>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
@@ -4277,9 +4286,9 @@ class GenerationMixin:
... MaxLengthCriteria,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
- >>> assistant_model = AutoModelForCausalLM.from_pretrained("distilgpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+ >>> assistant_model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
>>> # set pad_token_id to eos_token_id because GPT2 does not have a PAD token
>>> model.generation_config.pad_token_id = model.generation_config.eos_token_id
>>> input_prompt = "It might be possible to"
diff --git a/src/transformers/image_processing_utils.py b/src/transformers/image_processing_utils.py
index 4a7b06621a4..a2004a8b559 100644
--- a/src/transformers/image_processing_utils.py
+++ b/src/transformers/image_processing_utils.py
@@ -111,8 +111,7 @@ class ImageProcessingMixin(PushToHubMixin):
This can be either:
- a string, the *model id* of a pretrained image_processor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a image processor file saved using the
[`~image_processing_utils.ImageProcessingMixin.save_pretrained`] method, e.g.,
`./my_model_directory/`.
diff --git a/src/transformers/integrations/bitsandbytes.py b/src/transformers/integrations/bitsandbytes.py
index 43aeaf6708d..d58e749f824 100644
--- a/src/transformers/integrations/bitsandbytes.py
+++ b/src/transformers/integrations/bitsandbytes.py
@@ -76,7 +76,7 @@ def set_module_quantized_tensor_to_device(module, tensor_name, device, value=Non
else:
new_value = torch.tensor(value, device="cpu")
- # Support models using `Conv1D` in place of `nn.Linear` (e.g. gpt2) by transposing the weight matrix prior to quantization.
+ # Support models using `Conv1D` in place of `nn.Linear` (e.g. openai-community/gpt2) by transposing the weight matrix prior to quantization.
# Since weights are saved in the correct "orientation", we skip transposing when loading.
if issubclass(module.source_cls, Conv1D) and not prequantized_loading:
new_value = new_value.T
diff --git a/src/transformers/modelcard.py b/src/transformers/modelcard.py
index 9e8f2becae0..4776737a374 100644
--- a/src/transformers/modelcard.py
+++ b/src/transformers/modelcard.py
@@ -131,8 +131,6 @@ class ModelCard:
pretrained_model_name_or_path: either:
- a string, the *model id* of a pretrained model card hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a model card file saved using the [`~ModelCard.save_pretrained`]
method, e.g.: `./my_model_directory/`.
- a path or url to a saved model card JSON *file*, e.g.: `./my_model_directory/modelcard.json`.
@@ -163,11 +161,11 @@ class ModelCard:
```python
# Download model card from huggingface.co and cache.
- modelcard = ModelCard.from_pretrained("bert-base-uncased")
+ modelcard = ModelCard.from_pretrained("google-bert/bert-base-uncased")
# Model card was saved using *save_pretrained('./test/saved_model/')*
modelcard = ModelCard.from_pretrained("./test/saved_model/")
modelcard = ModelCard.from_pretrained("./test/saved_model/modelcard.json")
- modelcard = ModelCard.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
+ modelcard = ModelCard.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
```"""
cache_dir = kwargs.pop("cache_dir", None)
proxies = kwargs.pop("proxies", None)
diff --git a/src/transformers/modeling_flax_utils.py b/src/transformers/modeling_flax_utils.py
index b57458c0826..eaf5410bc2f 100644
--- a/src/transformers/modeling_flax_utils.py
+++ b/src/transformers/modeling_flax_utils.py
@@ -347,14 +347,14 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
>>> from transformers import FlaxBertModel
>>> # load model
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model parameters will be in fp32 precision, to cast these to bfloat16 precision
>>> model.params = model.to_bf16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
@@ -383,7 +383,7 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
>>> from transformers import FlaxBertModel
>>> # Download model and configuration from huggingface.co
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to illustrate the use of this method,
>>> # we'll first cast to fp16 and back to fp32
>>> model.params = model.to_f16(model.params)
@@ -413,14 +413,14 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
>>> from transformers import FlaxBertModel
>>> # load model
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to cast these to float16
>>> model.params = model.to_fp16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
@@ -545,8 +545,6 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *pt index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In this case,
@@ -639,7 +637,7 @@ class FlaxPreTrainedModel(PushToHubMixin, FlaxGenerationMixin):
>>> from transformers import BertConfig, FlaxBertModel
>>> # Download model and configuration from huggingface.co and cache.
- >>> model = FlaxBertModel.from_pretrained("bert-base-cased")
+ >>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = FlaxBertModel.from_pretrained("./test/saved_model/")
>>> # Loading from a PyTorch checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py
index f8b1122d467..92f713a9706 100644
--- a/src/transformers/modeling_tf_utils.py
+++ b/src/transformers/modeling_tf_utils.py
@@ -2493,8 +2493,6 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch state_dict save file* (e.g, `./pt_model/pytorch_model.bin`). In this
@@ -2592,11 +2590,11 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT
>>> from transformers import BertConfig, TFBertModel
>>> # Download model and configuration from huggingface.co and cache.
- >>> model = TFBertModel.from_pretrained("bert-base-uncased")
+ >>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = TFBertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
- >>> model = TFBertModel.from_pretrained("bert-base-uncased", output_attentions=True)
+ >>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a Pytorch model file instead of a TensorFlow checkpoint (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./pt_model/my_pt_model_config.json")
@@ -3075,7 +3073,7 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT
```python
from transformers import TFAutoModel
- model = TFAutoModel.from_pretrained("bert-base-cased")
+ model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")
# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")
diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py
index 0d9050f5fad..25731ced53f 100644
--- a/src/transformers/modeling_utils.py
+++ b/src/transformers/modeling_utils.py
@@ -1251,7 +1251,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
```python
from transformers import AutoModel
- model = AutoModel.from_pretrained("bert-base-cased")
+ model = AutoModel.from_pretrained("google-bert/bert-base-cased")
model.add_model_tags(["custom", "custom-bert"])
@@ -2608,8 +2608,6 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -2788,17 +2786,17 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
>>> from transformers import BertConfig, BertModel
>>> # Download model and configuration from huggingface.co and cache.
- >>> model = BertModel.from_pretrained("bert-base-uncased")
+ >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
- >>> model = BertModel.from_pretrained("bert-base-uncased", output_attentions=True)
+ >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
>>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
>>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
- >>> model = BertModel.from_pretrained("bert-base-uncased", from_flax=True)
+ >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)
```
* `low_cpu_mem_usage` algorithm:
diff --git a/src/transformers/models/albert/configuration_albert.py b/src/transformers/models/albert/configuration_albert.py
index cacc0499035..690be7fbbf2 100644
--- a/src/transformers/models/albert/configuration_albert.py
+++ b/src/transformers/models/albert/configuration_albert.py
@@ -22,14 +22,14 @@ from ...onnx import OnnxConfig
ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "albert-base-v1": "https://huggingface.co/albert-base-v1/resolve/main/config.json",
- "albert-large-v1": "https://huggingface.co/albert-large-v1/resolve/main/config.json",
- "albert-xlarge-v1": "https://huggingface.co/albert-xlarge-v1/resolve/main/config.json",
- "albert-xxlarge-v1": "https://huggingface.co/albert-xxlarge-v1/resolve/main/config.json",
- "albert-base-v2": "https://huggingface.co/albert-base-v2/resolve/main/config.json",
- "albert-large-v2": "https://huggingface.co/albert-large-v2/resolve/main/config.json",
- "albert-xlarge-v2": "https://huggingface.co/albert-xlarge-v2/resolve/main/config.json",
- "albert-xxlarge-v2": "https://huggingface.co/albert-xxlarge-v2/resolve/main/config.json",
+ "albert/albert-base-v1": "https://huggingface.co/albert/albert-base-v1/resolve/main/config.json",
+ "albert/albert-large-v1": "https://huggingface.co/albert/albert-large-v1/resolve/main/config.json",
+ "albert/albert-xlarge-v1": "https://huggingface.co/albert/albert-xlarge-v1/resolve/main/config.json",
+ "albert/albert-xxlarge-v1": "https://huggingface.co/albert/albert-xxlarge-v1/resolve/main/config.json",
+ "albert/albert-base-v2": "https://huggingface.co/albert/albert-base-v2/resolve/main/config.json",
+ "albert/albert-large-v2": "https://huggingface.co/albert/albert-large-v2/resolve/main/config.json",
+ "albert/albert-xlarge-v2": "https://huggingface.co/albert/albert-xlarge-v2/resolve/main/config.json",
+ "albert/albert-xxlarge-v2": "https://huggingface.co/albert/albert-xxlarge-v2/resolve/main/config.json",
}
@@ -38,7 +38,7 @@ class AlbertConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`AlbertModel`] or a [`TFAlbertModel`]. It is used
to instantiate an ALBERT model according to the specified arguments, defining the model architecture. Instantiating
a configuration with the defaults will yield a similar configuration to that of the ALBERT
- [albert-xxlarge-v2](https://huggingface.co/albert-xxlarge-v2) architecture.
+ [albert/albert-xxlarge-v2](https://huggingface.co/albert/albert-xxlarge-v2) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/albert/modeling_albert.py b/src/transformers/models/albert/modeling_albert.py
index fe6b3773233..25ae832b03a 100755
--- a/src/transformers/models/albert/modeling_albert.py
+++ b/src/transformers/models/albert/modeling_albert.py
@@ -48,19 +48,19 @@ from .configuration_albert import AlbertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "albert-base-v2"
+_CHECKPOINT_FOR_DOC = "albert/albert-base-v2"
_CONFIG_FOR_DOC = "AlbertConfig"
ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "albert-base-v1",
- "albert-large-v1",
- "albert-xlarge-v1",
- "albert-xxlarge-v1",
- "albert-base-v2",
- "albert-large-v2",
- "albert-xlarge-v2",
- "albert-xxlarge-v2",
+ "albert/albert-base-v1",
+ "albert/albert-large-v1",
+ "albert/albert-xlarge-v1",
+ "albert/albert-xxlarge-v1",
+ "albert/albert-base-v2",
+ "albert/albert-large-v2",
+ "albert/albert-xlarge-v2",
+ "albert/albert-xxlarge-v2",
# See all ALBERT models at https://huggingface.co/models?filter=albert
]
@@ -816,8 +816,8 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
>>> from transformers import AutoTokenizer, AlbertForPreTraining
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
- >>> model = AlbertForPreTraining.from_pretrained("albert-base-v2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
+ >>> model = AlbertForPreTraining.from_pretrained("albert/albert-base-v2")
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
>>> # Batch size 1
@@ -958,8 +958,8 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
>>> import torch
>>> from transformers import AutoTokenizer, AlbertForMaskedLM
- >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
- >>> model = AlbertForMaskedLM.from_pretrained("albert-base-v2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
+ >>> model = AlbertForMaskedLM.from_pretrained("albert/albert-base-v2")
>>> # add mask_token
>>> inputs = tokenizer("The capital of [MASK] is Paris.", return_tensors="pt")
diff --git a/src/transformers/models/albert/modeling_flax_albert.py b/src/transformers/models/albert/modeling_flax_albert.py
index 6333f0bd3ac..b2c01ded361 100644
--- a/src/transformers/models/albert/modeling_flax_albert.py
+++ b/src/transformers/models/albert/modeling_flax_albert.py
@@ -47,7 +47,7 @@ from .configuration_albert import AlbertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "albert-base-v2"
+_CHECKPOINT_FOR_DOC = "albert/albert-base-v2"
_CONFIG_FOR_DOC = "AlbertConfig"
@@ -754,8 +754,8 @@ FLAX_ALBERT_FOR_PRETRAINING_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxAlbertForPreTraining
- >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
- >>> model = FlaxAlbertForPreTraining.from_pretrained("albert-base-v2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
+ >>> model = FlaxAlbertForPreTraining.from_pretrained("albert/albert-base-v2")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np")
>>> outputs = model(**inputs)
diff --git a/src/transformers/models/albert/modeling_tf_albert.py b/src/transformers/models/albert/modeling_tf_albert.py
index acdc8c886c5..1225465c526 100644
--- a/src/transformers/models/albert/modeling_tf_albert.py
+++ b/src/transformers/models/albert/modeling_tf_albert.py
@@ -62,18 +62,18 @@ from .configuration_albert import AlbertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "albert-base-v2"
+_CHECKPOINT_FOR_DOC = "albert/albert-base-v2"
_CONFIG_FOR_DOC = "AlbertConfig"
TF_ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "albert-base-v1",
- "albert-large-v1",
- "albert-xlarge-v1",
- "albert-xxlarge-v1",
- "albert-base-v2",
- "albert-large-v2",
- "albert-xlarge-v2",
- "albert-xxlarge-v2",
+ "albert/albert-base-v1",
+ "albert/albert-large-v1",
+ "albert/albert-xlarge-v1",
+ "albert/albert-xxlarge-v1",
+ "albert/albert-base-v2",
+ "albert/albert-large-v2",
+ "albert/albert-xlarge-v2",
+ "albert/albert-xxlarge-v2",
# See all ALBERT models at https://huggingface.co/models?filter=albert
]
@@ -971,8 +971,8 @@ class TFAlbertForPreTraining(TFAlbertPreTrainedModel, TFAlbertPreTrainingLoss):
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFAlbertForPreTraining
- >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
- >>> model = TFAlbertForPreTraining.from_pretrained("albert-base-v2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
+ >>> model = TFAlbertForPreTraining.from_pretrained("albert/albert-base-v2")
>>> input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True))[None, :]
>>> # Batch size 1
@@ -1103,8 +1103,8 @@ class TFAlbertForMaskedLM(TFAlbertPreTrainedModel, TFMaskedLanguageModelingLoss)
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFAlbertForMaskedLM
- >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
- >>> model = TFAlbertForMaskedLM.from_pretrained("albert-base-v2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
+ >>> model = TFAlbertForMaskedLM.from_pretrained("albert/albert-base-v2")
>>> # add mask_token
>>> inputs = tokenizer(f"The capital of [MASK] is Paris.", return_tensors="tf")
diff --git a/src/transformers/models/albert/tokenization_albert.py b/src/transformers/models/albert/tokenization_albert.py
index 3ff31919952..7baaa0a6000 100644
--- a/src/transformers/models/albert/tokenization_albert.py
+++ b/src/transformers/models/albert/tokenization_albert.py
@@ -31,26 +31,26 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "albert-base-v1": "https://huggingface.co/albert-base-v1/resolve/main/spiece.model",
- "albert-large-v1": "https://huggingface.co/albert-large-v1/resolve/main/spiece.model",
- "albert-xlarge-v1": "https://huggingface.co/albert-xlarge-v1/resolve/main/spiece.model",
- "albert-xxlarge-v1": "https://huggingface.co/albert-xxlarge-v1/resolve/main/spiece.model",
- "albert-base-v2": "https://huggingface.co/albert-base-v2/resolve/main/spiece.model",
- "albert-large-v2": "https://huggingface.co/albert-large-v2/resolve/main/spiece.model",
- "albert-xlarge-v2": "https://huggingface.co/albert-xlarge-v2/resolve/main/spiece.model",
- "albert-xxlarge-v2": "https://huggingface.co/albert-xxlarge-v2/resolve/main/spiece.model",
+ "albert/albert-base-v1": "https://huggingface.co/albert/albert-base-v1/resolve/main/spiece.model",
+ "albert/albert-large-v1": "https://huggingface.co/albert/albert-large-v1/resolve/main/spiece.model",
+ "albert/albert-xlarge-v1": "https://huggingface.co/albert/albert-xlarge-v1/resolve/main/spiece.model",
+ "albert/albert-xxlarge-v1": "https://huggingface.co/albert/albert-xxlarge-v1/resolve/main/spiece.model",
+ "albert/albert-base-v2": "https://huggingface.co/albert/albert-base-v2/resolve/main/spiece.model",
+ "albert/albert-large-v2": "https://huggingface.co/albert/albert-large-v2/resolve/main/spiece.model",
+ "albert/albert-xlarge-v2": "https://huggingface.co/albert/albert-xlarge-v2/resolve/main/spiece.model",
+ "albert/albert-xxlarge-v2": "https://huggingface.co/albert/albert-xxlarge-v2/resolve/main/spiece.model",
}
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "albert-base-v1": 512,
- "albert-large-v1": 512,
- "albert-xlarge-v1": 512,
- "albert-xxlarge-v1": 512,
- "albert-base-v2": 512,
- "albert-large-v2": 512,
- "albert-xlarge-v2": 512,
- "albert-xxlarge-v2": 512,
+ "albert/albert-base-v1": 512,
+ "albert/albert-large-v1": 512,
+ "albert/albert-xlarge-v1": 512,
+ "albert/albert-xxlarge-v1": 512,
+ "albert/albert-base-v2": 512,
+ "albert/albert-large-v2": 512,
+ "albert/albert-xlarge-v2": 512,
+ "albert/albert-xxlarge-v2": 512,
}
SPIECE_UNDERLINE = "▁"
diff --git a/src/transformers/models/albert/tokenization_albert_fast.py b/src/transformers/models/albert/tokenization_albert_fast.py
index 200953f8e6b..91cf403d07e 100644
--- a/src/transformers/models/albert/tokenization_albert_fast.py
+++ b/src/transformers/models/albert/tokenization_albert_fast.py
@@ -34,36 +34,36 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model", "tokenizer_file": "tokenizer.
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "albert-base-v1": "https://huggingface.co/albert-base-v1/resolve/main/spiece.model",
- "albert-large-v1": "https://huggingface.co/albert-large-v1/resolve/main/spiece.model",
- "albert-xlarge-v1": "https://huggingface.co/albert-xlarge-v1/resolve/main/spiece.model",
- "albert-xxlarge-v1": "https://huggingface.co/albert-xxlarge-v1/resolve/main/spiece.model",
- "albert-base-v2": "https://huggingface.co/albert-base-v2/resolve/main/spiece.model",
- "albert-large-v2": "https://huggingface.co/albert-large-v2/resolve/main/spiece.model",
- "albert-xlarge-v2": "https://huggingface.co/albert-xlarge-v2/resolve/main/spiece.model",
- "albert-xxlarge-v2": "https://huggingface.co/albert-xxlarge-v2/resolve/main/spiece.model",
+ "albert/albert-base-v1": "https://huggingface.co/albert/albert-base-v1/resolve/main/spiece.model",
+ "albert/albert-large-v1": "https://huggingface.co/albert/albert-large-v1/resolve/main/spiece.model",
+ "albert/albert-xlarge-v1": "https://huggingface.co/albert/albert-xlarge-v1/resolve/main/spiece.model",
+ "albert/albert-xxlarge-v1": "https://huggingface.co/albert/albert-xxlarge-v1/resolve/main/spiece.model",
+ "albert/albert-base-v2": "https://huggingface.co/albert/albert-base-v2/resolve/main/spiece.model",
+ "albert/albert-large-v2": "https://huggingface.co/albert/albert-large-v2/resolve/main/spiece.model",
+ "albert/albert-xlarge-v2": "https://huggingface.co/albert/albert-xlarge-v2/resolve/main/spiece.model",
+ "albert/albert-xxlarge-v2": "https://huggingface.co/albert/albert-xxlarge-v2/resolve/main/spiece.model",
},
"tokenizer_file": {
- "albert-base-v1": "https://huggingface.co/albert-base-v1/resolve/main/tokenizer.json",
- "albert-large-v1": "https://huggingface.co/albert-large-v1/resolve/main/tokenizer.json",
- "albert-xlarge-v1": "https://huggingface.co/albert-xlarge-v1/resolve/main/tokenizer.json",
- "albert-xxlarge-v1": "https://huggingface.co/albert-xxlarge-v1/resolve/main/tokenizer.json",
- "albert-base-v2": "https://huggingface.co/albert-base-v2/resolve/main/tokenizer.json",
- "albert-large-v2": "https://huggingface.co/albert-large-v2/resolve/main/tokenizer.json",
- "albert-xlarge-v2": "https://huggingface.co/albert-xlarge-v2/resolve/main/tokenizer.json",
- "albert-xxlarge-v2": "https://huggingface.co/albert-xxlarge-v2/resolve/main/tokenizer.json",
+ "albert/albert-base-v1": "https://huggingface.co/albert/albert-base-v1/resolve/main/tokenizer.json",
+ "albert/albert-large-v1": "https://huggingface.co/albert/albert-large-v1/resolve/main/tokenizer.json",
+ "albert/albert-xlarge-v1": "https://huggingface.co/albert/albert-xlarge-v1/resolve/main/tokenizer.json",
+ "albert/albert-xxlarge-v1": "https://huggingface.co/albert/albert-xxlarge-v1/resolve/main/tokenizer.json",
+ "albert/albert-base-v2": "https://huggingface.co/albert/albert-base-v2/resolve/main/tokenizer.json",
+ "albert/albert-large-v2": "https://huggingface.co/albert/albert-large-v2/resolve/main/tokenizer.json",
+ "albert/albert-xlarge-v2": "https://huggingface.co/albert/albert-xlarge-v2/resolve/main/tokenizer.json",
+ "albert/albert-xxlarge-v2": "https://huggingface.co/albert/albert-xxlarge-v2/resolve/main/tokenizer.json",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "albert-base-v1": 512,
- "albert-large-v1": 512,
- "albert-xlarge-v1": 512,
- "albert-xxlarge-v1": 512,
- "albert-base-v2": 512,
- "albert-large-v2": 512,
- "albert-xlarge-v2": 512,
- "albert-xxlarge-v2": 512,
+ "albert/albert-base-v1": 512,
+ "albert/albert-large-v1": 512,
+ "albert/albert-xlarge-v1": 512,
+ "albert/albert-xxlarge-v1": 512,
+ "albert/albert-base-v2": 512,
+ "albert/albert-large-v2": 512,
+ "albert/albert-xlarge-v2": 512,
+ "albert/albert-xxlarge-v2": 512,
}
SPIECE_UNDERLINE = "▁"
diff --git a/src/transformers/models/align/convert_align_tf_to_hf.py b/src/transformers/models/align/convert_align_tf_to_hf.py
index 96e98107976..610db8482f9 100644
--- a/src/transformers/models/align/convert_align_tf_to_hf.py
+++ b/src/transformers/models/align/convert_align_tf_to_hf.py
@@ -78,7 +78,7 @@ def get_processor():
include_top=False,
resample=Image.BILINEAR,
)
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
tokenizer.model_max_length = 64
processor = AlignProcessor(image_processor=image_processor, tokenizer=tokenizer)
return processor
diff --git a/src/transformers/models/auto/auto_factory.py b/src/transformers/models/auto/auto_factory.py
index 0ef455fea47..ce7884d2ef1 100644
--- a/src/transformers/models/auto/auto_factory.py
+++ b/src/transformers/models/auto/auto_factory.py
@@ -87,8 +87,6 @@ FROM_PRETRAINED_TORCH_DOCSTRING = """
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -194,8 +192,6 @@ FROM_PRETRAINED_TF_DOCSTRING = """
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch state_dict save file* (e.g, `./pt_model/pytorch_model.bin`). In this
@@ -295,8 +291,6 @@ FROM_PRETRAINED_FLAX_DOCSTRING = """
Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch state_dict save file* (e.g, `./pt_model/pytorch_model.bin`). In this
@@ -642,7 +636,7 @@ def insert_head_doc(docstring, head_doc=""):
)
-def auto_class_update(cls, checkpoint_for_example="bert-base-cased", head_doc=""):
+def auto_class_update(cls, checkpoint_for_example="google-bert/bert-base-cased", head_doc=""):
# Create a new class with the right name from the base class
model_mapping = cls._model_mapping
name = cls.__name__
diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py
index 682241ea4a8..44d435bc45a 100755
--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -1017,8 +1017,7 @@ class AutoConfig:
Can be either:
- A string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- A path to a *directory* containing a configuration file saved using the
[`~PretrainedConfig.save_pretrained`] method, or the [`~PreTrainedModel.save_pretrained`] method,
e.g., `./my_model_directory/`.
@@ -1061,7 +1060,7 @@ class AutoConfig:
>>> from transformers import AutoConfig
>>> # Download configuration from huggingface.co and cache.
- >>> config = AutoConfig.from_pretrained("bert-base-uncased")
+ >>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
>>> # Download configuration from huggingface.co (user-uploaded) and cache.
>>> config = AutoConfig.from_pretrained("dbmdz/bert-base-german-cased")
@@ -1073,12 +1072,12 @@ class AutoConfig:
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/my_configuration.json")
>>> # Change some config attributes when loading a pretrained config.
- >>> config = AutoConfig.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
+ >>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
>>> config.output_attentions
True
>>> config, unused_kwargs = AutoConfig.from_pretrained(
- ... "bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
+ ... "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
... )
>>> config.output_attentions
True
diff --git a/src/transformers/models/auto/feature_extraction_auto.py b/src/transformers/models/auto/feature_extraction_auto.py
index b3461e8b56a..f8cb55091b0 100644
--- a/src/transformers/models/auto/feature_extraction_auto.py
+++ b/src/transformers/models/auto/feature_extraction_auto.py
@@ -155,8 +155,7 @@ def get_feature_extractor_config(
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
@@ -194,14 +193,14 @@ def get_feature_extractor_config(
```python
# Download configuration from huggingface.co and cache.
- tokenizer_config = get_tokenizer_config("bert-base-uncased")
+ tokenizer_config = get_tokenizer_config("google-bert/bert-base-uncased")
# This model does not have a tokenizer config so the result will be an empty dict.
- tokenizer_config = get_tokenizer_config("xlm-roberta-base")
+ tokenizer_config = get_tokenizer_config("FacebookAI/xlm-roberta-base")
# Save a pretrained tokenizer locally and you can reload its config
from transformers import AutoTokenizer
- tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenizer.save_pretrained("tokenizer-test")
tokenizer_config = get_tokenizer_config("tokenizer-test")
```"""
@@ -267,8 +266,7 @@ class AutoFeatureExtractor:
This can be either:
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a feature extractor file saved using the
[`~feature_extraction_utils.FeatureExtractionMixin.save_pretrained`] method, e.g.,
`./my_model_directory/`.
diff --git a/src/transformers/models/auto/image_processing_auto.py b/src/transformers/models/auto/image_processing_auto.py
index 54675f5693c..c9cd6fca69d 100644
--- a/src/transformers/models/auto/image_processing_auto.py
+++ b/src/transformers/models/auto/image_processing_auto.py
@@ -168,8 +168,7 @@ def get_image_processor_config(
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
@@ -207,9 +206,9 @@ def get_image_processor_config(
```python
# Download configuration from huggingface.co and cache.
- image_processor_config = get_image_processor_config("bert-base-uncased")
+ image_processor_config = get_image_processor_config("google-bert/bert-base-uncased")
# This model does not have a image processor config so the result will be an empty dict.
- image_processor_config = get_image_processor_config("xlm-roberta-base")
+ image_processor_config = get_image_processor_config("FacebookAI/xlm-roberta-base")
# Save a pretrained image processor locally and you can reload its config
from transformers import AutoTokenizer
@@ -280,8 +279,7 @@ class AutoImageProcessor:
This can be either:
- a string, the *model id* of a pretrained image_processor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a image processor file saved using the
[`~image_processing_utils.ImageProcessingMixin.save_pretrained`] method, e.g.,
`./my_model_directory/`.
diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py
index 6aa882a5340..1de0249831d 100755
--- a/src/transformers/models/auto/modeling_auto.py
+++ b/src/transformers/models/auto/modeling_auto.py
@@ -1354,7 +1354,7 @@ class AutoModelForSeq2SeqLM(_BaseAutoModelClass):
AutoModelForSeq2SeqLM = auto_class_update(
AutoModelForSeq2SeqLM,
head_doc="sequence-to-sequence language modeling",
- checkpoint_for_example="t5-base",
+ checkpoint_for_example="google-t5/t5-base",
)
diff --git a/src/transformers/models/auto/modeling_flax_auto.py b/src/transformers/models/auto/modeling_flax_auto.py
index 3438e1c7bc7..785035b98fb 100644
--- a/src/transformers/models/auto/modeling_flax_auto.py
+++ b/src/transformers/models/auto/modeling_flax_auto.py
@@ -308,7 +308,9 @@ class FlaxAutoModelForSeq2SeqLM(_BaseAutoModelClass):
FlaxAutoModelForSeq2SeqLM = auto_class_update(
- FlaxAutoModelForSeq2SeqLM, head_doc="sequence-to-sequence language modeling", checkpoint_for_example="t5-base"
+ FlaxAutoModelForSeq2SeqLM,
+ head_doc="sequence-to-sequence language modeling",
+ checkpoint_for_example="google-t5/t5-base",
)
diff --git a/src/transformers/models/auto/modeling_tf_auto.py b/src/transformers/models/auto/modeling_tf_auto.py
index e79922f9282..deed743162e 100644
--- a/src/transformers/models/auto/modeling_tf_auto.py
+++ b/src/transformers/models/auto/modeling_tf_auto.py
@@ -621,7 +621,9 @@ class TFAutoModelForSeq2SeqLM(_BaseAutoModelClass):
TFAutoModelForSeq2SeqLM = auto_class_update(
- TFAutoModelForSeq2SeqLM, head_doc="sequence-to-sequence language modeling", checkpoint_for_example="t5-base"
+ TFAutoModelForSeq2SeqLM,
+ head_doc="sequence-to-sequence language modeling",
+ checkpoint_for_example="google-t5/t5-base",
)
diff --git a/src/transformers/models/auto/processing_auto.py b/src/transformers/models/auto/processing_auto.py
index 2a8823fea7c..e41e39e56ee 100644
--- a/src/transformers/models/auto/processing_auto.py
+++ b/src/transformers/models/auto/processing_auto.py
@@ -156,8 +156,7 @@ class AutoProcessor:
This can be either:
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a processor files saved using the `save_pretrained()` method,
e.g., `./my_model_directory/`.
cache_dir (`str` or `os.PathLike`, *optional*):
diff --git a/src/transformers/models/auto/tokenization_auto.py b/src/transformers/models/auto/tokenization_auto.py
index ff464c578c2..7760369507b 100644
--- a/src/transformers/models/auto/tokenization_auto.py
+++ b/src/transformers/models/auto/tokenization_auto.py
@@ -295,7 +295,10 @@ else:
),
),
("oneformer", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
- ("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)),
+ (
+ "openai-gpt",
+ ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None),
+ ),
("opt", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("owlv2", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
("owlvit", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
@@ -524,8 +527,7 @@ def get_tokenizer_config(
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
@@ -566,14 +568,14 @@ def get_tokenizer_config(
```python
# Download configuration from huggingface.co and cache.
- tokenizer_config = get_tokenizer_config("bert-base-uncased")
+ tokenizer_config = get_tokenizer_config("google-bert/bert-base-uncased")
# This model does not have a tokenizer config so the result will be an empty dict.
- tokenizer_config = get_tokenizer_config("xlm-roberta-base")
+ tokenizer_config = get_tokenizer_config("FacebookAI/xlm-roberta-base")
# Save a pretrained tokenizer locally and you can reload its config
from transformers import AutoTokenizer
- tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenizer.save_pretrained("tokenizer-test")
tokenizer_config = get_tokenizer_config("tokenizer-test")
```"""
@@ -646,8 +648,6 @@ class AutoTokenizer:
Can be either:
- A string, the *model id* of a predefined tokenizer hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing vocabulary files required by the tokenizer, for instance saved
using the [`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
- A path or url to a single saved vocabulary file if and only if the tokenizer only requires a
@@ -697,7 +697,7 @@ class AutoTokenizer:
>>> from transformers import AutoTokenizer
>>> # Download vocabulary from huggingface.co and cache.
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> # Download vocabulary from huggingface.co (user-uploaded) and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-cased")
@@ -706,7 +706,7 @@ class AutoTokenizer:
>>> # tokenizer = AutoTokenizer.from_pretrained("./test/bert_saved_model/")
>>> # Download vocabulary from huggingface.co and define model-specific arguments
- >>> tokenizer = AutoTokenizer.from_pretrained("roberta-base", add_prefix_space=True)
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base", add_prefix_space=True)
```"""
use_auth_token = kwargs.pop("use_auth_token", None)
if use_auth_token is not None:
diff --git a/src/transformers/models/bark/processing_bark.py b/src/transformers/models/bark/processing_bark.py
index b322615ae23..d58b89bf6f8 100644
--- a/src/transformers/models/bark/processing_bark.py
+++ b/src/transformers/models/bark/processing_bark.py
@@ -73,8 +73,7 @@ class BarkProcessor(ProcessorMixin):
This can be either:
- a string, the *model id* of a pretrained [`BarkProcessor`] hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a processor saved using the [`~BarkProcessor.save_pretrained`]
method, e.g., `./my_model_directory/`.
speaker_embeddings_dict_path (`str`, *optional*, defaults to `"speaker_embeddings_path.json"`):
diff --git a/src/transformers/models/bert/configuration_bert.py b/src/transformers/models/bert/configuration_bert.py
index e0db2c9f1bb..1f79260f510 100644
--- a/src/transformers/models/bert/configuration_bert.py
+++ b/src/transformers/models/bert/configuration_bert.py
@@ -25,29 +25,29 @@ from ...utils import logging
logger = logging.get_logger(__name__)
BERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "bert-base-uncased": "https://huggingface.co/bert-base-uncased/resolve/main/config.json",
- "bert-large-uncased": "https://huggingface.co/bert-large-uncased/resolve/main/config.json",
- "bert-base-cased": "https://huggingface.co/bert-base-cased/resolve/main/config.json",
- "bert-large-cased": "https://huggingface.co/bert-large-cased/resolve/main/config.json",
- "bert-base-multilingual-uncased": "https://huggingface.co/bert-base-multilingual-uncased/resolve/main/config.json",
- "bert-base-multilingual-cased": "https://huggingface.co/bert-base-multilingual-cased/resolve/main/config.json",
- "bert-base-chinese": "https://huggingface.co/bert-base-chinese/resolve/main/config.json",
- "bert-base-german-cased": "https://huggingface.co/bert-base-german-cased/resolve/main/config.json",
- "bert-large-uncased-whole-word-masking": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking/resolve/main/config.json"
+ "google-bert/bert-base-uncased": "https://huggingface.co/google-bert/bert-base-uncased/resolve/main/config.json",
+ "google-bert/bert-large-uncased": "https://huggingface.co/google-bert/bert-large-uncased/resolve/main/config.json",
+ "google-bert/bert-base-cased": "https://huggingface.co/google-bert/bert-base-cased/resolve/main/config.json",
+ "google-bert/bert-large-cased": "https://huggingface.co/google-bert/bert-large-cased/resolve/main/config.json",
+ "google-bert/bert-base-multilingual-uncased": "https://huggingface.co/google-bert/bert-base-multilingual-uncased/resolve/main/config.json",
+ "google-bert/bert-base-multilingual-cased": "https://huggingface.co/google-bert/bert-base-multilingual-cased/resolve/main/config.json",
+ "google-bert/bert-base-chinese": "https://huggingface.co/google-bert/bert-base-chinese/resolve/main/config.json",
+ "google-bert/bert-base-german-cased": "https://huggingface.co/google-bert/bert-base-german-cased/resolve/main/config.json",
+ "google-bert/bert-large-uncased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking/resolve/main/config.json"
),
- "bert-large-cased-whole-word-masking": (
- "https://huggingface.co/bert-large-cased-whole-word-masking/resolve/main/config.json"
+ "google-bert/bert-large-cased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking/resolve/main/config.json"
),
- "bert-large-uncased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/config.json"
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/config.json"
),
- "bert-large-cased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/config.json"
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/config.json"
),
- "bert-base-cased-finetuned-mrpc": "https://huggingface.co/bert-base-cased-finetuned-mrpc/resolve/main/config.json",
- "bert-base-german-dbmdz-cased": "https://huggingface.co/bert-base-german-dbmdz-cased/resolve/main/config.json",
- "bert-base-german-dbmdz-uncased": "https://huggingface.co/bert-base-german-dbmdz-uncased/resolve/main/config.json",
+ "google-bert/bert-base-cased-finetuned-mrpc": "https://huggingface.co/google-bert/bert-base-cased-finetuned-mrpc/resolve/main/config.json",
+ "google-bert/bert-base-german-dbmdz-cased": "https://huggingface.co/google-bert/bert-base-german-dbmdz-cased/resolve/main/config.json",
+ "google-bert/bert-base-german-dbmdz-uncased": "https://huggingface.co/google-bert/bert-base-german-dbmdz-uncased/resolve/main/config.json",
"cl-tohoku/bert-base-japanese": "https://huggingface.co/cl-tohoku/bert-base-japanese/resolve/main/config.json",
"cl-tohoku/bert-base-japanese-whole-word-masking": (
"https://huggingface.co/cl-tohoku/bert-base-japanese-whole-word-masking/resolve/main/config.json"
@@ -74,7 +74,7 @@ class BertConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`BertModel`] or a [`TFBertModel`]. It is used to
instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the BERT
- [bert-base-uncased](https://huggingface.co/bert-base-uncased) architecture.
+ [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
@@ -127,10 +127,10 @@ class BertConfig(PretrainedConfig):
```python
>>> from transformers import BertConfig, BertModel
- >>> # Initializing a BERT bert-base-uncased style configuration
+ >>> # Initializing a BERT google-bert/bert-base-uncased style configuration
>>> configuration = BertConfig()
- >>> # Initializing a model (with random weights) from the bert-base-uncased style configuration
+ >>> # Initializing a model (with random weights) from the google-bert/bert-base-uncased style configuration
>>> model = BertModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/bert/convert_bert_pytorch_checkpoint_to_original_tf.py b/src/transformers/models/bert/convert_bert_pytorch_checkpoint_to_original_tf.py
index 418e1f89051..f7cb149053a 100644
--- a/src/transformers/models/bert/convert_bert_pytorch_checkpoint_to_original_tf.py
+++ b/src/transformers/models/bert/convert_bert_pytorch_checkpoint_to_original_tf.py
@@ -91,7 +91,7 @@ def convert_pytorch_checkpoint_to_tf(model: BertModel, ckpt_dir: str, model_name
def main(raw_args=None):
parser = argparse.ArgumentParser()
- parser.add_argument("--model_name", type=str, required=True, help="model name e.g. bert-base-uncased")
+ parser.add_argument("--model_name", type=str, required=True, help="model name e.g. google-bert/bert-base-uncased")
parser.add_argument(
"--cache_dir", type=str, default=None, required=False, help="Directory containing pytorch model"
)
diff --git a/src/transformers/models/bert/modeling_bert.py b/src/transformers/models/bert/modeling_bert.py
index 3eff1447002..ea5bae4a8bb 100755
--- a/src/transformers/models/bert/modeling_bert.py
+++ b/src/transformers/models/bert/modeling_bert.py
@@ -54,7 +54,7 @@ from .configuration_bert import BertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "bert-base-uncased"
+_CHECKPOINT_FOR_DOC = "google-bert/bert-base-uncased"
_CONFIG_FOR_DOC = "BertConfig"
# TokenClassification docstring
@@ -78,21 +78,21 @@ _SEQ_CLASS_EXPECTED_LOSS = 0.01
BERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "bert-base-uncased",
- "bert-large-uncased",
- "bert-base-cased",
- "bert-large-cased",
- "bert-base-multilingual-uncased",
- "bert-base-multilingual-cased",
- "bert-base-chinese",
- "bert-base-german-cased",
- "bert-large-uncased-whole-word-masking",
- "bert-large-cased-whole-word-masking",
- "bert-large-uncased-whole-word-masking-finetuned-squad",
- "bert-large-cased-whole-word-masking-finetuned-squad",
- "bert-base-cased-finetuned-mrpc",
- "bert-base-german-dbmdz-cased",
- "bert-base-german-dbmdz-uncased",
+ "google-bert/bert-base-uncased",
+ "google-bert/bert-large-uncased",
+ "google-bert/bert-base-cased",
+ "google-bert/bert-large-cased",
+ "google-bert/bert-base-multilingual-uncased",
+ "google-bert/bert-base-multilingual-cased",
+ "google-bert/bert-base-chinese",
+ "google-bert/bert-base-german-cased",
+ "google-bert/bert-large-uncased-whole-word-masking",
+ "google-bert/bert-large-cased-whole-word-masking",
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad",
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad",
+ "google-bert/bert-base-cased-finetuned-mrpc",
+ "google-bert/bert-base-german-dbmdz-cased",
+ "google-bert/bert-base-german-dbmdz-uncased",
"cl-tohoku/bert-base-japanese",
"cl-tohoku/bert-base-japanese-whole-word-masking",
"cl-tohoku/bert-base-japanese-char",
@@ -1105,8 +1105,8 @@ class BertForPreTraining(BertPreTrainedModel):
>>> from transformers import AutoTokenizer, BertForPreTraining
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = BertForPreTraining.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = BertForPreTraining.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
@@ -1459,8 +1459,8 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
>>> from transformers import AutoTokenizer, BertForNextSentencePrediction
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = BertForNextSentencePrediction.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = BertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
diff --git a/src/transformers/models/bert/modeling_flax_bert.py b/src/transformers/models/bert/modeling_flax_bert.py
index b32a618655e..772ea2bf12b 100644
--- a/src/transformers/models/bert/modeling_flax_bert.py
+++ b/src/transformers/models/bert/modeling_flax_bert.py
@@ -52,7 +52,7 @@ from .configuration_bert import BertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "bert-base-uncased"
+_CHECKPOINT_FOR_DOC = "google-bert/bert-base-uncased"
_CONFIG_FOR_DOC = "BertConfig"
remat = nn_partitioning.remat
@@ -1114,8 +1114,8 @@ FLAX_BERT_FOR_PRETRAINING_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxBertForPreTraining
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = FlaxBertForPreTraining.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = FlaxBertForPreTraining.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np")
>>> outputs = model(**inputs)
@@ -1269,8 +1269,8 @@ FLAX_BERT_FOR_NEXT_SENT_PRED_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxBertForNextSentencePrediction
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = FlaxBertForNextSentencePrediction.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = FlaxBertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
diff --git a/src/transformers/models/bert/modeling_tf_bert.py b/src/transformers/models/bert/modeling_tf_bert.py
index 853ec6e6df4..7fe89e43e86 100644
--- a/src/transformers/models/bert/modeling_tf_bert.py
+++ b/src/transformers/models/bert/modeling_tf_bert.py
@@ -67,7 +67,7 @@ from .configuration_bert import BertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "bert-base-uncased"
+_CHECKPOINT_FOR_DOC = "google-bert/bert-base-uncased"
_CONFIG_FOR_DOC = "BertConfig"
# TokenClassification docstring
@@ -90,19 +90,19 @@ _SEQ_CLASS_EXPECTED_OUTPUT = "'LABEL_1'"
_SEQ_CLASS_EXPECTED_LOSS = 0.01
TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "bert-base-uncased",
- "bert-large-uncased",
- "bert-base-cased",
- "bert-large-cased",
- "bert-base-multilingual-uncased",
- "bert-base-multilingual-cased",
- "bert-base-chinese",
- "bert-base-german-cased",
- "bert-large-uncased-whole-word-masking",
- "bert-large-cased-whole-word-masking",
- "bert-large-uncased-whole-word-masking-finetuned-squad",
- "bert-large-cased-whole-word-masking-finetuned-squad",
- "bert-base-cased-finetuned-mrpc",
+ "google-bert/bert-base-uncased",
+ "google-bert/bert-large-uncased",
+ "google-bert/bert-base-cased",
+ "google-bert/bert-large-cased",
+ "google-bert/bert-base-multilingual-uncased",
+ "google-bert/bert-base-multilingual-cased",
+ "google-bert/bert-base-chinese",
+ "google-bert/bert-base-german-cased",
+ "google-bert/bert-large-uncased-whole-word-masking",
+ "google-bert/bert-large-cased-whole-word-masking",
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad",
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad",
+ "google-bert/bert-base-cased-finetuned-mrpc",
"cl-tohoku/bert-base-japanese",
"cl-tohoku/bert-base-japanese-whole-word-masking",
"cl-tohoku/bert-base-japanese-char",
@@ -1327,8 +1327,8 @@ class TFBertForPreTraining(TFBertPreTrainedModel, TFBertPreTrainingLoss):
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFBertForPreTraining
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = TFBertForPreTraining.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = TFBertForPreTraining.from_pretrained("google-bert/bert-base-uncased")
>>> input_ids = tokenizer("Hello, my dog is cute", add_special_tokens=True, return_tensors="tf")
>>> # Batch size 1
@@ -1657,8 +1657,8 @@ class TFBertForNextSentencePrediction(TFBertPreTrainedModel, TFNextSentencePredi
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFBertForNextSentencePrediction
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = TFBertForNextSentencePrediction.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = TFBertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
diff --git a/src/transformers/models/bert/tokenization_bert.py b/src/transformers/models/bert/tokenization_bert.py
index 16044973343..c95e9ff0f8b 100644
--- a/src/transformers/models/bert/tokenization_bert.py
+++ b/src/transformers/models/bert/tokenization_bert.py
@@ -30,34 +30,34 @@ VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "bert-base-uncased": "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt",
- "bert-large-uncased": "https://huggingface.co/bert-large-uncased/resolve/main/vocab.txt",
- "bert-base-cased": "https://huggingface.co/bert-base-cased/resolve/main/vocab.txt",
- "bert-large-cased": "https://huggingface.co/bert-large-cased/resolve/main/vocab.txt",
- "bert-base-multilingual-uncased": (
- "https://huggingface.co/bert-base-multilingual-uncased/resolve/main/vocab.txt"
+ "google-bert/bert-base-uncased": "https://huggingface.co/google-bert/bert-base-uncased/resolve/main/vocab.txt",
+ "google-bert/bert-large-uncased": "https://huggingface.co/google-bert/bert-large-uncased/resolve/main/vocab.txt",
+ "google-bert/bert-base-cased": "https://huggingface.co/google-bert/bert-base-cased/resolve/main/vocab.txt",
+ "google-bert/bert-large-cased": "https://huggingface.co/google-bert/bert-large-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-multilingual-uncased": (
+ "https://huggingface.co/google-bert/bert-base-multilingual-uncased/resolve/main/vocab.txt"
),
- "bert-base-multilingual-cased": "https://huggingface.co/bert-base-multilingual-cased/resolve/main/vocab.txt",
- "bert-base-chinese": "https://huggingface.co/bert-base-chinese/resolve/main/vocab.txt",
- "bert-base-german-cased": "https://huggingface.co/bert-base-german-cased/resolve/main/vocab.txt",
- "bert-large-uncased-whole-word-masking": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking/resolve/main/vocab.txt"
+ "google-bert/bert-base-multilingual-cased": "https://huggingface.co/google-bert/bert-base-multilingual-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-chinese": "https://huggingface.co/google-bert/bert-base-chinese/resolve/main/vocab.txt",
+ "google-bert/bert-base-german-cased": "https://huggingface.co/google-bert/bert-base-german-cased/resolve/main/vocab.txt",
+ "google-bert/bert-large-uncased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking/resolve/main/vocab.txt"
),
- "bert-large-cased-whole-word-masking": (
- "https://huggingface.co/bert-large-cased-whole-word-masking/resolve/main/vocab.txt"
+ "google-bert/bert-large-cased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking/resolve/main/vocab.txt"
),
- "bert-large-uncased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
),
- "bert-large-cased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
),
- "bert-base-cased-finetuned-mrpc": (
- "https://huggingface.co/bert-base-cased-finetuned-mrpc/resolve/main/vocab.txt"
+ "google-bert/bert-base-cased-finetuned-mrpc": (
+ "https://huggingface.co/google-bert/bert-base-cased-finetuned-mrpc/resolve/main/vocab.txt"
),
- "bert-base-german-dbmdz-cased": "https://huggingface.co/bert-base-german-dbmdz-cased/resolve/main/vocab.txt",
- "bert-base-german-dbmdz-uncased": (
- "https://huggingface.co/bert-base-german-dbmdz-uncased/resolve/main/vocab.txt"
+ "google-bert/bert-base-german-dbmdz-cased": "https://huggingface.co/google-bert/bert-base-german-dbmdz-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-german-dbmdz-uncased": (
+ "https://huggingface.co/google-bert/bert-base-german-dbmdz-uncased/resolve/main/vocab.txt"
),
"TurkuNLP/bert-base-finnish-cased-v1": (
"https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1/resolve/main/vocab.txt"
@@ -72,42 +72,42 @@ PRETRAINED_VOCAB_FILES_MAP = {
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "bert-base-uncased": 512,
- "bert-large-uncased": 512,
- "bert-base-cased": 512,
- "bert-large-cased": 512,
- "bert-base-multilingual-uncased": 512,
- "bert-base-multilingual-cased": 512,
- "bert-base-chinese": 512,
- "bert-base-german-cased": 512,
- "bert-large-uncased-whole-word-masking": 512,
- "bert-large-cased-whole-word-masking": 512,
- "bert-large-uncased-whole-word-masking-finetuned-squad": 512,
- "bert-large-cased-whole-word-masking-finetuned-squad": 512,
- "bert-base-cased-finetuned-mrpc": 512,
- "bert-base-german-dbmdz-cased": 512,
- "bert-base-german-dbmdz-uncased": 512,
+ "google-bert/bert-base-uncased": 512,
+ "google-bert/bert-large-uncased": 512,
+ "google-bert/bert-base-cased": 512,
+ "google-bert/bert-large-cased": 512,
+ "google-bert/bert-base-multilingual-uncased": 512,
+ "google-bert/bert-base-multilingual-cased": 512,
+ "google-bert/bert-base-chinese": 512,
+ "google-bert/bert-base-german-cased": 512,
+ "google-bert/bert-large-uncased-whole-word-masking": 512,
+ "google-bert/bert-large-cased-whole-word-masking": 512,
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": 512,
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": 512,
+ "google-bert/bert-base-cased-finetuned-mrpc": 512,
+ "google-bert/bert-base-german-dbmdz-cased": 512,
+ "google-bert/bert-base-german-dbmdz-uncased": 512,
"TurkuNLP/bert-base-finnish-cased-v1": 512,
"TurkuNLP/bert-base-finnish-uncased-v1": 512,
"wietsedv/bert-base-dutch-cased": 512,
}
PRETRAINED_INIT_CONFIGURATION = {
- "bert-base-uncased": {"do_lower_case": True},
- "bert-large-uncased": {"do_lower_case": True},
- "bert-base-cased": {"do_lower_case": False},
- "bert-large-cased": {"do_lower_case": False},
- "bert-base-multilingual-uncased": {"do_lower_case": True},
- "bert-base-multilingual-cased": {"do_lower_case": False},
- "bert-base-chinese": {"do_lower_case": False},
- "bert-base-german-cased": {"do_lower_case": False},
- "bert-large-uncased-whole-word-masking": {"do_lower_case": True},
- "bert-large-cased-whole-word-masking": {"do_lower_case": False},
- "bert-large-uncased-whole-word-masking-finetuned-squad": {"do_lower_case": True},
- "bert-large-cased-whole-word-masking-finetuned-squad": {"do_lower_case": False},
- "bert-base-cased-finetuned-mrpc": {"do_lower_case": False},
- "bert-base-german-dbmdz-cased": {"do_lower_case": False},
- "bert-base-german-dbmdz-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-uncased": {"do_lower_case": True},
+ "google-bert/bert-large-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-cased": {"do_lower_case": False},
+ "google-bert/bert-large-cased": {"do_lower_case": False},
+ "google-bert/bert-base-multilingual-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-multilingual-cased": {"do_lower_case": False},
+ "google-bert/bert-base-chinese": {"do_lower_case": False},
+ "google-bert/bert-base-german-cased": {"do_lower_case": False},
+ "google-bert/bert-large-uncased-whole-word-masking": {"do_lower_case": True},
+ "google-bert/bert-large-cased-whole-word-masking": {"do_lower_case": False},
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": {"do_lower_case": True},
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": {"do_lower_case": False},
+ "google-bert/bert-base-cased-finetuned-mrpc": {"do_lower_case": False},
+ "google-bert/bert-base-german-dbmdz-cased": {"do_lower_case": False},
+ "google-bert/bert-base-german-dbmdz-uncased": {"do_lower_case": True},
"TurkuNLP/bert-base-finnish-cased-v1": {"do_lower_case": False},
"TurkuNLP/bert-base-finnish-uncased-v1": {"do_lower_case": True},
"wietsedv/bert-base-dutch-cased": {"do_lower_case": False},
diff --git a/src/transformers/models/bert/tokenization_bert_fast.py b/src/transformers/models/bert/tokenization_bert_fast.py
index 80d542367dc..e7754b2fb5a 100644
--- a/src/transformers/models/bert/tokenization_bert_fast.py
+++ b/src/transformers/models/bert/tokenization_bert_fast.py
@@ -30,34 +30,34 @@ VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt", "tokenizer_file": "tokenizer.jso
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "bert-base-uncased": "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt",
- "bert-large-uncased": "https://huggingface.co/bert-large-uncased/resolve/main/vocab.txt",
- "bert-base-cased": "https://huggingface.co/bert-base-cased/resolve/main/vocab.txt",
- "bert-large-cased": "https://huggingface.co/bert-large-cased/resolve/main/vocab.txt",
- "bert-base-multilingual-uncased": (
- "https://huggingface.co/bert-base-multilingual-uncased/resolve/main/vocab.txt"
+ "google-bert/bert-base-uncased": "https://huggingface.co/google-bert/bert-base-uncased/resolve/main/vocab.txt",
+ "google-bert/bert-large-uncased": "https://huggingface.co/google-bert/bert-large-uncased/resolve/main/vocab.txt",
+ "google-bert/bert-base-cased": "https://huggingface.co/google-bert/bert-base-cased/resolve/main/vocab.txt",
+ "google-bert/bert-large-cased": "https://huggingface.co/google-bert/bert-large-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-multilingual-uncased": (
+ "https://huggingface.co/google-bert/bert-base-multilingual-uncased/resolve/main/vocab.txt"
),
- "bert-base-multilingual-cased": "https://huggingface.co/bert-base-multilingual-cased/resolve/main/vocab.txt",
- "bert-base-chinese": "https://huggingface.co/bert-base-chinese/resolve/main/vocab.txt",
- "bert-base-german-cased": "https://huggingface.co/bert-base-german-cased/resolve/main/vocab.txt",
- "bert-large-uncased-whole-word-masking": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking/resolve/main/vocab.txt"
+ "google-bert/bert-base-multilingual-cased": "https://huggingface.co/google-bert/bert-base-multilingual-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-chinese": "https://huggingface.co/google-bert/bert-base-chinese/resolve/main/vocab.txt",
+ "google-bert/bert-base-german-cased": "https://huggingface.co/google-bert/bert-base-german-cased/resolve/main/vocab.txt",
+ "google-bert/bert-large-uncased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking/resolve/main/vocab.txt"
),
- "bert-large-cased-whole-word-masking": (
- "https://huggingface.co/bert-large-cased-whole-word-masking/resolve/main/vocab.txt"
+ "google-bert/bert-large-cased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking/resolve/main/vocab.txt"
),
- "bert-large-uncased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
),
- "bert-large-cased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/vocab.txt"
),
- "bert-base-cased-finetuned-mrpc": (
- "https://huggingface.co/bert-base-cased-finetuned-mrpc/resolve/main/vocab.txt"
+ "google-bert/bert-base-cased-finetuned-mrpc": (
+ "https://huggingface.co/google-bert/bert-base-cased-finetuned-mrpc/resolve/main/vocab.txt"
),
- "bert-base-german-dbmdz-cased": "https://huggingface.co/bert-base-german-dbmdz-cased/resolve/main/vocab.txt",
- "bert-base-german-dbmdz-uncased": (
- "https://huggingface.co/bert-base-german-dbmdz-uncased/resolve/main/vocab.txt"
+ "google-bert/bert-base-german-dbmdz-cased": "https://huggingface.co/google-bert/bert-base-german-dbmdz-cased/resolve/main/vocab.txt",
+ "google-bert/bert-base-german-dbmdz-uncased": (
+ "https://huggingface.co/google-bert/bert-base-german-dbmdz-uncased/resolve/main/vocab.txt"
),
"TurkuNLP/bert-base-finnish-cased-v1": (
"https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1/resolve/main/vocab.txt"
@@ -70,38 +70,38 @@ PRETRAINED_VOCAB_FILES_MAP = {
),
},
"tokenizer_file": {
- "bert-base-uncased": "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json",
- "bert-large-uncased": "https://huggingface.co/bert-large-uncased/resolve/main/tokenizer.json",
- "bert-base-cased": "https://huggingface.co/bert-base-cased/resolve/main/tokenizer.json",
- "bert-large-cased": "https://huggingface.co/bert-large-cased/resolve/main/tokenizer.json",
- "bert-base-multilingual-uncased": (
- "https://huggingface.co/bert-base-multilingual-uncased/resolve/main/tokenizer.json"
+ "google-bert/bert-base-uncased": "https://huggingface.co/google-bert/bert-base-uncased/resolve/main/tokenizer.json",
+ "google-bert/bert-large-uncased": "https://huggingface.co/google-bert/bert-large-uncased/resolve/main/tokenizer.json",
+ "google-bert/bert-base-cased": "https://huggingface.co/google-bert/bert-base-cased/resolve/main/tokenizer.json",
+ "google-bert/bert-large-cased": "https://huggingface.co/google-bert/bert-large-cased/resolve/main/tokenizer.json",
+ "google-bert/bert-base-multilingual-uncased": (
+ "https://huggingface.co/google-bert/bert-base-multilingual-uncased/resolve/main/tokenizer.json"
),
- "bert-base-multilingual-cased": (
- "https://huggingface.co/bert-base-multilingual-cased/resolve/main/tokenizer.json"
+ "google-bert/bert-base-multilingual-cased": (
+ "https://huggingface.co/google-bert/bert-base-multilingual-cased/resolve/main/tokenizer.json"
),
- "bert-base-chinese": "https://huggingface.co/bert-base-chinese/resolve/main/tokenizer.json",
- "bert-base-german-cased": "https://huggingface.co/bert-base-german-cased/resolve/main/tokenizer.json",
- "bert-large-uncased-whole-word-masking": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking/resolve/main/tokenizer.json"
+ "google-bert/bert-base-chinese": "https://huggingface.co/google-bert/bert-base-chinese/resolve/main/tokenizer.json",
+ "google-bert/bert-base-german-cased": "https://huggingface.co/google-bert/bert-base-german-cased/resolve/main/tokenizer.json",
+ "google-bert/bert-large-uncased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking/resolve/main/tokenizer.json"
),
- "bert-large-cased-whole-word-masking": (
- "https://huggingface.co/bert-large-cased-whole-word-masking/resolve/main/tokenizer.json"
+ "google-bert/bert-large-cased-whole-word-masking": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking/resolve/main/tokenizer.json"
),
- "bert-large-uncased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/tokenizer.json"
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-uncased-whole-word-masking-finetuned-squad/resolve/main/tokenizer.json"
),
- "bert-large-cased-whole-word-masking-finetuned-squad": (
- "https://huggingface.co/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/tokenizer.json"
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": (
+ "https://huggingface.co/google-bert/bert-large-cased-whole-word-masking-finetuned-squad/resolve/main/tokenizer.json"
),
- "bert-base-cased-finetuned-mrpc": (
- "https://huggingface.co/bert-base-cased-finetuned-mrpc/resolve/main/tokenizer.json"
+ "google-bert/bert-base-cased-finetuned-mrpc": (
+ "https://huggingface.co/google-bert/bert-base-cased-finetuned-mrpc/resolve/main/tokenizer.json"
),
- "bert-base-german-dbmdz-cased": (
- "https://huggingface.co/bert-base-german-dbmdz-cased/resolve/main/tokenizer.json"
+ "google-bert/bert-base-german-dbmdz-cased": (
+ "https://huggingface.co/google-bert/bert-base-german-dbmdz-cased/resolve/main/tokenizer.json"
),
- "bert-base-german-dbmdz-uncased": (
- "https://huggingface.co/bert-base-german-dbmdz-uncased/resolve/main/tokenizer.json"
+ "google-bert/bert-base-german-dbmdz-uncased": (
+ "https://huggingface.co/google-bert/bert-base-german-dbmdz-uncased/resolve/main/tokenizer.json"
),
"TurkuNLP/bert-base-finnish-cased-v1": (
"https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1/resolve/main/tokenizer.json"
@@ -116,42 +116,42 @@ PRETRAINED_VOCAB_FILES_MAP = {
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "bert-base-uncased": 512,
- "bert-large-uncased": 512,
- "bert-base-cased": 512,
- "bert-large-cased": 512,
- "bert-base-multilingual-uncased": 512,
- "bert-base-multilingual-cased": 512,
- "bert-base-chinese": 512,
- "bert-base-german-cased": 512,
- "bert-large-uncased-whole-word-masking": 512,
- "bert-large-cased-whole-word-masking": 512,
- "bert-large-uncased-whole-word-masking-finetuned-squad": 512,
- "bert-large-cased-whole-word-masking-finetuned-squad": 512,
- "bert-base-cased-finetuned-mrpc": 512,
- "bert-base-german-dbmdz-cased": 512,
- "bert-base-german-dbmdz-uncased": 512,
+ "google-bert/bert-base-uncased": 512,
+ "google-bert/bert-large-uncased": 512,
+ "google-bert/bert-base-cased": 512,
+ "google-bert/bert-large-cased": 512,
+ "google-bert/bert-base-multilingual-uncased": 512,
+ "google-bert/bert-base-multilingual-cased": 512,
+ "google-bert/bert-base-chinese": 512,
+ "google-bert/bert-base-german-cased": 512,
+ "google-bert/bert-large-uncased-whole-word-masking": 512,
+ "google-bert/bert-large-cased-whole-word-masking": 512,
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": 512,
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": 512,
+ "google-bert/bert-base-cased-finetuned-mrpc": 512,
+ "google-bert/bert-base-german-dbmdz-cased": 512,
+ "google-bert/bert-base-german-dbmdz-uncased": 512,
"TurkuNLP/bert-base-finnish-cased-v1": 512,
"TurkuNLP/bert-base-finnish-uncased-v1": 512,
"wietsedv/bert-base-dutch-cased": 512,
}
PRETRAINED_INIT_CONFIGURATION = {
- "bert-base-uncased": {"do_lower_case": True},
- "bert-large-uncased": {"do_lower_case": True},
- "bert-base-cased": {"do_lower_case": False},
- "bert-large-cased": {"do_lower_case": False},
- "bert-base-multilingual-uncased": {"do_lower_case": True},
- "bert-base-multilingual-cased": {"do_lower_case": False},
- "bert-base-chinese": {"do_lower_case": False},
- "bert-base-german-cased": {"do_lower_case": False},
- "bert-large-uncased-whole-word-masking": {"do_lower_case": True},
- "bert-large-cased-whole-word-masking": {"do_lower_case": False},
- "bert-large-uncased-whole-word-masking-finetuned-squad": {"do_lower_case": True},
- "bert-large-cased-whole-word-masking-finetuned-squad": {"do_lower_case": False},
- "bert-base-cased-finetuned-mrpc": {"do_lower_case": False},
- "bert-base-german-dbmdz-cased": {"do_lower_case": False},
- "bert-base-german-dbmdz-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-uncased": {"do_lower_case": True},
+ "google-bert/bert-large-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-cased": {"do_lower_case": False},
+ "google-bert/bert-large-cased": {"do_lower_case": False},
+ "google-bert/bert-base-multilingual-uncased": {"do_lower_case": True},
+ "google-bert/bert-base-multilingual-cased": {"do_lower_case": False},
+ "google-bert/bert-base-chinese": {"do_lower_case": False},
+ "google-bert/bert-base-german-cased": {"do_lower_case": False},
+ "google-bert/bert-large-uncased-whole-word-masking": {"do_lower_case": True},
+ "google-bert/bert-large-cased-whole-word-masking": {"do_lower_case": False},
+ "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad": {"do_lower_case": True},
+ "google-bert/bert-large-cased-whole-word-masking-finetuned-squad": {"do_lower_case": False},
+ "google-bert/bert-base-cased-finetuned-mrpc": {"do_lower_case": False},
+ "google-bert/bert-base-german-dbmdz-cased": {"do_lower_case": False},
+ "google-bert/bert-base-german-dbmdz-uncased": {"do_lower_case": True},
"TurkuNLP/bert-base-finnish-cased-v1": {"do_lower_case": False},
"TurkuNLP/bert-base-finnish-uncased-v1": {"do_lower_case": True},
"wietsedv/bert-base-dutch-cased": {"do_lower_case": False},
diff --git a/src/transformers/models/bert/tokenization_bert_tf.py b/src/transformers/models/bert/tokenization_bert_tf.py
index 5f3a02b5478..ebf88eeac9b 100644
--- a/src/transformers/models/bert/tokenization_bert_tf.py
+++ b/src/transformers/models/bert/tokenization_bert_tf.py
@@ -116,7 +116,7 @@ class TFBertTokenizer(keras.layers.Layer):
```python
from transformers import AutoTokenizer, TFBertTokenizer
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
tf_tokenizer = TFBertTokenizer.from_tokenizer(tokenizer)
```
"""
@@ -155,7 +155,7 @@ class TFBertTokenizer(keras.layers.Layer):
```python
from transformers import TFBertTokenizer
- tf_tokenizer = TFBertTokenizer.from_pretrained("bert-base-uncased")
+ tf_tokenizer = TFBertTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
"""
try:
diff --git a/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py b/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py
index 7609b4a40e8..714aaa1e273 100644
--- a/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py
+++ b/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py
@@ -105,7 +105,7 @@ def convert_blip_checkpoint(pytorch_dump_folder_path, config_path=None):
image_size = 384
image = load_demo_image(image_size=image_size, device="cpu")
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
input_ids = tokenizer(["a picture of"]).input_ids
out = hf_model.generate(image, input_ids)
diff --git a/src/transformers/models/camembert/configuration_camembert.py b/src/transformers/models/camembert/configuration_camembert.py
index d712726492a..d904c35ad7b 100644
--- a/src/transformers/models/camembert/configuration_camembert.py
+++ b/src/transformers/models/camembert/configuration_camembert.py
@@ -26,7 +26,7 @@ from ...utils import logging
logger = logging.get_logger(__name__)
CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "camembert-base": "https://huggingface.co/camembert-base/resolve/main/config.json",
+ "almanach/camembert-base": "https://huggingface.co/almanach/camembert-base/resolve/main/config.json",
"umberto-commoncrawl-cased-v1": (
"https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1/resolve/main/config.json"
),
@@ -41,7 +41,7 @@ class CamembertConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`CamembertModel`] or a [`TFCamembertModel`]. It is
used to instantiate a Camembert model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the Camembert
- [camembert-base](https://huggingface.co/camembert-base) architecture.
+ [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
@@ -94,10 +94,10 @@ class CamembertConfig(PretrainedConfig):
```python
>>> from transformers import CamembertConfig, CamembertModel
- >>> # Initializing a Camembert camembert-base style configuration
+ >>> # Initializing a Camembert almanach/camembert-base style configuration
>>> configuration = CamembertConfig()
- >>> # Initializing a model (with random weights) from the camembert-base style configuration
+ >>> # Initializing a model (with random weights) from the almanach/camembert-base style configuration
>>> model = CamembertModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/camembert/modeling_camembert.py b/src/transformers/models/camembert/modeling_camembert.py
index 50fac0efd00..cd0b329b6ae 100644
--- a/src/transformers/models/camembert/modeling_camembert.py
+++ b/src/transformers/models/camembert/modeling_camembert.py
@@ -48,11 +48,11 @@ from .configuration_camembert import CamembertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "camembert-base"
+_CHECKPOINT_FOR_DOC = "almanach/camembert-base"
_CONFIG_FOR_DOC = "CamembertConfig"
CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "camembert-base",
+ "almanach/camembert-base",
"Musixmatch/umberto-commoncrawl-cased-v1",
"Musixmatch/umberto-wikipedia-uncased-v1",
# See all CamemBERT models at https://huggingface.co/models?filter=camembert
@@ -1397,7 +1397,7 @@ class CamembertForQuestionAnswering(CamembertPreTrainedModel):
@add_start_docstrings(
"""CamemBERT Model with a `language modeling` head on top for CLM fine-tuning.""", CAMEMBERT_START_DOCSTRING
)
-# Copied from transformers.models.roberta.modeling_roberta.RobertaForCausalLM with Roberta->Camembert, ROBERTA->CAMEMBERT, roberta-base->camembert-base
+# Copied from transformers.models.roberta.modeling_roberta.RobertaForCausalLM with Roberta->Camembert, ROBERTA->CAMEMBERT, FacebookAI/roberta-base->almanach/camembert-base
class CamembertForCausalLM(CamembertPreTrainedModel):
_tied_weights_keys = ["lm_head.decoder.weight", "lm_head.decoder.bias"]
@@ -1471,10 +1471,10 @@ class CamembertForCausalLM(CamembertPreTrainedModel):
>>> from transformers import AutoTokenizer, CamembertForCausalLM, AutoConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("camembert-base")
- >>> config = AutoConfig.from_pretrained("camembert-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("almanach/camembert-base")
+ >>> config = AutoConfig.from_pretrained("almanach/camembert-base")
>>> config.is_decoder = True
- >>> model = CamembertForCausalLM.from_pretrained("camembert-base", config=config)
+ >>> model = CamembertForCausalLM.from_pretrained("almanach/camembert-base", config=config)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
diff --git a/src/transformers/models/camembert/modeling_tf_camembert.py b/src/transformers/models/camembert/modeling_tf_camembert.py
index c4bb10891db..e3e3fca4cef 100644
--- a/src/transformers/models/camembert/modeling_tf_camembert.py
+++ b/src/transformers/models/camembert/modeling_tf_camembert.py
@@ -62,7 +62,7 @@ from .configuration_camembert import CamembertConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "camembert-base"
+_CHECKPOINT_FOR_DOC = "almanach/camembert-base"
_CONFIG_FOR_DOC = "CamembertConfig"
TF_CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
diff --git a/src/transformers/models/camembert/tokenization_camembert.py b/src/transformers/models/camembert/tokenization_camembert.py
index 40755494901..0949db02fbb 100644
--- a/src/transformers/models/camembert/tokenization_camembert.py
+++ b/src/transformers/models/camembert/tokenization_camembert.py
@@ -31,12 +31,12 @@ VOCAB_FILES_NAMES = {"vocab_file": "sentencepiece.bpe.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "camembert-base": "https://huggingface.co/camembert-base/resolve/main/sentencepiece.bpe.model",
+ "almanach/camembert-base": "https://huggingface.co/almanach/camembert-base/resolve/main/sentencepiece.bpe.model",
}
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "camembert-base": 512,
+ "almanach/camembert-base": 512,
}
SPIECE_UNDERLINE = "▁"
diff --git a/src/transformers/models/camembert/tokenization_camembert_fast.py b/src/transformers/models/camembert/tokenization_camembert_fast.py
index f5720e45f2c..627971eb51d 100644
--- a/src/transformers/models/camembert/tokenization_camembert_fast.py
+++ b/src/transformers/models/camembert/tokenization_camembert_fast.py
@@ -36,15 +36,15 @@ VOCAB_FILES_NAMES = {"vocab_file": "sentencepiece.bpe.model", "tokenizer_file":
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "camembert-base": "https://huggingface.co/camembert-base/resolve/main/sentencepiece.bpe.model",
+ "almanach/camembert-base": "https://huggingface.co/almanach/camembert-base/resolve/main/sentencepiece.bpe.model",
},
"tokenizer_file": {
- "camembert-base": "https://huggingface.co/camembert-base/resolve/main/tokenizer.json",
+ "almanach/camembert-base": "https://huggingface.co/almanach/camembert-base/resolve/main/tokenizer.json",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "camembert-base": 512,
+ "almanach/camembert-base": 512,
}
SPIECE_UNDERLINE = "▁"
diff --git a/src/transformers/models/ctrl/modeling_tf_ctrl.py b/src/transformers/models/ctrl/modeling_tf_ctrl.py
index b0dc90424bd..19a6a84fc75 100644
--- a/src/transformers/models/ctrl/modeling_tf_ctrl.py
+++ b/src/transformers/models/ctrl/modeling_tf_ctrl.py
@@ -45,7 +45,7 @@ _CONFIG_FOR_DOC = "CTRLConfig"
TF_CTRL_PRETRAINED_MODEL_ARCHIVE_LIST = [
"Salesforce/ctrl"
- # See all CTRL models at https://huggingface.co/models?filter=ctrl
+ # See all CTRL models at https://huggingface.co/models?filter=Salesforce/ctrl
]
diff --git a/src/transformers/models/ctrl/tokenization_ctrl.py b/src/transformers/models/ctrl/tokenization_ctrl.py
index f00b5034804..3aac022897d 100644
--- a/src/transformers/models/ctrl/tokenization_ctrl.py
+++ b/src/transformers/models/ctrl/tokenization_ctrl.py
@@ -33,12 +33,12 @@ VOCAB_FILES_NAMES = {
}
PRETRAINED_VOCAB_FILES_MAP = {
- "vocab_file": {"ctrl": "https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json"},
- "merges_file": {"ctrl": "https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt"},
+ "vocab_file": {"Salesforce/ctrl": "https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json"},
+ "merges_file": {"Salesforce/ctrl": "https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt"},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "ctrl": 256,
+ "Salesforce/ctrl": 256,
}
CONTROL_CODES = {
diff --git a/src/transformers/models/deprecated/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py b/src/transformers/models/deprecated/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py
index 4753f593da1..5dc9a244c43 100644
--- a/src/transformers/models/deprecated/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py
+++ b/src/transformers/models/deprecated/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py
@@ -277,7 +277,7 @@ def convert_bort_checkpoint_to_pytorch(bort_checkpoint_path: str, pytorch_dump_f
hf_bort_model.half()
# Compare output of both models
- tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
+ tokenizer = RobertaTokenizer.from_pretrained("FacebookAI/roberta-base")
input_ids = tokenizer.encode_plus(SAMPLE_TEXT)["input_ids"]
diff --git a/src/transformers/models/deprecated/mmbt/modeling_mmbt.py b/src/transformers/models/deprecated/mmbt/modeling_mmbt.py
index db0cef3a650..8dc450ce8f6 100644
--- a/src/transformers/models/deprecated/mmbt/modeling_mmbt.py
+++ b/src/transformers/models/deprecated/mmbt/modeling_mmbt.py
@@ -213,7 +213,7 @@ class MMBTModel(nn.Module, ModuleUtilsMixin):
```python
# For example purposes. Not runnable.
- transformer = BertModel.from_pretrained("bert-base-uncased")
+ transformer = BertModel.from_pretrained("google-bert/bert-base-uncased")
encoder = ImageEncoder(args)
mmbt = MMBTModel(config, transformer, encoder)
```"""
@@ -333,7 +333,7 @@ class MMBTForClassification(nn.Module):
```python
# For example purposes. Not runnable.
- transformer = BertModel.from_pretrained("bert-base-uncased")
+ transformer = BertModel.from_pretrained("google-bert/bert-base-uncased")
encoder = ImageEncoder(args)
model = MMBTForClassification(config, transformer, encoder)
outputs = model(input_modal, input_ids, labels=labels)
diff --git a/src/transformers/models/deprecated/transfo_xl/configuration_transfo_xl.py b/src/transformers/models/deprecated/transfo_xl/configuration_transfo_xl.py
index 842c1643a00..f7d5f2f87fb 100644
--- a/src/transformers/models/deprecated/transfo_xl/configuration_transfo_xl.py
+++ b/src/transformers/models/deprecated/transfo_xl/configuration_transfo_xl.py
@@ -22,7 +22,7 @@ from ....utils import logging
logger = logging.get_logger(__name__)
TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "transfo-xl-wt103": "https://huggingface.co/transfo-xl-wt103/resolve/main/config.json",
+ "transfo-xl/transfo-xl-wt103": "https://huggingface.co/transfo-xl/transfo-xl-wt103/resolve/main/config.json",
}
@@ -31,7 +31,7 @@ class TransfoXLConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`TransfoXLModel`] or a [`TFTransfoXLModel`]. It is
used to instantiate a Transformer-XL model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the TransfoXL
- [transfo-xl-wt103](https://huggingface.co/transfo-xl-wt103) architecture.
+ [transfo-xl/transfo-xl-wt103](https://huggingface.co/transfo-xl/transfo-xl-wt103) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/deprecated/transfo_xl/modeling_tf_transfo_xl.py b/src/transformers/models/deprecated/transfo_xl/modeling_tf_transfo_xl.py
index c99d8346701..ab2725df0c4 100644
--- a/src/transformers/models/deprecated/transfo_xl/modeling_tf_transfo_xl.py
+++ b/src/transformers/models/deprecated/transfo_xl/modeling_tf_transfo_xl.py
@@ -48,11 +48,11 @@ from .modeling_tf_transfo_xl_utilities import TFAdaptiveSoftmaxMask
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "transfo-xl-wt103"
+_CHECKPOINT_FOR_DOC = "transfo-xl/transfo-xl-wt103"
_CONFIG_FOR_DOC = "TransfoXLConfig"
TF_TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "transfo-xl-wt103",
+ "transfo-xl/transfo-xl-wt103",
# See all Transformer XL models at https://huggingface.co/models?filter=transfo-xl
]
diff --git a/src/transformers/models/deprecated/transfo_xl/modeling_transfo_xl.py b/src/transformers/models/deprecated/transfo_xl/modeling_transfo_xl.py
index 2fa251399b1..1b8f222f508 100644
--- a/src/transformers/models/deprecated/transfo_xl/modeling_transfo_xl.py
+++ b/src/transformers/models/deprecated/transfo_xl/modeling_transfo_xl.py
@@ -39,11 +39,11 @@ from .modeling_transfo_xl_utilities import ProjectedAdaptiveLogSoftmax
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "transfo-xl-wt103"
+_CHECKPOINT_FOR_DOC = "transfo-xl/transfo-xl-wt103"
_CONFIG_FOR_DOC = "TransfoXLConfig"
TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "transfo-xl-wt103",
+ "transfo-xl/transfo-xl-wt103",
# See all Transformer XL models at https://huggingface.co/models?filter=transfo-xl
]
diff --git a/src/transformers/models/deprecated/transfo_xl/tokenization_transfo_xl.py b/src/transformers/models/deprecated/transfo_xl/tokenization_transfo_xl.py
index cea74e76bc1..12d360076fb 100644
--- a/src/transformers/models/deprecated/transfo_xl/tokenization_transfo_xl.py
+++ b/src/transformers/models/deprecated/transfo_xl/tokenization_transfo_xl.py
@@ -57,16 +57,16 @@ VOCAB_FILES_NAMES = {
PRETRAINED_VOCAB_FILES_MAP = {
"pretrained_vocab_file": {
- "transfo-xl-wt103": "https://huggingface.co/transfo-xl-wt103/resolve/main/vocab.pkl",
+ "transfo-xl/transfo-xl-wt103": "https://huggingface.co/transfo-xl/transfo-xl-wt103/resolve/main/vocab.pkl",
}
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "transfo-xl-wt103": None,
+ "transfo-xl/transfo-xl-wt103": None,
}
PRETRAINED_CORPUS_ARCHIVE_MAP = {
- "transfo-xl-wt103": "https://huggingface.co/transfo-xl-wt103/resolve/main/corpus.bin",
+ "transfo-xl/transfo-xl-wt103": "https://huggingface.co/transfo-xl/transfo-xl-wt103/resolve/main/corpus.bin",
}
CORPUS_NAME = "corpus.bin"
@@ -451,7 +451,7 @@ class TransfoXLTokenizer(PreTrainedTokenizer):
Example:
```python
- >>> tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")
+ >>> tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl/transfo-xl-wt103")
>>> tokenizer.moses_pipeline("23,000 people are 1.80 m tall")
['23', '@,@', '000', 'people', 'are', '1', '@.@', '80', 'm', 'tall']
```"""
diff --git a/src/transformers/models/dpr/convert_dpr_original_checkpoint_to_pytorch.py b/src/transformers/models/dpr/convert_dpr_original_checkpoint_to_pytorch.py
index b4965857b55..c11345d1eb4 100644
--- a/src/transformers/models/dpr/convert_dpr_original_checkpoint_to_pytorch.py
+++ b/src/transformers/models/dpr/convert_dpr_original_checkpoint_to_pytorch.py
@@ -54,7 +54,7 @@ class DPRState:
class DPRContextEncoderState(DPRState):
def load_dpr_model(self):
- model = DPRContextEncoder(DPRConfig(**BertConfig.get_config_dict("bert-base-uncased")[0]))
+ model = DPRContextEncoder(DPRConfig(**BertConfig.get_config_dict("google-bert/bert-base-uncased")[0]))
print(f"Loading DPR biencoder from {self.src_file}")
saved_state = load_states_from_checkpoint(self.src_file)
encoder, prefix = model.ctx_encoder, "ctx_model."
@@ -72,7 +72,7 @@ class DPRContextEncoderState(DPRState):
class DPRQuestionEncoderState(DPRState):
def load_dpr_model(self):
- model = DPRQuestionEncoder(DPRConfig(**BertConfig.get_config_dict("bert-base-uncased")[0]))
+ model = DPRQuestionEncoder(DPRConfig(**BertConfig.get_config_dict("google-bert/bert-base-uncased")[0]))
print(f"Loading DPR biencoder from {self.src_file}")
saved_state = load_states_from_checkpoint(self.src_file)
encoder, prefix = model.question_encoder, "question_model."
@@ -90,7 +90,7 @@ class DPRQuestionEncoderState(DPRState):
class DPRReaderState(DPRState):
def load_dpr_model(self):
- model = DPRReader(DPRConfig(**BertConfig.get_config_dict("bert-base-uncased")[0]))
+ model = DPRReader(DPRConfig(**BertConfig.get_config_dict("google-bert/bert-base-uncased")[0]))
print(f"Loading DPR reader from {self.src_file}")
saved_state = load_states_from_checkpoint(self.src_file)
# Fix changes from https://github.com/huggingface/transformers/commit/614fef1691edb806de976756d4948ecbcd0c0ca3
diff --git a/src/transformers/models/encoder_decoder/configuration_encoder_decoder.py b/src/transformers/models/encoder_decoder/configuration_encoder_decoder.py
index 9f373ea4544..8c0ae2771e8 100644
--- a/src/transformers/models/encoder_decoder/configuration_encoder_decoder.py
+++ b/src/transformers/models/encoder_decoder/configuration_encoder_decoder.py
@@ -45,13 +45,13 @@ class EncoderDecoderConfig(PretrainedConfig):
```python
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
- >>> # Initializing a BERT bert-base-uncased style configuration
+ >>> # Initializing a BERT google-bert/bert-base-uncased style configuration
>>> config_encoder = BertConfig()
>>> config_decoder = BertConfig()
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
- >>> # Initializing a Bert2Bert model (with random weights) from the bert-base-uncased style configurations
+ >>> # Initializing a Bert2Bert model (with random weights) from the google-bert/bert-base-uncased style configurations
>>> model = EncoderDecoderModel(config=config)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py b/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
index 12959f8f200..1a6adcee1f8 100644
--- a/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
+++ b/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
@@ -403,8 +403,6 @@ class EncoderDecoderModel(PreTrainedModel):
Information necessary to initiate the encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -416,8 +414,6 @@ class EncoderDecoderModel(PreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -444,7 +440,7 @@ class EncoderDecoderModel(PreTrainedModel):
>>> from transformers import EncoderDecoderModel
>>> # initialize a bert2bert from two pretrained BERT models. Note that the cross-attention layers will be randomly initialized
- >>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
+ >>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "google-bert/bert-base-uncased")
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert2bert")
>>> # load fine-tuned model
@@ -560,9 +556,9 @@ class EncoderDecoderModel(PreTrainedModel):
>>> from transformers import EncoderDecoderModel, BertTokenizer
>>> import torch
- >>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ >>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "bert-base-uncased", "bert-base-uncased"
+ ... "google-bert/bert-base-uncased", "google-bert/bert-base-uncased"
... ) # initialize Bert2Bert from pre-trained checkpoints
>>> # training
diff --git a/src/transformers/models/encoder_decoder/modeling_flax_encoder_decoder.py b/src/transformers/models/encoder_decoder/modeling_flax_encoder_decoder.py
index 93cac0b3f65..beecd080328 100644
--- a/src/transformers/models/encoder_decoder/modeling_flax_encoder_decoder.py
+++ b/src/transformers/models/encoder_decoder/modeling_flax_encoder_decoder.py
@@ -449,9 +449,9 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
>>> from transformers import FlaxEncoderDecoderModel, BertTokenizer
>>> # initialize a bert2gpt2 from pretrained BERT and GPT2 models. Note that the cross-attention layers will be randomly initialized
- >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-cased", "openai-community/gpt2")
- >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ >>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> text = "My friends are cool but they eat too many carbs."
>>> input_ids = tokenizer.encode(text, return_tensors="np")
@@ -527,9 +527,9 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
>>> import jax.numpy as jnp
>>> # initialize a bert2gpt2 from pretrained BERT and GPT2 models. Note that the cross-attention layers will be randomly initialized
- >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-cased", "openai-community/gpt2")
- >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ >>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> text = "My friends are cool but they eat too many carbs."
>>> input_ids = tokenizer.encode(text, max_length=1024, return_tensors="np")
@@ -653,8 +653,8 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
>>> # load a fine-tuned bert2gpt2 model
>>> model = FlaxEncoderDecoderModel.from_pretrained("patrickvonplaten/bert2gpt2-cnn_dailymail-fp16")
>>> # load input & output tokenizer
- >>> tokenizer_input = BertTokenizer.from_pretrained("bert-base-cased")
- >>> tokenizer_output = GPT2Tokenizer.from_pretrained("gpt2")
+ >>> tokenizer_input = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
+ >>> tokenizer_output = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
>>> article = '''Sigma Alpha Epsilon is under fire for a video showing party-bound fraternity members
>>> singing a racist chant. SAE's national chapter suspended the students,
@@ -774,8 +774,6 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
Information necessary to initiate the encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -783,8 +781,6 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -807,7 +803,7 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
>>> from transformers import FlaxEncoderDecoderModel
>>> # initialize a bert2gpt2 from pretrained BERT and GPT2 models. Note that the cross-attention layers will be randomly initialized
- >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ >>> model = FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-cased", "openai-community/gpt2")
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert2gpt2")
>>> # load fine-tuned model
diff --git a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py
index b4b2503bd00..855fb767d13 100644
--- a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py
+++ b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py
@@ -327,8 +327,6 @@ class TFEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLoss):
Information necessary to initiate the encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *pytorch index checkpoint file* (e.g, `./pt_model/`). In this case,
@@ -338,8 +336,6 @@ class TFEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLoss):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *pytorch checkpoint file* (e.g, `./pt_model/`). In this case,
@@ -364,7 +360,7 @@ class TFEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLoss):
>>> from transformers import TFEncoderDecoderModel
>>> # initialize a bert2gpt2 from two pretrained BERT models. Note that the cross-attention layers will be randomly initialized
- >>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "gpt2")
+ >>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "openai-community/gpt2")
>>> # saving model after fine-tuning
>>> model.save_pretrained("./bert2gpt2")
>>> # load fine-tuned model
@@ -486,9 +482,9 @@ class TFEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLoss):
>>> from transformers import TFEncoderDecoderModel, BertTokenizer
>>> # initialize a bert2gpt2 from a pretrained BERT and GPT2 models. Note that the cross-attention layers will be randomly initialized
- >>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ >>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-cased", "openai-community/gpt2")
- >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ >>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> # forward
>>> input_ids = tokenizer.encode(
diff --git a/src/transformers/models/flaubert/modeling_flaubert.py b/src/transformers/models/flaubert/modeling_flaubert.py
index 318e9bfd471..4786fc6d578 100644
--- a/src/transformers/models/flaubert/modeling_flaubert.py
+++ b/src/transformers/models/flaubert/modeling_flaubert.py
@@ -1143,8 +1143,8 @@ class FlaubertForQuestionAnswering(FlaubertPreTrainedModel):
>>> from transformers import XLMTokenizer, XLMForQuestionAnswering
>>> import torch
- >>> tokenizer = XLMTokenizer.from_pretrained("xlm-mlm-en-2048")
- >>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
+ >>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
+ >>> model = XLMForQuestionAnswering.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
... 0
diff --git a/src/transformers/models/git/convert_git_to_pytorch.py b/src/transformers/models/git/convert_git_to_pytorch.py
index 5dde4da15e5..4e3e8e7b317 100644
--- a/src/transformers/models/git/convert_git_to_pytorch.py
+++ b/src/transformers/models/git/convert_git_to_pytorch.py
@@ -311,7 +311,9 @@ def convert_git_checkpoint(model_name, pytorch_dump_folder_path, push_to_hub=Fal
size={"shortest_edge": image_size}, crop_size={"height": image_size, "width": image_size}
)
)
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", model_input_names=["input_ids", "attention_mask"])
+ tokenizer = AutoTokenizer.from_pretrained(
+ "google-bert/bert-base-uncased", model_input_names=["input_ids", "attention_mask"]
+ )
processor = GitProcessor(tokenizer=tokenizer, image_processor=image_processor)
if is_video:
diff --git a/src/transformers/models/gpt2/configuration_gpt2.py b/src/transformers/models/gpt2/configuration_gpt2.py
index d35a1614288..395e2b4873f 100644
--- a/src/transformers/models/gpt2/configuration_gpt2.py
+++ b/src/transformers/models/gpt2/configuration_gpt2.py
@@ -26,11 +26,11 @@ from ...utils import logging
logger = logging.get_logger(__name__)
GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/config.json",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/config.json",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/config.json",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/config.json",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/config.json",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/config.json",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/config.json",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/config.json",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/config.json",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/config.json",
}
@@ -39,7 +39,7 @@ class GPT2Config(PretrainedConfig):
This is the configuration class to store the configuration of a [`GPT2Model`] or a [`TFGPT2Model`]. It is used to
instantiate a GPT-2 model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the GPT-2
- [gpt2](https://huggingface.co/gpt2) architecture.
+ [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/gpt2/modeling_flax_gpt2.py b/src/transformers/models/gpt2/modeling_flax_gpt2.py
index 50cfb5e1122..c3ef377642a 100644
--- a/src/transformers/models/gpt2/modeling_flax_gpt2.py
+++ b/src/transformers/models/gpt2/modeling_flax_gpt2.py
@@ -35,7 +35,7 @@ from .configuration_gpt2 import GPT2Config
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "gpt2"
+_CHECKPOINT_FOR_DOC = "openai-community/gpt2"
_CONFIG_FOR_DOC = "GPT2Config"
diff --git a/src/transformers/models/gpt2/modeling_gpt2.py b/src/transformers/models/gpt2/modeling_gpt2.py
index 25c92dd2dd5..e1b357cefb6 100644
--- a/src/transformers/models/gpt2/modeling_gpt2.py
+++ b/src/transformers/models/gpt2/modeling_gpt2.py
@@ -51,15 +51,15 @@ from .configuration_gpt2 import GPT2Config
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "gpt2"
+_CHECKPOINT_FOR_DOC = "openai-community/gpt2"
_CONFIG_FOR_DOC = "GPT2Config"
GPT2_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "gpt2",
- "gpt2-medium",
- "gpt2-large",
- "gpt2-xl",
- "distilgpt2",
+ "openai-community/gpt2",
+ "openai-community/gpt2-medium",
+ "openai-community/gpt2-large",
+ "openai-community/gpt2-xl",
+ "distilbert/distilgpt2",
# See all GPT-2 models at https://huggingface.co/models?filter=gpt2
]
@@ -619,16 +619,16 @@ PARALLELIZE_DOCSTRING = r"""
have fewer attention modules mapped to it than other devices. For reference, the gpt2 models have the
following number of attention modules:
- - gpt2: 12
- - gpt2-medium: 24
- - gpt2-large: 36
- - gpt2-xl: 48
+ - openai-community/gpt2: 12
+ - openai-community/gpt2-medium: 24
+ - openai-community/gpt2-large: 36
+ - openai-community/gpt2-xl: 48
Example:
```python
# Here is an example of a device map on a machine with 4 GPUs using gpt2-xl, which has a total of 48 attention modules:
- model = GPT2LMHeadModel.from_pretrained("gpt2-xl")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2-xl")
device_map = {
0: [0, 1, 2, 3, 4, 5, 6, 7, 8],
1: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
@@ -644,8 +644,8 @@ DEPARALLELIZE_DOCSTRING = r"""
Example:
```python
- # On a 4 GPU machine with gpt2-large:
- model = GPT2LMHeadModel.from_pretrained("gpt2-large")
+ # On a 4 GPU machine with openai-community/gpt2-large:
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2-large")
device_map = {
0: [0, 1, 2, 3, 4, 5, 6, 7],
1: [8, 9, 10, 11, 12, 13, 14, 15],
@@ -1277,8 +1277,8 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
>>> import torch
>>> from transformers import AutoTokenizer, GPT2DoubleHeadsModel
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = GPT2DoubleHeadsModel.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = GPT2DoubleHeadsModel.from_pretrained("openai-community/gpt2")
>>> # Add a [CLS] to the vocabulary (we should train it also!)
>>> num_added_tokens = tokenizer.add_special_tokens({"cls_token": "[CLS]"})
diff --git a/src/transformers/models/gpt2/modeling_tf_gpt2.py b/src/transformers/models/gpt2/modeling_tf_gpt2.py
index fd40df97ddc..2c17593e268 100644
--- a/src/transformers/models/gpt2/modeling_tf_gpt2.py
+++ b/src/transformers/models/gpt2/modeling_tf_gpt2.py
@@ -55,16 +55,16 @@ from .configuration_gpt2 import GPT2Config
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "gpt2"
+_CHECKPOINT_FOR_DOC = "openai-community/gpt2"
_CONFIG_FOR_DOC = "GPT2Config"
TF_GPT2_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "gpt2",
- "gpt2-medium",
- "gpt2-large",
- "gpt2-xl",
- "distilgpt2",
- # See all GPT-2 models at https://huggingface.co/models?filter=gpt2
+ "openai-community/gpt2",
+ "openai-community/gpt2-medium",
+ "openai-community/gpt2-large",
+ "openai-community/gpt2-xl",
+ "distilbert/distilgpt2",
+ # See all GPT-2 models at https://huggingface.co/models?filter=openai-community/gpt2
]
@@ -1026,8 +1026,8 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFGPT2DoubleHeadsModel
- >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- >>> model = TFGPT2DoubleHeadsModel.from_pretrained("gpt2")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ >>> model = TFGPT2DoubleHeadsModel.from_pretrained("openai-community/gpt2")
>>> # Add a [CLS] to the vocabulary (we should train it also!)
>>> num_added_tokens = tokenizer.add_special_tokens({"cls_token": "[CLS]"})
diff --git a/src/transformers/models/gpt2/tokenization_gpt2.py b/src/transformers/models/gpt2/tokenization_gpt2.py
index a7b576e92de..801e997344a 100644
--- a/src/transformers/models/gpt2/tokenization_gpt2.py
+++ b/src/transformers/models/gpt2/tokenization_gpt2.py
@@ -35,27 +35,27 @@ VOCAB_FILES_NAMES = {
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/vocab.json",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/vocab.json",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/vocab.json",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/vocab.json",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/vocab.json",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/vocab.json",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/vocab.json",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/vocab.json",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/vocab.json",
},
"merges_file": {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/merges.txt",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/merges.txt",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/merges.txt",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/merges.txt",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/merges.txt",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/merges.txt",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/merges.txt",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/merges.txt",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/merges.txt",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/merges.txt",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "gpt2": 1024,
- "gpt2-medium": 1024,
- "gpt2-large": 1024,
- "gpt2-xl": 1024,
- "distilgpt2": 1024,
+ "openai-community/gpt2": 1024,
+ "openai-community/gpt2-medium": 1024,
+ "openai-community/gpt2-large": 1024,
+ "openai-community/gpt2-xl": 1024,
+ "distilbert/distilgpt2": 1024,
}
@@ -108,7 +108,7 @@ class GPT2Tokenizer(PreTrainedTokenizer):
```python
>>> from transformers import GPT2Tokenizer
- >>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ >>> tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
>>> tokenizer("Hello world")["input_ids"]
[15496, 995]
diff --git a/src/transformers/models/gpt2/tokenization_gpt2_fast.py b/src/transformers/models/gpt2/tokenization_gpt2_fast.py
index a5dcade90a0..c4e49d23d14 100644
--- a/src/transformers/models/gpt2/tokenization_gpt2_fast.py
+++ b/src/transformers/models/gpt2/tokenization_gpt2_fast.py
@@ -32,34 +32,34 @@ VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "t
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/vocab.json",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/vocab.json",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/vocab.json",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/vocab.json",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/vocab.json",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/vocab.json",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/vocab.json",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/vocab.json",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/vocab.json",
},
"merges_file": {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/merges.txt",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/merges.txt",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/merges.txt",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/merges.txt",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/merges.txt",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/merges.txt",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/merges.txt",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/merges.txt",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/merges.txt",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/merges.txt",
},
"tokenizer_file": {
- "gpt2": "https://huggingface.co/gpt2/resolve/main/tokenizer.json",
- "gpt2-medium": "https://huggingface.co/gpt2-medium/resolve/main/tokenizer.json",
- "gpt2-large": "https://huggingface.co/gpt2-large/resolve/main/tokenizer.json",
- "gpt2-xl": "https://huggingface.co/gpt2-xl/resolve/main/tokenizer.json",
- "distilgpt2": "https://huggingface.co/distilgpt2/resolve/main/tokenizer.json",
+ "openai-community/gpt2": "https://huggingface.co/openai-community/gpt2/resolve/main/tokenizer.json",
+ "openai-community/gpt2-medium": "https://huggingface.co/openai-community/gpt2-medium/resolve/main/tokenizer.json",
+ "openai-community/gpt2-large": "https://huggingface.co/openai-community/gpt2-large/resolve/main/tokenizer.json",
+ "openai-community/gpt2-xl": "https://huggingface.co/openai-community/gpt2-xl/resolve/main/tokenizer.json",
+ "distilbert/distilgpt2": "https://huggingface.co/distilbert/distilgpt2/resolve/main/tokenizer.json",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "gpt2": 1024,
- "gpt2-medium": 1024,
- "gpt2-large": 1024,
- "gpt2-xl": 1024,
- "distilgpt2": 1024,
+ "openai-community/gpt2": 1024,
+ "openai-community/gpt2-medium": 1024,
+ "openai-community/gpt2-large": 1024,
+ "openai-community/gpt2-xl": 1024,
+ "distilbert/distilgpt2": 1024,
}
@@ -74,7 +74,7 @@ class GPT2TokenizerFast(PreTrainedTokenizerFast):
```python
>>> from transformers import GPT2TokenizerFast
- >>> tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
+ >>> tokenizer = GPT2TokenizerFast.from_pretrained("openai-community/gpt2")
>>> tokenizer("Hello world")["input_ids"]
[15496, 995]
diff --git a/src/transformers/models/gpt2/tokenization_gpt2_tf.py b/src/transformers/models/gpt2/tokenization_gpt2_tf.py
index 41f0874919a..d763eb84855 100644
--- a/src/transformers/models/gpt2/tokenization_gpt2_tf.py
+++ b/src/transformers/models/gpt2/tokenization_gpt2_tf.py
@@ -45,7 +45,7 @@ class TFGPT2Tokenizer(keras.layers.Layer):
```python
from transformers import AutoTokenizer, TFGPT2Tokenizer
- tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
tf_tokenizer = TFGPT2Tokenizer.from_tokenizer(tokenizer)
```
"""
@@ -65,7 +65,7 @@ class TFGPT2Tokenizer(keras.layers.Layer):
```python
from transformers import TFGPT2Tokenizer
- tf_tokenizer = TFGPT2Tokenizer.from_pretrained("gpt2")
+ tf_tokenizer = TFGPT2Tokenizer.from_pretrained("openai-community/gpt2")
```
"""
tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model_name_or_path, *init_inputs, **kwargs)
diff --git a/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py b/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py
index 31f8a7708ad..16ed6b1e753 100644
--- a/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py
+++ b/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py
@@ -48,7 +48,7 @@ class GPTNeoXTokenizerFast(PreTrainedTokenizerFast):
```python
>>> from transformers import GPTNeoXTokenizerFast
- >>> tokenizer = GPTNeoXTokenizerFast.from_pretrained("gpt2")
+ >>> tokenizer = GPTNeoXTokenizerFast.from_pretrained("openai-community/gpt2")
>>> tokenizer("Hello world")["input_ids"]
[15496, 995]
diff --git a/src/transformers/models/instructblip/convert_instructblip_original_to_pytorch.py b/src/transformers/models/instructblip/convert_instructblip_original_to_pytorch.py
index 87e8b90d6cc..f8b9c86cfdd 100644
--- a/src/transformers/models/instructblip/convert_instructblip_original_to_pytorch.py
+++ b/src/transformers/models/instructblip/convert_instructblip_original_to_pytorch.py
@@ -132,7 +132,7 @@ def convert_blip2_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
"""
Copy/paste/tweak model's weights to Transformers design.
"""
- qformer_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", truncation_side="left")
+ qformer_tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased", truncation_side="left")
qformer_tokenizer.add_special_tokens({"bos_token": "[DEC]"})
if "t5" in model_name:
diff --git a/src/transformers/models/llama/tokenization_llama.py b/src/transformers/models/llama/tokenization_llama.py
index a7c2155b0da..7a5db51987d 100644
--- a/src/transformers/models/llama/tokenization_llama.py
+++ b/src/transformers/models/llama/tokenization_llama.py
@@ -117,7 +117,7 @@ class LlamaTokenizer(PreTrainedTokenizer):
```python
>>> from transformers import T5Tokenizer
- >>> tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=True)
+ >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=True)
>>> tokenizer.encode("Hello .")
[8774, 32099, 3, 5, 1]
```
@@ -125,7 +125,7 @@ class LlamaTokenizer(PreTrainedTokenizer):
```python
>>> from transformers import T5Tokenizer
- >>> tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=False)
+ >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=False)
>>> tokenizer.encode("Hello .") # the extra space `[3]` is no longer here
[8774, 32099, 5, 1]
```
diff --git a/src/transformers/models/longformer/tokenization_longformer.py b/src/transformers/models/longformer/tokenization_longformer.py
index 4f76f16d518..cf0477bac10 100644
--- a/src/transformers/models/longformer/tokenization_longformer.py
+++ b/src/transformers/models/longformer/tokenization_longformer.py
@@ -112,7 +112,7 @@ def get_pairs(word):
return pairs
-# Copied from transformers.models.roberta.tokenization_roberta.RobertaTokenizer with roberta-base->allenai/longformer-base-4096, RoBERTa->Longformer all-casing, RobertaTokenizer->LongformerTokenizer
+# Copied from transformers.models.roberta.tokenization_roberta.RobertaTokenizer with FacebookAI/roberta-base->allenai/longformer-base-4096, RoBERTa->Longformer all-casing, RobertaTokenizer->LongformerTokenizer
class LongformerTokenizer(PreTrainedTokenizer):
"""
Constructs a Longformer tokenizer, derived from the GPT-2 tokenizer, using byte-level Byte-Pair-Encoding.
diff --git a/src/transformers/models/longformer/tokenization_longformer_fast.py b/src/transformers/models/longformer/tokenization_longformer_fast.py
index fb35a8b67bb..e40ebff3b65 100644
--- a/src/transformers/models/longformer/tokenization_longformer_fast.py
+++ b/src/transformers/models/longformer/tokenization_longformer_fast.py
@@ -87,7 +87,7 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
}
-# Copied from transformers.models.roberta.tokenization_roberta_fast.RobertaTokenizerFast with roberta-base->allenai/longformer-base-4096, RoBERTa->Longformer all-casing, Roberta->Longformer
+# Copied from transformers.models.roberta.tokenization_roberta_fast.RobertaTokenizerFast with FacebookAI/roberta-base->allenai/longformer-base-4096, RoBERTa->Longformer all-casing, Roberta->Longformer
class LongformerTokenizerFast(PreTrainedTokenizerFast):
"""
Construct a "fast" Longformer tokenizer (backed by HuggingFace's *tokenizers* library), derived from the GPT-2
diff --git a/src/transformers/models/longt5/modeling_flax_longt5.py b/src/transformers/models/longt5/modeling_flax_longt5.py
index 36e273d5725..d47f644ba37 100644
--- a/src/transformers/models/longt5/modeling_flax_longt5.py
+++ b/src/transformers/models/longt5/modeling_flax_longt5.py
@@ -1828,7 +1828,7 @@ class FlaxLongT5PreTrainedModel(FlaxPreTrainedModel):
```python
>>> from transformers import AutoTokenizer, FlaxLongT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
>>> model = FlaxLongT5ForConditionalGeneration.from_pretrained("google/long-t5-local-base")
>>> text = "My friends are cool but they eat too many carbs."
@@ -1890,7 +1890,7 @@ class FlaxLongT5PreTrainedModel(FlaxPreTrainedModel):
>>> from transformers import AutoTokenizer, FlaxLongT5ForConditionalGeneration
>>> import jax.numpy as jnp
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
>>> model = FlaxLongT5ForConditionalGeneration.from_pretrained("google/long-t5-local-base")
>>> text = "My friends are cool but they eat too many carbs."
@@ -2119,7 +2119,7 @@ FLAX_LONGT5_MODEL_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxLongT5Model
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
>>> model = FlaxLongT5Model.from_pretrained("google/long-t5-local-base")
>>> input_ids = tokenizer(
@@ -2278,7 +2278,7 @@ class FlaxLongT5ForConditionalGeneration(FlaxLongT5PreTrainedModel):
>>> from transformers import AutoTokenizer, FlaxLongT5ForConditionalGeneration
>>> import jax.numpy as jnp
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
>>> model = FlaxLongT5ForConditionalGeneration.from_pretrained("google/long-t5-local-base")
>>> text = "summarize: My friends are cool but they eat too many carbs."
@@ -2426,7 +2426,7 @@ FLAX_LONGT5_CONDITIONAL_GENERATION_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxLongT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
>>> model = FlaxLongT5ForConditionalGeneration.from_pretrained("google/long-t5-local-base")
>>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs."
diff --git a/src/transformers/models/megatron_bert/configuration_megatron_bert.py b/src/transformers/models/megatron_bert/configuration_megatron_bert.py
index 874aaa331d7..02cdf289432 100644
--- a/src/transformers/models/megatron_bert/configuration_megatron_bert.py
+++ b/src/transformers/models/megatron_bert/configuration_megatron_bert.py
@@ -81,10 +81,10 @@ class MegatronBertConfig(PretrainedConfig):
```python
>>> from transformers import MegatronBertConfig, MegatronBertModel
- >>> # Initializing a MEGATRON_BERT bert-base-uncased style configuration
+ >>> # Initializing a MEGATRON_BERT google-bert/bert-base-uncased style configuration
>>> configuration = MegatronBertConfig()
- >>> # Initializing a model (with random weights) from the bert-base-uncased style configuration
+ >>> # Initializing a model (with random weights) from the google-bert/bert-base-uncased style configuration
>>> model = MegatronBertModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py b/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py
index b535e599ad6..15ccfb4dcb1 100644
--- a/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py
+++ b/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py
@@ -550,7 +550,7 @@ def convert_checkpoint_from_megatron_to_transformers(args):
# see https://github.com/huggingface/transformers/issues/13906)
if args.tokenizer_name is None:
- tokenizer_name = "gpt2"
+ tokenizer_name = "openai-community/gpt2"
else:
tokenizer_name = args.tokenizer_name
diff --git a/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py b/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
index 88d54f10e26..38060f8af5c 100644
--- a/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
+++ b/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
@@ -324,13 +324,13 @@ def main():
if ds_args is not None:
tokenizer_type = ds_args.tokenizer_type
if tokenizer_type == "GPT2BPETokenizer":
- tokenizer_model_name = "gpt2"
+ tokenizer_model_name = "openai-community/gpt2"
elif tokenizer_type == "PretrainedFromHF":
tokenizer_model_name = ds_args.tokenizer_name_or_path
else:
raise ValueError(f"Unrecognized tokenizer_type {tokenizer_type}")
else:
- tokenizer_model_name = "gpt2"
+ tokenizer_model_name = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_model_name)
tokenizer_class = type(tokenizer).__name__
diff --git a/src/transformers/models/mgp_str/processing_mgp_str.py b/src/transformers/models/mgp_str/processing_mgp_str.py
index 71422e844d0..207d4230ba0 100644
--- a/src/transformers/models/mgp_str/processing_mgp_str.py
+++ b/src/transformers/models/mgp_str/processing_mgp_str.py
@@ -71,8 +71,8 @@ class MgpstrProcessor(ProcessorMixin):
raise ValueError("You need to specify a `tokenizer`.")
self.char_tokenizer = tokenizer
- self.bpe_tokenizer = AutoTokenizer.from_pretrained("gpt2")
- self.wp_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ self.bpe_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+ self.wp_tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
super().__init__(image_processor, tokenizer)
diff --git a/src/transformers/models/mt5/modeling_mt5.py b/src/transformers/models/mt5/modeling_mt5.py
index f9d42afc22e..100273a5ac5 100644
--- a/src/transformers/models/mt5/modeling_mt5.py
+++ b/src/transformers/models/mt5/modeling_mt5.py
@@ -1470,8 +1470,8 @@ class MT5Model(MT5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, MT5Model
- >>> tokenizer = AutoTokenizer.from_pretrained("mt5-small")
- >>> model = MT5Model.from_pretrained("mt5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-mt5/mt5-small")
+ >>> model = MT5Model.from_pretrained("google-mt5/mt5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="pt"
@@ -1706,8 +1706,8 @@ class MT5ForConditionalGeneration(MT5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, MT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("mt5-small")
- >>> model = MT5ForConditionalGeneration.from_pretrained("mt5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-mt5/mt5-small")
+ >>> model = MT5ForConditionalGeneration.from_pretrained("google-mt5/mt5-small")
>>> # training
>>> input_ids = tokenizer("The walks in park", return_tensors="pt").input_ids
@@ -2017,8 +2017,8 @@ class MT5EncoderModel(MT5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, MT5EncoderModel
- >>> tokenizer = AutoTokenizer.from_pretrained("mt5-small")
- >>> model = MT5EncoderModel.from_pretrained("mt5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-mt5/mt5-small")
+ >>> model = MT5EncoderModel.from_pretrained("google-mt5/mt5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="pt"
... ).input_ids # Batch size 1
diff --git a/src/transformers/models/musicgen/convert_musicgen_transformers.py b/src/transformers/models/musicgen/convert_musicgen_transformers.py
index d4b61046e5e..f1eb9e40704 100644
--- a/src/transformers/models/musicgen/convert_musicgen_transformers.py
+++ b/src/transformers/models/musicgen/convert_musicgen_transformers.py
@@ -138,7 +138,7 @@ def convert_musicgen_checkpoint(
decoder_state_dict, hidden_size=decoder_config.hidden_size
)
- text_encoder = T5EncoderModel.from_pretrained("t5-base")
+ text_encoder = T5EncoderModel.from_pretrained("google-t5/t5-base")
audio_encoder = EncodecModel.from_pretrained("facebook/encodec_32khz")
decoder = MusicgenForCausalLM(decoder_config).eval()
@@ -172,7 +172,7 @@ def convert_musicgen_checkpoint(
raise ValueError("Incorrect shape for logits")
# now construct the processor
- tokenizer = AutoTokenizer.from_pretrained("t5-base")
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
feature_extractor = AutoFeatureExtractor.from_pretrained(
"facebook/encodec_32khz", padding_side="left", feature_size=decoder_config.audio_channels
)
diff --git a/src/transformers/models/musicgen/modeling_musicgen.py b/src/transformers/models/musicgen/modeling_musicgen.py
index 9a6518a4d11..2514a487632 100644
--- a/src/transformers/models/musicgen/modeling_musicgen.py
+++ b/src/transformers/models/musicgen/modeling_musicgen.py
@@ -1576,8 +1576,6 @@ class MusicgenForConditionalGeneration(PreTrainedModel):
Information necessary to initiate the text encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `t5-base`, or namespaced under a user or
- organization name, like `google/flan-t5-base.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -1585,8 +1583,6 @@ class MusicgenForConditionalGeneration(PreTrainedModel):
Information necessary to initiate the audio encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `facebook/encodec_24khz`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -1594,8 +1590,6 @@ class MusicgenForConditionalGeneration(PreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `gpt2`, or namespaced under a user or
- organization name, like `facebook/musicgen-small`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -1622,7 +1616,7 @@ class MusicgenForConditionalGeneration(PreTrainedModel):
>>> # initialize a musicgen model from a t5 text encoder, encodec audio encoder, and musicgen decoder
>>> model = MusicgenForConditionalGeneration.from_sub_models_pretrained(
- ... text_encoder_pretrained_model_name_or_path="t5-base",
+ ... text_encoder_pretrained_model_name_or_path="google-t5/t5-base",
... audio_encoder_pretrained_model_name_or_path="facebook/encodec_24khz",
... decoder_pretrained_model_name_or_path="facebook/musicgen-small",
... )
diff --git a/src/transformers/models/openai/configuration_openai.py b/src/transformers/models/openai/configuration_openai.py
index dd6f349249e..38e646b3934 100644
--- a/src/transformers/models/openai/configuration_openai.py
+++ b/src/transformers/models/openai/configuration_openai.py
@@ -21,7 +21,9 @@ from ...utils import logging
logger = logging.get_logger(__name__)
-OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP = {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/config.json"}
+OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/config.json"
+}
class OpenAIGPTConfig(PretrainedConfig):
@@ -29,7 +31,7 @@ class OpenAIGPTConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`OpenAIGPTModel`] or a [`TFOpenAIGPTModel`]. It is
used to instantiate a GPT model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the GPT
- [openai-gpt](https://huggingface.co/openai-gpt) architecture from OpenAI.
+ [openai-community/openai-gpt](https://huggingface.co/openai-community/openai-gpt) architecture from OpenAI.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/openai/modeling_openai.py b/src/transformers/models/openai/modeling_openai.py
index ebb83cfc6bd..747118bd27f 100644
--- a/src/transformers/models/openai/modeling_openai.py
+++ b/src/transformers/models/openai/modeling_openai.py
@@ -43,12 +43,12 @@ from .configuration_openai import OpenAIGPTConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "openai-gpt"
+_CHECKPOINT_FOR_DOC = "openai-community/openai-gpt"
_CONFIG_FOR_DOC = "OpenAIGPTConfig"
OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "openai-gpt",
- # See all OpenAI GPT models at https://huggingface.co/models?filter=openai-gpt
+ "openai-community/openai-gpt",
+ # See all OpenAI GPT models at https://huggingface.co/models?filter=openai-community/openai-gpt
]
@@ -678,8 +678,8 @@ class OpenAIGPTDoubleHeadsModel(OpenAIGPTPreTrainedModel):
>>> from transformers import AutoTokenizer, OpenAIGPTDoubleHeadsModel
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
- >>> model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-gpt")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")
+ >>> model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-community/openai-gpt")
>>> tokenizer.add_special_tokens(
... {"cls_token": "[CLS]"}
... ) # Add a [CLS] to the vocabulary (we should train it also!)
diff --git a/src/transformers/models/openai/modeling_tf_openai.py b/src/transformers/models/openai/modeling_tf_openai.py
index 8c213bcebdb..34bc5aa522d 100644
--- a/src/transformers/models/openai/modeling_tf_openai.py
+++ b/src/transformers/models/openai/modeling_tf_openai.py
@@ -52,12 +52,12 @@ from .configuration_openai import OpenAIGPTConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "openai-gpt"
+_CHECKPOINT_FOR_DOC = "openai-community/openai-gpt"
_CONFIG_FOR_DOC = "OpenAIGPTConfig"
TF_OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "openai-gpt",
- # See all OpenAI GPT models at https://huggingface.co/models?filter=openai-gpt
+ "openai-community/openai-gpt",
+ # See all OpenAI GPT models at https://huggingface.co/models?filter=openai-community/openai-gpt
]
@@ -731,8 +731,8 @@ class TFOpenAIGPTDoubleHeadsModel(TFOpenAIGPTPreTrainedModel):
>>> import tensorflow as tf
>>> from transformers import AutoTokenizer, TFOpenAIGPTDoubleHeadsModel
- >>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
- >>> model = TFOpenAIGPTDoubleHeadsModel.from_pretrained("openai-gpt")
+ >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")
+ >>> model = TFOpenAIGPTDoubleHeadsModel.from_pretrained("openai-community/openai-gpt")
>>> # Add a [CLS] to the vocabulary (we should train it also!)
>>> tokenizer.add_special_tokens({"cls_token": "[CLS]"})
diff --git a/src/transformers/models/openai/tokenization_openai.py b/src/transformers/models/openai/tokenization_openai.py
index cfdeb3207a6..e189b15035b 100644
--- a/src/transformers/models/openai/tokenization_openai.py
+++ b/src/transformers/models/openai/tokenization_openai.py
@@ -33,12 +33,16 @@ VOCAB_FILES_NAMES = {
}
PRETRAINED_VOCAB_FILES_MAP = {
- "vocab_file": {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/vocab.json"},
- "merges_file": {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/merges.txt"},
+ "vocab_file": {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/vocab.json"
+ },
+ "merges_file": {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/merges.txt"
+ },
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "openai-gpt": 512,
+ "openai-community/openai-gpt": 512,
}
diff --git a/src/transformers/models/openai/tokenization_openai_fast.py b/src/transformers/models/openai/tokenization_openai_fast.py
index 2df26c3a2f6..e1f04722ee2 100644
--- a/src/transformers/models/openai/tokenization_openai_fast.py
+++ b/src/transformers/models/openai/tokenization_openai_fast.py
@@ -27,13 +27,19 @@ logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "tokenizer_file": "tokenizer.json"}
PRETRAINED_VOCAB_FILES_MAP = {
- "vocab_file": {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/vocab.json"},
- "merges_file": {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/merges.txt"},
- "tokenizer_file": {"openai-gpt": "https://huggingface.co/openai-gpt/resolve/main/tokenizer.json"},
+ "vocab_file": {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/vocab.json"
+ },
+ "merges_file": {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/merges.txt"
+ },
+ "tokenizer_file": {
+ "openai-community/openai-gpt": "https://huggingface.co/openai-community/openai-gpt/resolve/main/tokenizer.json"
+ },
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "openai-gpt": 512,
+ "openai-community/openai-gpt": 512,
}
diff --git a/src/transformers/models/prophetnet/modeling_prophetnet.py b/src/transformers/models/prophetnet/modeling_prophetnet.py
index eb1576197e5..81eb503ddbe 100644
--- a/src/transformers/models/prophetnet/modeling_prophetnet.py
+++ b/src/transformers/models/prophetnet/modeling_prophetnet.py
@@ -2192,10 +2192,10 @@ class ProphetNetForCausalLM(ProphetNetPreTrainedModel):
>>> from transformers import BertTokenizer, EncoderDecoderModel, AutoTokenizer
>>> import torch
- >>> tokenizer_enc = BertTokenizer.from_pretrained("bert-large-uncased")
+ >>> tokenizer_enc = BertTokenizer.from_pretrained("google-bert/bert-large-uncased")
>>> tokenizer_dec = AutoTokenizer.from_pretrained("microsoft/prophetnet-large-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "bert-large-uncased", "microsoft/prophetnet-large-uncased"
+ ... "google-bert/bert-large-uncased", "microsoft/prophetnet-large-uncased"
... )
>>> ARTICLE = (
diff --git a/src/transformers/models/qdqbert/configuration_qdqbert.py b/src/transformers/models/qdqbert/configuration_qdqbert.py
index b790dd1efc5..1efa2ef811e 100644
--- a/src/transformers/models/qdqbert/configuration_qdqbert.py
+++ b/src/transformers/models/qdqbert/configuration_qdqbert.py
@@ -21,7 +21,7 @@ from ...utils import logging
logger = logging.get_logger(__name__)
QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "bert-base-uncased": "https://huggingface.co/bert-base-uncased/resolve/main/config.json",
+ "google-bert/bert-base-uncased": "https://huggingface.co/google-bert/bert-base-uncased/resolve/main/config.json",
# QDQBERT models can be loaded from any BERT checkpoint, available at https://huggingface.co/models?filter=bert
}
@@ -31,7 +31,7 @@ class QDQBertConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`QDQBertModel`]. It is used to instantiate an
QDQBERT model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the BERT
- [bert-base-uncased](https://huggingface.co/bert-base-uncased) architecture.
+ [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
@@ -76,10 +76,10 @@ class QDQBertConfig(PretrainedConfig):
```python
>>> from transformers import QDQBertModel, QDQBertConfig
- >>> # Initializing a QDQBERT bert-base-uncased style configuration
+ >>> # Initializing a QDQBERT google-bert/bert-base-uncased style configuration
>>> configuration = QDQBertConfig()
- >>> # Initializing a model from the bert-base-uncased style configuration
+ >>> # Initializing a model from the google-bert/bert-base-uncased style configuration
>>> model = QDQBertModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/qdqbert/modeling_qdqbert.py b/src/transformers/models/qdqbert/modeling_qdqbert.py
index 5e7704c77ce..ff4b5441ea8 100755
--- a/src/transformers/models/qdqbert/modeling_qdqbert.py
+++ b/src/transformers/models/qdqbert/modeling_qdqbert.py
@@ -66,11 +66,11 @@ if is_pytorch_quantization_available():
" https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization."
)
-_CHECKPOINT_FOR_DOC = "bert-base-uncased"
+_CHECKPOINT_FOR_DOC = "google-bert/bert-base-uncased"
_CONFIG_FOR_DOC = "QDQBertConfig"
QDQBERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "bert-base-uncased",
+ "google-bert/bert-base-uncased",
# See all BERT models at https://huggingface.co/models?filter=bert
]
@@ -1080,10 +1080,10 @@ class QDQBertLMHeadModel(QDQBertPreTrainedModel):
>>> from transformers import AutoTokenizer, QDQBertLMHeadModel, QDQBertConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
- >>> config = QDQBertConfig.from_pretrained("bert-base-cased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
+ >>> config = QDQBertConfig.from_pretrained("google-bert/bert-base-cased")
>>> config.is_decoder = True
- >>> model = QDQBertLMHeadModel.from_pretrained("bert-base-cased", config=config)
+ >>> model = QDQBertLMHeadModel.from_pretrained("google-bert/bert-base-cased", config=config)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
@@ -1324,8 +1324,8 @@ class QDQBertForNextSentencePrediction(QDQBertPreTrainedModel):
>>> from transformers import AutoTokenizer, QDQBertForNextSentencePrediction
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- >>> model = QDQBertForNextSentencePrediction.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ >>> model = QDQBertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
diff --git a/src/transformers/models/rag/modeling_rag.py b/src/transformers/models/rag/modeling_rag.py
index 09fc9dabe84..a840b0681ed 100644
--- a/src/transformers/models/rag/modeling_rag.py
+++ b/src/transformers/models/rag/modeling_rag.py
@@ -260,8 +260,6 @@ class RagPreTrainedModel(PreTrainedModel):
Information necessary to initiate the question encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -273,8 +271,6 @@ class RagPreTrainedModel(PreTrainedModel):
Information necessary to initiate the generator. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -304,7 +300,7 @@ class RagPreTrainedModel(PreTrainedModel):
>>> # initialize a RAG from two pretrained models.
>>> model = RagModel.from_pretrained_question_encoder_generator(
- ... "facebook/dpr-question_encoder-single-nq-base", "t5-small"
+ ... "facebook/dpr-question_encoder-single-nq-base", "google-t5/t5-small"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./rag")
diff --git a/src/transformers/models/rag/modeling_tf_rag.py b/src/transformers/models/rag/modeling_tf_rag.py
index e586bed87c8..9d8ed650497 100644
--- a/src/transformers/models/rag/modeling_tf_rag.py
+++ b/src/transformers/models/rag/modeling_tf_rag.py
@@ -248,7 +248,7 @@ class TFRagPreTrainedModel(TFPreTrainedModel):
Information necessary to initiate the question encoder. Can be either:
- A string with the *shortcut name* of a pretrained model to load from cache or download, e.g.,
- `bert-base-uncased`.
+ `google-bert/bert-base-uncased`.
- A string with the *identifier name* of a pretrained model that was user-uploaded to our S3, e.g.,
`dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
@@ -260,7 +260,7 @@ class TFRagPreTrainedModel(TFPreTrainedModel):
Information necessary to initiate the generator. Can be either:
- A string with the *shortcut name* of a pretrained model to load from cache or download, e.g.,
- `t5-small`.
+ `google-t5/t5-small`.
- A string with the *identifier name* of a pretrained model that was user-uploaded to our S3, e.g.,
`facebook/bart-base`.
- A path to a *directory* containing model weights saved using
@@ -290,7 +290,7 @@ class TFRagPreTrainedModel(TFPreTrainedModel):
>>> # initialize a RAG from two pretrained models.
>>> model = TFRagModel.from_pretrained_question_encoder_generator(
- ... "facebook/dpr-question_encoder-single-nq-base", "t5-small"
+ ... "facebook/dpr-question_encoder-single-nq-base", "google-t5/t5-small"
... )
>>> # alternatively, initialize from pytorch pretrained models can also be done
>>> model = TFRagModel.from_pretrained_question_encoder_generator(
diff --git a/src/transformers/models/roberta/configuration_roberta.py b/src/transformers/models/roberta/configuration_roberta.py
index 86334f0a224..8cc35d6090c 100644
--- a/src/transformers/models/roberta/configuration_roberta.py
+++ b/src/transformers/models/roberta/configuration_roberta.py
@@ -25,12 +25,12 @@ from ...utils import logging
logger = logging.get_logger(__name__)
ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/config.json",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/config.json",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/config.json",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/config.json",
- "roberta-base-openai-detector": "https://huggingface.co/roberta-base-openai-detector/resolve/main/config.json",
- "roberta-large-openai-detector": "https://huggingface.co/roberta-large-openai-detector/resolve/main/config.json",
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/config.json",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/config.json",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/config.json",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/config.json",
+ "openai-community/roberta-base-openai-detector": "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/config.json",
+ "openai-community/roberta-large-openai-detector": "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/config.json",
}
@@ -39,7 +39,7 @@ class RobertaConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`RobertaModel`] or a [`TFRobertaModel`]. It is
used to instantiate a RoBERTa model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the RoBERTa
- [roberta-base](https://huggingface.co/roberta-base) architecture.
+ [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/roberta/modeling_flax_roberta.py b/src/transformers/models/roberta/modeling_flax_roberta.py
index 70a6f540a23..ecdd31386b2 100644
--- a/src/transformers/models/roberta/modeling_flax_roberta.py
+++ b/src/transformers/models/roberta/modeling_flax_roberta.py
@@ -43,7 +43,7 @@ from .configuration_roberta import RobertaConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/roberta-base"
_CONFIG_FOR_DOC = "RobertaConfig"
remat = nn_partitioning.remat
diff --git a/src/transformers/models/roberta/modeling_roberta.py b/src/transformers/models/roberta/modeling_roberta.py
index 8f34098f7bb..f755bd9d566 100644
--- a/src/transformers/models/roberta/modeling_roberta.py
+++ b/src/transformers/models/roberta/modeling_roberta.py
@@ -48,16 +48,16 @@ from .configuration_roberta import RobertaConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/roberta-base"
_CONFIG_FOR_DOC = "RobertaConfig"
ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "roberta-base",
- "roberta-large",
- "roberta-large-mnli",
- "distilroberta-base",
- "roberta-base-openai-detector",
- "roberta-large-openai-detector",
+ "FacebookAI/roberta-base",
+ "FacebookAI/roberta-large",
+ "FacebookAI/roberta-large-mnli",
+ "distilbert/distilroberta-base",
+ "openai-community/roberta-base-openai-detector",
+ "openai-community/roberta-large-openai-detector",
# See all RoBERTa models at https://huggingface.co/models?filter=roberta
]
@@ -936,10 +936,10 @@ class RobertaForCausalLM(RobertaPreTrainedModel):
>>> from transformers import AutoTokenizer, RobertaForCausalLM, AutoConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("roberta-base")
- >>> config = AutoConfig.from_pretrained("roberta-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
+ >>> config = AutoConfig.from_pretrained("FacebookAI/roberta-base")
>>> config.is_decoder = True
- >>> model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)
+ >>> model = RobertaForCausalLM.from_pretrained("FacebookAI/roberta-base", config=config)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
diff --git a/src/transformers/models/roberta/modeling_tf_roberta.py b/src/transformers/models/roberta/modeling_tf_roberta.py
index afe773ec97b..0bc5e85e808 100644
--- a/src/transformers/models/roberta/modeling_tf_roberta.py
+++ b/src/transformers/models/roberta/modeling_tf_roberta.py
@@ -62,14 +62,14 @@ from .configuration_roberta import RobertaConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/roberta-base"
_CONFIG_FOR_DOC = "RobertaConfig"
TF_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "roberta-base",
- "roberta-large",
- "roberta-large-mnli",
- "distilroberta-base",
+ "FacebookAI/roberta-base",
+ "FacebookAI/roberta-large",
+ "FacebookAI/roberta-large-mnli",
+ "distilbert/distilroberta-base",
# See all RoBERTa models at https://huggingface.co/models?filter=roberta
]
diff --git a/src/transformers/models/roberta/tokenization_roberta.py b/src/transformers/models/roberta/tokenization_roberta.py
index b7b3c75be18..c7dc51b9729 100644
--- a/src/transformers/models/roberta/tokenization_roberta.py
+++ b/src/transformers/models/roberta/tokenization_roberta.py
@@ -34,34 +34,34 @@ VOCAB_FILES_NAMES = {
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/vocab.json",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/vocab.json",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/vocab.json",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/vocab.json",
- "roberta-base-openai-detector": "https://huggingface.co/roberta-base-openai-detector/resolve/main/vocab.json",
- "roberta-large-openai-detector": (
- "https://huggingface.co/roberta-large-openai-detector/resolve/main/vocab.json"
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/vocab.json",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/vocab.json",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/vocab.json",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/vocab.json",
+ "openai-community/roberta-base-openai-detector": "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/vocab.json",
+ "openai-community/roberta-large-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/vocab.json"
),
},
"merges_file": {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/merges.txt",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/merges.txt",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/merges.txt",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/merges.txt",
- "roberta-base-openai-detector": "https://huggingface.co/roberta-base-openai-detector/resolve/main/merges.txt",
- "roberta-large-openai-detector": (
- "https://huggingface.co/roberta-large-openai-detector/resolve/main/merges.txt"
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/merges.txt",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/merges.txt",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/merges.txt",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/merges.txt",
+ "openai-community/roberta-base-openai-detector": "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/merges.txt",
+ "openai-community/roberta-large-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/merges.txt"
),
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "roberta-base": 512,
- "roberta-large": 512,
- "roberta-large-mnli": 512,
- "distilroberta-base": 512,
- "roberta-base-openai-detector": 512,
- "roberta-large-openai-detector": 512,
+ "FacebookAI/roberta-base": 512,
+ "FacebookAI/roberta-large": 512,
+ "FacebookAI/roberta-large-mnli": 512,
+ "distilbert/distilroberta-base": 512,
+ "openai-community/roberta-base-openai-detector": 512,
+ "openai-community/roberta-large-openai-detector": 512,
}
@@ -114,7 +114,7 @@ class RobertaTokenizer(PreTrainedTokenizer):
```python
>>> from transformers import RobertaTokenizer
- >>> tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
+ >>> tokenizer = RobertaTokenizer.from_pretrained("FacebookAI/roberta-base")
>>> tokenizer("Hello world")["input_ids"]
[0, 31414, 232, 2]
diff --git a/src/transformers/models/roberta/tokenization_roberta_fast.py b/src/transformers/models/roberta/tokenization_roberta_fast.py
index 05f64ac2ab1..00341e870f8 100644
--- a/src/transformers/models/roberta/tokenization_roberta_fast.py
+++ b/src/transformers/models/roberta/tokenization_roberta_fast.py
@@ -30,46 +30,46 @@ VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "t
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/vocab.json",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/vocab.json",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/vocab.json",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/vocab.json",
- "roberta-base-openai-detector": "https://huggingface.co/roberta-base-openai-detector/resolve/main/vocab.json",
- "roberta-large-openai-detector": (
- "https://huggingface.co/roberta-large-openai-detector/resolve/main/vocab.json"
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/vocab.json",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/vocab.json",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/vocab.json",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/vocab.json",
+ "openai-community/roberta-base-openai-detector": "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/vocab.json",
+ "openai-community/roberta-large-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/vocab.json"
),
},
"merges_file": {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/merges.txt",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/merges.txt",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/merges.txt",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/merges.txt",
- "roberta-base-openai-detector": "https://huggingface.co/roberta-base-openai-detector/resolve/main/merges.txt",
- "roberta-large-openai-detector": (
- "https://huggingface.co/roberta-large-openai-detector/resolve/main/merges.txt"
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/merges.txt",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/merges.txt",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/merges.txt",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/merges.txt",
+ "openai-community/roberta-base-openai-detector": "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/merges.txt",
+ "openai-community/roberta-large-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/merges.txt"
),
},
"tokenizer_file": {
- "roberta-base": "https://huggingface.co/roberta-base/resolve/main/tokenizer.json",
- "roberta-large": "https://huggingface.co/roberta-large/resolve/main/tokenizer.json",
- "roberta-large-mnli": "https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer.json",
- "distilroberta-base": "https://huggingface.co/distilroberta-base/resolve/main/tokenizer.json",
- "roberta-base-openai-detector": (
- "https://huggingface.co/roberta-base-openai-detector/resolve/main/tokenizer.json"
+ "FacebookAI/roberta-base": "https://huggingface.co/FacebookAI/roberta-base/resolve/main/tokenizer.json",
+ "FacebookAI/roberta-large": "https://huggingface.co/FacebookAI/roberta-large/resolve/main/tokenizer.json",
+ "FacebookAI/roberta-large-mnli": "https://huggingface.co/FacebookAI/roberta-large-mnli/resolve/main/tokenizer.json",
+ "distilbert/distilroberta-base": "https://huggingface.co/distilbert/distilroberta-base/resolve/main/tokenizer.json",
+ "openai-community/roberta-base-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-base-openai-detector/resolve/main/tokenizer.json"
),
- "roberta-large-openai-detector": (
- "https://huggingface.co/roberta-large-openai-detector/resolve/main/tokenizer.json"
+ "openai-community/roberta-large-openai-detector": (
+ "https://huggingface.co/openai-community/roberta-large-openai-detector/resolve/main/tokenizer.json"
),
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "roberta-base": 512,
- "roberta-large": 512,
- "roberta-large-mnli": 512,
- "distilroberta-base": 512,
- "roberta-base-openai-detector": 512,
- "roberta-large-openai-detector": 512,
+ "FacebookAI/roberta-base": 512,
+ "FacebookAI/roberta-large": 512,
+ "FacebookAI/roberta-large-mnli": 512,
+ "distilbert/distilroberta-base": 512,
+ "openai-community/roberta-base-openai-detector": 512,
+ "openai-community/roberta-large-openai-detector": 512,
}
@@ -84,7 +84,7 @@ class RobertaTokenizerFast(PreTrainedTokenizerFast):
```python
>>> from transformers import RobertaTokenizerFast
- >>> tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
+ >>> tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")
>>> tokenizer("Hello world")["input_ids"]
[0, 31414, 232, 2]
diff --git a/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py b/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py
index 1957a30f41b..f9325138165 100644
--- a/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py
+++ b/src/transformers/models/roberta_prelayernorm/configuration_roberta_prelayernorm.py
@@ -31,7 +31,7 @@ ROBERTA_PRELAYERNORM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
}
-# Copied from transformers.models.roberta.configuration_roberta.RobertaConfig with roberta-base->andreasmadsen/efficient_mlm_m0.40,RoBERTa->RoBERTa-PreLayerNorm,Roberta->RobertaPreLayerNorm,roberta->roberta-prelayernorm
+# Copied from transformers.models.roberta.configuration_roberta.RobertaConfig with FacebookAI/roberta-base->andreasmadsen/efficient_mlm_m0.40,RoBERTa->RoBERTa-PreLayerNorm,Roberta->RobertaPreLayerNorm,roberta->roberta-prelayernorm
class RobertaPreLayerNormConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`RobertaPreLayerNormModel`] or a [`TFRobertaPreLayerNormModel`]. It is
diff --git a/src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py b/src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py
index cb22bbe14a0..7c37950e478 100644
--- a/src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py
+++ b/src/transformers/models/roberta_prelayernorm/modeling_roberta_prelayernorm.py
@@ -867,7 +867,7 @@ class RobertaPreLayerNormModel(RobertaPreLayerNormPreTrainedModel):
"""RoBERTa-PreLayerNorm Model with a `language modeling` head on top for CLM fine-tuning.""",
ROBERTA_PRELAYERNORM_START_DOCSTRING,
)
-# Copied from transformers.models.roberta.modeling_roberta.RobertaForCausalLM with roberta-base->andreasmadsen/efficient_mlm_m0.40,ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,roberta->roberta_prelayernorm, RobertaPreLayerNormTokenizer->RobertaTokenizer
+# Copied from transformers.models.roberta.modeling_roberta.RobertaForCausalLM with FacebookAI/roberta-base->andreasmadsen/efficient_mlm_m0.40,ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,roberta->roberta_prelayernorm, RobertaPreLayerNormTokenizer->RobertaTokenizer
class RobertaPreLayerNormForCausalLM(RobertaPreLayerNormPreTrainedModel):
_tied_weights_keys = ["lm_head.decoder.weight", "lm_head.decoder.bias"]
diff --git a/src/transformers/models/speech_encoder_decoder/configuration_speech_encoder_decoder.py b/src/transformers/models/speech_encoder_decoder/configuration_speech_encoder_decoder.py
index 378f082e4b9..32a58ec5589 100644
--- a/src/transformers/models/speech_encoder_decoder/configuration_speech_encoder_decoder.py
+++ b/src/transformers/models/speech_encoder_decoder/configuration_speech_encoder_decoder.py
@@ -52,7 +52,7 @@ class SpeechEncoderDecoderConfig(PretrainedConfig):
>>> config = SpeechEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
- >>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & bert-base-uncased style configurations
+ >>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & google-bert/bert-base-uncased style configurations
>>> model = SpeechEncoderDecoderModel(config=config)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py b/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py
index b9975510abf..e3bbd86266e 100644
--- a/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py
+++ b/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py
@@ -796,8 +796,6 @@ class FlaxSpeechEncoderDecoderModel(FlaxPreTrainedModel):
Information necessary to initiate the encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -805,8 +803,6 @@ class FlaxSpeechEncoderDecoderModel(FlaxPreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
diff --git a/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py b/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py
index 5028e30344c..942dfb5f9c4 100644
--- a/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py
+++ b/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py
@@ -301,8 +301,6 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
Information necessary to initiate the encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -314,8 +312,6 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -343,7 +339,7 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
>>> # initialize a wav2vec2bert from a pretrained Wav2Vec2 and a pretrained BERT model. Note that the cross-attention layers will be randomly initialized
>>> model = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "facebook/wav2vec2-base-960h", "bert-base-uncased"
+ ... "facebook/wav2vec2-base-960h", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./wav2vec2bert")
diff --git a/src/transformers/models/switch_transformers/convert_big_switch.py b/src/transformers/models/switch_transformers/convert_big_switch.py
index 86c673b48a4..e4b8af07cd4 100644
--- a/src/transformers/models/switch_transformers/convert_big_switch.py
+++ b/src/transformers/models/switch_transformers/convert_big_switch.py
@@ -185,7 +185,7 @@ def sanity_check():
"/home/arthur_huggingface_co/transformers/switch_converted", device_map="auto"
)
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
text = "A walks into a bar a orders a with pinch of ."
input_ids = tokenizer(text, return_tensors="pt").input_ids
diff --git a/src/transformers/models/t5/configuration_t5.py b/src/transformers/models/t5/configuration_t5.py
index 05d737d035a..6a1d3c529e0 100644
--- a/src/transformers/models/t5/configuration_t5.py
+++ b/src/transformers/models/t5/configuration_t5.py
@@ -23,11 +23,11 @@ from ...utils import logging
logger = logging.get_logger(__name__)
T5_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "t5-small": "https://huggingface.co/t5-small/resolve/main/config.json",
- "t5-base": "https://huggingface.co/t5-base/resolve/main/config.json",
- "t5-large": "https://huggingface.co/t5-large/resolve/main/config.json",
- "t5-3b": "https://huggingface.co/t5-3b/resolve/main/config.json",
- "t5-11b": "https://huggingface.co/t5-11b/resolve/main/config.json",
+ "google-t5/t5-small": "https://huggingface.co/google-t5/t5-small/resolve/main/config.json",
+ "google-t5/t5-base": "https://huggingface.co/google-t5/t5-base/resolve/main/config.json",
+ "google-t5/t5-large": "https://huggingface.co/google-t5/t5-large/resolve/main/config.json",
+ "google-t5/t5-3b": "https://huggingface.co/google-t5/t5-3b/resolve/main/config.json",
+ "google-t5/t5-11b": "https://huggingface.co/google-t5/t5-11b/resolve/main/config.json",
}
@@ -36,7 +36,7 @@ class T5Config(PretrainedConfig):
This is the configuration class to store the configuration of a [`T5Model`] or a [`TFT5Model`]. It is used to
instantiate a T5 model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the T5
- [t5-small](https://huggingface.co/t5-small) architecture.
+ [google-t5/t5-small](https://huggingface.co/google-t5/t5-small) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/t5/modeling_flax_t5.py b/src/transformers/models/t5/modeling_flax_t5.py
index 09575fdcc3b..94b24bd42f9 100644
--- a/src/transformers/models/t5/modeling_flax_t5.py
+++ b/src/transformers/models/t5/modeling_flax_t5.py
@@ -49,7 +49,7 @@ from .configuration_t5 import T5Config
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "t5-small"
+_CHECKPOINT_FOR_DOC = "google-t5/t5-small"
_CONFIG_FOR_DOC = "T5Config"
remat = nn_partitioning.remat
@@ -1090,8 +1090,8 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
```python
>>> from transformers import AutoTokenizer, FlaxT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors="np")
@@ -1152,8 +1152,8 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
>>> from transformers import AutoTokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors="np")
@@ -1378,8 +1378,8 @@ FLAX_T5_MODEL_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxT5Model
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = FlaxT5Model.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = FlaxT5Model.from_pretrained("google-t5/t5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="np"
@@ -1630,8 +1630,8 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel):
>>> from transformers import AutoTokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> text = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors="np")
@@ -1778,8 +1778,8 @@ FLAX_T5_CONDITIONAL_GENERATION_DOCSTRING = """
```python
>>> from transformers import AutoTokenizer, FlaxT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], return_tensors="np")
diff --git a/src/transformers/models/t5/modeling_t5.py b/src/transformers/models/t5/modeling_t5.py
index 9d4ba820d04..a3febdd1aa7 100644
--- a/src/transformers/models/t5/modeling_t5.py
+++ b/src/transformers/models/t5/modeling_t5.py
@@ -53,18 +53,18 @@ from .configuration_t5 import T5Config
logger = logging.get_logger(__name__)
_CONFIG_FOR_DOC = "T5Config"
-_CHECKPOINT_FOR_DOC = "t5-small"
+_CHECKPOINT_FOR_DOC = "google-t5/t5-small"
####################################################
# This dict contains ids and associated url
# for the pretrained weights provided with the models
####################################################
T5_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
# See all T5 models at https://huggingface.co/models?filter=t5
]
@@ -196,17 +196,17 @@ PARALLELIZE_DOCSTRING = r"""
have fewer attention modules mapped to it than other devices. For reference, the t5 models have the
following number of attention modules:
- - t5-small: 6
- - t5-base: 12
- - t5-large: 24
- - t5-3b: 24
- - t5-11b: 24
+ - google-t5/t5-small: 6
+ - google-t5/t5-base: 12
+ - google-t5/t5-large: 24
+ - google-t5/t5-3b: 24
+ - google-t5/t5-11b: 24
Example:
```python
- # Here is an example of a device map on a machine with 4 GPUs using t5-3b, which has a total of 24 attention modules:
- model = T5ForConditionalGeneration.from_pretrained("t5-3b")
+ # Here is an example of a device map on a machine with 4 GPUs using google-t5/t5-3b, which has a total of 24 attention modules:
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-3b")
device_map = {
0: [0, 1, 2],
1: [3, 4, 5, 6, 7, 8, 9],
@@ -222,8 +222,8 @@ DEPARALLELIZE_DOCSTRING = r"""
Example:
```python
- # On a 4 GPU machine with t5-3b:
- model = T5ForConditionalGeneration.from_pretrained("t5-3b")
+ # On a 4 GPU machine with google-t5/t5-3b:
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-3b")
device_map = {
0: [0, 1, 2],
1: [3, 4, 5, 6, 7, 8, 9],
@@ -1463,8 +1463,8 @@ class T5Model(T5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, T5Model
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = T5Model.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = T5Model.from_pretrained("google-t5/t5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="pt"
@@ -1678,8 +1678,8 @@ class T5ForConditionalGeneration(T5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, T5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = T5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> # training
>>> input_ids = tokenizer("The walks in park", return_tensors="pt").input_ids
@@ -1967,8 +1967,8 @@ class T5EncoderModel(T5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, T5EncoderModel
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = T5EncoderModel.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = T5EncoderModel.from_pretrained("google-t5/t5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="pt"
... ).input_ids # Batch size 1
diff --git a/src/transformers/models/t5/modeling_tf_t5.py b/src/transformers/models/t5/modeling_tf_t5.py
index c0a05a8a39e..c809659477b 100644
--- a/src/transformers/models/t5/modeling_tf_t5.py
+++ b/src/transformers/models/t5/modeling_tf_t5.py
@@ -59,11 +59,11 @@ logger = logging.get_logger(__name__)
_CONFIG_FOR_DOC = "T5Config"
TF_T5_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "t5-small",
- "t5-base",
- "t5-large",
- "t5-3b",
- "t5-11b",
+ "google-t5/t5-small",
+ "google-t5/t5-base",
+ "google-t5/t5-large",
+ "google-t5/t5-3b",
+ "google-t5/t5-11b",
# See all T5 models at https://huggingface.co/models?filter=t5
]
@@ -1236,8 +1236,8 @@ class TFT5Model(TFT5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, TFT5Model
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = TFT5Model.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = TFT5Model.from_pretrained("google-t5/t5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="tf"
@@ -1418,8 +1418,8 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel, TFCausalLanguageModeling
```python
>>> from transformers import AutoTokenizer, TFT5ForConditionalGeneration
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
>>> # training
>>> inputs = tokenizer("The walks in park", return_tensors="tf").input_ids
@@ -1642,8 +1642,8 @@ class TFT5EncoderModel(TFT5PreTrainedModel):
```python
>>> from transformers import AutoTokenizer, TFT5EncoderModel
- >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
- >>> model = TFT5EncoderModel.from_pretrained("t5-small")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
+ >>> model = TFT5EncoderModel.from_pretrained("google-t5/t5-small")
>>> input_ids = tokenizer(
... "Studies have been shown that owning a dog is good for you", return_tensors="tf"
diff --git a/src/transformers/models/t5/tokenization_t5.py b/src/transformers/models/t5/tokenization_t5.py
index af2d8ef6e04..ffd58a4d5a5 100644
--- a/src/transformers/models/t5/tokenization_t5.py
+++ b/src/transformers/models/t5/tokenization_t5.py
@@ -39,22 +39,22 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "t5-small": "https://huggingface.co/t5-small/resolve/main/spiece.model",
- "t5-base": "https://huggingface.co/t5-base/resolve/main/spiece.model",
- "t5-large": "https://huggingface.co/t5-large/resolve/main/spiece.model",
- "t5-3b": "https://huggingface.co/t5-3b/resolve/main/spiece.model",
- "t5-11b": "https://huggingface.co/t5-11b/resolve/main/spiece.model",
+ "google-t5/t5-small": "https://huggingface.co/google-t5/t5-small/resolve/main/spiece.model",
+ "google-t5/t5-base": "https://huggingface.co/google-t5/t5-base/resolve/main/spiece.model",
+ "google-t5/t5-large": "https://huggingface.co/google-t5/t5-large/resolve/main/spiece.model",
+ "google-t5/t5-3b": "https://huggingface.co/google-t5/t5-3b/resolve/main/spiece.model",
+ "google-t5/t5-11b": "https://huggingface.co/google-t5/t5-11b/resolve/main/spiece.model",
}
}
# TODO(PVP) - this should be removed in Transformers v5
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "t5-small": 512,
- "t5-base": 512,
- "t5-large": 512,
- "t5-3b": 512,
- "t5-11b": 512,
+ "google-t5/t5-small": 512,
+ "google-t5/t5-base": 512,
+ "google-t5/t5-large": 512,
+ "google-t5/t5-3b": 512,
+ "google-t5/t5-11b": 512,
}
SPIECE_UNDERLINE = "▁"
@@ -117,7 +117,7 @@ class T5Tokenizer(PreTrainedTokenizer):
```python
>>> from transformers import T5Tokenizer
- >>> tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=True)
+ >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=True)
>>> tokenizer.encode("Hello .")
[8774, 32099, 3, 5, 1]
```
@@ -125,7 +125,7 @@ class T5Tokenizer(PreTrainedTokenizer):
```python
>>> from transformers import T5Tokenizer
- >>> tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=False)
+ >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=False)
>>> tokenizer.encode("Hello .") # the extra space `[3]` is no longer here
[8774, 32099, 5, 1]
```
diff --git a/src/transformers/models/t5/tokenization_t5_fast.py b/src/transformers/models/t5/tokenization_t5_fast.py
index a0fedd9e3be..71a7bd07b4d 100644
--- a/src/transformers/models/t5/tokenization_t5_fast.py
+++ b/src/transformers/models/t5/tokenization_t5_fast.py
@@ -37,29 +37,29 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model", "tokenizer_file": "tokenizer.
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "t5-small": "https://huggingface.co/t5-small/resolve/main/spiece.model",
- "t5-base": "https://huggingface.co/t5-base/resolve/main/spiece.model",
- "t5-large": "https://huggingface.co/t5-large/resolve/main/spiece.model",
- "t5-3b": "https://huggingface.co/t5-3b/resolve/main/spiece.model",
- "t5-11b": "https://huggingface.co/t5-11b/resolve/main/spiece.model",
+ "google-t5/t5-small": "https://huggingface.co/google-t5/t5-small/resolve/main/spiece.model",
+ "google-t5/t5-base": "https://huggingface.co/google-t5/t5-base/resolve/main/spiece.model",
+ "google-t5/t5-large": "https://huggingface.co/google-t5/t5-large/resolve/main/spiece.model",
+ "google-t5/t5-3b": "https://huggingface.co/google-t5/t5-3b/resolve/main/spiece.model",
+ "google-t5/t5-11b": "https://huggingface.co/google-t5/t5-11b/resolve/main/spiece.model",
},
"tokenizer_file": {
- "t5-small": "https://huggingface.co/t5-small/resolve/main/tokenizer.json",
- "t5-base": "https://huggingface.co/t5-base/resolve/main/tokenizer.json",
- "t5-large": "https://huggingface.co/t5-large/resolve/main/tokenizer.json",
- "t5-3b": "https://huggingface.co/t5-3b/resolve/main/tokenizer.json",
- "t5-11b": "https://huggingface.co/t5-11b/resolve/main/tokenizer.json",
+ "google-t5/t5-small": "https://huggingface.co/google-t5/t5-small/resolve/main/tokenizer.json",
+ "google-t5/t5-base": "https://huggingface.co/google-t5/t5-base/resolve/main/tokenizer.json",
+ "google-t5/t5-large": "https://huggingface.co/google-t5/t5-large/resolve/main/tokenizer.json",
+ "google-t5/t5-3b": "https://huggingface.co/google-t5/t5-3b/resolve/main/tokenizer.json",
+ "google-t5/t5-11b": "https://huggingface.co/google-t5/t5-11b/resolve/main/tokenizer.json",
},
}
# TODO(PVP) - this should be removed in Transformers v5
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "t5-small": 512,
- "t5-base": 512,
- "t5-large": 512,
- "t5-3b": 512,
- "t5-11b": 512,
+ "google-t5/t5-small": 512,
+ "google-t5/t5-base": 512,
+ "google-t5/t5-large": 512,
+ "google-t5/t5-3b": 512,
+ "google-t5/t5-11b": 512,
}
diff --git a/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py b/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py
index b82adf690e7..428406d82c6 100644
--- a/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py
+++ b/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py
@@ -183,7 +183,7 @@ def convert_tr_ocr_checkpoint(checkpoint_url, pytorch_dump_folder_path):
# Check outputs on an image
image_processor = ViTImageProcessor(size=encoder_config.image_size)
- tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
+ tokenizer = RobertaTokenizer.from_pretrained("FacebookAI/roberta-large")
processor = TrOCRProcessor(image_processor, tokenizer)
pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values
diff --git a/src/transformers/models/umt5/modeling_umt5.py b/src/transformers/models/umt5/modeling_umt5.py
index a93a6801689..1bf8469f77e 100644
--- a/src/transformers/models/umt5/modeling_umt5.py
+++ b/src/transformers/models/umt5/modeling_umt5.py
@@ -1418,7 +1418,7 @@ class UMT5EncoderModel(UMT5PreTrainedModel):
@add_start_docstrings_to_model_forward(UMT5_ENCODER_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=BaseModelOutput, config_class=_CONFIG_FOR_DOC)
- # Copied from transformers.models.t5.modeling_t5.T5EncoderModel.forward with T5->UMT5, t5-small->google/umt5-small
+ # Copied from transformers.models.t5.modeling_t5.T5EncoderModel.forward with T5->UMT5, google-t5/t5-small->google/umt5-small
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
diff --git a/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py b/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py
index 015db07453d..e597d0d7e77 100644
--- a/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py
+++ b/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py
@@ -224,7 +224,7 @@ def convert_vilt_checkpoint(checkpoint_url, pytorch_dump_folder_path):
# Define processor
image_processor = ViltImageProcessor(size=384)
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
processor = ViltProcessor(image_processor, tokenizer)
# Forward pass on example inputs (image + text)
diff --git a/src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
index ba380ed3ea3..a4aa663f985 100644
--- a/src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
+++ b/src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
@@ -59,7 +59,7 @@ class VisionEncoderDecoderConfig(PretrainedConfig):
>>> config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
- >>> # Initializing a ViTBert model (with random weights) from a ViT & bert-base-uncased style configurations
+ >>> # Initializing a ViTBert model (with random weights) from a ViT & google-bert/bert-base-uncased style configurations
>>> model = VisionEncoderDecoderModel(config=config)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/vision_encoder_decoder/modeling_flax_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/modeling_flax_vision_encoder_decoder.py
index 899acd10703..987c9a1afa3 100644
--- a/src/transformers/models/vision_encoder_decoder/modeling_flax_vision_encoder_decoder.py
+++ b/src/transformers/models/vision_encoder_decoder/modeling_flax_vision_encoder_decoder.py
@@ -421,7 +421,7 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
>>> # initialize a vit-gpt2 from pretrained ViT and GPT2 models. Note that the cross-attention layers will be randomly initialized
>>> model = FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "gpt2"
+ ... "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
... )
>>> pixel_values = image_processor(images=image, return_tensors="np").pixel_values
@@ -500,7 +500,7 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
>>> # initialize a vit-gpt2 from pretrained ViT and GPT2 models. Note that the cross-attention layers will be randomly initialized
>>> model = FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "gpt2"
+ ... "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
... )
>>> pixel_values = image_processor(images=image, return_tensors="np").pixel_values
@@ -627,11 +627,11 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
>>> # load output tokenizer
- >>> tokenizer_output = AutoTokenizer.from_pretrained("gpt2")
+ >>> tokenizer_output = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> # initialize a vit-gpt2 from pretrained ViT and GPT2 models. Note that the cross-attention layers will be randomly initialized
>>> model = FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "gpt2"
+ ... "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
... )
>>> pixel_values = image_processor(images=image, return_tensors="np").pixel_values
@@ -746,8 +746,6 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
@@ -771,7 +769,7 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
>>> # initialize a vit-gpt2 from a pretrained ViT and a pretrained GPT2 model. Note that the cross-attention layers will be randomly initialized
>>> model = FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "gpt2"
+ ... "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-gpt2")
diff --git a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py
index a323c0607f4..75ff2dbd82e 100644
--- a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py
+++ b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py
@@ -335,8 +335,6 @@ class TFVisionEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLos
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *pytorch checkpoint file* (e.g, `./pt_model/`). In this case,
@@ -362,7 +360,7 @@ class TFVisionEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLos
>>> # initialize a vit-bert from a pretrained ViT and a pretrained BERT model. Note that the cross-attention layers will be randomly initialized
>>> model = TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "bert-base-uncased"
+ ... "google/vit-base-patch16-224-in21k", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-bert")
@@ -487,11 +485,11 @@ class TFVisionEncoderDecoderModel(TFPreTrainedModel, TFCausalLanguageModelingLos
>>> import requests
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
- >>> decoder_tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ >>> decoder_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> # initialize a bert2gpt2 from a pretrained BERT and GPT2 models. Note that the cross-attention layers will be randomly initialized
>>> model = TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "gpt2"
+ ... "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
... )
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
diff --git a/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py
index f7134c94ff0..88b5efd0476 100644
--- a/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py
+++ b/src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py
@@ -391,8 +391,6 @@ class VisionEncoderDecoderModel(PreTrainedModel):
Information necessary to initiate the text decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
@@ -420,7 +418,7 @@ class VisionEncoderDecoderModel(PreTrainedModel):
>>> # initialize a vit-bert from a pretrained ViT and a pretrained BERT model. Note that the cross-attention layers will be randomly initialized
>>> model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "google/vit-base-patch16-224-in21k", "bert-base-uncased"
+ ... "google/vit-base-patch16-224-in21k", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-bert")
diff --git a/src/transformers/models/vision_text_dual_encoder/modeling_flax_vision_text_dual_encoder.py b/src/transformers/models/vision_text_dual_encoder/modeling_flax_vision_text_dual_encoder.py
index f38b6b931f5..ba8bf7091b3 100644
--- a/src/transformers/models/vision_text_dual_encoder/modeling_flax_vision_text_dual_encoder.py
+++ b/src/transformers/models/vision_text_dual_encoder/modeling_flax_vision_text_dual_encoder.py
@@ -426,8 +426,6 @@ class FlaxVisionTextDualEncoderModel(FlaxPreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -439,8 +437,6 @@ class FlaxVisionTextDualEncoderModel(FlaxPreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -468,7 +464,7 @@ class FlaxVisionTextDualEncoderModel(FlaxPreTrainedModel):
>>> # initialize a model from pretrained ViT and BERT models. Note that the projection layers will be randomly initialized.
>>> model = FlaxVisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-bert")
@@ -560,11 +556,11 @@ VISION_TEXT_DUAL_ENCODER_MODEL_DOCSTRING = r"""
... AutoTokenizer,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> image_processor = AutoImageProcesor.from_pretrained("google/vit-base-patch16-224")
>>> processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)
>>> model = FlaxVisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # contrastive training
diff --git a/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py b/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
index 3f3cc81795b..6f7e30d3f6f 100644
--- a/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
+++ b/src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
@@ -374,11 +374,11 @@ class TFVisionTextDualEncoderModel(TFPreTrainedModel):
... AutoTokenizer,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
>>> processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)
>>> model = TFVisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # contrastive training
@@ -477,8 +477,6 @@ class TFVisionTextDualEncoderModel(TFPreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -488,8 +486,6 @@ class TFVisionTextDualEncoderModel(TFPreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -515,7 +511,7 @@ class TFVisionTextDualEncoderModel(TFPreTrainedModel):
>>> # initialize a model from pretrained ViT and BERT models. Note that the projection layers will be randomly initialized.
>>> model = TFVisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-bert")
diff --git a/src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py b/src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py
index 106ff462e3e..cd4d5bd7a1f 100755
--- a/src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py
+++ b/src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py
@@ -319,11 +319,11 @@ class VisionTextDualEncoderModel(PreTrainedModel):
... AutoTokenizer,
... )
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
>>> processor = VisionTextDualEncoderProcessor(image_processor, tokenizer)
>>> model = VisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # contrastive training
@@ -425,8 +425,6 @@ class VisionTextDualEncoderModel(PreTrainedModel):
Information necessary to initiate the vision model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -438,8 +436,6 @@ class VisionTextDualEncoderModel(PreTrainedModel):
Information necessary to initiate the text model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
- A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt`
@@ -467,7 +463,7 @@ class VisionTextDualEncoderModel(PreTrainedModel):
>>> # initialize a model from pretrained ViT and BERT models. Note that the projection layers will be randomly initialized.
>>> model = VisionTextDualEncoderModel.from_vision_text_pretrained(
- ... "google/vit-base-patch16-224", "bert-base-uncased"
+ ... "google/vit-base-patch16-224", "google-bert/bert-base-uncased"
... )
>>> # saving model after fine-tuning
>>> model.save_pretrained("./vit-bert")
diff --git a/src/transformers/models/visual_bert/modeling_visual_bert.py b/src/transformers/models/visual_bert/modeling_visual_bert.py
index f81f7b04c8f..68e77505e12 100755
--- a/src/transformers/models/visual_bert/modeling_visual_bert.py
+++ b/src/transformers/models/visual_bert/modeling_visual_bert.py
@@ -736,7 +736,7 @@ class VisualBertModel(VisualBertPreTrainedModel):
from transformers import AutoTokenizer, VisualBertModel
import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertModel.from_pretrained("uclanlp/visualbert-vqa-coco-pre")
inputs = tokenizer("The capital of France is Paris.", return_tensors="pt")
@@ -924,7 +924,7 @@ class VisualBertForPreTraining(VisualBertPreTrainedModel):
# Assumption: *get_visual_embeddings(image)* gets the visual embeddings of the image in the batch.
from transformers import AutoTokenizer, VisualBertForPreTraining
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForPreTraining.from_pretrained("uclanlp/visualbert-vqa-coco-pre")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
@@ -1064,7 +1064,7 @@ class VisualBertForMultipleChoice(VisualBertPreTrainedModel):
from transformers import AutoTokenizer, VisualBertForMultipleChoice
import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForMultipleChoice.from_pretrained("uclanlp/visualbert-vcr")
prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
@@ -1215,7 +1215,7 @@ class VisualBertForQuestionAnswering(VisualBertPreTrainedModel):
from transformers import AutoTokenizer, VisualBertForQuestionAnswering
import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForQuestionAnswering.from_pretrained("uclanlp/visualbert-vqa")
text = "Who is eating the apple?"
@@ -1341,7 +1341,7 @@ class VisualBertForVisualReasoning(VisualBertPreTrainedModel):
from transformers import AutoTokenizer, VisualBertForVisualReasoning
import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForVisualReasoning.from_pretrained("uclanlp/visualbert-nlvr2")
text = "Who is eating the apple?"
@@ -1507,7 +1507,7 @@ class VisualBertForRegionToPhraseAlignment(VisualBertPreTrainedModel):
from transformers import AutoTokenizer, VisualBertForRegionToPhraseAlignment
import torch
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForRegionToPhraseAlignment.from_pretrained("uclanlp/visualbert-vqa-coco-pre")
text = "Who is eating the apple?"
diff --git a/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py b/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
index 916cca51a98..b388be245f1 100644
--- a/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
+++ b/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
@@ -131,8 +131,7 @@ class Wav2Vec2ProcessorWithLM(ProcessorMixin):
This can be either:
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a feature extractor file saved using the
[`~SequenceFeatureExtractor.save_pretrained`] method, e.g., `./my_model_directory/`.
- a path or url to a saved feature extractor JSON *file*, e.g.,
diff --git a/src/transformers/models/xlm/configuration_xlm.py b/src/transformers/models/xlm/configuration_xlm.py
index cd8d721bfc3..2992a3ab322 100644
--- a/src/transformers/models/xlm/configuration_xlm.py
+++ b/src/transformers/models/xlm/configuration_xlm.py
@@ -24,16 +24,16 @@ from ...utils import logging
logger = logging.get_logger(__name__)
XLM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "xlm-mlm-en-2048": "https://huggingface.co/xlm-mlm-en-2048/resolve/main/config.json",
- "xlm-mlm-ende-1024": "https://huggingface.co/xlm-mlm-ende-1024/resolve/main/config.json",
- "xlm-mlm-enfr-1024": "https://huggingface.co/xlm-mlm-enfr-1024/resolve/main/config.json",
- "xlm-mlm-enro-1024": "https://huggingface.co/xlm-mlm-enro-1024/resolve/main/config.json",
- "xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/xlm-mlm-tlm-xnli15-1024/resolve/main/config.json",
- "xlm-mlm-xnli15-1024": "https://huggingface.co/xlm-mlm-xnli15-1024/resolve/main/config.json",
- "xlm-clm-enfr-1024": "https://huggingface.co/xlm-clm-enfr-1024/resolve/main/config.json",
- "xlm-clm-ende-1024": "https://huggingface.co/xlm-clm-ende-1024/resolve/main/config.json",
- "xlm-mlm-17-1280": "https://huggingface.co/xlm-mlm-17-1280/resolve/main/config.json",
- "xlm-mlm-100-1280": "https://huggingface.co/xlm-mlm-100-1280/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-en-2048": "https://huggingface.co/FacebookAI/xlm-mlm-en-2048/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-ende-1024": "https://huggingface.co/FacebookAI/xlm-mlm-ende-1024/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enfr-1024/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-enro-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enro-1024/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-tlm-xnli15-1024/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-xnli15-1024/resolve/main/config.json",
+ "FacebookAI/xlm-clm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-clm-enfr-1024/resolve/main/config.json",
+ "FacebookAI/xlm-clm-ende-1024": "https://huggingface.co/FacebookAI/xlm-clm-ende-1024/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-17-1280": "https://huggingface.co/FacebookAI/xlm-mlm-17-1280/resolve/main/config.json",
+ "FacebookAI/xlm-mlm-100-1280": "https://huggingface.co/FacebookAI/xlm-mlm-100-1280/resolve/main/config.json",
}
@@ -42,7 +42,7 @@ class XLMConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`XLMModel`] or a [`TFXLMModel`]. It is used to
instantiate a XLM model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the
- [xlm-mlm-en-2048](https://huggingface.co/xlm-mlm-en-2048) architecture.
+ [FacebookAI/xlm-mlm-en-2048](https://huggingface.co/FacebookAI/xlm-mlm-en-2048) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/xlm/modeling_tf_xlm.py b/src/transformers/models/xlm/modeling_tf_xlm.py
index 63d807317b2..173f1d0acdb 100644
--- a/src/transformers/models/xlm/modeling_tf_xlm.py
+++ b/src/transformers/models/xlm/modeling_tf_xlm.py
@@ -63,20 +63,20 @@ from .configuration_xlm import XLMConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlm-mlm-en-2048"
+_CHECKPOINT_FOR_DOC = "FacebookAI/xlm-mlm-en-2048"
_CONFIG_FOR_DOC = "XLMConfig"
TF_XLM_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlm-mlm-en-2048",
- "xlm-mlm-ende-1024",
- "xlm-mlm-enfr-1024",
- "xlm-mlm-enro-1024",
- "xlm-mlm-tlm-xnli15-1024",
- "xlm-mlm-xnli15-1024",
- "xlm-clm-enfr-1024",
- "xlm-clm-ende-1024",
- "xlm-mlm-17-1280",
- "xlm-mlm-100-1280",
+ "FacebookAI/xlm-mlm-en-2048",
+ "FacebookAI/xlm-mlm-ende-1024",
+ "FacebookAI/xlm-mlm-enfr-1024",
+ "FacebookAI/xlm-mlm-enro-1024",
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024",
+ "FacebookAI/xlm-mlm-xnli15-1024",
+ "FacebookAI/xlm-clm-enfr-1024",
+ "FacebookAI/xlm-clm-ende-1024",
+ "FacebookAI/xlm-mlm-17-1280",
+ "FacebookAI/xlm-mlm-100-1280",
# See all XLM models at https://huggingface.co/models?filter=xlm
]
diff --git a/src/transformers/models/xlm/modeling_xlm.py b/src/transformers/models/xlm/modeling_xlm.py
index 2b7265489bd..de07829974d 100755
--- a/src/transformers/models/xlm/modeling_xlm.py
+++ b/src/transformers/models/xlm/modeling_xlm.py
@@ -50,20 +50,20 @@ from .configuration_xlm import XLMConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlm-mlm-en-2048"
+_CHECKPOINT_FOR_DOC = "FacebookAI/xlm-mlm-en-2048"
_CONFIG_FOR_DOC = "XLMConfig"
XLM_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlm-mlm-en-2048",
- "xlm-mlm-ende-1024",
- "xlm-mlm-enfr-1024",
- "xlm-mlm-enro-1024",
- "xlm-mlm-tlm-xnli15-1024",
- "xlm-mlm-xnli15-1024",
- "xlm-clm-enfr-1024",
- "xlm-clm-ende-1024",
- "xlm-mlm-17-1280",
- "xlm-mlm-100-1280",
+ "FacebookAI/xlm-mlm-en-2048",
+ "FacebookAI/xlm-mlm-ende-1024",
+ "FacebookAI/xlm-mlm-enfr-1024",
+ "FacebookAI/xlm-mlm-enro-1024",
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024",
+ "FacebookAI/xlm-mlm-xnli15-1024",
+ "FacebookAI/xlm-clm-enfr-1024",
+ "FacebookAI/xlm-clm-ende-1024",
+ "FacebookAI/xlm-mlm-17-1280",
+ "FacebookAI/xlm-mlm-100-1280",
# See all XLM models at https://huggingface.co/models?filter=xlm
]
@@ -1030,8 +1030,8 @@ class XLMForQuestionAnswering(XLMPreTrainedModel):
>>> from transformers import AutoTokenizer, XLMForQuestionAnswering
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
- >>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
+ >>> model = XLMForQuestionAnswering.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
... 0
diff --git a/src/transformers/models/xlm/tokenization_xlm.py b/src/transformers/models/xlm/tokenization_xlm.py
index 49d22934e07..a99b5cb73c9 100644
--- a/src/transformers/models/xlm/tokenization_xlm.py
+++ b/src/transformers/models/xlm/tokenization_xlm.py
@@ -35,62 +35,62 @@ VOCAB_FILES_NAMES = {
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "xlm-mlm-en-2048": "https://huggingface.co/xlm-mlm-en-2048/resolve/main/vocab.json",
- "xlm-mlm-ende-1024": "https://huggingface.co/xlm-mlm-ende-1024/resolve/main/vocab.json",
- "xlm-mlm-enfr-1024": "https://huggingface.co/xlm-mlm-enfr-1024/resolve/main/vocab.json",
- "xlm-mlm-enro-1024": "https://huggingface.co/xlm-mlm-enro-1024/resolve/main/vocab.json",
- "xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/xlm-mlm-tlm-xnli15-1024/resolve/main/vocab.json",
- "xlm-mlm-xnli15-1024": "https://huggingface.co/xlm-mlm-xnli15-1024/resolve/main/vocab.json",
- "xlm-clm-enfr-1024": "https://huggingface.co/xlm-clm-enfr-1024/resolve/main/vocab.json",
- "xlm-clm-ende-1024": "https://huggingface.co/xlm-clm-ende-1024/resolve/main/vocab.json",
- "xlm-mlm-17-1280": "https://huggingface.co/xlm-mlm-17-1280/resolve/main/vocab.json",
- "xlm-mlm-100-1280": "https://huggingface.co/xlm-mlm-100-1280/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-en-2048": "https://huggingface.co/FacebookAI/xlm-mlm-en-2048/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-ende-1024": "https://huggingface.co/FacebookAI/xlm-mlm-ende-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enfr-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-enro-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enro-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-tlm-xnli15-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-xnli15-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-clm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-clm-enfr-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-clm-ende-1024": "https://huggingface.co/FacebookAI/xlm-clm-ende-1024/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-17-1280": "https://huggingface.co/FacebookAI/xlm-mlm-17-1280/resolve/main/vocab.json",
+ "FacebookAI/xlm-mlm-100-1280": "https://huggingface.co/FacebookAI/xlm-mlm-100-1280/resolve/main/vocab.json",
},
"merges_file": {
- "xlm-mlm-en-2048": "https://huggingface.co/xlm-mlm-en-2048/resolve/main/merges.txt",
- "xlm-mlm-ende-1024": "https://huggingface.co/xlm-mlm-ende-1024/resolve/main/merges.txt",
- "xlm-mlm-enfr-1024": "https://huggingface.co/xlm-mlm-enfr-1024/resolve/main/merges.txt",
- "xlm-mlm-enro-1024": "https://huggingface.co/xlm-mlm-enro-1024/resolve/main/merges.txt",
- "xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/xlm-mlm-tlm-xnli15-1024/resolve/main/merges.txt",
- "xlm-mlm-xnli15-1024": "https://huggingface.co/xlm-mlm-xnli15-1024/resolve/main/merges.txt",
- "xlm-clm-enfr-1024": "https://huggingface.co/xlm-clm-enfr-1024/resolve/main/merges.txt",
- "xlm-clm-ende-1024": "https://huggingface.co/xlm-clm-ende-1024/resolve/main/merges.txt",
- "xlm-mlm-17-1280": "https://huggingface.co/xlm-mlm-17-1280/resolve/main/merges.txt",
- "xlm-mlm-100-1280": "https://huggingface.co/xlm-mlm-100-1280/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-en-2048": "https://huggingface.co/FacebookAI/xlm-mlm-en-2048/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-ende-1024": "https://huggingface.co/FacebookAI/xlm-mlm-ende-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enfr-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-enro-1024": "https://huggingface.co/FacebookAI/xlm-mlm-enro-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-tlm-xnli15-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-xnli15-1024": "https://huggingface.co/FacebookAI/xlm-mlm-xnli15-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-clm-enfr-1024": "https://huggingface.co/FacebookAI/xlm-clm-enfr-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-clm-ende-1024": "https://huggingface.co/FacebookAI/xlm-clm-ende-1024/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-17-1280": "https://huggingface.co/FacebookAI/xlm-mlm-17-1280/resolve/main/merges.txt",
+ "FacebookAI/xlm-mlm-100-1280": "https://huggingface.co/FacebookAI/xlm-mlm-100-1280/resolve/main/merges.txt",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "xlm-mlm-en-2048": 512,
- "xlm-mlm-ende-1024": 512,
- "xlm-mlm-enfr-1024": 512,
- "xlm-mlm-enro-1024": 512,
- "xlm-mlm-tlm-xnli15-1024": 512,
- "xlm-mlm-xnli15-1024": 512,
- "xlm-clm-enfr-1024": 512,
- "xlm-clm-ende-1024": 512,
- "xlm-mlm-17-1280": 512,
- "xlm-mlm-100-1280": 512,
+ "FacebookAI/xlm-mlm-en-2048": 512,
+ "FacebookAI/xlm-mlm-ende-1024": 512,
+ "FacebookAI/xlm-mlm-enfr-1024": 512,
+ "FacebookAI/xlm-mlm-enro-1024": 512,
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024": 512,
+ "FacebookAI/xlm-mlm-xnli15-1024": 512,
+ "FacebookAI/xlm-clm-enfr-1024": 512,
+ "FacebookAI/xlm-clm-ende-1024": 512,
+ "FacebookAI/xlm-mlm-17-1280": 512,
+ "FacebookAI/xlm-mlm-100-1280": 512,
}
PRETRAINED_INIT_CONFIGURATION = {
- "xlm-mlm-en-2048": {"do_lowercase_and_remove_accent": True},
- "xlm-mlm-ende-1024": {
+ "FacebookAI/xlm-mlm-en-2048": {"do_lowercase_and_remove_accent": True},
+ "FacebookAI/xlm-mlm-ende-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {0: "de", 1: "en"},
"lang2id": {"de": 0, "en": 1},
},
- "xlm-mlm-enfr-1024": {
+ "FacebookAI/xlm-mlm-enfr-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {0: "en", 1: "fr"},
"lang2id": {"en": 0, "fr": 1},
},
- "xlm-mlm-enro-1024": {
+ "FacebookAI/xlm-mlm-enro-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {0: "en", 1: "ro"},
"lang2id": {"en": 0, "ro": 1},
},
- "xlm-mlm-tlm-xnli15-1024": {
+ "FacebookAI/xlm-mlm-tlm-xnli15-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {
0: "ar",
@@ -127,7 +127,7 @@ PRETRAINED_INIT_CONFIGURATION = {
"zh": 14,
},
},
- "xlm-mlm-xnli15-1024": {
+ "FacebookAI/xlm-mlm-xnli15-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {
0: "ar",
@@ -164,17 +164,17 @@ PRETRAINED_INIT_CONFIGURATION = {
"zh": 14,
},
},
- "xlm-clm-enfr-1024": {
+ "FacebookAI/xlm-clm-enfr-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {0: "en", 1: "fr"},
"lang2id": {"en": 0, "fr": 1},
},
- "xlm-clm-ende-1024": {
+ "FacebookAI/xlm-clm-ende-1024": {
"do_lowercase_and_remove_accent": True,
"id2lang": {0: "de", 1: "en"},
"lang2id": {"de": 0, "en": 1},
},
- "xlm-mlm-17-1280": {
+ "FacebookAI/xlm-mlm-17-1280": {
"do_lowercase_and_remove_accent": False,
"id2lang": {
0: "ar",
@@ -215,7 +215,7 @@ PRETRAINED_INIT_CONFIGURATION = {
"zh": 16,
},
},
- "xlm-mlm-100-1280": {
+ "FacebookAI/xlm-mlm-100-1280": {
"do_lowercase_and_remove_accent": False,
"id2lang": {
0: "af",
@@ -512,7 +512,7 @@ def remove_non_printing_char(text):
def romanian_preprocessing(text):
- """Sennrich's WMT16 scripts for Romanian preprocessing, used by model `xlm-mlm-enro-1024`"""
+ """Sennrich's WMT16 scripts for Romanian preprocessing, used by model `FacebookAI/xlm-mlm-enro-1024`"""
# https://github.com/rsennrich/wmt16-scripts/blob/master/preprocess/normalise-romanian.py
text = text.replace("\u015e", "\u0218").replace("\u015f", "\u0219")
text = text.replace("\u0162", "\u021a").replace("\u0163", "\u021b")
@@ -807,7 +807,7 @@ class XLMTokenizer(PreTrainedTokenizer):
text = text.split()
elif lang not in self.lang_with_custom_tokenizer:
text = self.moses_pipeline(text, lang=lang)
- # TODO: make sure we are using `xlm-mlm-enro-1024`, since XLM-100 doesn't have this step
+ # TODO: make sure we are using `FacebookAI/xlm-mlm-enro-1024`, since XLM-100 doesn't have this step
if lang == "ro":
text = romanian_preprocessing(text)
text = self.moses_tokenize(text, lang=lang)
diff --git a/src/transformers/models/xlm_prophetnet/modeling_xlm_prophetnet.py b/src/transformers/models/xlm_prophetnet/modeling_xlm_prophetnet.py
index 37bd32186af..e705b95b177 100644
--- a/src/transformers/models/xlm_prophetnet/modeling_xlm_prophetnet.py
+++ b/src/transformers/models/xlm_prophetnet/modeling_xlm_prophetnet.py
@@ -2216,10 +2216,10 @@ class XLMProphetNetForCausalLM(XLMProphetNetPreTrainedModel):
>>> from transformers import BertTokenizer, EncoderDecoderModel, AutoTokenizer
>>> import torch
- >>> tokenizer_enc = BertTokenizer.from_pretrained("bert-large-uncased")
+ >>> tokenizer_enc = BertTokenizer.from_pretrained("google-bert/bert-large-uncased")
>>> tokenizer_dec = AutoTokenizer.from_pretrained("patrickvonplaten/xprophetnet-large-uncased-standalone")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained(
- ... "bert-large-uncased", "patrickvonplaten/xprophetnet-large-uncased-standalone"
+ ... "google-bert/bert-large-uncased", "patrickvonplaten/xprophetnet-large-uncased-standalone"
... )
>>> ARTICLE = (
diff --git a/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py b/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
index 517b751f422..65c536ba437 100644
--- a/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
@@ -25,19 +25,19 @@ from ...utils import logging
logger = logging.get_logger(__name__)
XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/config.json",
- "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/config.json",
- "xlm-roberta-large-finetuned-conll02-dutch": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/config.json"
+ "FacebookAI/xlm-roberta-base": "https://huggingface.co/FacebookAI/xlm-roberta-base/resolve/main/config.json",
+ "FacebookAI/xlm-roberta-large": "https://huggingface.co/FacebookAI/xlm-roberta-large/resolve/main/config.json",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/config.json"
),
- "xlm-roberta-large-finetuned-conll02-spanish": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/config.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/config.json"
),
- "xlm-roberta-large-finetuned-conll03-english": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/config.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english/resolve/main/config.json"
),
- "xlm-roberta-large-finetuned-conll03-german": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/config.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german/resolve/main/config.json"
),
}
@@ -47,7 +47,7 @@ class XLMRobertaConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`XLMRobertaModel`] or a [`TFXLMRobertaModel`]. It
is used to instantiate a XLM-RoBERTa model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the XLMRoBERTa
- [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) architecture.
+ [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
@@ -101,10 +101,10 @@ class XLMRobertaConfig(PretrainedConfig):
```python
>>> from transformers import XLMRobertaConfig, XLMRobertaModel
- >>> # Initializing a XLM-RoBERTa xlm-roberta-base style configuration
+ >>> # Initializing a XLM-RoBERTa FacebookAI/xlm-roberta-base style configuration
>>> configuration = XLMRobertaConfig()
- >>> # Initializing a model (with random weights) from the xlm-roberta-base style configuration
+ >>> # Initializing a model (with random weights) from the FacebookAI/xlm-roberta-base style configuration
>>> model = XLMRobertaModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py b/src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py
index e8247b3f28d..0017be6bd8c 100644
--- a/src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py
@@ -46,14 +46,14 @@ from .configuration_xlm_roberta import XLMRobertaConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlm-roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/xlm-roberta-base"
_CONFIG_FOR_DOC = "XLMRobertaConfig"
remat = nn_partitioning.remat
FLAX_XLM_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlm-roberta-base",
- "xlm-roberta-large",
+ "FacebookAI/xlm-roberta-base",
+ "FacebookAI/xlm-roberta-large",
# See all XLM-RoBERTa models at https://huggingface.co/models?filter=xlm-roberta
]
diff --git a/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py b/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py
index c33f12298a2..dcf1b018b2a 100644
--- a/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/modeling_tf_xlm_roberta.py
@@ -64,12 +64,12 @@ logger = logging.get_logger(__name__)
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlm-roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/xlm-roberta-base"
_CONFIG_FOR_DOC = "XLMRobertaConfig"
TF_XLM_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlm-roberta-base",
- "xlm-roberta-large",
+ "FacebookAI/xlm-roberta-base",
+ "FacebookAI/xlm-roberta-large",
"joeddav/xlm-roberta-large-xnli",
"cardiffnlp/twitter-xlm-roberta-base-sentiment",
# See all XLM-RoBERTa models at https://huggingface.co/models?filter=xlm-roberta
diff --git a/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py b/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py
index 95ea2e7dca7..8abd77b8c30 100644
--- a/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/modeling_xlm_roberta.py
@@ -48,16 +48,16 @@ from .configuration_xlm_roberta import XLMRobertaConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlm-roberta-base"
+_CHECKPOINT_FOR_DOC = "FacebookAI/xlm-roberta-base"
_CONFIG_FOR_DOC = "XLMRobertaConfig"
XLM_ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlm-roberta-base",
- "xlm-roberta-large",
- "xlm-roberta-large-finetuned-conll02-dutch",
- "xlm-roberta-large-finetuned-conll02-spanish",
- "xlm-roberta-large-finetuned-conll03-english",
- "xlm-roberta-large-finetuned-conll03-german",
+ "FacebookAI/xlm-roberta-base",
+ "FacebookAI/xlm-roberta-large",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish",
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english",
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german",
# See all XLM-RoBERTa models at https://huggingface.co/models?filter=xlm-roberta
]
@@ -940,10 +940,10 @@ class XLMRobertaForCausalLM(XLMRobertaPreTrainedModel):
>>> from transformers import AutoTokenizer, XLMRobertaForCausalLM, AutoConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("roberta-base")
- >>> config = AutoConfig.from_pretrained("roberta-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
+ >>> config = AutoConfig.from_pretrained("FacebookAI/roberta-base")
>>> config.is_decoder = True
- >>> model = XLMRobertaForCausalLM.from_pretrained("roberta-base", config=config)
+ >>> model = XLMRobertaForCausalLM.from_pretrained("FacebookAI/roberta-base", config=config)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
diff --git a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py
index f704d136fae..3f87bd9b0dd 100644
--- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py
@@ -33,30 +33,30 @@ VOCAB_FILES_NAMES = {"vocab_file": "sentencepiece.bpe.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/sentencepiece.bpe.model",
- "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/sentencepiece.bpe.model",
- "xlm-roberta-large-finetuned-conll02-dutch": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-base": "https://huggingface.co/FacebookAI/xlm-roberta-base/resolve/main/sentencepiece.bpe.model",
+ "FacebookAI/xlm-roberta-large": "https://huggingface.co/FacebookAI/xlm-roberta-large/resolve/main/sentencepiece.bpe.model",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll02-spanish": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll03-english": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll03-german": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german/resolve/main/sentencepiece.bpe.model"
),
}
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "xlm-roberta-base": 512,
- "xlm-roberta-large": 512,
- "xlm-roberta-large-finetuned-conll02-dutch": 512,
- "xlm-roberta-large-finetuned-conll02-spanish": 512,
- "xlm-roberta-large-finetuned-conll03-english": 512,
- "xlm-roberta-large-finetuned-conll03-german": 512,
+ "FacebookAI/xlm-roberta-base": 512,
+ "FacebookAI/xlm-roberta-large": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": 512,
}
diff --git a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py
index 41079e29d8c..8f2c1e02a0a 100644
--- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py
+++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py
@@ -36,46 +36,46 @@ VOCAB_FILES_NAMES = {"vocab_file": "sentencepiece.bpe.model", "tokenizer_file":
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/sentencepiece.bpe.model",
- "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/sentencepiece.bpe.model",
- "xlm-roberta-large-finetuned-conll02-dutch": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-base": "https://huggingface.co/FacebookAI/xlm-roberta-base/resolve/main/sentencepiece.bpe.model",
+ "FacebookAI/xlm-roberta-large": "https://huggingface.co/FacebookAI/xlm-roberta-large/resolve/main/sentencepiece.bpe.model",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll02-spanish": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll03-english": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english/resolve/main/sentencepiece.bpe.model"
),
- "xlm-roberta-large-finetuned-conll03-german": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/sentencepiece.bpe.model"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german/resolve/main/sentencepiece.bpe.model"
),
},
"tokenizer_file": {
- "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/tokenizer.json",
- "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/tokenizer.json",
- "xlm-roberta-large-finetuned-conll02-dutch": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/tokenizer.json"
+ "FacebookAI/xlm-roberta-base": "https://huggingface.co/FacebookAI/xlm-roberta-base/resolve/main/tokenizer.json",
+ "FacebookAI/xlm-roberta-large": "https://huggingface.co/FacebookAI/xlm-roberta-large/resolve/main/tokenizer.json",
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/tokenizer.json"
),
- "xlm-roberta-large-finetuned-conll02-spanish": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/tokenizer.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/tokenizer.json"
),
- "xlm-roberta-large-finetuned-conll03-english": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/tokenizer.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english/resolve/main/tokenizer.json"
),
- "xlm-roberta-large-finetuned-conll03-german": (
- "https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/tokenizer.json"
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": (
+ "https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german/resolve/main/tokenizer.json"
),
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "xlm-roberta-base": 512,
- "xlm-roberta-large": 512,
- "xlm-roberta-large-finetuned-conll02-dutch": 512,
- "xlm-roberta-large-finetuned-conll02-spanish": 512,
- "xlm-roberta-large-finetuned-conll03-english": 512,
- "xlm-roberta-large-finetuned-conll03-german": 512,
+ "FacebookAI/xlm-roberta-base": 512,
+ "FacebookAI/xlm-roberta-large": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-dutch": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll02-spanish": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-english": 512,
+ "FacebookAI/xlm-roberta-large-finetuned-conll03-german": 512,
}
diff --git a/src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py b/src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py
index e2dee1cbe4e..acb9c630970 100644
--- a/src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py
+++ b/src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py
@@ -88,10 +88,10 @@ class XLMRobertaXLConfig(PretrainedConfig):
```python
>>> from transformers import XLMRobertaXLConfig, XLMRobertaXLModel
- >>> # Initializing a XLM_ROBERTA_XL bert-base-uncased style configuration
+ >>> # Initializing a XLM_ROBERTA_XL google-bert/bert-base-uncased style configuration
>>> configuration = XLMRobertaXLConfig()
- >>> # Initializing a model (with random weights) from the bert-base-uncased style configuration
+ >>> # Initializing a model (with random weights) from the google-bert/bert-base-uncased style configuration
>>> model = XLMRobertaXLModel(configuration)
>>> # Accessing the model configuration
diff --git a/src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py b/src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py
index 48bb28bf4ee..2799752ca4b 100644
--- a/src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py
+++ b/src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py
@@ -906,10 +906,10 @@ class XLMRobertaXLForCausalLM(XLMRobertaXLPreTrainedModel):
>>> from transformers import AutoTokenizer, RobertaForCausalLM, RobertaConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("roberta-base")
- >>> config = RobertaConfig.from_pretrained("roberta-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
+ >>> config = RobertaConfig.from_pretrained("FacebookAI/roberta-base")
>>> config.is_decoder = True
- >>> model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)
+ >>> model = RobertaForCausalLM.from_pretrained("FacebookAI/roberta-base", config=config)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
>>> prediction_logits = outputs.logits
diff --git a/src/transformers/models/xlnet/configuration_xlnet.py b/src/transformers/models/xlnet/configuration_xlnet.py
index 9ebc1f8bb9f..8528bb06394 100644
--- a/src/transformers/models/xlnet/configuration_xlnet.py
+++ b/src/transformers/models/xlnet/configuration_xlnet.py
@@ -24,8 +24,8 @@ from ...utils import logging
logger = logging.get_logger(__name__)
XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP = {
- "xlnet-base-cased": "https://huggingface.co/xlnet-base-cased/resolve/main/config.json",
- "xlnet-large-cased": "https://huggingface.co/xlnet-large-cased/resolve/main/config.json",
+ "xlnet/xlnet-base-cased": "https://huggingface.co/xlnet/xlnet-base-cased/resolve/main/config.json",
+ "xlnet/xlnet-large-cased": "https://huggingface.co/xlnet/xlnet-large-cased/resolve/main/config.json",
}
@@ -34,7 +34,7 @@ class XLNetConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`XLNetModel`] or a [`TFXLNetModel`]. It is used to
instantiate a XLNet model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the
- [xlnet-large-cased](https://huggingface.co/xlnet-large-cased) architecture.
+ [xlnet/xlnet-large-cased](https://huggingface.co/xlnet/xlnet-large-cased) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
diff --git a/src/transformers/models/xlnet/modeling_tf_xlnet.py b/src/transformers/models/xlnet/modeling_tf_xlnet.py
index 9bf26872f80..598af1b707a 100644
--- a/src/transformers/models/xlnet/modeling_tf_xlnet.py
+++ b/src/transformers/models/xlnet/modeling_tf_xlnet.py
@@ -57,12 +57,12 @@ from .configuration_xlnet import XLNetConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlnet-base-cased"
+_CHECKPOINT_FOR_DOC = "xlnet/xlnet-base-cased"
_CONFIG_FOR_DOC = "XLNetConfig"
TF_XLNET_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlnet-base-cased",
- "xlnet-large-cased",
+ "xlnet/xlnet-base-cased",
+ "xlnet/xlnet-large-cased",
# See all XLNet models at https://huggingface.co/models?filter=xlnet
]
@@ -1325,8 +1325,8 @@ class TFXLNetLMHeadModel(TFXLNetPreTrainedModel, TFCausalLanguageModelingLoss):
>>> import numpy as np
>>> from transformers import AutoTokenizer, TFXLNetLMHeadModel
- >>> tokenizer = AutoTokenizer.from_pretrained("xlnet-large-cased")
- >>> model = TFXLNetLMHeadModel.from_pretrained("xlnet-large-cased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("xlnet/xlnet-large-cased")
+ >>> model = TFXLNetLMHeadModel.from_pretrained("xlnet/xlnet-large-cased")
>>> # We show how to setup inputs to predict a next token using a bi-directional context.
>>> input_ids = tf.constant(tokenizer.encode("Hello, my dog is very ", add_special_tokens=True))[
diff --git a/src/transformers/models/xlnet/modeling_xlnet.py b/src/transformers/models/xlnet/modeling_xlnet.py
index c987c1e187a..6def87ef07b 100755
--- a/src/transformers/models/xlnet/modeling_xlnet.py
+++ b/src/transformers/models/xlnet/modeling_xlnet.py
@@ -40,12 +40,12 @@ from .configuration_xlnet import XLNetConfig
logger = logging.get_logger(__name__)
-_CHECKPOINT_FOR_DOC = "xlnet-base-cased"
+_CHECKPOINT_FOR_DOC = "xlnet/xlnet-base-cased"
_CONFIG_FOR_DOC = "XLNetConfig"
XLNET_PRETRAINED_MODEL_ARCHIVE_LIST = [
- "xlnet-base-cased",
- "xlnet-large-cased",
+ "xlnet/xlnet-base-cased",
+ "xlnet/xlnet-large-cased",
# See all XLNet models at https://huggingface.co/models?filter=xlnet
]
@@ -1393,8 +1393,8 @@ class XLNetLMHeadModel(XLNetPreTrainedModel):
>>> from transformers import AutoTokenizer, XLNetLMHeadModel
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("xlnet-large-cased")
- >>> model = XLNetLMHeadModel.from_pretrained("xlnet-large-cased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("xlnet/xlnet-large-cased")
+ >>> model = XLNetLMHeadModel.from_pretrained("xlnet/xlnet-large-cased")
>>> # We show how to setup inputs to predict a next token using a bi-directional context.
>>> input_ids = torch.tensor(
@@ -1970,8 +1970,8 @@ class XLNetForQuestionAnswering(XLNetPreTrainedModel):
>>> from transformers import AutoTokenizer, XLNetForQuestionAnswering
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
- >>> model = XLNetForQuestionAnswering.from_pretrained("xlnet-base-cased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("xlnet/xlnet-base-cased")
+ >>> model = XLNetForQuestionAnswering.from_pretrained("xlnet/xlnet-base-cased")
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
... 0
diff --git a/src/transformers/models/xlnet/tokenization_xlnet.py b/src/transformers/models/xlnet/tokenization_xlnet.py
index adc201abb96..808a7ff5bfc 100644
--- a/src/transformers/models/xlnet/tokenization_xlnet.py
+++ b/src/transformers/models/xlnet/tokenization_xlnet.py
@@ -32,14 +32,14 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "xlnet-base-cased": "https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model",
- "xlnet-large-cased": "https://huggingface.co/xlnet-large-cased/resolve/main/spiece.model",
+ "xlnet/xlnet-base-cased": "https://huggingface.co/xlnet/xlnet-base-cased/resolve/main/spiece.model",
+ "xlnet/xlnet-large-cased": "https://huggingface.co/xlnet/xlnet-large-cased/resolve/main/spiece.model",
}
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "xlnet-base-cased": None,
- "xlnet-large-cased": None,
+ "xlnet/xlnet-base-cased": None,
+ "xlnet/xlnet-large-cased": None,
}
# Segments (not really needed)
diff --git a/src/transformers/models/xlnet/tokenization_xlnet_fast.py b/src/transformers/models/xlnet/tokenization_xlnet_fast.py
index 589675f0062..c43016a1a77 100644
--- a/src/transformers/models/xlnet/tokenization_xlnet_fast.py
+++ b/src/transformers/models/xlnet/tokenization_xlnet_fast.py
@@ -36,18 +36,18 @@ VOCAB_FILES_NAMES = {"vocab_file": "spiece.model", "tokenizer_file": "tokenizer.
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
- "xlnet-base-cased": "https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model",
- "xlnet-large-cased": "https://huggingface.co/xlnet-large-cased/resolve/main/spiece.model",
+ "xlnet/xlnet-base-cased": "https://huggingface.co/xlnet/xlnet-base-cased/resolve/main/spiece.model",
+ "xlnet/xlnet-large-cased": "https://huggingface.co/xlnet/xlnet-large-cased/resolve/main/spiece.model",
},
"tokenizer_file": {
- "xlnet-base-cased": "https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json",
- "xlnet-large-cased": "https://huggingface.co/xlnet-large-cased/resolve/main/tokenizer.json",
+ "xlnet/xlnet-base-cased": "https://huggingface.co/xlnet/xlnet-base-cased/resolve/main/tokenizer.json",
+ "xlnet/xlnet-large-cased": "https://huggingface.co/xlnet/xlnet-large-cased/resolve/main/tokenizer.json",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
- "xlnet-base-cased": None,
- "xlnet-large-cased": None,
+ "xlnet/xlnet-base-cased": None,
+ "xlnet/xlnet-large-cased": None,
}
SPIECE_UNDERLINE = "▁"
diff --git a/src/transformers/models/xmod/modeling_xmod.py b/src/transformers/models/xmod/modeling_xmod.py
index cb048fb85e2..ba5ba6b7271 100644
--- a/src/transformers/models/xmod/modeling_xmod.py
+++ b/src/transformers/models/xmod/modeling_xmod.py
@@ -1045,7 +1045,7 @@ class XmodForCausalLM(XmodPreTrainedModel):
>>> from transformers import AutoTokenizer, XmodForCausalLM, AutoConfig
>>> import torch
- >>> tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
+ >>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
>>> config = AutoConfig.from_pretrained("facebook/xmod-base")
>>> config.is_decoder = True
>>> model = XmodForCausalLM.from_pretrained("facebook/xmod-base", config=config)
diff --git a/src/transformers/pipelines/__init__.py b/src/transformers/pipelines/__init__.py
index 72e8b2b4aa9..8ee0137a20b 100755
--- a/src/transformers/pipelines/__init__.py
+++ b/src/transformers/pipelines/__init__.py
@@ -713,12 +713,12 @@ def pipeline(
>>> # Question answering pipeline, specifying the checkpoint identifier
>>> oracle = pipeline(
- ... "question-answering", model="distilbert/distilbert-base-cased-distilled-squad", tokenizer="bert-base-cased"
+ ... "question-answering", model="distilbert/distilbert-base-cased-distilled-squad", tokenizer="google-bert/bert-base-cased"
... )
>>> # Named entity recognition pipeline, passing in a specific model and tokenizer
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
- >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+ >>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> recognizer = pipeline("ner", model=model, tokenizer=tokenizer)
```"""
if model_kwargs is None:
diff --git a/src/transformers/pipelines/feature_extraction.py b/src/transformers/pipelines/feature_extraction.py
index 118baeccd0d..e8adb11b687 100644
--- a/src/transformers/pipelines/feature_extraction.py
+++ b/src/transformers/pipelines/feature_extraction.py
@@ -22,7 +22,7 @@ class FeatureExtractionPipeline(Pipeline):
```python
>>> from transformers import pipeline
- >>> extractor = pipeline(model="bert-base-uncased", task="feature-extraction")
+ >>> extractor = pipeline(model="google-bert/bert-base-uncased", task="feature-extraction")
>>> result = extractor("This is a simple test.", return_tensors=True)
>>> result.shape # This is a tensor of shape [1, sequence_lenth, hidden_dimension] representing the input string.
torch.Size([1, 8, 768])
diff --git a/src/transformers/pipelines/fill_mask.py b/src/transformers/pipelines/fill_mask.py
index 1d54c615ea2..a6f24082232 100644
--- a/src/transformers/pipelines/fill_mask.py
+++ b/src/transformers/pipelines/fill_mask.py
@@ -41,7 +41,7 @@ class FillMaskPipeline(Pipeline):
```python
>>> from transformers import pipeline
- >>> fill_masker = pipeline(model="bert-base-uncased")
+ >>> fill_masker = pipeline(model="google-bert/bert-base-uncased")
>>> fill_masker("This is a simple [MASK].")
[{'score': 0.042, 'token': 3291, 'token_str': 'problem', 'sequence': 'this is a simple problem.'}, {'score': 0.031, 'token': 3160, 'token_str': 'question', 'sequence': 'this is a simple question.'}, {'score': 0.03, 'token': 8522, 'token_str': 'equation', 'sequence': 'this is a simple equation.'}, {'score': 0.027, 'token': 2028, 'token_str': 'one', 'sequence': 'this is a simple one.'}, {'score': 0.024, 'token': 3627, 'token_str': 'rule', 'sequence': 'this is a simple rule.'}]
```
@@ -70,7 +70,7 @@ class FillMaskPipeline(Pipeline):
```python
>>> from transformers import pipeline
- >>> fill_masker = pipeline(model="bert-base-uncased")
+ >>> fill_masker = pipeline(model="google-bert/bert-base-uncased")
>>> tokenizer_kwargs = {"truncation": True}
>>> fill_masker(
... "This is a simple [MASK]. " + "...with a large amount of repeated text appended. " * 100,
diff --git a/src/transformers/pipelines/text2text_generation.py b/src/transformers/pipelines/text2text_generation.py
index 09f0b0c4490..bb8abdfcf7f 100644
--- a/src/transformers/pipelines/text2text_generation.py
+++ b/src/transformers/pipelines/text2text_generation.py
@@ -222,7 +222,7 @@ class SummarizationPipeline(Text2TextGenerationPipeline):
`"summarization"`.
The models that this pipeline can use are models that have been fine-tuned on a summarization task, which is
- currently, '*bart-large-cnn*', '*t5-small*', '*t5-base*', '*t5-large*', '*t5-3b*', '*t5-11b*'. See the up-to-date
+ currently, '*bart-large-cnn*', '*google-t5/t5-small*', '*google-t5/t5-base*', '*google-t5/t5-large*', '*google-t5/t5-3b*', '*google-t5/t5-11b*'. See the up-to-date
list of available models on [huggingface.co/models](https://huggingface.co/models?filter=summarization). For a list
of available parameters, see the [following
documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.generation.GenerationMixin.generate)
@@ -235,7 +235,7 @@ class SummarizationPipeline(Text2TextGenerationPipeline):
summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
# use t5 in tf
- summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")
+ summarizer = pipeline("summarization", model="google-t5/t5-base", tokenizer="google-t5/t5-base", framework="tf")
summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
```"""
diff --git a/src/transformers/pipelines/text_classification.py b/src/transformers/pipelines/text_classification.py
index 2b7717934dd..0c54fe1706c 100644
--- a/src/transformers/pipelines/text_classification.py
+++ b/src/transformers/pipelines/text_classification.py
@@ -55,7 +55,7 @@ class TextClassificationPipeline(Pipeline):
```python
>>> from transformers import pipeline
- >>> classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
+ >>> classifier = pipeline(model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")
>>> classifier("This movie is disgustingly good !")
[{'label': 'POSITIVE', 'score': 1.0}]
diff --git a/src/transformers/pipelines/text_generation.py b/src/transformers/pipelines/text_generation.py
index 839395d7fe0..ce7e180601f 100644
--- a/src/transformers/pipelines/text_generation.py
+++ b/src/transformers/pipelines/text_generation.py
@@ -31,7 +31,7 @@ class TextGenerationPipeline(Pipeline):
```python
>>> from transformers import pipeline
- >>> generator = pipeline(model="gpt2")
+ >>> generator = pipeline(model="openai-community/gpt2")
>>> generator("I can't believe you did such a ", do_sample=False)
[{'generated_text': "I can't believe you did such a icky thing to me. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I"}]
@@ -48,7 +48,7 @@ class TextGenerationPipeline(Pipeline):
`"text-generation"`.
The models that this pipeline can use are models that have been trained with an autoregressive language modeling
- objective, which includes the uni-directional models in the library (e.g. gpt2). See the list of available models
+ objective, which includes the uni-directional models in the library (e.g. openai-community/gpt2). See the list of available models
on [huggingface.co/models](https://huggingface.co/models?filter=text-generation).
"""
diff --git a/src/transformers/processing_utils.py b/src/transformers/processing_utils.py
index 30cbfddeed7..5b46d5ea4a4 100644
--- a/src/transformers/processing_utils.py
+++ b/src/transformers/processing_utils.py
@@ -432,8 +432,7 @@ class ProcessorMixin(PushToHubMixin):
This can be either:
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
- namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a feature extractor file saved using the
[`~SequenceFeatureExtractor.save_pretrained`] method, e.g., `./my_model_directory/`.
- a path or url to a saved feature extractor JSON *file*, e.g.,
diff --git a/src/transformers/quantizers/quantizer_bnb_4bit.py b/src/transformers/quantizers/quantizer_bnb_4bit.py
index 16745f756ca..6cea1b55123 100644
--- a/src/transformers/quantizers/quantizer_bnb_4bit.py
+++ b/src/transformers/quantizers/quantizer_bnb_4bit.py
@@ -204,7 +204,7 @@ class Bnb4BitHfQuantizer(HfQuantizer):
else:
new_value = param_value.to("cpu")
- # Support models using `Conv1D` in place of `nn.Linear` (e.g. gpt2) by transposing the weight matrix prior to quantization.
+ # Support models using `Conv1D` in place of `nn.Linear` (e.g. openai-community/gpt2) by transposing the weight matrix prior to quantization.
# Since weights are saved in the correct "orientation", we skip transposing when loading.
if issubclass(module.source_cls, Conv1D):
new_value = new_value.T
diff --git a/src/transformers/quantizers/quantizer_bnb_8bit.py b/src/transformers/quantizers/quantizer_bnb_8bit.py
index d41a280f89a..193da44d2c8 100644
--- a/src/transformers/quantizers/quantizer_bnb_8bit.py
+++ b/src/transformers/quantizers/quantizer_bnb_8bit.py
@@ -190,7 +190,7 @@ class Bnb8BitHfQuantizer(HfQuantizer):
"Make sure to download the latest `bitsandbytes` version. `pip install --upgrade bitsandbytes`."
)
- # Support models using `Conv1D` in place of `nn.Linear` (e.g. gpt2) by transposing the weight matrix prior to quantization.
+ # Support models using `Conv1D` in place of `nn.Linear` (e.g. openai-community/gpt2) by transposing the weight matrix prior to quantization.
# Since weights are saved in the correct "orientation", we skip transposing when loading.
if issubclass(module.source_cls, Conv1D):
if fp16_statistics is None:
diff --git a/src/transformers/testing_utils.py b/src/transformers/testing_utils.py
index 50e178fbea3..ca4b0db8b8c 100644
--- a/src/transformers/testing_utils.py
+++ b/src/transformers/testing_utils.py
@@ -1325,7 +1325,7 @@ def LoggingLevel(level):
```python
with LoggingLevel(logging.INFO):
- AutoModel.from_pretrained("gpt2") # calls logger.info() several times
+ AutoModel.from_pretrained("openai-community/gpt2") # calls logger.info() several times
```
"""
orig_level = transformers_logging.get_verbosity()
@@ -1611,7 +1611,7 @@ class TestCasePlus(unittest.TestCase):
Example:
```
- one_liner_str = 'from transformers import AutoModel; AutoModel.from_pretrained("t5-large")'
+ one_liner_str = 'from transformers import AutoModel; AutoModel.from_pretrained("google-t5/t5-large")'
max_rss = self.python_one_liner_max_rss(one_liner_str)
```
"""
diff --git a/src/transformers/tokenization_utils.py b/src/transformers/tokenization_utils.py
index 50a42b4bb5d..8f1b15c1c11 100644
--- a/src/transformers/tokenization_utils.py
+++ b/src/transformers/tokenization_utils.py
@@ -452,8 +452,8 @@ class PreTrainedTokenizer(PreTrainedTokenizerBase):
```python
# Let's see how to increase the vocabulary of Bert model and tokenizer
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
- model = BertModel.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ model = BertModel.from_pretrained("google-bert/bert-base-uncased")
num_added_toks = tokenizer.add_tokens(["new_tok1", "my_new-tok2"])
print("We have added", num_added_toks, "tokens")
diff --git a/src/transformers/tokenization_utils_base.py b/src/transformers/tokenization_utils_base.py
index d389af676fd..f4a467c32fa 100644
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -916,8 +916,8 @@ class SpecialTokensMixin:
```python
# Let's see how to add a new classification token to GPT-2
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- model = GPT2Model.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+ model = GPT2Model.from_pretrained("openai-community/gpt2")
special_tokens_dict = {"cls_token": ""}
@@ -1005,8 +1005,8 @@ class SpecialTokensMixin:
```python
# Let's see how to increase the vocabulary of Bert model and tokenizer
- tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
- model = BertModel.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-uncased")
+ model = BertModel.from_pretrained("google-bert/bert-base-uncased")
num_added_toks = tokenizer.add_tokens(["new_tok1", "my_new-tok2"])
print("We have added", num_added_toks, "tokens")
@@ -1821,8 +1821,6 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
Can be either:
- A string, the *model id* of a predefined tokenizer hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing vocabulary files required by the tokenizer, for instance saved
using the [`~tokenization_utils_base.PreTrainedTokenizerBase.save_pretrained`] method, e.g.,
`./my_model_directory/`.
@@ -1871,7 +1869,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
```python
# We can't instantiate directly the base class *PreTrainedTokenizerBase* so let's show our examples on a derived class: BertTokenizer
# Download vocabulary from huggingface.co and cache.
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# Download vocabulary from huggingface.co (user-uploaded) and cache.
tokenizer = BertTokenizer.from_pretrained("dbmdz/bert-base-german-cased")
@@ -1883,7 +1881,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
tokenizer = BertTokenizer.from_pretrained("./test/saved_model/my_vocab.txt")
# You can link tokens to special vocabulary when instantiating
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", unk_token="")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased", unk_token="")
# You should be sure '' is in the vocabulary when doing that.
# Otherwise use tokenizer.add_special_tokens({'unk_token': ''}) instead)
assert tokenizer.unk_token == ""
diff --git a/src/transformers/training_args_seq2seq.py b/src/transformers/training_args_seq2seq.py
index ccacbbb3702..88ae662570a 100644
--- a/src/transformers/training_args_seq2seq.py
+++ b/src/transformers/training_args_seq2seq.py
@@ -48,8 +48,7 @@ class Seq2SeqTrainingArguments(TrainingArguments):
Allows to load a [`~generation.GenerationConfig`] from the `from_pretrained` method. This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
- huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
- under a user or organization name, like `dbmdz/bert-base-german-cased`.
+ huggingface.co.
- a path to a *directory* containing a configuration file saved using the
[`~GenerationConfig.save_pretrained`] method, e.g., `./my_model_directory/`.
- a [`~generation.GenerationConfig`] object.
diff --git a/src/transformers/utils/hub.py b/src/transformers/utils/hub.py
index 3aa452cf27a..984fba1b6b7 100644
--- a/src/transformers/utils/hub.py
+++ b/src/transformers/utils/hub.py
@@ -332,7 +332,7 @@ def cached_file(
```python
# Download a model weight from the Hub and cache it.
- model_weights_file = cached_file("bert-base-uncased", "pytorch_model.bin")
+ model_weights_file = cached_file("google-bert/bert-base-uncased", "pytorch_model.bin")
```
"""
use_auth_token = deprecated_kwargs.pop("use_auth_token", None)
@@ -531,9 +531,9 @@ def get_file_from_repo(
```python
# Download a tokenizer configuration from huggingface.co and cache.
- tokenizer_config = get_file_from_repo("bert-base-uncased", "tokenizer_config.json")
+ tokenizer_config = get_file_from_repo("google-bert/bert-base-uncased", "tokenizer_config.json")
# This model does not have a tokenizer config so the result will be None.
- tokenizer_config = get_file_from_repo("xlm-roberta-base", "tokenizer_config.json")
+ tokenizer_config = get_file_from_repo("FacebookAI/xlm-roberta-base", "tokenizer_config.json")
```
"""
use_auth_token = deprecated_kwargs.pop("use_auth_token", None)
@@ -819,7 +819,7 @@ class PushToHubMixin:
```python
from transformers import {object_class}
- {object} = {object_class}.from_pretrained("bert-base-cased")
+ {object} = {object_class}.from_pretrained("google-bert/bert-base-cased")
# Push the {object} to your namespace with the name "my-finetuned-bert".
{object}.push_to_hub("my-finetuned-bert")
diff --git a/src/transformers/utils/quantization_config.py b/src/transformers/utils/quantization_config.py
index d2ab879f24a..d26cfca678c 100644
--- a/src/transformers/utils/quantization_config.py
+++ b/src/transformers/utils/quantization_config.py
@@ -393,8 +393,6 @@ class GPTQConfig(QuantizationConfigMixin):
The tokenizer used to process the dataset. You can pass either:
- A custom tokenizer object.
- A string, the *model id* of a predefined tokenizer hosted inside a model repo on huggingface.co.
- Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
- user or organization name, like `dbmdz/bert-base-german-cased`.
- A path to a *directory* containing vocabulary files required by the tokenizer, for instance saved
using the [`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
dataset (`Union[List[str]]`, *optional*):
diff --git a/tests/deepspeed/test_deepspeed.py b/tests/deepspeed/test_deepspeed.py
index fe623d972c8..e2d25a28316 100644
--- a/tests/deepspeed/test_deepspeed.py
+++ b/tests/deepspeed/test_deepspeed.py
@@ -70,7 +70,7 @@ set_seed(42)
# default torch.distributed port
DEFAULT_MASTER_PORT = "10999"
-T5_SMALL = "t5-small"
+T5_SMALL = "google-t5/t5-small"
T5_TINY = "patrickvonplaten/t5-tiny-random"
GPT2_TINY = "sshleifer/tiny-gpt2"
GPTJ_TINY = "hf-internal-testing/tiny-random-gptj"
diff --git a/tests/deepspeed/test_model_zoo.py b/tests/deepspeed/test_model_zoo.py
index e51fe1e7cfc..08c8b86dc07 100644
--- a/tests/deepspeed/test_model_zoo.py
+++ b/tests/deepspeed/test_model_zoo.py
@@ -50,7 +50,7 @@ DS_TESTS_DIRECTORY = dirname(os.path.abspath(__file__))
# default torch.distributed port
DEFAULT_MASTER_PORT = "10999"
-T5_SMALL = "t5-small"
+T5_SMALL = "google-t5/t5-small"
# *** Working Models ***
ALBERT_TINY = "hf-internal-testing/tiny-albert"
@@ -105,7 +105,7 @@ HUBERT_TINY = "hf-internal-testing/tiny-random-hubert"
# issues with tokenizer
CTRL_TINY = "hf-internal-testing/tiny-random-ctrl"
-TRANSFO_XL_TINY = "hf-internal-testing/tiny-random-transfo-xl" # same as ctrl
+TRANSFO_XL_TINY = "hf-internal-testing/tiny-random-transfo-xl" # same as Salesforce/ctrl
# other issues with tiny models
IBERT_TINY = "hf-internal-testing/tiny-random-ibert" # multiple issues with either mlm/qa/clas
@@ -218,9 +218,9 @@ def make_task_cmds():
"xlnet",
# "hubert", # missing tokenizer files
# "ibert", # multiple issues with either mlm/qa/clas
- # "transfo-xl", # tokenizer issues as ctrl
- # "ctrl", # tokenizer issues
- # "openai-gpt", missing model files
+ # "transfo-xl", # tokenizer issues as Salesforce/ctrl
+ # "Salesforce/ctrl", # tokenizer issues
+ # "openai-community/openai-gpt", missing model files
# "tapas", multiple issues
],
"img_clas": [
diff --git a/tests/fsdp/test_fsdp.py b/tests/fsdp/test_fsdp.py
index d883f29ed36..aa5b3537531 100644
--- a/tests/fsdp/test_fsdp.py
+++ b/tests/fsdp/test_fsdp.py
@@ -256,7 +256,7 @@ class TrainerIntegrationFSDP(TestCasePlus, TrainerIntegrationCommon):
def get_base_args(self, output_dir, num_epochs, logging_steps):
return f"""
- --model_name_or_path bert-base-cased
+ --model_name_or_path google-bert/bert-base-cased
--task_name mrpc
--output_dir {output_dir}
--overwrite_output_dir
diff --git a/tests/generation/test_configuration_utils.py b/tests/generation/test_configuration_utils.py
index dc69a673efa..7aabee4b521 100644
--- a/tests/generation/test_configuration_utils.py
+++ b/tests/generation/test_configuration_utils.py
@@ -52,7 +52,7 @@ class GenerationConfigTest(unittest.TestCase):
self.assertEqual(loaded_config.max_time, None)
def test_from_model_config(self):
- model_config = AutoConfig.from_pretrained("gpt2")
+ model_config = AutoConfig.from_pretrained("openai-community/gpt2")
generation_config_from_model = GenerationConfig.from_model_config(model_config)
default_generation_config = GenerationConfig()
diff --git a/tests/generation/test_framework_agnostic.py b/tests/generation/test_framework_agnostic.py
index 7efa4281b09..f4f13dd8d55 100644
--- a/tests/generation/test_framework_agnostic.py
+++ b/tests/generation/test_framework_agnostic.py
@@ -157,10 +157,10 @@ class GenerationIntegrationTestsMixin:
is_pt = not model_cls.__name__.startswith("TF")
articles = ["Justin Timberlake", "Michael Phelps"]
- tokenizer = AutoTokenizer.from_pretrained("distilgpt2", padding_side="left")
+ tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2", padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
- model = model_cls.from_pretrained("distilgpt2")
+ model = model_cls.from_pretrained("distilbert/distilgpt2")
input_ids = tokenizer(articles, return_tensors=return_tensors, padding=True).input_ids
if is_pt:
model = model.to(torch_device)
@@ -193,10 +193,10 @@ class GenerationIntegrationTestsMixin:
is_pt = not model_cls.__name__.startswith("TF")
articles = ["Justin Timberlake", "Michael Phelps"]
- tokenizer = AutoTokenizer.from_pretrained("distilgpt2", padding_side="left")
+ tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2", padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
- model = model_cls.from_pretrained("distilgpt2")
+ model = model_cls.from_pretrained("distilbert/distilgpt2")
input_ids = tokenizer(articles, return_tensors=return_tensors, padding=True).input_ids
if is_pt:
model = model.to(torch_device)
@@ -375,7 +375,7 @@ class GenerationIntegrationTestsMixin:
is_pt = not model_cls.__name__.startswith("TF")
input_ids = create_tensor_fn(2 * [[822, 10, 571, 33, 25, 58, 2625, 10, 27, 141, 3, 9, 307, 239, 6, 1]])
- model = model_cls.from_pretrained("t5-small")
+ model = model_cls.from_pretrained("google-t5/t5-small")
if is_pt:
model = model.to(torch_device)
input_ids = input_ids.to(torch_device)
diff --git a/tests/generation/test_streamers.py b/tests/generation/test_streamers.py
index 361f39e03e0..c82a5e99e0d 100644
--- a/tests/generation/test_streamers.py
+++ b/tests/generation/test_streamers.py
@@ -89,8 +89,8 @@ class StreamerTester(unittest.TestCase):
# Tests that we can pass `decode_kwargs` to the streamer to control how the tokens are decoded. Must be tested
# with actual models -- the dummy models' tokenizers are not aligned with their models, and
# `skip_special_tokens=True` has no effect on them
- tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
- model = AutoModelForCausalLM.from_pretrained("distilgpt2").to(torch_device)
+ tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
+ model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2").to(torch_device)
model.config.eos_token_id = -1
input_ids = torch.ones((1, 5), device=torch_device).long() * model.config.bos_token_id
diff --git a/tests/generation/test_utils.py b/tests/generation/test_utils.py
index 4a13487cf89..c91ff7993a1 100644
--- a/tests/generation/test_utils.py
+++ b/tests/generation/test_utils.py
@@ -2840,8 +2840,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
self.assertTrue(torch.allclose(transition_scores_sum, outputs.sequences_scores, atol=1e-3))
def test_beam_search_low_memory(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- model = AutoModelForCausalLM.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+ model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
model_inputs = tokenizer("I", return_tensors="pt")["input_ids"]
@@ -2857,8 +2857,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
# PT-only test: TF doesn't have a BeamSearchScorer
# exactly the example provided in the docstrings of beam search, which previously
# failed after directly copying from it. Refer to PR #15555
- tokenizer = AutoTokenizer.from_pretrained("t5-base")
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
encoder_input_str = "translate English to German: How old are you?"
encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
@@ -2898,8 +2898,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_constrained_beam_search(self):
# PT-only test: TF doesn't have constrained beam search
- model = GPT2LMHeadModel.from_pretrained("gpt2").to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2").to(torch_device)
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
force_tokens = tokenizer("scared", add_prefix_space=True, add_special_tokens=False).input_ids
force_tokens_2 = tokenizer("big weapons", add_prefix_space=True, add_special_tokens=False).input_ids
@@ -2936,8 +2936,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_constrained_beam_search_mixed(self):
# PT-only test: TF doesn't have constrained beam search
- model = GPT2LMHeadModel.from_pretrained("gpt2").to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2").to(torch_device)
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
force_phrase = tokenizer("scared", add_prefix_space=True, add_special_tokens=False).input_ids
flexible_phrases = tokenizer(
@@ -2977,8 +2977,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_constrained_beam_search_mixed_mixin(self):
# PT-only test: TF doesn't have constrained beam search
- model = GPT2LMHeadModel.from_pretrained("gpt2").to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2").to(torch_device)
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
force_word = "scared"
force_flexible = ["scream", "screams", "screaming", "screamed"]
@@ -3014,8 +3014,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_cfg_mixin(self):
- model = GPT2LMHeadModel.from_pretrained("gpt2").to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2").to(torch_device)
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
input = tokenizer(["The dragon flew over Paris,"], return_tensors="pt", return_attention_mask=True)
input["input_ids"] = input["input_ids"].to(torch_device)
@@ -3055,8 +3055,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_constrained_beam_search_example_translation_mixin(self):
# PT-only test: TF doesn't have constrained beam search
- tokenizer = AutoTokenizer.from_pretrained("t5-base")
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
encoder_input_str = "translate English to German: How old are you?"
force_words = ["sind"]
@@ -3080,8 +3080,8 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
@slow
def test_constrained_beam_search_example_integration(self):
# PT-only test: TF doesn't have constrained beam search
- tokenizer = AutoTokenizer.from_pretrained("t5-base")
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-base")
encoder_input_str = "translate English to German: How old are you?"
encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
diff --git a/tests/models/albert/test_modeling_albert.py b/tests/models/albert/test_modeling_albert.py
index 75c84ad0d3d..823315bc678 100644
--- a/tests/models/albert/test_modeling_albert.py
+++ b/tests/models/albert/test_modeling_albert.py
@@ -331,7 +331,7 @@ class AlbertModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
class AlbertModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_no_head_absolute_embedding(self):
- model = AlbertModel.from_pretrained("albert-base-v2")
+ model = AlbertModel.from_pretrained("albert/albert-base-v2")
input_ids = torch.tensor([[0, 345, 232, 328, 740, 140, 1695, 69, 6078, 1588, 2]])
attention_mask = torch.tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
with torch.no_grad():
diff --git a/tests/models/albert/test_modeling_flax_albert.py b/tests/models/albert/test_modeling_flax_albert.py
index 0bdc8065bce..956de9ebdc9 100644
--- a/tests/models/albert/test_modeling_flax_albert.py
+++ b/tests/models/albert/test_modeling_flax_albert.py
@@ -139,7 +139,7 @@ class FlaxAlbertModelTest(FlaxModelTesterMixin, unittest.TestCase):
@slow
def test_model_from_pretrained(self):
for model_class_name in self.all_model_classes:
- model = model_class_name.from_pretrained("albert-base-v2")
+ model = model_class_name.from_pretrained("albert/albert-base-v2")
outputs = model(np.ones((1, 1)))
self.assertIsNotNone(outputs)
@@ -148,7 +148,7 @@ class FlaxAlbertModelTest(FlaxModelTesterMixin, unittest.TestCase):
class FlaxAlbertModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_no_head_absolute_embedding(self):
- model = FlaxAlbertModel.from_pretrained("albert-base-v2")
+ model = FlaxAlbertModel.from_pretrained("albert/albert-base-v2")
input_ids = np.array([[0, 345, 232, 328, 740, 140, 1695, 69, 6078, 1588, 2]])
attention_mask = np.array([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
output = model(input_ids, attention_mask=attention_mask)[0]
diff --git a/tests/models/albert/test_modeling_tf_albert.py b/tests/models/albert/test_modeling_tf_albert.py
index 7314eb4749a..7bea29fa9cb 100644
--- a/tests/models/albert/test_modeling_tf_albert.py
+++ b/tests/models/albert/test_modeling_tf_albert.py
@@ -311,7 +311,7 @@ class TFAlbertModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.TestCa
class TFAlbertModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_masked_lm(self):
- model = TFAlbertForPreTraining.from_pretrained("albert-base-v2")
+ model = TFAlbertForPreTraining.from_pretrained("albert/albert-base-v2")
input_ids = tf.constant([[0, 1, 2, 3, 4, 5]])
output = model(input_ids)[0]
diff --git a/tests/models/albert/test_tokenization_albert.py b/tests/models/albert/test_tokenization_albert.py
index d9bb86bf299..343cba168f2 100644
--- a/tests/models/albert/test_tokenization_albert.py
+++ b/tests/models/albert/test_tokenization_albert.py
@@ -127,6 +127,6 @@ class AlbertTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.tokenizer_integration_test_util(
expected_encoding=expected_encoding,
- model_name="albert-base-v2",
+ model_name="albert/albert-base-v2",
revision="6b6560eaf5ff2e250b00c50f380c5389a9c2d82e",
)
diff --git a/tests/models/auto/test_configuration_auto.py b/tests/models/auto/test_configuration_auto.py
index fa05952d29a..8b202b90921 100644
--- a/tests/models/auto/test_configuration_auto.py
+++ b/tests/models/auto/test_configuration_auto.py
@@ -46,7 +46,7 @@ class AutoConfigTest(unittest.TestCase):
self.assertIsNotNone(importlib.util.find_spec("transformers.models.auto"))
def test_config_from_model_shortcut(self):
- config = AutoConfig.from_pretrained("bert-base-uncased")
+ config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
self.assertIsInstance(config, BertConfig)
def test_config_model_type_from_local_file(self):
diff --git a/tests/models/auto/test_modeling_flax_auto.py b/tests/models/auto/test_modeling_flax_auto.py
index 5880551f54d..8880972e044 100644
--- a/tests/models/auto/test_modeling_flax_auto.py
+++ b/tests/models/auto/test_modeling_flax_auto.py
@@ -30,7 +30,7 @@ if is_flax_available():
class FlaxAutoModelTest(unittest.TestCase):
@slow
def test_bert_from_pretrained(self):
- for model_name in ["bert-base-cased", "bert-large-uncased"]:
+ for model_name in ["google-bert/bert-base-cased", "google-bert/bert-large-uncased"]:
with self.subTest(model_name):
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
@@ -42,7 +42,7 @@ class FlaxAutoModelTest(unittest.TestCase):
@slow
def test_roberta_from_pretrained(self):
- for model_name in ["roberta-base", "roberta-large"]:
+ for model_name in ["FacebookAI/roberta-base", "FacebookAI/roberta-large"]:
with self.subTest(model_name):
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
@@ -54,7 +54,7 @@ class FlaxAutoModelTest(unittest.TestCase):
@slow
def test_bert_jax_jit(self):
- for model_name in ["bert-base-cased", "bert-large-uncased"]:
+ for model_name in ["google-bert/bert-base-cased", "google-bert/bert-large-uncased"]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = FlaxBertModel.from_pretrained(model_name)
tokens = tokenizer("Do you support jax jitted function?", return_tensors=TensorType.JAX)
@@ -67,7 +67,7 @@ class FlaxAutoModelTest(unittest.TestCase):
@slow
def test_roberta_jax_jit(self):
- for model_name in ["roberta-base", "roberta-large"]:
+ for model_name in ["FacebookAI/roberta-base", "FacebookAI/roberta-large"]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = FlaxRobertaModel.from_pretrained(model_name)
tokens = tokenizer("Do you support jax jitted function?", return_tensors=TensorType.JAX)
diff --git a/tests/models/auto/test_modeling_tf_auto.py b/tests/models/auto/test_modeling_tf_auto.py
index 9c284a78aee..e0758610871 100644
--- a/tests/models/auto/test_modeling_tf_auto.py
+++ b/tests/models/auto/test_modeling_tf_auto.py
@@ -85,7 +85,7 @@ if is_tf_available():
class TFAutoModelTest(unittest.TestCase):
@slow
def test_model_from_pretrained(self):
- model_name = "bert-base-cased"
+ model_name = "google-bert/bert-base-cased"
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -96,7 +96,7 @@ class TFAutoModelTest(unittest.TestCase):
@slow
def test_model_for_pretraining_from_pretrained(self):
- model_name = "bert-base-cased"
+ model_name = "google-bert/bert-base-cased"
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -155,7 +155,7 @@ class TFAutoModelTest(unittest.TestCase):
@slow
def test_sequence_classification_model_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -167,7 +167,7 @@ class TFAutoModelTest(unittest.TestCase):
@slow
def test_question_answering_model_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
diff --git a/tests/models/auto/test_modeling_tf_pytorch.py b/tests/models/auto/test_modeling_tf_pytorch.py
index 3e213f29562..77b19a8e3a7 100644
--- a/tests/models/auto/test_modeling_tf_pytorch.py
+++ b/tests/models/auto/test_modeling_tf_pytorch.py
@@ -75,7 +75,7 @@ class TFPTAutoModelTest(unittest.TestCase):
@slow
def test_model_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -91,7 +91,7 @@ class TFPTAutoModelTest(unittest.TestCase):
@slow
def test_model_for_pretraining_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -185,7 +185,7 @@ class TFPTAutoModelTest(unittest.TestCase):
@slow
def test_sequence_classification_model_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
@@ -201,7 +201,7 @@ class TFPTAutoModelTest(unittest.TestCase):
@slow
def test_question_answering_model_from_pretrained(self):
# for model_name in TF_BERT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
- for model_name in ["bert-base-uncased"]:
+ for model_name in ["google-bert/bert-base-uncased"]:
config = AutoConfig.from_pretrained(model_name)
self.assertIsNotNone(config)
self.assertIsInstance(config, BertConfig)
diff --git a/tests/models/auto/test_tokenization_auto.py b/tests/models/auto/test_tokenization_auto.py
index 597c995b6e3..8ebf834f12a 100644
--- a/tests/models/auto/test_tokenization_auto.py
+++ b/tests/models/auto/test_tokenization_auto.py
@@ -176,12 +176,14 @@ class AutoTokenizerTest(unittest.TestCase):
@require_tokenizers
def test_from_pretrained_use_fast_toggle(self):
- self.assertIsInstance(AutoTokenizer.from_pretrained("bert-base-cased", use_fast=False), BertTokenizer)
- self.assertIsInstance(AutoTokenizer.from_pretrained("bert-base-cased"), BertTokenizerFast)
+ self.assertIsInstance(
+ AutoTokenizer.from_pretrained("google-bert/bert-base-cased", use_fast=False), BertTokenizer
+ )
+ self.assertIsInstance(AutoTokenizer.from_pretrained("google-bert/bert-base-cased"), BertTokenizerFast)
@require_tokenizers
def test_do_lower_case(self):
- tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased", do_lower_case=False)
+ tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased", do_lower_case=False)
sample = "Hello, world. How are you?"
tokens = tokenizer.tokenize(sample)
self.assertEqual("[UNK]", tokens[0])
@@ -211,15 +213,15 @@ class AutoTokenizerTest(unittest.TestCase):
self.assertEqual(tokenizer2.vocab_size, 12)
def test_auto_tokenizer_fast_no_slow(self):
- tokenizer = AutoTokenizer.from_pretrained("ctrl")
+ tokenizer = AutoTokenizer.from_pretrained("Salesforce/ctrl")
# There is no fast CTRL so this always gives us a slow tokenizer.
self.assertIsInstance(tokenizer, CTRLTokenizer)
def test_get_tokenizer_config(self):
# Check we can load the tokenizer config of an online model.
- config = get_tokenizer_config("bert-base-cased")
+ config = get_tokenizer_config("google-bert/bert-base-cased")
_ = config.pop("_commit_hash", None)
- # If we ever update bert-base-cased tokenizer config, this dict here will need to be updated.
+ # If we ever update google-bert/bert-base-cased tokenizer config, this dict here will need to be updated.
self.assertEqual(config, {"do_lower_case": False})
# This model does not have a tokenizer_config so we get back an empty dict.
diff --git a/tests/models/bert/test_modeling_bert.py b/tests/models/bert/test_modeling_bert.py
index 2601c92cfb7..bc383568529 100644
--- a/tests/models/bert/test_modeling_bert.py
+++ b/tests/models/bert/test_modeling_bert.py
@@ -627,7 +627,7 @@ class BertModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin
class BertModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_no_head_absolute_embedding(self):
- model = BertModel.from_pretrained("bert-base-uncased")
+ model = BertModel.from_pretrained("google-bert/bert-base-uncased")
input_ids = torch.tensor([[0, 345, 232, 328, 740, 140, 1695, 69, 6078, 1588, 2]])
attention_mask = torch.tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
with torch.no_grad():
diff --git a/tests/models/bert/test_modeling_flax_bert.py b/tests/models/bert/test_modeling_flax_bert.py
index 82268991751..fca54dbed84 100644
--- a/tests/models/bert/test_modeling_flax_bert.py
+++ b/tests/models/bert/test_modeling_flax_bert.py
@@ -158,6 +158,6 @@ class FlaxBertModelTest(FlaxModelTesterMixin, unittest.TestCase):
def test_model_from_pretrained(self):
# Only check this for base model, not necessary for all model classes.
# This will also help speed-up tests.
- model = FlaxBertModel.from_pretrained("bert-base-cased")
+ model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
outputs = model(np.ones((1, 1)))
self.assertIsNotNone(outputs)
diff --git a/tests/models/bert/test_tokenization_bert.py b/tests/models/bert/test_tokenization_bert.py
index f9383756e3b..bee1ccf0d15 100644
--- a/tests/models/bert/test_tokenization_bert.py
+++ b/tests/models/bert/test_tokenization_bert.py
@@ -242,7 +242,7 @@ class BertTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@slow
def test_sequence_builders(self):
- tokenizer = self.tokenizer_class.from_pretrained("bert-base-uncased")
+ tokenizer = self.tokenizer_class.from_pretrained("google-bert/bert-base-uncased")
text = tokenizer.encode("sequence builders", add_special_tokens=False)
text_2 = tokenizer.encode("multi-sequence build", add_special_tokens=False)
diff --git a/tests/models/bert/test_tokenization_bert_tf.py b/tests/models/bert/test_tokenization_bert_tf.py
index 16ac1d4867e..f950e7439c3 100644
--- a/tests/models/bert/test_tokenization_bert_tf.py
+++ b/tests/models/bert/test_tokenization_bert_tf.py
@@ -16,7 +16,7 @@ if is_tensorflow_text_available():
from transformers.models.bert import TFBertTokenizer
-TOKENIZER_CHECKPOINTS = ["bert-base-uncased", "bert-base-cased"]
+TOKENIZER_CHECKPOINTS = ["google-bert/bert-base-uncased", "google-bert/bert-base-cased"]
TINY_MODEL_CHECKPOINT = "hf-internal-testing/tiny-bert-tf-only"
if is_tf_available():
diff --git a/tests/models/bert_japanese/test_tokenization_bert_japanese.py b/tests/models/bert_japanese/test_tokenization_bert_japanese.py
index cedf7492cfb..d2a7accb390 100644
--- a/tests/models/bert_japanese/test_tokenization_bert_japanese.py
+++ b/tests/models/bert_japanese/test_tokenization_bert_japanese.py
@@ -488,7 +488,7 @@ class BertTokenizerMismatchTest(unittest.TestCase):
" is called from."
)
)
- EXAMPLE_BERT_ID = "bert-base-cased"
+ EXAMPLE_BERT_ID = "google-bert/bert-base-cased"
with self.assertLogs("transformers", level="WARNING") as cm:
BertJapaneseTokenizer.from_pretrained(EXAMPLE_BERT_ID)
self.assertTrue(
diff --git a/tests/models/camembert/test_modeling_camembert.py b/tests/models/camembert/test_modeling_camembert.py
index a15ab8caa23..f2fba59496d 100644
--- a/tests/models/camembert/test_modeling_camembert.py
+++ b/tests/models/camembert/test_modeling_camembert.py
@@ -31,7 +31,7 @@ if is_torch_available():
class CamembertModelIntegrationTest(unittest.TestCase):
@slow
def test_output_embeds_base_model(self):
- model = CamembertModel.from_pretrained("camembert-base")
+ model = CamembertModel.from_pretrained("almanach/camembert-base")
model.to(torch_device)
input_ids = torch.tensor(
diff --git a/tests/models/camembert/test_tokenization_camembert.py b/tests/models/camembert/test_tokenization_camembert.py
index 7f72d304d5c..33254b96de8 100644
--- a/tests/models/camembert/test_tokenization_camembert.py
+++ b/tests/models/camembert/test_tokenization_camembert.py
@@ -128,7 +128,7 @@ class CamembertTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.tokenizer_integration_test_util(
expected_encoding=expected_encoding,
- model_name="camembert-base",
+ model_name="almanach/camembert-base",
revision="3a0641d9a1aeb7e848a74299e7e4c4bca216b4cf",
sequences=sequences,
)
diff --git a/tests/models/dpr/test_tokenization_dpr.py b/tests/models/dpr/test_tokenization_dpr.py
index db41052d4cd..2e0f41da4d5 100644
--- a/tests/models/dpr/test_tokenization_dpr.py
+++ b/tests/models/dpr/test_tokenization_dpr.py
@@ -50,7 +50,7 @@ class DPRReaderTokenizationTest(BertTokenizationTest):
@slow
def test_decode_best_spans(self):
- tokenizer = self.tokenizer_class.from_pretrained("bert-base-uncased")
+ tokenizer = self.tokenizer_class.from_pretrained("google-bert/bert-base-uncased")
text_1 = tokenizer.encode("question sequence", add_special_tokens=False)
text_2 = tokenizer.encode("title sequence", add_special_tokens=False)
@@ -73,7 +73,7 @@ class DPRReaderTokenizationTest(BertTokenizationTest):
@slow
def test_call(self):
- tokenizer = self.tokenizer_class.from_pretrained("bert-base-uncased")
+ tokenizer = self.tokenizer_class.from_pretrained("google-bert/bert-base-uncased")
text_1 = tokenizer.encode("question sequence", add_special_tokens=False)
text_2 = tokenizer.encode("title sequence", add_special_tokens=False)
diff --git a/tests/models/encoder_decoder/test_modeling_encoder_decoder.py b/tests/models/encoder_decoder/test_modeling_encoder_decoder.py
index 25444d7d32f..2ff3e3aa509 100644
--- a/tests/models/encoder_decoder/test_modeling_encoder_decoder.py
+++ b/tests/models/encoder_decoder/test_modeling_encoder_decoder.py
@@ -671,7 +671,9 @@ class EncoderDecoderMixin:
@require_torch
class BertEncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model(self):
- return EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "bert-base-cased")
+ return EncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "google-bert/bert-base-cased"
+ )
def get_encoder_decoder_model(self, config, decoder_config):
encoder_model = BertModel(config)
@@ -937,7 +939,9 @@ class RoBertaEncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
}
def get_pretrained_model(self):
- return EncoderDecoderModel.from_encoder_decoder_pretrained("roberta-base", "roberta-base")
+ return EncoderDecoderModel.from_encoder_decoder_pretrained(
+ "FacebookAI/roberta-base", "FacebookAI/roberta-base"
+ )
@require_torch
@@ -994,7 +998,9 @@ class GPT2EncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
}
def get_pretrained_model(self):
- return EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ return EncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "openai-community/gpt2"
+ )
def test_encoder_decoder_model_shared_weights(self):
pass
@@ -1004,8 +1010,8 @@ class GPT2EncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2gpt2-cnn_dailymail-fp16")
model.to(torch_device)
- tokenizer_in = AutoTokenizer.from_pretrained("bert-base-cased")
- tokenizer_out = AutoTokenizer.from_pretrained("gpt2")
+ tokenizer_in = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_out = AutoTokenizer.from_pretrained("openai-community/gpt2")
ARTICLE_STUDENTS = """(CNN)Sigma Alpha Epsilon is under fire for a video showing party-bound fraternity members singing a racist chant. SAE's national chapter suspended the students, but University of Oklahoma President David Boren took it a step further, saying the university's affiliation with the fraternity is permanently done. The news is shocking, but it's not the first time SAE has faced controversy. SAE was founded March 9, 1856, at the University of Alabama, five years before the American Civil War, according to the fraternity website. When the war began, the group had fewer than 400 members, of which "369 went to war for the Confederate States and seven for the Union Army," the website says. The fraternity now boasts more than 200,000 living alumni, along with about 15,000 undergraduates populating 219 chapters and 20 "colonies" seeking full membership at universities. SAE has had to work hard to change recently after a string of member deaths, many blamed on the hazing of new recruits, SAE national President Bradley Cohen wrote in a message on the fraternity's website. The fraternity's website lists more than 130 chapters cited or suspended for "health and safety incidents" since 2010. At least 30 of the incidents involved hazing, and dozens more involved alcohol. However, the list is missing numerous incidents from recent months. Among them, according to various media outlets: Yale University banned the SAEs from campus activities last month after members allegedly tried to interfere with a sexual misconduct investigation connected to an initiation rite. Stanford University in December suspended SAE housing privileges after finding sorority members attending a fraternity function were subjected to graphic sexual content. And Johns Hopkins University in November suspended the fraternity for underage drinking. "The media has labeled us as the 'nation's deadliest fraternity,' " Cohen said. In 2011, for example, a student died while being coerced into excessive alcohol consumption, according to a lawsuit. SAE's previous insurer dumped the fraternity. "As a result, we are paying Lloyd's of London the highest insurance rates in the Greek-letter world," Cohen said. Universities have turned down SAE's attempts to open new chapters, and the fraternity had to close 12 in 18 months over hazing incidents."""
@@ -1067,7 +1073,7 @@ class ProphetNetEncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model(self):
return EncoderDecoderModel.from_encoder_decoder_pretrained(
- "bert-large-uncased", "microsoft/prophetnet-large-uncased"
+ "google-bert/bert-large-uncased", "microsoft/prophetnet-large-uncased"
)
def test_encoder_decoder_model_shared_weights(self):
@@ -1122,7 +1128,9 @@ class BartEncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
}
def get_pretrained_model(self):
- return EncoderDecoderModel.from_encoder_decoder_pretrained("bert-large-uncased", "facebook/bart-large")
+ return EncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-large-uncased", "facebook/bart-large"
+ )
def test_encoder_decoder_model_shared_weights(self):
pass
@@ -1131,10 +1139,12 @@ class BartEncoderDecoderModelTest(EncoderDecoderMixin, unittest.TestCase):
@require_torch
class EncoderDecoderModelTest(unittest.TestCase):
def get_from_encoderdecoder_pretrained_model(self):
- return EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
+ return EncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-uncased", "google-bert/bert-base-uncased"
+ )
def get_decoder_config(self):
- config = AutoConfig.from_pretrained("bert-base-uncased")
+ config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
config.is_decoder = True
config.add_cross_attention = True
return config
@@ -1143,8 +1153,10 @@ class EncoderDecoderModelTest(unittest.TestCase):
return EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
def get_encoder_decoder_models(self):
- encoder_model = BertModel.from_pretrained("bert-base-uncased")
- decoder_model = BertLMHeadModel.from_pretrained("bert-base-uncased", config=self.get_decoder_config())
+ encoder_model = BertModel.from_pretrained("google-bert/bert-base-uncased")
+ decoder_model = BertLMHeadModel.from_pretrained(
+ "google-bert/bert-base-uncased", config=self.get_decoder_config()
+ )
return {"encoder": encoder_model, "decoder": decoder_model}
def _check_configuration_tie(self, model):
diff --git a/tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py b/tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py
index 362a5f74a1b..c8f76a144be 100644
--- a/tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py
+++ b/tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py
@@ -483,12 +483,14 @@ class FlaxGPT2EncoderDecoderModelTest(FlaxEncoderDecoderMixin, unittest.TestCase
}
def get_pretrained_model(self):
- return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "openai-community/gpt2"
+ )
@slow
def test_bert2gpt2_summarization(self):
- tokenizer_in = AutoTokenizer.from_pretrained("bert-base-cased")
- tokenizer_out = AutoTokenizer.from_pretrained("gpt2")
+ tokenizer_in = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_out = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = FlaxEncoderDecoderModel.from_pretrained(
"patrickvonplaten/bert2gpt2-cnn_dailymail-fp16", pad_token_id=tokenizer_out.eos_token_id
@@ -539,7 +541,9 @@ class FlaxBartEncoderDecoderModelTest(FlaxEncoderDecoderMixin, unittest.TestCase
}
def get_pretrained_model(self):
- return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "facebook/bart-base")
+ return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "facebook/bart-base"
+ )
@require_flax
@@ -576,13 +580,17 @@ class FlaxBertEncoderDecoderModelTest(FlaxEncoderDecoderMixin, unittest.TestCase
}
def get_pretrained_model(self):
- return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "bert-base-cased")
+ return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "google-bert/bert-base-cased"
+ )
@require_flax
class FlaxEncoderDecoderModelTest(unittest.TestCase):
def get_from_encoderdecoder_pretrained_model(self):
- return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "gpt2")
+ return FlaxEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "openai-community/gpt2"
+ )
def _check_configuration_tie(self, model):
module = model.module.bind(model.params)
diff --git a/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py b/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py
index a9d32474c3d..99a09ada169 100644
--- a/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py
+++ b/tests/models/encoder_decoder/test_modeling_tf_encoder_decoder.py
@@ -764,7 +764,7 @@ class TFBertEncoderDecoderModelTest(TFEncoderDecoderMixin, unittest.TestCase):
def test_bert2bert_summarization(self):
from transformers import EncoderDecoderModel
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
"""Not working, because pt checkpoint has `encoder.encoder.layer...` while tf model has `encoder.bert.encoder.layer...`.
(For Bert decoder, there is no issue, because `BertModel` is wrapped into `decoder` as `bert`)
@@ -864,8 +864,8 @@ class TFGPT2EncoderDecoderModelTest(TFEncoderDecoderMixin, unittest.TestCase):
def test_bert2gpt2_summarization(self):
from transformers import EncoderDecoderModel
- tokenizer_in = AutoTokenizer.from_pretrained("bert-base-cased")
- tokenizer_out = AutoTokenizer.from_pretrained("gpt2")
+ tokenizer_in = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_out = AutoTokenizer.from_pretrained("openai-community/gpt2")
"""Not working, because pt checkpoint has `encoder.encoder.layer...` while tf model has `encoder.bert.encoder.layer...`.
(For GPT2 decoder, there is no issue)
@@ -1016,10 +1016,12 @@ class TFRembertEncoderDecoderModelTest(TFEncoderDecoderMixin, unittest.TestCase)
@require_tf
class TFEncoderDecoderModelTest(unittest.TestCase):
def get_from_encoderdecoder_pretrained_model(self):
- return TFEncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-cased", "bert-base-cased")
+ return TFEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google-bert/bert-base-cased", "google-bert/bert-base-cased"
+ )
def get_decoder_config(self):
- config = AutoConfig.from_pretrained("bert-base-cased")
+ config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
config.is_decoder = True
config.add_cross_attention = True
return config
@@ -1028,9 +1030,9 @@ class TFEncoderDecoderModelTest(unittest.TestCase):
return TFEncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
def get_encoder_decoder_models(self):
- encoder_model = TFBertModel.from_pretrained("bert-base-cased", name="encoder")
+ encoder_model = TFBertModel.from_pretrained("google-bert/bert-base-cased", name="encoder")
decoder_model = TFBertLMHeadModel.from_pretrained(
- "bert-base-cased", config=self.get_decoder_config(), name="decoder"
+ "google-bert/bert-base-cased", config=self.get_decoder_config(), name="decoder"
)
return {"encoder": encoder_model, "decoder": decoder_model}
@@ -1055,8 +1057,10 @@ class TFEncoderDecoderModelTest(unittest.TestCase):
@require_tf
class TFEncoderDecoderModelSaveLoadTests(unittest.TestCase):
def get_encoder_decoder_config(self):
- encoder_config = AutoConfig.from_pretrained("bert-base-uncased")
- decoder_config = AutoConfig.from_pretrained("bert-base-uncased", is_decoder=True, add_cross_attention=True)
+ encoder_config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
+ decoder_config = AutoConfig.from_pretrained(
+ "google-bert/bert-base-uncased", is_decoder=True, add_cross_attention=True
+ )
return EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
def get_encoder_decoder_config_small(self):
@@ -1160,8 +1164,8 @@ class TFEncoderDecoderModelSaveLoadTests(unittest.TestCase):
load_weight_prefix = TFEncoderDecoderModel.load_weight_prefix
config = self.get_encoder_decoder_config()
- encoder_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
- decoder_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+ encoder_tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
+ decoder_tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
input_ids = encoder_tokenizer("who sings does he love me with reba", return_tensors="tf").input_ids
decoder_input_ids = decoder_tokenizer("Linda Davis", return_tensors="tf").input_ids
@@ -1173,10 +1177,10 @@ class TFEncoderDecoderModelSaveLoadTests(unittest.TestCase):
# So we create pretrained models (without `load_weight_prefix`), save them, and later,
# we load them using `from_pretrained`.
# (we don't need to do this for encoder, but let's make the code more similar between encoder/decoder)
- encoder = TFAutoModel.from_pretrained("bert-base-uncased", name="encoder")
+ encoder = TFAutoModel.from_pretrained("google-bert/bert-base-uncased", name="encoder")
# It's necessary to specify `add_cross_attention=True` here.
decoder = TFAutoModelForCausalLM.from_pretrained(
- "bert-base-uncased", is_decoder=True, add_cross_attention=True, name="decoder"
+ "google-bert/bert-base-uncased", is_decoder=True, add_cross_attention=True, name="decoder"
)
pretrained_encoder_dir = os.path.join(tmp_dirname, "pretrained_encoder")
pretrained_decoder_dir = os.path.join(tmp_dirname, "pretrained_decoder")
diff --git a/tests/models/gpt2/test_modeling_flax_gpt2.py b/tests/models/gpt2/test_modeling_flax_gpt2.py
index 1e24ad0b00d..fbf2d6c333f 100644
--- a/tests/models/gpt2/test_modeling_flax_gpt2.py
+++ b/tests/models/gpt2/test_modeling_flax_gpt2.py
@@ -237,10 +237,10 @@ class FlaxGPT2ModelTest(FlaxModelTesterMixin, FlaxGenerationTesterMixin, unittes
@slow
def test_batch_generation(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2", pad_token="", padding_side="left")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2", pad_token="", padding_side="left")
inputs = tokenizer(["Hello this is a long string", "Hey"], return_tensors="np", padding=True, truncation=True)
- model = FlaxGPT2LMHeadModel.from_pretrained("gpt2")
+ model = FlaxGPT2LMHeadModel.from_pretrained("openai-community/gpt2")
model.do_sample = False
model.config.pad_token_id = model.config.eos_token_id
@@ -359,6 +359,6 @@ class FlaxGPT2ModelTest(FlaxModelTesterMixin, FlaxGenerationTesterMixin, unittes
@slow
def test_model_from_pretrained(self):
for model_class_name in self.all_model_classes:
- model = model_class_name.from_pretrained("gpt2", from_pt=True)
+ model = model_class_name.from_pretrained("openai-community/gpt2", from_pt=True)
outputs = model(np.ones((1, 1)))
self.assertIsNotNone(outputs)
diff --git a/tests/models/gpt2/test_modeling_gpt2.py b/tests/models/gpt2/test_modeling_gpt2.py
index 245b29d56a6..c9ecbdde667 100644
--- a/tests/models/gpt2/test_modeling_gpt2.py
+++ b/tests/models/gpt2/test_modeling_gpt2.py
@@ -98,7 +98,7 @@ class GPT2ModelTester:
self.pad_token_id = vocab_size - 1
def get_large_model_config(self):
- return GPT2Config.from_pretrained("gpt2")
+ return GPT2Config.from_pretrained("openai-community/gpt2")
def prepare_config_and_inputs(
self, gradient_checkpointing=False, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False
@@ -582,9 +582,9 @@ class GPT2ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin
@slow
def test_batch_generation(self):
- model = GPT2LMHeadModel.from_pretrained("gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
model.to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
tokenizer.padding_side = "left"
@@ -641,9 +641,9 @@ class GPT2ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin
@slow
def test_batch_generation_2heads(self):
- model = GPT2DoubleHeadsModel.from_pretrained("gpt2")
+ model = GPT2DoubleHeadsModel.from_pretrained("openai-community/gpt2")
model.to(torch_device)
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
tokenizer.padding_side = "left"
@@ -722,7 +722,7 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
verify_outputs=True,
):
model = GPT2LMHeadModel.from_pretrained(
- "gpt2",
+ "openai-community/gpt2",
reorder_and_upcast_attn=reorder_and_upcast_attn,
scale_attn_by_inverse_layer_idx=scale_attn_by_inverse_layer_idx,
)
@@ -759,8 +759,8 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_gpt2_sample(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- model = GPT2LMHeadModel.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
model.to(torch_device)
torch.manual_seed(0)
@@ -787,8 +787,8 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_gpt2_sample_max_time(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
- model = GPT2LMHeadModel.from_pretrained("gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+ model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
model.to(torch_device)
torch.manual_seed(0)
@@ -833,8 +833,8 @@ class GPT2ModelLanguageGenerationTest(unittest.TestCase):
"laboratory founded in 2010. DeepMind was acquired by Google in 2014. The company is based"
)
- gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
- gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2-large").to(torch_device)
+ gpt2_tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2-large")
+ gpt2_model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2-large").to(torch_device)
input_ids = gpt2_tokenizer(article, return_tensors="pt").input_ids.to(torch_device)
outputs = gpt2_model.generate(input_ids, penalty_alpha=0.6, top_k=4, max_length=256)
diff --git a/tests/models/gpt2/test_modeling_tf_gpt2.py b/tests/models/gpt2/test_modeling_tf_gpt2.py
index d636097dc28..060d4b71985 100644
--- a/tests/models/gpt2/test_modeling_tf_gpt2.py
+++ b/tests/models/gpt2/test_modeling_tf_gpt2.py
@@ -461,8 +461,8 @@ class TFGPT2ModelTest(TFModelTesterMixin, TFCoreModelTesterMixin, PipelineTester
class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_greedy_distilgpt2_batch_special(self):
- model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("distilbert/distilgpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -488,8 +488,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_sample_distilgpt2_batch_special(self):
- model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("distilbert/distilgpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -522,8 +522,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_greedy_distilgpt2_beam_search_special(self):
- model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("distilbert/distilgpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -550,8 +550,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_distilgpt2_left_padding(self):
"""Tests that the generated text is the same, regarless of left padding"""
- model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("distilbert/distilgpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -582,8 +582,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_gpt2_greedy_xla(self):
- model = TFGPT2LMHeadModel.from_pretrained("gpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("openai-community/gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -612,8 +612,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
# forces the generation to happen on CPU, to avoid GPU-related quirks
with tf.device(":/CPU:0"):
- model = TFGPT2LMHeadModel.from_pretrained("gpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("openai-community/gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -642,8 +642,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_gpt2_beam_search_xla(self):
- model = TFGPT2LMHeadModel.from_pretrained("gpt2")
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+ model = TFGPT2LMHeadModel.from_pretrained("openai-community/gpt2")
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
@@ -671,8 +671,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
"laboratory founded in 2010. DeepMind was acquired by Google in 2014. The company is based"
)
- gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
- gpt2_model = TFGPT2LMHeadModel.from_pretrained("gpt2-large")
+ gpt2_tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2-large")
+ gpt2_model = TFGPT2LMHeadModel.from_pretrained("openai-community/gpt2-large")
input_ids = gpt2_tokenizer(article, return_tensors="tf")
outputs = gpt2_model.generate(**input_ids, penalty_alpha=0.6, top_k=4, max_length=256)
@@ -705,8 +705,8 @@ class TFGPT2ModelLanguageGenerationTest(unittest.TestCase):
"laboratory founded in 2010. DeepMind was acquired by Google in 2014. The company is based"
)
- gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
- gpt2_model = TFGPT2LMHeadModel.from_pretrained("gpt2-large")
+ gpt2_tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2-large")
+ gpt2_model = TFGPT2LMHeadModel.from_pretrained("openai-community/gpt2-large")
input_ids = gpt2_tokenizer(article, return_tensors="tf")
xla_generate = tf.function(gpt2_model.generate, jit_compile=True)
diff --git a/tests/models/gpt2/test_tokenization_gpt2_tf.py b/tests/models/gpt2/test_tokenization_gpt2_tf.py
index a3eac86fa60..0cea50db318 100644
--- a/tests/models/gpt2/test_tokenization_gpt2_tf.py
+++ b/tests/models/gpt2/test_tokenization_gpt2_tf.py
@@ -15,8 +15,8 @@ if is_keras_nlp_available():
from transformers.models.gpt2 import TFGPT2Tokenizer
-TOKENIZER_CHECKPOINTS = ["gpt2"]
-TINY_MODEL_CHECKPOINT = "gpt2"
+TOKENIZER_CHECKPOINTS = ["openai-community/gpt2"]
+TINY_MODEL_CHECKPOINT = "openai-community/gpt2"
if is_tf_available():
diff --git a/tests/models/gpt_neo/test_modeling_flax_gpt_neo.py b/tests/models/gpt_neo/test_modeling_flax_gpt_neo.py
index 58574a8b1da..ca41495a842 100644
--- a/tests/models/gpt_neo/test_modeling_flax_gpt_neo.py
+++ b/tests/models/gpt_neo/test_modeling_flax_gpt_neo.py
@@ -202,7 +202,9 @@ class FlaxGPTNeoModelTest(FlaxModelTesterMixin, FlaxGenerationTesterMixin, unitt
@slow
def test_batch_generation(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2", pad_token="<|endoftext|>", padding_side="left")
+ tokenizer = GPT2Tokenizer.from_pretrained(
+ "openai-community/gpt2", pad_token="<|endoftext|>", padding_side="left"
+ )
inputs = tokenizer(["Hello this is a long string", "Hey"], return_tensors="np", padding=True, truncation=True)
model = FlaxGPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")
diff --git a/tests/models/gptj/test_modeling_flax_gptj.py b/tests/models/gptj/test_modeling_flax_gptj.py
index 48061f84d86..aa3b7a99aa0 100644
--- a/tests/models/gptj/test_modeling_flax_gptj.py
+++ b/tests/models/gptj/test_modeling_flax_gptj.py
@@ -199,7 +199,9 @@ class FlaxGPTJModelTest(FlaxModelTesterMixin, FlaxGenerationTesterMixin, unittes
@tooslow
def test_batch_generation(self):
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2", pad_token="<|endoftext|>", padding_side="left")
+ tokenizer = GPT2Tokenizer.from_pretrained(
+ "openai-community/gpt2", pad_token="<|endoftext|>", padding_side="left"
+ )
inputs = tokenizer(["Hello this is a long string", "Hey"], return_tensors="np", padding=True, truncation=True)
model = FlaxGPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
diff --git a/tests/models/longformer/test_tokenization_longformer.py b/tests/models/longformer/test_tokenization_longformer.py
index 32dc0f952fe..42524ca65a6 100644
--- a/tests/models/longformer/test_tokenization_longformer.py
+++ b/tests/models/longformer/test_tokenization_longformer.py
@@ -28,7 +28,7 @@ from ...test_tokenization_common import TokenizerTesterMixin
@require_tokenizers
-# Copied from tests.models.roberta.test_tokenization_roberta.RobertaTokenizationTest with roberta-base->allenai/longformer-base-4096,Roberta->Longformer,roberta->longformer,
+# Copied from tests.models.roberta.test_tokenization_roberta.RobertaTokenizationTest with FacebookAI/roberta-base->allenai/longformer-base-4096,Roberta->Longformer,roberta->longformer,
class LongformerTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
# Ignore copy
tokenizer_class = LongformerTokenizer
diff --git a/tests/models/markuplm/test_tokenization_markuplm.py b/tests/models/markuplm/test_tokenization_markuplm.py
index 9d2af513e1a..e793a9a5070 100644
--- a/tests/models/markuplm/test_tokenization_markuplm.py
+++ b/tests/models/markuplm/test_tokenization_markuplm.py
@@ -1373,7 +1373,7 @@ class MarkupLMTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
inputs = new_tokenizer(text, xpaths=xpaths)
self.assertEqual(len(inputs["input_ids"]), 2)
decoded_input = new_tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
- expected_result = ( # original expected result "this is the" seems contradicts to roberta-based tokenizer
+ expected_result = ( # original expected result "this is the" seems contradicts to FacebookAI/roberta-based tokenizer
"thisisthe"
)
diff --git a/tests/models/mobilebert/test_tokenization_mobilebert.py b/tests/models/mobilebert/test_tokenization_mobilebert.py
index babed7a8d9b..92ddd88684b 100644
--- a/tests/models/mobilebert/test_tokenization_mobilebert.py
+++ b/tests/models/mobilebert/test_tokenization_mobilebert.py
@@ -258,7 +258,7 @@ class MobileBERTTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
)
@slow
- # Copied from tests.models.bert.test_tokenization_bert.BertTokenizationTest.test_sequence_builders with bert-base-uncased->google/mobilebert-uncased
+ # Copied from tests.models.bert.test_tokenization_bert.BertTokenizationTest.test_sequence_builders with google-bert/bert-base-uncased->google/mobilebert-uncased
def test_sequence_builders(self):
tokenizer = self.tokenizer_class.from_pretrained("google/mobilebert-uncased")
diff --git a/tests/models/mt5/test_modeling_mt5.py b/tests/models/mt5/test_modeling_mt5.py
index ac34bcce7b9..9e7dd443e2b 100644
--- a/tests/models/mt5/test_modeling_mt5.py
+++ b/tests/models/mt5/test_modeling_mt5.py
@@ -104,7 +104,7 @@ class MT5ModelTester:
self.decoder_layers = decoder_layers
def get_large_model_config(self):
- return MT5Config.from_pretrained("t5-base")
+ return MT5Config.from_pretrained("google-t5/t5-base")
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.encoder_seq_length], self.vocab_size).clamp(2)
@@ -940,7 +940,7 @@ class MT5EncoderOnlyModelTester:
self.is_training = is_training
def get_large_model_config(self):
- return MT5Config.from_pretrained("t5-base")
+ return MT5Config.from_pretrained("google-t5/t5-base")
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.encoder_seq_length], self.vocab_size)
diff --git a/tests/models/openai/test_modeling_openai.py b/tests/models/openai/test_modeling_openai.py
index 98d74ee5f80..718c224bf04 100644
--- a/tests/models/openai/test_modeling_openai.py
+++ b/tests/models/openai/test_modeling_openai.py
@@ -279,7 +279,7 @@ class OpenAIGPTModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTester
class OPENAIGPTModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_openai_gpt(self):
- model = OpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
+ model = OpenAIGPTLMHeadModel.from_pretrained("openai-community/openai-gpt")
model.to(torch_device)
input_ids = torch.tensor([[481, 4735, 544]], dtype=torch.long, device=torch_device) # the president is
expected_output_ids = [
diff --git a/tests/models/openai/test_modeling_tf_openai.py b/tests/models/openai/test_modeling_tf_openai.py
index 231758064f2..6704ec97532 100644
--- a/tests/models/openai/test_modeling_tf_openai.py
+++ b/tests/models/openai/test_modeling_tf_openai.py
@@ -262,7 +262,7 @@ class TFOpenAIGPTModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.Tes
class TFOPENAIGPTModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_openai_gpt(self):
- model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
+ model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-community/openai-gpt")
input_ids = tf.convert_to_tensor([[481, 4735, 544]], dtype=tf.int32) # the president is
expected_output_ids = [
481,
diff --git a/tests/models/pix2struct/test_processor_pix2struct.py b/tests/models/pix2struct/test_processor_pix2struct.py
index 318e6f301f6..88335296f03 100644
--- a/tests/models/pix2struct/test_processor_pix2struct.py
+++ b/tests/models/pix2struct/test_processor_pix2struct.py
@@ -41,7 +41,7 @@ class Pix2StructProcessorTest(unittest.TestCase):
self.tmpdirname = tempfile.mkdtemp()
image_processor = Pix2StructImageProcessor()
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
processor = Pix2StructProcessor(image_processor, tokenizer)
diff --git a/tests/models/qdqbert/test_modeling_qdqbert.py b/tests/models/qdqbert/test_modeling_qdqbert.py
index d10abb733e0..e8c6d17986d 100644
--- a/tests/models/qdqbert/test_modeling_qdqbert.py
+++ b/tests/models/qdqbert/test_modeling_qdqbert.py
@@ -563,7 +563,7 @@ class QDQBertModelIntegrationTest(unittest.TestCase):
quant_nn.QuantLinear.set_default_quant_desc_input(input_desc)
quant_nn.QuantLinear.set_default_quant_desc_weight(weight_desc)
- model = QDQBertModel.from_pretrained("bert-base-uncased")
+ model = QDQBertModel.from_pretrained("google-bert/bert-base-uncased")
input_ids = torch.tensor([[0, 345, 232, 328, 740, 140, 1695, 69, 6078, 1588, 2]])
attention_mask = torch.tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
output = model(input_ids, attention_mask=attention_mask)[0]
diff --git a/tests/models/realm/test_tokenization_realm.py b/tests/models/realm/test_tokenization_realm.py
index 6a5a3878fd4..7dbd8df6ef2 100644
--- a/tests/models/realm/test_tokenization_realm.py
+++ b/tests/models/realm/test_tokenization_realm.py
@@ -236,7 +236,7 @@ class RealmTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@slow
def test_sequence_builders(self):
- tokenizer = self.tokenizer_class.from_pretrained("bert-base-uncased")
+ tokenizer = self.tokenizer_class.from_pretrained("google-bert/bert-base-uncased")
text = tokenizer.encode("sequence builders", add_special_tokens=False)
text_2 = tokenizer.encode("multi-sequence build", add_special_tokens=False)
diff --git a/tests/models/roberta/test_modeling_flax_roberta.py b/tests/models/roberta/test_modeling_flax_roberta.py
index f82479aa706..d205a0e75f8 100644
--- a/tests/models/roberta/test_modeling_flax_roberta.py
+++ b/tests/models/roberta/test_modeling_flax_roberta.py
@@ -154,6 +154,6 @@ class FlaxRobertaModelTest(FlaxModelTesterMixin, unittest.TestCase):
@slow
def test_model_from_pretrained(self):
for model_class_name in self.all_model_classes:
- model = model_class_name.from_pretrained("roberta-base", from_pt=True)
+ model = model_class_name.from_pretrained("FacebookAI/roberta-base", from_pt=True)
outputs = model(np.ones((1, 1)))
self.assertIsNotNone(outputs)
diff --git a/tests/models/roberta/test_modeling_roberta.py b/tests/models/roberta/test_modeling_roberta.py
index 6cacf605a26..402d60d37a4 100644
--- a/tests/models/roberta/test_modeling_roberta.py
+++ b/tests/models/roberta/test_modeling_roberta.py
@@ -527,7 +527,7 @@ class RobertaModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMi
class RobertaModelIntegrationTest(TestCasePlus):
@slow
def test_inference_masked_lm(self):
- model = RobertaForMaskedLM.from_pretrained("roberta-base")
+ model = RobertaForMaskedLM.from_pretrained("FacebookAI/roberta-base")
input_ids = torch.tensor([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
with torch.no_grad():
@@ -547,7 +547,7 @@ class RobertaModelIntegrationTest(TestCasePlus):
@slow
def test_inference_no_head(self):
- model = RobertaModel.from_pretrained("roberta-base")
+ model = RobertaModel.from_pretrained("FacebookAI/roberta-base")
input_ids = torch.tensor([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
with torch.no_grad():
@@ -565,7 +565,7 @@ class RobertaModelIntegrationTest(TestCasePlus):
@slow
def test_inference_classification_head(self):
- model = RobertaForSequenceClassification.from_pretrained("roberta-large-mnli")
+ model = RobertaForSequenceClassification.from_pretrained("FacebookAI/roberta-large-mnli")
input_ids = torch.tensor([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
with torch.no_grad():
diff --git a/tests/models/roberta/test_modeling_tf_roberta.py b/tests/models/roberta/test_modeling_tf_roberta.py
index 2f2859391ad..37377ab5ba5 100644
--- a/tests/models/roberta/test_modeling_tf_roberta.py
+++ b/tests/models/roberta/test_modeling_tf_roberta.py
@@ -666,7 +666,7 @@ class TFRobertaModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.TestC
class TFRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_masked_lm(self):
- model = TFRobertaForMaskedLM.from_pretrained("roberta-base")
+ model = TFRobertaForMaskedLM.from_pretrained("FacebookAI/roberta-base")
input_ids = tf.constant([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
output = model(input_ids)[0]
@@ -680,7 +680,7 @@ class TFRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_no_head(self):
- model = TFRobertaModel.from_pretrained("roberta-base")
+ model = TFRobertaModel.from_pretrained("FacebookAI/roberta-base")
input_ids = tf.constant([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
output = model(input_ids)[0]
@@ -692,7 +692,7 @@ class TFRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_classification_head(self):
- model = TFRobertaForSequenceClassification.from_pretrained("roberta-large-mnli")
+ model = TFRobertaForSequenceClassification.from_pretrained("FacebookAI/roberta-large-mnli")
input_ids = tf.constant([[0, 31414, 232, 328, 740, 1140, 12695, 69, 46078, 1588, 2]])
output = model(input_ids)[0]
diff --git a/tests/models/roberta/test_tokenization_roberta.py b/tests/models/roberta/test_tokenization_roberta.py
index 3190ab13be4..5d457c4cb44 100644
--- a/tests/models/roberta/test_tokenization_roberta.py
+++ b/tests/models/roberta/test_tokenization_roberta.py
@@ -105,7 +105,7 @@ class RobertaTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@slow
def test_sequence_builders(self):
- tokenizer = self.tokenizer_class.from_pretrained("roberta-base")
+ tokenizer = self.tokenizer_class.from_pretrained("FacebookAI/roberta-base")
text = tokenizer.encode("sequence builders", add_special_tokens=False)
text_2 = tokenizer.encode("multi-sequence build", add_special_tokens=False)
diff --git a/tests/models/roberta_prelayernorm/test_modeling_flax_roberta_prelayernorm.py b/tests/models/roberta_prelayernorm/test_modeling_flax_roberta_prelayernorm.py
index 65dbe65974d..0074323460a 100644
--- a/tests/models/roberta_prelayernorm/test_modeling_flax_roberta_prelayernorm.py
+++ b/tests/models/roberta_prelayernorm/test_modeling_flax_roberta_prelayernorm.py
@@ -134,7 +134,7 @@ class FlaxRobertaPreLayerNormModelTester(unittest.TestCase):
@require_flax
-# Copied from tests.models.roberta.test_modeling_flax_roberta.FlaxRobertaModelTest with ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,roberta-base->andreasmadsen/efficient_mlm_m0.40
+# Copied from tests.models.roberta.test_modeling_flax_roberta.FlaxRobertaModelTest with ROBERTA->ROBERTA_PRELAYERNORM,Roberta->RobertaPreLayerNorm,FacebookAI/roberta-base->andreasmadsen/efficient_mlm_m0.40
class FlaxRobertaPreLayerNormModelTest(FlaxModelTesterMixin, unittest.TestCase):
test_head_masking = True
diff --git a/tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py b/tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py
index f2c75e702bf..62ce0d660a0 100644
--- a/tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py
+++ b/tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py
@@ -578,7 +578,7 @@ class FlaxEncoderDecoderMixin:
class FlaxWav2Vec2GPT2ModelTest(FlaxEncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model_and_inputs(self):
model = FlaxSpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
- "facebook/wav2vec2-large-lv60", "gpt2-medium"
+ "facebook/wav2vec2-large-lv60", "openai-community/gpt2-medium"
)
batch_size = 13
input_values = floats_tensor([batch_size, 512], scale=1.0)
@@ -812,7 +812,7 @@ class FlaxWav2Vec2BartModelTest(FlaxEncoderDecoderMixin, unittest.TestCase):
class FlaxWav2Vec2BertModelTest(FlaxEncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model_and_inputs(self):
model = FlaxSpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
- "facebook/wav2vec2-large-lv60", "bert-large-uncased"
+ "facebook/wav2vec2-large-lv60", "google-bert/bert-large-uncased"
)
batch_size = 13
input_values = floats_tensor([batch_size, 512], model.config.encoder.vocab_size)
diff --git a/tests/models/speech_encoder_decoder/test_modeling_speech_encoder_decoder.py b/tests/models/speech_encoder_decoder/test_modeling_speech_encoder_decoder.py
index 368232331a2..c3503702c2a 100644
--- a/tests/models/speech_encoder_decoder/test_modeling_speech_encoder_decoder.py
+++ b/tests/models/speech_encoder_decoder/test_modeling_speech_encoder_decoder.py
@@ -445,7 +445,7 @@ class EncoderDecoderMixin:
class Wav2Vec2BertModelTest(EncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model_and_inputs(self):
model = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
- "facebook/wav2vec2-base-960h", "bert-base-cased"
+ "facebook/wav2vec2-base-960h", "google-bert/bert-base-cased"
)
batch_size = 13
input_values = floats_tensor([batch_size, 512], scale=1.0)
@@ -509,7 +509,7 @@ class Wav2Vec2BertModelTest(EncoderDecoderMixin, unittest.TestCase):
class Speech2TextBertModelTest(EncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model_and_inputs(self):
model = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
- "facebook/s2t-small-librispeech-asr", "bert-base-cased"
+ "facebook/s2t-small-librispeech-asr", "google-bert/bert-base-cased"
)
batch_size = 13
input_features = floats_tensor([batch_size, 7, 80], scale=1.0)
diff --git a/tests/models/switch_transformers/test_modeling_switch_transformers.py b/tests/models/switch_transformers/test_modeling_switch_transformers.py
index aa226f82ae3..b21fa405c39 100644
--- a/tests/models/switch_transformers/test_modeling_switch_transformers.py
+++ b/tests/models/switch_transformers/test_modeling_switch_transformers.py
@@ -1065,7 +1065,7 @@ class SwitchTransformerModelIntegrationTests(unittest.TestCase):
model = SwitchTransformersForConditionalGeneration.from_pretrained(
"google/switch-base-8", torch_dtype=torch.bfloat16
).eval()
- tokenizer = AutoTokenizer.from_pretrained("t5-small", use_fast=False, legacy=False)
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small", use_fast=False, legacy=False)
model = model.to(torch_device)
input_ids = tokenizer(
@@ -1093,7 +1093,7 @@ class SwitchTransformerModelIntegrationTests(unittest.TestCase):
model = SwitchTransformersForConditionalGeneration.from_pretrained(
"google/switch-base-8", torch_dtype=torch.bfloat16
).eval()
- tokenizer = AutoTokenizer.from_pretrained("t5-small", use_fast=False, legacy=False)
+ tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small", use_fast=False, legacy=False)
inputs = [
"A walks into a bar and orders a with pinch of ."
diff --git a/tests/models/t5/test_modeling_flax_t5.py b/tests/models/t5/test_modeling_flax_t5.py
index d5d729dac9a..204b84989be 100644
--- a/tests/models/t5/test_modeling_flax_t5.py
+++ b/tests/models/t5/test_modeling_flax_t5.py
@@ -773,8 +773,8 @@ class FlaxT5ModelIntegrationTests(unittest.TestCase):
>>> score = t5_model.score(inputs=["Hello there"], targets=["Hi I am"], vocabulary=vocab)
"""
- model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("Hello there", return_tensors="np").input_ids
labels = tokenizer("Hi I am", return_tensors="np").input_ids
@@ -849,11 +849,11 @@ class FlaxT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_small_generation(self):
- model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small")
+ model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
model.config.max_length = 8
model.config.num_beams = 1
model.config.do_sample = False
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("summarize: Hello there", return_tensors="np").input_ids
@@ -864,11 +864,11 @@ class FlaxT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_small_generation_bfloat16(self):
- model = FlaxT5ForConditionalGeneration.from_pretrained("t5-small", dtype=jnp.bfloat16)
+ model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-small", dtype=jnp.bfloat16)
model.config.max_length = 8
model.config.num_beams = 1
model.config.do_sample = False
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("summarize: Hello there", return_tensors="np").input_ids
@@ -879,8 +879,8 @@ class FlaxT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_summarization(self):
- model = FlaxT5ForConditionalGeneration.from_pretrained("t5-base")
- tok = T5Tokenizer.from_pretrained("t5-base")
+ model = FlaxT5ForConditionalGeneration.from_pretrained("google-t5/t5-base")
+ tok = T5Tokenizer.from_pretrained("google-t5/t5-base")
FRANCE_ARTICLE = ( # @noqa
"Marseille, France (CNN)The French prosecutor leading an investigation into the crash of Germanwings"
diff --git a/tests/models/t5/test_modeling_t5.py b/tests/models/t5/test_modeling_t5.py
index 9defe3b23ef..c0a43dfeab6 100644
--- a/tests/models/t5/test_modeling_t5.py
+++ b/tests/models/t5/test_modeling_t5.py
@@ -108,7 +108,7 @@ class T5ModelTester:
self.decoder_layers = decoder_layers
def get_large_model_config(self):
- return T5Config.from_pretrained("t5-base")
+ return T5Config.from_pretrained("google-t5/t5-base")
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.encoder_seq_length], self.vocab_size).clamp(2)
@@ -942,7 +942,7 @@ class T5EncoderOnlyModelTester:
self.is_training = is_training
def get_large_model_config(self):
- return T5Config.from_pretrained("t5-base")
+ return T5Config.from_pretrained("google-t5/t5-base")
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.encoder_seq_length], self.vocab_size)
@@ -1096,36 +1096,40 @@ class T5ModelFp16Tests(unittest.TestCase):
with unittest.mock.patch("builtins.__import__", side_effect=import_accelerate_mock):
accelerate_available = False
- model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.float16)
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small", torch_dtype=torch.float16)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.float16)
# Load without in bf16
- model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.bfloat16)
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small", torch_dtype=torch.bfloat16)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.bfloat16)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.bfloat16)
# Load using `accelerate` in bf16
- model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.bfloat16, device_map="auto")
+ model = T5ForConditionalGeneration.from_pretrained(
+ "google-t5/t5-small", torch_dtype=torch.bfloat16, device_map="auto"
+ )
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.bfloat16)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.bfloat16)
# Load using `accelerate` in bf16
model = T5ForConditionalGeneration.from_pretrained(
- "t5-small", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True
+ "google-t5/t5-small", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True
)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.bfloat16)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.bfloat16)
# Load without using `accelerate`
model = T5ForConditionalGeneration.from_pretrained(
- "t5-small", torch_dtype=torch.float16, low_cpu_mem_usage=True
+ "google-t5/t5-small", torch_dtype=torch.float16, low_cpu_mem_usage=True
)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.float16)
# Load using `accelerate`
- model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.float16, device_map="auto")
+ model = T5ForConditionalGeneration.from_pretrained(
+ "google-t5/t5-small", torch_dtype=torch.float16, device_map="auto"
+ )
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32)
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wi.weight.dtype == torch.float16)
@@ -1136,11 +1140,11 @@ class T5ModelFp16Tests(unittest.TestCase):
class T5ModelIntegrationTests(unittest.TestCase):
@cached_property
def model(self):
- return T5ForConditionalGeneration.from_pretrained("t5-base").to(torch_device)
+ return T5ForConditionalGeneration.from_pretrained("google-t5/t5-base").to(torch_device)
@cached_property
def tokenizer(self):
- return T5Tokenizer.from_pretrained("t5-base")
+ return T5Tokenizer.from_pretrained("google-t5/t5-base")
@slow
def test_torch_quant(self):
@@ -1157,11 +1161,11 @@ class T5ModelIntegrationTests(unittest.TestCase):
@slow
def test_small_generation(self):
- model = T5ForConditionalGeneration.from_pretrained("t5-small").to(torch_device)
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small").to(torch_device)
model.config.max_length = 8
model.config.num_beams = 1
model.config.do_sample = False
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("summarize: Hello there", return_tensors="pt").input_ids.to(torch_device)
@@ -1184,8 +1188,8 @@ class T5ModelIntegrationTests(unittest.TestCase):
>>> score = t5_model.score(inputs=["Hello there"], targets=["Hi I am"], vocabulary=vocab)
"""
- model = T5ForConditionalGeneration.from_pretrained("t5-small").to(torch_device)
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small").to(torch_device)
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("Hello there", return_tensors="pt").input_ids
labels = tokenizer("Hi I am", return_tensors="pt").input_ids
@@ -1501,7 +1505,7 @@ class T5ModelIntegrationTests(unittest.TestCase):
@slow
def test_translation_en_to_fr(self):
- model = self.model # t5-base
+ model = self.model # google-t5/t5-base
tok = self.tokenizer
use_task_specific_params(model, "translation_en_to_fr")
diff --git a/tests/models/t5/test_modeling_tf_t5.py b/tests/models/t5/test_modeling_tf_t5.py
index 9976e20baf3..cab41c2b041 100644
--- a/tests/models/t5/test_modeling_tf_t5.py
+++ b/tests/models/t5/test_modeling_tf_t5.py
@@ -302,7 +302,7 @@ class TFT5ModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
@slow
def test_model_from_pretrained(self):
- model = TFT5Model.from_pretrained("t5-small")
+ model = TFT5Model.from_pretrained("google-t5/t5-small")
self.assertIsNotNone(model)
def test_generate_with_headmasking(self):
@@ -448,8 +448,8 @@ class TFT5EncoderOnlyModelTest(TFModelTesterMixin, unittest.TestCase):
class TFT5GenerationIntegrationTests(unittest.TestCase):
@slow
def test_greedy_xla_generate_simple(self):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
# two examples with different lengths to confirm that attention masks are operational in XLA
sentences = [
@@ -476,8 +476,8 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
@slow
def test_greedy_generate(self):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
sentences = ["Yesterday, my name was", "Today is a beautiful day and"]
input_ids = tokenizer(sentences, return_tensors="tf", padding=True).input_ids
@@ -505,8 +505,8 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
# forces the generation to happen on CPU, to avoid GPU-related quirks
with tf.device(":/CPU:0"):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
sentence = "Translate English to German: I have two bananas"
input_ids = tokenizer(sentence, return_tensors="tf", padding=True).input_ids
@@ -526,8 +526,8 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
@slow
def test_sample_generate(self):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
sentences = ["I really love my", "Translate English to German: the transformers are truly amazing"]
input_ids = tokenizer(sentences, return_tensors="tf", padding=True).input_ids
@@ -557,8 +557,8 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
@unittest.skip("Skip for now as TF 2.13 breaks it on GPU")
@slow
def test_beam_search_xla_generate_simple(self):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
# tests XLA with task specific arguments
task_specific_config = getattr(model.config, "task_specific_params", {})
@@ -590,8 +590,8 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
@slow
def test_beam_search_generate(self):
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
sentences = ["I really love my", "Translate English to German: the transformers are truly amazing"]
input_ids = tokenizer(sentences, return_tensors="tf", padding=True).input_ids
@@ -622,7 +622,7 @@ class TFT5GenerationIntegrationTests(unittest.TestCase):
class TFT5ModelIntegrationTests(unittest.TestCase):
@cached_property
def model(self):
- return TFT5ForConditionalGeneration.from_pretrained("t5-base")
+ return TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-base")
@slow
def test_small_integration_test(self):
@@ -638,8 +638,8 @@ class TFT5ModelIntegrationTests(unittest.TestCase):
>>> score = t5_model.score(inputs=["Hello there"], targets=["Hi I am"], vocabulary=vocab)
"""
- model = TFT5ForConditionalGeneration.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
input_ids = tokenizer("Hello there", return_tensors="tf").input_ids
labels = tokenizer("Hi I am", return_tensors="tf").input_ids
@@ -703,7 +703,7 @@ class TFT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_summarization(self):
model = self.model
- tok = T5Tokenizer.from_pretrained("t5-base")
+ tok = T5Tokenizer.from_pretrained("google-t5/t5-base")
FRANCE_ARTICLE = ( # @noqa
"Marseille, France (CNN)The French prosecutor leading an investigation into the crash of Germanwings"
@@ -948,7 +948,7 @@ class TFT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_translation_en_to_de(self):
- tok = T5Tokenizer.from_pretrained("t5-base")
+ tok = T5Tokenizer.from_pretrained("google-t5/t5-base")
model = self.model
task_specific_config = getattr(model.config, "task_specific_params", {})
@@ -978,7 +978,7 @@ class TFT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_translation_en_to_fr(self):
model = self.model
- tok = T5Tokenizer.from_pretrained("t5-base")
+ tok = T5Tokenizer.from_pretrained("google-t5/t5-base")
task_specific_config = getattr(model.config, "task_specific_params", {})
translation_config = task_specific_config.get("translation_en_to_fr", {})
@@ -1015,7 +1015,7 @@ class TFT5ModelIntegrationTests(unittest.TestCase):
@slow
def test_translation_en_to_ro(self):
model = self.model
- tok = T5Tokenizer.from_pretrained("t5-base")
+ tok = T5Tokenizer.from_pretrained("google-t5/t5-base")
task_specific_config = getattr(model.config, "task_specific_params", {})
translation_config = task_specific_config.get("translation_en_to_ro", {})
diff --git a/tests/models/t5/test_tokenization_t5.py b/tests/models/t5/test_tokenization_t5.py
index 5fa0e19c792..fdd4f253001 100644
--- a/tests/models/t5/test_tokenization_t5.py
+++ b/tests/models/t5/test_tokenization_t5.py
@@ -138,11 +138,11 @@ class T5TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@cached_property
def t5_base_tokenizer(self):
- return T5Tokenizer.from_pretrained("t5-base")
+ return T5Tokenizer.from_pretrained("google-t5/t5-base")
@cached_property
def t5_base_tokenizer_fast(self):
- return T5TokenizerFast.from_pretrained("t5-base")
+ return T5TokenizerFast.from_pretrained("google-t5/t5-base")
def get_tokenizer(self, **kwargs) -> T5Tokenizer:
return self.tokenizer_class.from_pretrained(self.tmpdirname, **kwargs)
@@ -373,7 +373,7 @@ class T5TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.tokenizer_integration_test_util(
expected_encoding=expected_encoding,
- model_name="t5-base",
+ model_name="google-t5/t5-base",
revision="5a7ff2d8f5117c194c7e32ec1ccbf04642cca99b",
)
@@ -400,7 +400,7 @@ class T5TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.assertListEqual(sorted(tokenizer.get_sentinel_token_ids()), sorted(range(1000, 1010)))
def test_some_edge_cases(self):
- tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=False)
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=False)
sp_tokens = tokenizer.sp_model.encode(">", out_type=str)
self.assertEqual(sp_tokens, ["<", "/", "s", ">", ">"])
@@ -426,8 +426,8 @@ class T5TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
def test_fast_slow_edge_cases(self):
# We are testing spaces before and spaces after special tokens + space transformations
- slow_tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy=False)
- fast_tokenizer = T5TokenizerFast.from_pretrained("t5-base", legacy=False, from_slow=True)
+ slow_tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=False)
+ fast_tokenizer = T5TokenizerFast.from_pretrained("google-t5/t5-base", legacy=False, from_slow=True)
slow_tokenizer.add_tokens(AddedToken("", rstrip=False, lstrip=False, normalized=False))
fast_tokenizer.add_tokens(AddedToken("", rstrip=False, lstrip=False, normalized=False))
@@ -445,7 +445,7 @@ class T5TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
with self.subTest(f"fast {edge_case} normalized = False"):
self.assertEqual(fast_tokenizer.tokenize(hard_case), EXPECTED_SLOW)
- fast_tokenizer = T5TokenizerFast.from_pretrained("t5-base", legacy=False, from_slow=True)
+ fast_tokenizer = T5TokenizerFast.from_pretrained("google-t5/t5-base", legacy=False, from_slow=True)
fast_tokenizer.add_tokens(AddedToken("", rstrip=False, lstrip=False, normalized=True))
# `normalized=True` is the default normalization scheme when adding a token. Normalize -> don't strip the space.
@@ -604,7 +604,7 @@ class CommonSpmIntegrationTests(unittest.TestCase):
)
# Test with T5
- hf_tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ hf_tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
vocab_path = "gs://t5-data/vocabs/cc_all.32000/sentencepiece.model"
t5x_tokenizer = SentencePieceVocabulary(vocab_path, extra_ids=300)
for text in input_texts:
diff --git a/tests/models/umt5/test_modeling_umt5.py b/tests/models/umt5/test_modeling_umt5.py
index b25873eae54..5bd961dbb3d 100644
--- a/tests/models/umt5/test_modeling_umt5.py
+++ b/tests/models/umt5/test_modeling_umt5.py
@@ -603,7 +603,7 @@ class UMT5EncoderOnlyModelTester:
self.is_training = is_training
def get_large_model_config(self):
- return UMT5Config.from_pretrained("t5-base")
+ return UMT5Config.from_pretrained("google-t5/t5-base")
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.encoder_seq_length], self.vocab_size)
diff --git a/tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py b/tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py
index c6926e002a9..98c3a275825 100644
--- a/tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py
+++ b/tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py
@@ -426,7 +426,7 @@ class FlaxViT2GPT2EncoderDecoderModelTest(FlaxEncoderDecoderMixin, unittest.Test
def get_pretrained_model(self):
return FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- "google/vit-base-patch16-224-in21k", "gpt2"
+ "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
)
@@ -434,7 +434,7 @@ class FlaxViT2GPT2EncoderDecoderModelTest(FlaxEncoderDecoderMixin, unittest.Test
class FlaxVisionEncoderDecoderModelTest(unittest.TestCase):
def get_from_encoderdecoder_pretrained_model(self):
return FlaxVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
- "google/vit-base-patch16-224-in21k", "gpt2"
+ "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
)
def _check_configuration_tie(self, model):
diff --git a/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py b/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py
index 057df26d303..b87673c0511 100644
--- a/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py
+++ b/tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py
@@ -627,7 +627,9 @@ class TFVisionEncoderDecoderMixin:
@require_tf
class TFViT2GPT2EncoderDecoderModelTest(TFVisionEncoderDecoderMixin, unittest.TestCase):
def get_pretrained_model(self):
- return TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained("google/vit-base-patch16-224-in21k", "gpt2")
+ return TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
+ )
def get_encoder_decoder_model(self, config, decoder_config):
encoder_model = TFViTModel(config, name="encoder")
@@ -672,10 +674,12 @@ class TFViT2GPT2EncoderDecoderModelTest(TFVisionEncoderDecoderMixin, unittest.Te
@require_tf
class TFVisionEncoderDecoderModelTest(unittest.TestCase):
def get_from_encoderdecoder_pretrained_model(self):
- return TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained("google/vit-base-patch16-224-in21k", "gpt2")
+ return TFVisionEncoderDecoderModel.from_encoder_decoder_pretrained(
+ "google/vit-base-patch16-224-in21k", "openai-community/gpt2"
+ )
def get_decoder_config(self):
- config = AutoConfig.from_pretrained("gpt2")
+ config = AutoConfig.from_pretrained("openai-community/gpt2")
config.is_decoder = True
config.add_cross_attention = True
return config
@@ -685,7 +689,9 @@ class TFVisionEncoderDecoderModelTest(unittest.TestCase):
def get_encoder_decoder_models(self):
encoder_model = TFViTModel.from_pretrained("google/vit-base-patch16-224-in21k", name="encoder")
- decoder_model = TFGPT2LMHeadModel.from_pretrained("gpt2", config=self.get_decoder_config(), name="decoder")
+ decoder_model = TFGPT2LMHeadModel.from_pretrained(
+ "openai-community/gpt2", config=self.get_decoder_config(), name="decoder"
+ )
return {"encoder": encoder_model, "decoder": decoder_model}
def _check_configuration_tie(self, model):
@@ -714,7 +720,7 @@ def prepare_img():
class TFVisionEncoderDecoderModelSaveLoadTests(unittest.TestCase):
def get_encoder_decoder_config(self):
encoder_config = AutoConfig.from_pretrained("google/vit-base-patch16-224-in21k")
- decoder_config = AutoConfig.from_pretrained("gpt2", is_decoder=True, add_cross_attention=True)
+ decoder_config = AutoConfig.from_pretrained("openai-community/gpt2", is_decoder=True, add_cross_attention=True)
return VisionEncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
def get_encoder_decoder_config_small(self):
@@ -829,7 +835,7 @@ class TFVisionEncoderDecoderModelSaveLoadTests(unittest.TestCase):
config = self.get_encoder_decoder_config()
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
- decoder_tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ decoder_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
img = prepare_img()
pixel_values = image_processor(images=img, return_tensors="tf").pixel_values
@@ -845,7 +851,7 @@ class TFVisionEncoderDecoderModelSaveLoadTests(unittest.TestCase):
encoder = TFAutoModel.from_pretrained("google/vit-base-patch16-224-in21k", name="encoder")
# It's necessary to specify `add_cross_attention=True` here.
decoder = TFAutoModelForCausalLM.from_pretrained(
- "gpt2", is_decoder=True, add_cross_attention=True, name="decoder"
+ "openai-community/gpt2", is_decoder=True, add_cross_attention=True, name="decoder"
)
pretrained_encoder_dir = os.path.join(tmp_dirname, "pretrained_encoder")
pretrained_decoder_dir = os.path.join(tmp_dirname, "pretrained_decoder")
diff --git a/tests/models/xlm/test_modeling_tf_xlm.py b/tests/models/xlm/test_modeling_tf_xlm.py
index 7bfa33828f7..51ba6c2476b 100644
--- a/tests/models/xlm/test_modeling_tf_xlm.py
+++ b/tests/models/xlm/test_modeling_tf_xlm.py
@@ -369,7 +369,7 @@ class TFXLMModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.TestCase)
class TFXLMModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_xlm_mlm_en_2048(self):
- model = TFXLMWithLMHeadModel.from_pretrained("xlm-mlm-en-2048")
+ model = TFXLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")
input_ids = tf.convert_to_tensor([[14, 447]], dtype=tf.int32) # the president
expected_output_ids = [
14,
diff --git a/tests/models/xlm/test_modeling_xlm.py b/tests/models/xlm/test_modeling_xlm.py
index b551e7e645d..09ad95e81ac 100644
--- a/tests/models/xlm/test_modeling_xlm.py
+++ b/tests/models/xlm/test_modeling_xlm.py
@@ -514,7 +514,7 @@ class XLMModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin,
class XLMModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_xlm_mlm_en_2048(self):
- model = XLMWithLMHeadModel.from_pretrained("xlm-mlm-en-2048")
+ model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")
model.to(torch_device)
input_ids = torch.tensor([[14, 447]], dtype=torch.long, device=torch_device) # the president
expected_output_ids = [
diff --git a/tests/models/xlm/test_tokenization_xlm.py b/tests/models/xlm/test_tokenization_xlm.py
index 6e310352158..4b5982ca985 100644
--- a/tests/models/xlm/test_tokenization_xlm.py
+++ b/tests/models/xlm/test_tokenization_xlm.py
@@ -85,7 +85,7 @@ class XLMTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@slow
def test_sequence_builders(self):
- tokenizer = XLMTokenizer.from_pretrained("xlm-mlm-en-2048")
+ tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
text = tokenizer.encode("sequence builders", add_special_tokens=False)
text_2 = tokenizer.encode("multi-sequence build", add_special_tokens=False)
diff --git a/tests/models/xlm_roberta/test_modeling_flax_xlm_roberta.py b/tests/models/xlm_roberta/test_modeling_flax_xlm_roberta.py
index 0ceaa739f3f..6af80600607 100644
--- a/tests/models/xlm_roberta/test_modeling_flax_xlm_roberta.py
+++ b/tests/models/xlm_roberta/test_modeling_flax_xlm_roberta.py
@@ -32,8 +32,8 @@ if is_flax_available():
class FlaxXLMRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_flax_xlm_roberta_base(self):
- model = FlaxXLMRobertaModel.from_pretrained("xlm-roberta-base")
- tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
+ model = FlaxXLMRobertaModel.from_pretrained("FacebookAI/xlm-roberta-base")
+ tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
text = "The dog is cute and lives in the garden house"
input_ids = jnp.array([tokenizer.encode(text)])
diff --git a/tests/models/xlm_roberta/test_modeling_xlm_roberta.py b/tests/models/xlm_roberta/test_modeling_xlm_roberta.py
index ca9db17270d..d9b69bb9ab5 100644
--- a/tests/models/xlm_roberta/test_modeling_xlm_roberta.py
+++ b/tests/models/xlm_roberta/test_modeling_xlm_roberta.py
@@ -32,7 +32,7 @@ if is_torch_available():
class XLMRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_xlm_roberta_base(self):
- model = XLMRobertaModel.from_pretrained("xlm-roberta-base")
+ model = XLMRobertaModel.from_pretrained("FacebookAI/xlm-roberta-base")
input_ids = torch.tensor([[0, 581, 10269, 83, 99942, 136, 60742, 23, 70, 80583, 18276, 2]])
# The dog is cute and lives in the garden house
@@ -51,7 +51,7 @@ class XLMRobertaModelIntegrationTest(unittest.TestCase):
@slow
def test_xlm_roberta_large(self):
- model = XLMRobertaModel.from_pretrained("xlm-roberta-large")
+ model = XLMRobertaModel.from_pretrained("FacebookAI/xlm-roberta-large")
input_ids = torch.tensor([[0, 581, 10269, 83, 99942, 136, 60742, 23, 70, 80583, 18276, 2]])
# The dog is cute and lives in the garden house
diff --git a/tests/models/xlm_roberta/test_tokenization_xlm_roberta.py b/tests/models/xlm_roberta/test_tokenization_xlm_roberta.py
index 1cba1c01d58..6e2d4446a02 100644
--- a/tests/models/xlm_roberta/test_tokenization_xlm_roberta.py
+++ b/tests/models/xlm_roberta/test_tokenization_xlm_roberta.py
@@ -212,7 +212,7 @@ class XLMRobertaTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@cached_property
def big_tokenizer(self):
- return XLMRobertaTokenizer.from_pretrained("xlm-roberta-base")
+ return XLMRobertaTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
def test_picklable_without_disk(self):
with tempfile.NamedTemporaryFile() as f:
@@ -338,6 +338,6 @@ class XLMRobertaTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.tokenizer_integration_test_util(
expected_encoding=expected_encoding,
- model_name="xlm-roberta-base",
+ model_name="FacebookAI/xlm-roberta-base",
revision="d9d8a8ea5eb94b1c6654ae9249df7793cd2933d3",
)
diff --git a/tests/models/xlnet/test_modeling_tf_xlnet.py b/tests/models/xlnet/test_modeling_tf_xlnet.py
index 03eba74f406..5d17299f9b3 100644
--- a/tests/models/xlnet/test_modeling_tf_xlnet.py
+++ b/tests/models/xlnet/test_modeling_tf_xlnet.py
@@ -491,7 +491,7 @@ class TFXLNetModelTest(TFModelTesterMixin, PipelineTesterMixin, unittest.TestCas
class TFXLNetModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_xlnet_base_cased(self):
- model = TFXLNetLMHeadModel.from_pretrained("xlnet-base-cased")
+ model = TFXLNetLMHeadModel.from_pretrained("xlnet/xlnet-base-cased")
# fmt: off
input_ids = tf.convert_to_tensor(
[
diff --git a/tests/models/xlnet/test_modeling_xlnet.py b/tests/models/xlnet/test_modeling_xlnet.py
index 2b0c95cd6d1..cd5a3d52b34 100644
--- a/tests/models/xlnet/test_modeling_xlnet.py
+++ b/tests/models/xlnet/test_modeling_xlnet.py
@@ -694,7 +694,7 @@ class XLNetModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixi
class XLNetModelLanguageGenerationTest(unittest.TestCase):
@slow
def test_lm_generate_xlnet_base_cased(self):
- model = XLNetLMHeadModel.from_pretrained("xlnet-base-cased")
+ model = XLNetLMHeadModel.from_pretrained("xlnet/xlnet-base-cased")
model.to(torch_device)
# fmt: off
input_ids = torch.tensor(
diff --git a/tests/models/xlnet/test_tokenization_xlnet.py b/tests/models/xlnet/test_tokenization_xlnet.py
index 9fb28658aab..8a7476fad92 100644
--- a/tests/models/xlnet/test_tokenization_xlnet.py
+++ b/tests/models/xlnet/test_tokenization_xlnet.py
@@ -186,7 +186,7 @@ class XLNetTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
@slow
def test_sequence_builders(self):
- tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")
+ tokenizer = XLNetTokenizer.from_pretrained("xlnet/xlnet-base-cased")
text = tokenizer.encode("sequence builders", add_special_tokens=False)
text_2 = tokenizer.encode("multi-sequence build", add_special_tokens=False)
@@ -203,6 +203,6 @@ class XLNetTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
self.tokenizer_integration_test_util(
expected_encoding=expected_encoding,
- model_name="xlnet-base-cased",
+ model_name="xlnet/xlnet-base-cased",
revision="c841166438c31ec7ca9a106dee7bb312b73ae511",
)
diff --git a/tests/models/xmod/test_modeling_xmod.py b/tests/models/xmod/test_modeling_xmod.py
index fc1ce44e35d..1a9eab5507e 100644
--- a/tests/models/xmod/test_modeling_xmod.py
+++ b/tests/models/xmod/test_modeling_xmod.py
@@ -630,7 +630,7 @@ class XmodModelIntegrationTest(unittest.TestCase):
@slow
def test_end_to_end_mask_fill(self):
- tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base")
+ tokenizer = XLMRobertaTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
model = XmodForMaskedLM.from_pretrained("facebook/xmod-base", default_language="en_XX")
model.to(torch_device)
diff --git a/tests/pipelines/test_pipelines_common.py b/tests/pipelines/test_pipelines_common.py
index e760d279014..5e3e15f39c1 100644
--- a/tests/pipelines/test_pipelines_common.py
+++ b/tests/pipelines/test_pipelines_common.py
@@ -143,7 +143,7 @@ class CommonPipelineTest(unittest.TestCase):
self.assertIsInstance(text_classifier, MyPipeline)
def test_check_task(self):
- task = get_task("gpt2")
+ task = get_task("openai-community/gpt2")
self.assertEqual(task, "text-generation")
with self.assertRaises(RuntimeError):
diff --git a/tests/pipelines/test_pipelines_fill_mask.py b/tests/pipelines/test_pipelines_fill_mask.py
index 571b320d617..bbf2b6cf3f4 100644
--- a/tests/pipelines/test_pipelines_fill_mask.py
+++ b/tests/pipelines/test_pipelines_fill_mask.py
@@ -169,13 +169,13 @@ class FillMaskPipelineTests(unittest.TestCase):
@slow
@require_torch
def test_large_model_pt(self):
- unmasker = pipeline(task="fill-mask", model="distilroberta-base", top_k=2, framework="pt")
+ unmasker = pipeline(task="fill-mask", model="distilbert/distilroberta-base", top_k=2, framework="pt")
self.run_large_test(unmasker)
@slow
@require_tf
def test_large_model_tf(self):
- unmasker = pipeline(task="fill-mask", model="distilroberta-base", top_k=2, framework="tf")
+ unmasker = pipeline(task="fill-mask", model="distilbert/distilroberta-base", top_k=2, framework="tf")
self.run_large_test(unmasker)
def run_large_test(self, unmasker):
diff --git a/tests/pipelines/test_pipelines_token_classification.py b/tests/pipelines/test_pipelines_token_classification.py
index b139fbfd2f7..eda9ac014bf 100644
--- a/tests/pipelines/test_pipelines_token_classification.py
+++ b/tests/pipelines/test_pipelines_token_classification.py
@@ -468,7 +468,7 @@ class TokenClassificationPipelineTests(unittest.TestCase):
@slow
def test_aggregation_strategy_byte_level_tokenizer(self):
sentence = "Groenlinks praat over Schiphol."
- ner = pipeline("ner", model="xlm-roberta-large-finetuned-conll02-dutch", aggregation_strategy="max")
+ ner = pipeline("ner", model="FacebookAI/xlm-roberta-large-finetuned-conll02-dutch", aggregation_strategy="max")
self.assertEqual(
nested_simplify(ner(sentence)),
[
diff --git a/tests/pipelines/test_pipelines_zero_shot.py b/tests/pipelines/test_pipelines_zero_shot.py
index 9c37014ab81..2e61d97c1dc 100644
--- a/tests/pipelines/test_pipelines_zero_shot.py
+++ b/tests/pipelines/test_pipelines_zero_shot.py
@@ -199,7 +199,9 @@ class ZeroShotClassificationPipelineTests(unittest.TestCase):
@slow
@require_torch
def test_large_model_pt(self):
- zero_shot_classifier = pipeline("zero-shot-classification", model="roberta-large-mnli", framework="pt")
+ zero_shot_classifier = pipeline(
+ "zero-shot-classification", model="FacebookAI/roberta-large-mnli", framework="pt"
+ )
outputs = zero_shot_classifier(
"Who are you voting for in 2020?", candidate_labels=["politics", "public health", "science"]
)
@@ -254,7 +256,9 @@ class ZeroShotClassificationPipelineTests(unittest.TestCase):
@slow
@require_tf
def test_large_model_tf(self):
- zero_shot_classifier = pipeline("zero-shot-classification", model="roberta-large-mnli", framework="tf")
+ zero_shot_classifier = pipeline(
+ "zero-shot-classification", model="FacebookAI/roberta-large-mnli", framework="tf"
+ )
outputs = zero_shot_classifier(
"Who are you voting for in 2020?", candidate_labels=["politics", "public health", "science"]
)
diff --git a/tests/quantization/bnb/test_4bit.py b/tests/quantization/bnb/test_4bit.py
index 4c33270af67..782e9a082fd 100644
--- a/tests/quantization/bnb/test_4bit.py
+++ b/tests/quantization/bnb/test_4bit.py
@@ -43,7 +43,7 @@ from transformers.testing_utils import (
def get_some_linear_layer(model):
- if model.config.model_type == "gpt2":
+ if model.config.model_type == "openai-community/gpt2":
return model.transformer.h[0].mlp.c_fc
elif model.config.model_type == "opt":
try:
@@ -283,7 +283,7 @@ class Bnb4BitTest(Base4bitTest):
r"""
Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly.
"""
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-small", load_in_4bit=True, device_map="auto")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-small", load_in_4bit=True, device_map="auto")
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32)
@@ -295,7 +295,7 @@ class Bnb4BitTest(Base4bitTest):
class Bnb4BitT5Test(unittest.TestCase):
@classmethod
def setUpClass(cls):
- cls.model_name = "t5-small"
+ cls.model_name = "google-t5/t5-small"
cls.dense_act_model_name = "google/flan-t5-small" # flan-t5 uses dense-act instead of dense-relu-dense
cls.tokenizer = AutoTokenizer.from_pretrained(cls.model_name)
cls.input_text = "Translate in German: Hello, my dog is cute"
@@ -311,7 +311,7 @@ class Bnb4BitT5Test(unittest.TestCase):
def test_inference_without_keep_in_fp32(self):
r"""
Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly.
- `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test
+ `flan-t5-small` uses `T5DenseGatedActDense` whereas `google-t5/t5-small` uses `T5DenseReluDense`. We need to test
both cases.
"""
from transformers import T5ForConditionalGeneration
@@ -319,7 +319,7 @@ class Bnb4BitT5Test(unittest.TestCase):
modules = T5ForConditionalGeneration._keep_in_fp32_modules
T5ForConditionalGeneration._keep_in_fp32_modules = None
- # test with `t5-small`
+ # test with `google-t5/t5-small`
model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto")
encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0)
_ = model.generate(**encoded_input)
@@ -335,12 +335,12 @@ class Bnb4BitT5Test(unittest.TestCase):
def test_inference_with_keep_in_fp32(self):
r"""
Test whether it is possible to mix both `4bit` and `fp32` weights when using `keep_in_fp32_modules` correctly.
- `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test
+ `flan-t5-small` uses `T5DenseGatedActDense` whereas `google-t5/t5-small` uses `T5DenseReluDense`. We need to test
both cases.
"""
from transformers import T5ForConditionalGeneration
- # test with `t5-small`
+ # test with `google-t5/t5-small`
model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_4bit=True, device_map="auto")
# there was a bug with decoders - this test checks that it is fixed
@@ -362,7 +362,7 @@ class Classes4BitModelTest(Base4bitTest):
super().setUp()
# model_name
self.model_name = "bigscience/bloom-560m"
- self.seq_to_seq_name = "t5-small"
+ self.seq_to_seq_name = "google-t5/t5-small"
# Different types of model
@@ -509,7 +509,7 @@ class Bnb4BitTestTraining(Base4bitTest):
class Bnb4BitGPT2Test(Bnb4BitTest):
- model_name = "gpt2-xl"
+ model_name = "openai-community/gpt2-xl"
EXPECTED_RELATIVE_DIFFERENCE = 3.3191854854152187
@@ -647,7 +647,7 @@ class GPTSerializationTest(BaseSerializationTest):
default BaseSerializationTest config tested with GPT family model
"""
- model_name = "gpt2-xl"
+ model_name = "openai-community/gpt2-xl"
@require_bitsandbytes
diff --git a/tests/quantization/bnb/test_mixed_int8.py b/tests/quantization/bnb/test_mixed_int8.py
index 0ce7274d259..b926c80398c 100644
--- a/tests/quantization/bnb/test_mixed_int8.py
+++ b/tests/quantization/bnb/test_mixed_int8.py
@@ -42,7 +42,7 @@ from transformers.testing_utils import (
def get_some_linear_layer(model):
- if model.config.model_type == "gpt2":
+ if model.config.model_type == "openai-community/gpt2":
return model.transformer.h[0].mlp.c_fc
return model.transformer.h[0].mlp.dense_4h_to_h
@@ -174,7 +174,7 @@ class MixedInt8Test(BaseMixedInt8Test):
model = OPTForCausalLM(config)
self.assertEqual(get_keys_to_not_convert(model).sort(), ["lm_head", "model.decoder.embed_tokens"].sort())
- model_id = "roberta-large"
+ model_id = "FacebookAI/roberta-large"
config = AutoConfig.from_pretrained(model_id, revision="716877d372b884cad6d419d828bac6c85b3b18d9")
with init_empty_weights():
model = AutoModelForMaskedLM.from_config(config)
@@ -240,7 +240,7 @@ class MixedInt8Test(BaseMixedInt8Test):
quantization_config = BitsAndBytesConfig(load_in_8bit=True, llm_int8_skip_modules=["classifier"])
seq_classification_model = AutoModelForSequenceClassification.from_pretrained(
- "roberta-large-mnli", quantization_config=quantization_config
+ "FacebookAI/roberta-large-mnli", quantization_config=quantization_config
)
self.assertTrue(seq_classification_model.roberta.encoder.layer[0].output.dense.weight.dtype == torch.int8)
self.assertTrue(
@@ -340,7 +340,7 @@ class MixedInt8Test(BaseMixedInt8Test):
r"""
Test whether it is possible to mix both `int8` and `fp32` weights when using `keep_in_fp32_modules` correctly.
"""
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-small", load_in_8bit=True, device_map="auto")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-small", load_in_8bit=True, device_map="auto")
self.assertTrue(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype == torch.float32)
def test_int8_serialization(self):
@@ -447,7 +447,7 @@ class MixedInt8Test(BaseMixedInt8Test):
class MixedInt8T5Test(unittest.TestCase):
@classmethod
def setUpClass(cls):
- cls.model_name = "t5-small"
+ cls.model_name = "google-t5/t5-small"
cls.dense_act_model_name = "google/flan-t5-small" # flan-t5 uses dense-act instead of dense-relu-dense
cls.tokenizer = AutoTokenizer.from_pretrained(cls.model_name)
cls.input_text = "Translate in German: Hello, my dog is cute"
@@ -463,7 +463,7 @@ class MixedInt8T5Test(unittest.TestCase):
def test_inference_without_keep_in_fp32(self):
r"""
Test whether it is possible to mix both `int8` and `fp32` weights when using `keep_in_fp32_modules` correctly.
- `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test
+ `flan-t5-small` uses `T5DenseGatedActDense` whereas `google-t5/t5-small` uses `T5DenseReluDense`. We need to test
both cases.
"""
from transformers import T5ForConditionalGeneration
@@ -471,7 +471,7 @@ class MixedInt8T5Test(unittest.TestCase):
modules = T5ForConditionalGeneration._keep_in_fp32_modules
T5ForConditionalGeneration._keep_in_fp32_modules = None
- # test with `t5-small`
+ # test with `google-t5/t5-small`
model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_8bit=True, device_map="auto")
encoded_input = self.tokenizer(self.input_text, return_tensors="pt").to(0)
_ = model.generate(**encoded_input)
@@ -487,14 +487,14 @@ class MixedInt8T5Test(unittest.TestCase):
def test_inference_with_keep_in_fp32(self):
r"""
Test whether it is possible to mix both `int8` and `fp32` weights when using `keep_in_fp32_modules` correctly.
- `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test
+ `flan-t5-small` uses `T5DenseGatedActDense` whereas `google-t5/t5-small` uses `T5DenseReluDense`. We need to test
both cases.
"""
import bitsandbytes as bnb
from transformers import T5ForConditionalGeneration
- # test with `t5-small`
+ # test with `google-t5/t5-small`
model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_8bit=True, device_map="auto")
# there was a bug with decoders - this test checks that it is fixed
@@ -514,14 +514,14 @@ class MixedInt8T5Test(unittest.TestCase):
r"""
Test whether it is possible to mix both `int8` and `fp32` weights when using `keep_in_fp32_modules` correctly on
a serialized model.
- `flan-t5-small` uses `T5DenseGatedActDense` whereas `t5-small` uses `T5DenseReluDense`. We need to test
+ `flan-t5-small` uses `T5DenseGatedActDense` whereas `google-t5/t5-small` uses `T5DenseReluDense`. We need to test
both cases.
"""
import bitsandbytes as bnb
from transformers import T5ForConditionalGeneration
- # test with `t5-small`
+ # test with `google-t5/t5-small`
model = T5ForConditionalGeneration.from_pretrained(self.model_name, load_in_8bit=True, device_map="auto")
with tempfile.TemporaryDirectory() as tmp_dir:
@@ -548,7 +548,7 @@ class MixedInt8ModelClassesTest(BaseMixedInt8Test):
super().setUp()
# model_name
self.model_name = "bigscience/bloom-560m"
- self.seq_to_seq_name = "t5-small"
+ self.seq_to_seq_name = "google-t5/t5-small"
# Different types of model
@@ -842,7 +842,7 @@ class MixedInt8TestTraining(BaseMixedInt8Test):
class MixedInt8GPT2Test(MixedInt8Test):
- model_name = "gpt2-xl"
+ model_name = "openai-community/gpt2-xl"
EXPECTED_RELATIVE_DIFFERENCE = 1.8720077507258357
EXPECTED_OUTPUTS = set()
EXPECTED_OUTPUTS.add("Hello my name is John Doe, and I'm a big fan of")
diff --git a/tests/sagemaker/test_multi_node_data_parallel.py b/tests/sagemaker/test_multi_node_data_parallel.py
index cc7f9e5e84f..2ea029a2855 100644
--- a/tests/sagemaker/test_multi_node_data_parallel.py
+++ b/tests/sagemaker/test_multi_node_data_parallel.py
@@ -25,21 +25,21 @@ if is_sagemaker_available():
{
"framework": "pytorch",
"script": "run_glue.py",
- "model_name_or_path": "distilbert-base-cased",
+ "model_name_or_path": "distilbert/distilbert-base-cased",
"instance_type": "ml.p3.16xlarge",
"results": {"train_runtime": 650, "eval_accuracy": 0.7, "eval_loss": 0.6},
},
{
"framework": "pytorch",
"script": "run_ddp.py",
- "model_name_or_path": "distilbert-base-cased",
+ "model_name_or_path": "distilbert/distilbert-base-cased",
"instance_type": "ml.p3.16xlarge",
"results": {"train_runtime": 600, "eval_accuracy": 0.7, "eval_loss": 0.6},
},
{
"framework": "tensorflow",
"script": "run_tf_dist.py",
- "model_name_or_path": "distilbert-base-cased",
+ "model_name_or_path": "distilbert/distilbert-base-cased",
"instance_type": "ml.p3.16xlarge",
"results": {"train_runtime": 600, "eval_accuracy": 0.6, "eval_loss": 0.7},
},
diff --git a/tests/sagemaker/test_multi_node_model_parallel.py b/tests/sagemaker/test_multi_node_model_parallel.py
index 95d5b9fa855..216d31de471 100644
--- a/tests/sagemaker/test_multi_node_model_parallel.py
+++ b/tests/sagemaker/test_multi_node_model_parallel.py
@@ -25,14 +25,14 @@ if is_sagemaker_available():
{
"framework": "pytorch",
"script": "run_glue_model_parallelism.py",
- "model_name_or_path": "roberta-large",
+ "model_name_or_path": "FacebookAI/roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
{
"framework": "pytorch",
"script": "run_glue.py",
- "model_name_or_path": "roberta-large",
+ "model_name_or_path": "FacebookAI/roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
diff --git a/tests/sagemaker/test_single_node_gpu.py b/tests/sagemaker/test_single_node_gpu.py
index f2a62547e78..53d966bd1e8 100644
--- a/tests/sagemaker/test_single_node_gpu.py
+++ b/tests/sagemaker/test_single_node_gpu.py
@@ -25,14 +25,14 @@ if is_sagemaker_available():
{
"framework": "pytorch",
"script": "run_glue.py",
- "model_name_or_path": "distilbert-base-cased",
+ "model_name_or_path": "distilbert/distilbert-base-cased",
"instance_type": "ml.g4dn.xlarge",
"results": {"train_runtime": 650, "eval_accuracy": 0.6, "eval_loss": 0.9},
},
{
"framework": "tensorflow",
"script": "run_tf.py",
- "model_name_or_path": "distilbert-base-cased",
+ "model_name_or_path": "distilbert/distilbert-base-cased",
"instance_type": "ml.g4dn.xlarge",
"results": {"train_runtime": 600, "eval_accuracy": 0.3, "eval_loss": 0.9},
},
diff --git a/tests/test_configuration_utils.py b/tests/test_configuration_utils.py
index 413060ddfde..5c9861e48bb 100644
--- a/tests/test_configuration_utils.py
+++ b/tests/test_configuration_utils.py
@@ -255,7 +255,7 @@ class ConfigTestUtils(unittest.TestCase):
)
def test_local_versioning(self):
- configuration = AutoConfig.from_pretrained("bert-base-cased")
+ configuration = AutoConfig.from_pretrained("google-bert/bert-base-cased")
configuration.configuration_files = ["config.4.0.0.json"]
with tempfile.TemporaryDirectory() as tmp_dir:
diff --git a/tests/test_modeling_utils.py b/tests/test_modeling_utils.py
index cef56822dc3..0d52e5a87be 100755
--- a/tests/test_modeling_utils.py
+++ b/tests/test_modeling_utils.py
@@ -709,7 +709,7 @@ class ModelUtilsTest(TestCasePlus):
def test_from_pretrained_low_cpu_mem_usage_measured(self):
# test that `from_pretrained(..., low_cpu_mem_usage=True)` uses less cpu memory than default
- mname = "bert-base-cased"
+ mname = "google-bert/bert-base-cased"
preamble = "from transformers import AutoModel"
one_liner_str = f'{preamble}; AutoModel.from_pretrained("{mname}", low_cpu_mem_usage=False)'
@@ -753,9 +753,9 @@ class ModelUtilsTest(TestCasePlus):
for i in range(12):
device_map[f"transformer.h.{i}"] = 0 if i <= 5 else 1
- model = AutoModelForCausalLM.from_pretrained("gpt2", device_map=device_map)
+ model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2", device_map=device_map)
- tokenizer = AutoTokenizer.from_pretrained("gpt2")
+ tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my name is", return_tensors="pt")
output = model.generate(inputs["input_ids"].to(0))
@@ -1165,7 +1165,7 @@ class ModelUtilsTest(TestCasePlus):
@slow
def test_pretrained_low_mem_new_config(self):
# Checking for 1 model(the same one which was described in the issue) .
- model_ids = ["gpt2"]
+ model_ids = ["openai-community/gpt2"]
for model_id in model_ids:
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_id)
@@ -1246,7 +1246,7 @@ class ModelUtilsTest(TestCasePlus):
self.assertTrue(torch.equal(p1, p2))
def test_modifying_model_config_causes_warning_saving_generation_config(self):
- model = AutoModelForCausalLM.from_pretrained("gpt2")
+ model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
model.config.top_k = 1
with tempfile.TemporaryDirectory() as tmp_dir:
with self.assertLogs("transformers.modeling_utils", level="WARNING") as logs:
@@ -1514,7 +1514,7 @@ class ModelPushToHubTester(unittest.TestCase):
The commit description supports markdown synthax see:
```python
>>> form transformers import AutoConfig
->>> config = AutoConfig.from_pretrained("bert-base-uncased")
+>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
```
"""
commit_details = model.push_to_hub(
diff --git a/tests/test_tokenization_common.py b/tests/test_tokenization_common.py
index e5b9a34702e..d0c58749114 100644
--- a/tests/test_tokenization_common.py
+++ b/tests/test_tokenization_common.py
@@ -3990,7 +3990,7 @@ class TokenizerTesterMixin:
# TODO This is ran for all models but only tests bert...
def test_clean_up_tokenization_spaces(self):
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
assert tokenizer.clean_up_tokenization_spaces is True
tokens = tokenizer.encode("This shouldn't be! He'll go.")
diff --git a/tests/test_tokenization_utils.py b/tests/test_tokenization_utils.py
index 3f7f7249f97..3f23fdb156b 100644
--- a/tests/test_tokenization_utils.py
+++ b/tests/test_tokenization_utils.py
@@ -73,11 +73,11 @@ class TokenizerUtilTester(unittest.TestCase):
response_mock.json.return_value = {}
# Download this model to make sure it's in the cache.
- _ = GPT2TokenizerFast.from_pretrained("gpt2")
+ _ = GPT2TokenizerFast.from_pretrained("openai-community/gpt2")
# Under the mock environment we get a 500 error when trying to reach the tokenizer.
with mock.patch("requests.Session.request", return_value=response_mock) as mock_head:
- _ = GPT2TokenizerFast.from_pretrained("gpt2")
+ _ = GPT2TokenizerFast.from_pretrained("openai-community/gpt2")
# This check we did call the fake head request
mock_head.assert_called()
@@ -86,7 +86,7 @@ class TokenizerUtilTester(unittest.TestCase):
try:
tmp_file = tempfile.mktemp()
with open(tmp_file, "wb") as f:
- http_get("https://huggingface.co/albert-base-v1/resolve/main/spiece.model", f)
+ http_get("https://huggingface.co/albert/albert-base-v1/resolve/main/spiece.model", f)
_ = AlbertTokenizer.from_pretrained(tmp_file)
finally:
@@ -101,7 +101,7 @@ class TokenizerUtilTester(unittest.TestCase):
with open("tokenizer.json", "wb") as f:
http_get("https://huggingface.co/hf-internal-testing/tiny-random-bert/blob/main/tokenizer.json", f)
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gpt2")
- # The tiny random BERT has a vocab size of 1024, tiny gpt2 as a vocab size of 1000
+ # The tiny random BERT has a vocab size of 1024, tiny openai-community/gpt2 as a vocab size of 1000
self.assertEqual(tokenizer.vocab_size, 1000)
# Tokenizer should depend on the remote checkpoint, not the local tokenizer.json file.
@@ -110,7 +110,7 @@ class TokenizerUtilTester(unittest.TestCase):
def test_legacy_load_from_url(self):
# This test is for deprecated behavior and can be removed in v5
- _ = AlbertTokenizer.from_pretrained("https://huggingface.co/albert-base-v1/resolve/main/spiece.model")
+ _ = AlbertTokenizer.from_pretrained("https://huggingface.co/albert/albert-base-v1/resolve/main/spiece.model")
@is_staging_test
diff --git a/tests/tokenization/test_tokenization_fast.py b/tests/tokenization/test_tokenization_fast.py
index 48ac31b97c4..6e24009ecd0 100644
--- a/tests/tokenization/test_tokenization_fast.py
+++ b/tests/tokenization/test_tokenization_fast.py
@@ -132,7 +132,7 @@ class PreTrainedTokenizationFastTest(TokenizerTesterMixin, unittest.TestCase):
sentences = ["Hello, y'all!", "How are you 😁 ? There should not be any issue right?"]
- tokenizer = Tokenizer.from_pretrained("t5-base")
+ tokenizer = Tokenizer.from_pretrained("google-t5/t5-base")
# Enable padding
tokenizer.enable_padding(pad_id=0, pad_token="", length=512, pad_to_multiple_of=8)
self.assertEqual(
@@ -179,7 +179,7 @@ class PreTrainedTokenizationFastTest(TokenizerTesterMixin, unittest.TestCase):
@require_tokenizers
class TokenizerVersioningTest(unittest.TestCase):
def test_local_versioning(self):
- tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
json_tokenizer = json.loads(tokenizer._tokenizer.to_str())
json_tokenizer["model"]["vocab"]["huggingface"] = len(tokenizer)
diff --git a/tests/tokenization/test_tokenization_utils.py b/tests/tokenization/test_tokenization_utils.py
index 186fabb7aea..e5838dd4a32 100644
--- a/tests/tokenization/test_tokenization_utils.py
+++ b/tests/tokenization/test_tokenization_utils.py
@@ -91,8 +91,8 @@ class TokenizerUtilsTest(unittest.TestCase):
def test_batch_encoding_pickle(self):
import numpy as np
- tokenizer_p = BertTokenizer.from_pretrained("bert-base-cased")
- tokenizer_r = BertTokenizerFast.from_pretrained("bert-base-cased")
+ tokenizer_p = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_r = BertTokenizerFast.from_pretrained("google-bert/bert-base-cased")
# Python no tensor
with self.subTest("BatchEncoding (Python, return_tensors=None)"):
@@ -119,8 +119,8 @@ class TokenizerUtilsTest(unittest.TestCase):
def tf_array_equals(t1, t2):
return tf.reduce_all(tf.equal(t1, t2))
- tokenizer_p = BertTokenizer.from_pretrained("bert-base-cased")
- tokenizer_r = BertTokenizerFast.from_pretrained("bert-base-cased")
+ tokenizer_p = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_r = BertTokenizerFast.from_pretrained("google-bert/bert-base-cased")
with self.subTest("BatchEncoding (Python, return_tensors=TENSORFLOW)"):
self.assert_dump_and_restore(
@@ -137,8 +137,8 @@ class TokenizerUtilsTest(unittest.TestCase):
def test_batch_encoding_pickle_pt(self):
import torch
- tokenizer_p = BertTokenizer.from_pretrained("bert-base-cased")
- tokenizer_r = BertTokenizerFast.from_pretrained("bert-base-cased")
+ tokenizer_p = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_r = BertTokenizerFast.from_pretrained("google-bert/bert-base-cased")
with self.subTest("BatchEncoding (Python, return_tensors=PYTORCH)"):
self.assert_dump_and_restore(
@@ -152,8 +152,8 @@ class TokenizerUtilsTest(unittest.TestCase):
@require_tokenizers
def test_batch_encoding_is_fast(self):
- tokenizer_p = BertTokenizer.from_pretrained("bert-base-cased")
- tokenizer_r = BertTokenizerFast.from_pretrained("bert-base-cased")
+ tokenizer_p = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
+ tokenizer_r = BertTokenizerFast.from_pretrained("google-bert/bert-base-cased")
with self.subTest("Python Tokenizer"):
self.assertFalse(tokenizer_p("Small example to_encode").is_fast)
@@ -163,7 +163,7 @@ class TokenizerUtilsTest(unittest.TestCase):
@require_tokenizers
def test_batch_encoding_word_to_tokens(self):
- tokenizer_r = BertTokenizerFast.from_pretrained("bert-base-cased")
+ tokenizer_r = BertTokenizerFast.from_pretrained("google-bert/bert-base-cased")
encoded = tokenizer_r(["Test", "\xad", "test"], is_split_into_words=True)
self.assertEqual(encoded.word_to_tokens(0), TokenSpan(start=1, end=2))
@@ -235,7 +235,7 @@ class TokenizerUtilsTest(unittest.TestCase):
def test_padding_accepts_tensors(self):
features = [{"input_ids": np.array([0, 1, 2])}, {"input_ids": np.array([0, 1, 2, 3])}]
- tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
batch = tokenizer.pad(features, padding=True)
self.assertTrue(isinstance(batch["input_ids"], np.ndarray))
@@ -249,7 +249,7 @@ class TokenizerUtilsTest(unittest.TestCase):
import torch
features = [{"input_ids": torch.tensor([0, 1, 2])}, {"input_ids": torch.tensor([0, 1, 2, 3])}]
- tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
batch = tokenizer.pad(features, padding=True)
self.assertTrue(isinstance(batch["input_ids"], torch.Tensor))
@@ -263,7 +263,7 @@ class TokenizerUtilsTest(unittest.TestCase):
import tensorflow as tf
features = [{"input_ids": tf.constant([0, 1, 2])}, {"input_ids": tf.constant([0, 1, 2, 3])}]
- tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
batch = tokenizer.pad(features, padding=True)
self.assertTrue(isinstance(batch["input_ids"], tf.Tensor))
diff --git a/tests/trainer/test_trainer.py b/tests/trainer/test_trainer.py
index d53ec2d8180..b03423bde2a 100644
--- a/tests/trainer/test_trainer.py
+++ b/tests/trainer/test_trainer.py
@@ -1537,7 +1537,7 @@ class TrainerIntegrationTest(TestCasePlus, TrainerIntegrationCommon):
with tempfile.TemporaryDirectory() as tmpdir:
testargs = f"""
run_glue.py
- --model_name_or_path distilbert-base-uncased
+ --model_name_or_path distilbert/distilbert-base-uncased
--task_name mrpc
--do_train
--do_eval
@@ -1886,7 +1886,7 @@ class TrainerIntegrationTest(TestCasePlus, TrainerIntegrationCommon):
@slow
def test_trainer_eval_mrpc(self):
- MODEL_ID = "bert-base-cased-finetuned-mrpc"
+ MODEL_ID = "google-bert/bert-base-cased-finetuned-mrpc"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
data_args = GlueDataTrainingArguments(
@@ -1901,7 +1901,7 @@ class TrainerIntegrationTest(TestCasePlus, TrainerIntegrationCommon):
@slow
def test_trainer_eval_multiple(self):
- MODEL_ID = "gpt2"
+ MODEL_ID = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID)
dataset = LineByLineTextDataset(
@@ -1930,7 +1930,7 @@ class TrainerIntegrationTest(TestCasePlus, TrainerIntegrationCommon):
@slow
def test_trainer_eval_lm(self):
- MODEL_ID = "distilroberta-base"
+ MODEL_ID = "distilbert/distilroberta-base"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
dataset = LineByLineTextDataset(
tokenizer=tokenizer,
@@ -2384,7 +2384,7 @@ class TrainerIntegrationTest(TestCasePlus, TrainerIntegrationCommon):
"launch",
script_path,
"--model_name_or_path",
- "t5-small",
+ "google-t5/t5-small",
"--per_device_train_batch_size",
"1",
"--output_dir",
diff --git a/tests/trainer/test_trainer_seq2seq.py b/tests/trainer/test_trainer_seq2seq.py
index 3f875e6d365..7a76ede3a55 100644
--- a/tests/trainer/test_trainer_seq2seq.py
+++ b/tests/trainer/test_trainer_seq2seq.py
@@ -35,7 +35,7 @@ class Seq2seqTrainerTester(TestCasePlus):
@require_torch
def test_finetune_bert2bert(self):
bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("prajjwal1/bert-tiny", "prajjwal1/bert-tiny")
- tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+ tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
bert2bert.config.vocab_size = bert2bert.config.encoder.vocab_size
bert2bert.config.eos_token_id = tokenizer.sep_token_id
@@ -144,11 +144,11 @@ class Seq2seqTrainerTester(TestCasePlus):
MAX_TARGET_LENGTH = 256
dataset = datasets.load_dataset("gsm8k", "main", split="train[:38]")
- model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
- tokenizer = T5Tokenizer.from_pretrained("t5-small")
+ model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-small")
+ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, return_tensors="pt", padding="longest")
gen_config = GenerationConfig.from_pretrained(
- "t5-small", max_length=None, min_length=None, max_new_tokens=256, min_new_tokens=1, num_beams=5
+ "google-t5/t5-small", max_length=None, min_length=None, max_new_tokens=256, min_new_tokens=1, num_beams=5
)
training_args = Seq2SeqTrainingArguments(".", predict_with_generate=True)
diff --git a/tests/utils/test_add_new_model_like.py b/tests/utils/test_add_new_model_like.py
index 61ccc184f55..b7eceb6e76c 100644
--- a/tests/utils/test_add_new_model_like.py
+++ b/tests/utils/test_add_new_model_like.py
@@ -228,7 +228,7 @@ class SomeClass:
)
def test_replace_model_patterns(self):
- bert_model_patterns = ModelPatterns("Bert", "bert-base-cased")
+ bert_model_patterns = ModelPatterns("Bert", "google-bert/bert-base-cased")
new_bert_model_patterns = ModelPatterns("New Bert", "huggingface/bert-new-base")
bert_test = '''class TFBertPreTrainedModel(PreTrainedModel):
"""
@@ -312,14 +312,14 @@ GPT_NEW_NEW_CONSTANT = "value"
# in others.
self.assertEqual(replacements, "")
- roberta_model_patterns = ModelPatterns("RoBERTa", "roberta-base", model_camel_cased="Roberta")
+ roberta_model_patterns = ModelPatterns("RoBERTa", "FacebookAI/roberta-base", model_camel_cased="Roberta")
new_roberta_model_patterns = ModelPatterns(
"RoBERTa-New", "huggingface/roberta-new-base", model_camel_cased="RobertaNew"
)
roberta_test = '''# Copied from transformers.models.bert.BertModel with Bert->Roberta
class RobertaModel(RobertaPreTrainedModel):
""" The base RoBERTa model. """
- checkpoint = roberta-base
+ checkpoint = FacebookAI/roberta-base
base_model_prefix = "roberta"
'''
roberta_expected = '''# Copied from transformers.models.bert.BertModel with Bert->RobertaNew
@@ -346,7 +346,7 @@ class RobertaNewModel(RobertaNewPreTrainedModel):
get_module_from_file("/models/gpt2/modeling_gpt2.py")
def test_duplicate_module(self):
- bert_model_patterns = ModelPatterns("Bert", "bert-base-cased")
+ bert_model_patterns = ModelPatterns("Bert", "google-bert/bert-base-cased")
new_bert_model_patterns = ModelPatterns("New Bert", "huggingface/bert-new-base")
bert_test = '''class TFBertPreTrainedModel(PreTrainedModel):
"""
@@ -395,7 +395,7 @@ NEW_BERT_CONSTANT = "value"
self.check_result(dest_file_name, bert_expected)
def test_duplicate_module_with_copied_from(self):
- bert_model_patterns = ModelPatterns("Bert", "bert-base-cased")
+ bert_model_patterns = ModelPatterns("Bert", "google-bert/bert-base-cased")
new_bert_model_patterns = ModelPatterns("New Bert", "huggingface/bert-new-base")
bert_test = '''# Copied from transformers.models.xxx.XxxModel with Xxx->Bert
class TFBertPreTrainedModel(PreTrainedModel):
@@ -656,7 +656,7 @@ NEW_BERT_CONSTANT = "value"
self.assertEqual(test_files, wav2vec2_test_files)
def test_find_base_model_checkpoint(self):
- self.assertEqual(find_base_model_checkpoint("bert"), "bert-base-uncased")
+ self.assertEqual(find_base_model_checkpoint("bert"), "google-bert/bert-base-uncased")
self.assertEqual(find_base_model_checkpoint("gpt2"), "gpt2")
def test_retrieve_model_classes(self):
@@ -719,7 +719,7 @@ NEW_BERT_CONSTANT = "value"
bert_model_patterns = bert_info["model_patterns"]
self.assertEqual(bert_model_patterns.model_name, "BERT")
- self.assertEqual(bert_model_patterns.checkpoint, "bert-base-uncased")
+ self.assertEqual(bert_model_patterns.checkpoint, "google-bert/bert-base-uncased")
self.assertEqual(bert_model_patterns.model_type, "bert")
self.assertEqual(bert_model_patterns.model_lower_cased, "bert")
self.assertEqual(bert_model_patterns.model_camel_cased, "Bert")
@@ -768,7 +768,7 @@ NEW_BERT_CONSTANT = "value"
bert_model_patterns = bert_info["model_patterns"]
self.assertEqual(bert_model_patterns.model_name, "BERT")
- self.assertEqual(bert_model_patterns.checkpoint, "bert-base-uncased")
+ self.assertEqual(bert_model_patterns.checkpoint, "google-bert/bert-base-uncased")
self.assertEqual(bert_model_patterns.model_type, "bert")
self.assertEqual(bert_model_patterns.model_lower_cased, "bert")
self.assertEqual(bert_model_patterns.model_camel_cased, "Bert")
diff --git a/tests/utils/test_hub_utils.py b/tests/utils/test_hub_utils.py
index dffc018e284..c1320baadda 100644
--- a/tests/utils/test_hub_utils.py
+++ b/tests/utils/test_hub_utils.py
@@ -105,7 +105,7 @@ class GetFromCacheTests(unittest.TestCase):
def test_get_file_from_repo_distant(self):
# `get_file_from_repo` returns None if the file does not exist
- self.assertIsNone(get_file_from_repo("bert-base-cased", "ahah.txt"))
+ self.assertIsNone(get_file_from_repo("google-bert/bert-base-cased", "ahah.txt"))
# The function raises if the repository does not exist.
with self.assertRaisesRegex(EnvironmentError, "is not a valid model identifier"):
@@ -113,9 +113,9 @@ class GetFromCacheTests(unittest.TestCase):
# The function raises if the revision does not exist.
with self.assertRaisesRegex(EnvironmentError, "is not a valid git identifier"):
- get_file_from_repo("bert-base-cased", CONFIG_NAME, revision="ahaha")
+ get_file_from_repo("google-bert/bert-base-cased", CONFIG_NAME, revision="ahaha")
- resolved_file = get_file_from_repo("bert-base-cased", CONFIG_NAME)
+ resolved_file = get_file_from_repo("google-bert/bert-base-cased", CONFIG_NAME)
# The name is the cached name which is not very easy to test, so instead we load the content.
config = json.loads(open(resolved_file, "r").read())
self.assertEqual(config["hidden_size"], 768)
diff --git a/utils/check_config_docstrings.py b/utils/check_config_docstrings.py
index 02ec510baba..8cb2c4e2fea 100644
--- a/utils/check_config_docstrings.py
+++ b/utils/check_config_docstrings.py
@@ -30,7 +30,7 @@ transformers = direct_transformers_import(PATH_TO_TRANSFORMERS)
CONFIG_MAPPING = transformers.models.auto.configuration_auto.CONFIG_MAPPING
# Regex pattern used to find the checkpoint mentioned in the docstring of `config_class`.
-# For example, `[bert-base-uncased](https://huggingface.co/bert-base-uncased)`
+# For example, `[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)`
_re_checkpoint = re.compile(r"\[(.+?)\]\((https://huggingface\.co/.+?)\)")
@@ -55,7 +55,7 @@ def get_checkpoint_from_config_class(config_class):
checkpoints = _re_checkpoint.findall(config_source)
# Each `checkpoint` is a tuple of a checkpoint name and a checkpoint link.
- # For example, `('bert-base-uncased', 'https://huggingface.co/bert-base-uncased')`
+ # For example, `('google-bert/bert-base-uncased', 'https://huggingface.co/google-bert/bert-base-uncased')`
for ckpt_name, ckpt_link in checkpoints:
# allow the link to end with `/`
if ckpt_link.endswith("/"):