Create concept guide section (#16369)

* ✨ create concept guide section * 🖍 make fixup * 🖍 apply feedback Co-authored-by: Steven <stevhliu@gmail.com>
2025-07-31 02:02:21 +06:00 · 2022-03-25 12:51:43 -07:00 · 2022-03-25 12:51:43 -07:00 · b320d87ece
commit b320d87ece
parent ed2ee373d0
8 changed files with 113 additions and 815 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -5,10 +5,6 @@
    title: Quick tour
  - local: installation
    title: Installation
-  - local: philosophy
-    title: Philosophy
-  - local: glossary
-    title: Glossary
  title: Get started
 - sections:
  - local: pipeline_tutorial
@ -17,30 +13,20 @@
    title: Load pretrained instances with an AutoClass
  - local: preprocessing
    title: Preprocess
-  - local: task_summary
-    title: Summary of the tasks
-  - local: model_summary
-    title: Summary of the models
  - local: training
    title: Fine-tune a pretrained model
  - local: accelerate
    title: Distributed training with 🤗 Accelerate
  - local: model_sharing
    title: Share a model
-  - local: tokenizer_summary
-    title: Summary of the tokenizers
-  - local: multilingual
-    title: Multi-lingual models
  title: Tutorials
 - sections:
+  - local: fast_tokenizers
+    title: "Use tokenizers from 🤗 Tokenizers"
  - local: create_a_model
-    title: Create a custom model
-  - local: multilingual
-    title: Inference for multilingual models
-  - local: troubleshooting
-    title: Troubleshoot
-  - local: custom_datasets
-    title: Fine-tuning with custom datasets
+    title: Create a custom architecture
+  - local: custom_models
+    title: Sharing custom models
  - sections:
    - local: tasks/sequence_classification
      title: Text classification
@ -65,47 +51,59 @@
    title: Fine-tune for downstream tasks
  - local: run_scripts
    title: Train with a script
-  - local: notebooks
-    title: "🤗 Transformers Notebooks"
  - local: sagemaker
    title: Run training on Amazon SageMaker
-  - local: community
-    title: Community
+  - local: multilingual
+    title: Inference for multilingual models
  - local: converting_tensorflow_models
-    title: Converting Tensorflow Checkpoints
+    title: Converting TensorFlow Checkpoints
+  - local: serialization
+    title: Export 🤗 Transformers models
+  - local: performance
+    title: 'Performance and Scalability: How To Fit a Bigger Model and Train It Faster'
+  - local: parallelism
+    title: Model Parallelism
+  - local: benchmarks
+    title: Benchmarks
  - local: migration
    title: Migrating from previous packages
+  - local: troubleshooting
+    title: Troubleshoot
+  - local: debugging
+    title: Debugging
+  - local: notebooks
+    title: "🤗 Transformers Notebooks"
+  - local: community
+    title: Community
  - local: contributing
    title: How to contribute to transformers?
  - local: add_new_model
    title: "How to add a model to 🤗 Transformers?"
  - local: add_new_pipeline
    title: "How to add a pipeline to 🤗 Transformers?"
-  - local: fast_tokenizers
-    title: "Using tokenizers from 🤗 Tokenizers"
-  - local: performance
-    title: 'Performance and Scalability: How To Fit a Bigger Model and Train It Faster'
-  - local: parallelism
-    title: Model Parallelism
  - local: testing
    title: Testing
-  - local: debugging
-    title: Debugging
-  - local: serialization
-    title: Exporting 🤗 Transformers models
-  - local: custom_models
-    title: Sharing custom models
  - local: pr_checks
    title: Checks on a Pull Request
  title: How-to guides
 - sections:
+  - local: philosophy
+    title: Philosophy
+  - local: glossary
+    title: Glossary
+  - local: task_summary
+    title: Summary of the tasks
+  - local: model_summary
+    title: Summary of the models
+  - local: tokenizer_summary
+    title: Summary of the tokenizers
+  - local: pad_truncation
+    title: Padding and truncation
  - local: bertology
    title: BERTology
  - local: perplexity
    title: Perplexity of fixed-length models
-  - local: benchmarks
-    title: Benchmarks
-  title: Research
+  title: Conceptual guides
 - sections:
  - sections:
    - local: main_classes/callback
--- a/docs/source/create_a_model.mdx
+++ b/docs/source/create_a_model.mdx
@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Create a custom model
+# Create a custom architecture

 An [`AutoClass`](model_doc/auto) automatically infers the model architecture and downloads pretrained configuration and weights. Generally, we recommend using an `AutoClass` to produce checkpoint-agnostic code. But users who want more control over specific model parameters can create a custom 🤗 Transformers model from just a few base classes. This could be particularly useful for anyone who is interested in studying, training or experimenting with a 🤗 Transformers model. In this guide, dive deeper into creating a custom model without an `AutoClass`. Learn how to:

--- a/docs/source/custom_datasets.mdx
+++ b/docs/source/custom_datasets.mdx
@ -1,702 +0,0 @@
-<!--Copyright 2020 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-->
-
-# How to fine-tune a model for common downstream tasks
-
-[[open-in-colab]]
-
-This guide will show you how to fine-tune 🤗 Transformers models for common downstream tasks. You will use the 🤗
-Datasets library to quickly load and preprocess the datasets, getting them ready for training with PyTorch and
-TensorFlow.
-
-Before you begin, make sure you have the 🤗 Datasets library installed. For more detailed installation instructions,
-refer to the 🤗 Datasets [installation page](https://huggingface.co/docs/datasets/installation.html). All of the
-examples in this guide will use 🤗 Datasets to load and preprocess a dataset.
-
-```bash
-pip install datasets
-```
-
-Learn how to fine-tune a model for:
-
- [seq_imdb](#seq_imdb)
- [tok_ner](#tok_ner)
- [qa_squad](#qa_squad)
-
-<a id='seq_imdb'></a>
-
-## Sequence classification with IMDb reviews
-
-Sequence classification refers to the task of classifying sequences of text according to a given number of classes. In
-this example, learn how to fine-tune a model on the [IMDb dataset](https://huggingface.co/datasets/imdb) to determine
-whether a review is positive or negative.
-
-<Tip>
-
-For a more in-depth example of how to fine-tune a model for text classification, take a look at the corresponding
-[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)
-or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
-
-</Tip>
-
-### Load IMDb dataset
-
-The 🤗 Datasets library makes it simple to load a dataset:
-
-```python
-from datasets import load_dataset
-
-imdb = load_dataset("imdb")
-```
-
-This loads a `DatasetDict` object which you can index into to view an example:
-
-```python
-imdb["train"][0]
-{
-    "label": 1,
-    "text": "Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!",
-}
-```
-
-### Preprocess
-
-The next step is to tokenize the text into a readable format by the model. It is important to load the same tokenizer a
-model was trained with to ensure appropriately tokenized words. Load the DistilBERT tokenizer with the
-[`AutoTokenizer`] because we will eventually train a classifier using a pretrained [DistilBERT](https://huggingface.co/distilbert-base-uncased) model:
-
-```python
-from transformers import AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
-```
-
-Now that you have instantiated a tokenizer, create a function that will tokenize the text. You should also truncate
-longer sequences in the text to be no longer than the model's maximum input length:
-
-```python
-def preprocess_function(examples):
-    return tokenizer(examples["text"], truncation=True)
-```
-
-Use 🤗 Datasets `map` function to apply the preprocessing function to the entire dataset. You can also set
-`batched=True` to apply the preprocessing function to multiple elements of the dataset at once for faster
-preprocessing:
-
-```python
-tokenized_imdb = imdb.map(preprocess_function, batched=True)
-```
-
-Lastly, pad your text so they are a uniform length. While it is possible to pad your text in the `tokenizer` function
-by setting `padding=True`, it is more efficient to only pad the text to the length of the longest element in its
-batch. This is known as **dynamic padding**. You can do this with the `DataCollatorWithPadding` function:
-
-```python
-from transformers import DataCollatorWithPadding
-
-data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
-```
-
-### Fine-tune with the Trainer API
-
-Now load your model with the [`AutoModelForSequenceClassification`] class along with the number of expected labels:
-
-```python
-from transformers import AutoModelForSequenceClassification
-
-model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
-```
-
-At this point, only three steps remain:
-
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to a [`Trainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`Trainer.train()`] to fine-tune your model.
-
-```python
-from transformers import TrainingArguments, Trainer
-
-training_args = TrainingArguments(
-    output_dir="./results",
-    learning_rate=2e-5,
-    per_device_train_batch_size=16,
-    per_device_eval_batch_size=16,
-    num_train_epochs=5,
-    weight_decay=0.01,
-)
-
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=tokenized_imdb["train"],
-    eval_dataset=tokenized_imdb["test"],
-    tokenizer=tokenizer,
-    data_collator=data_collator,
-)
-
-trainer.train()
-```
-
-### Fine-tune with TensorFlow
-
-Fine-tuning with TensorFlow is just as easy, with only a few differences.
-
-Start by batching the processed examples together with dynamic padding using the [`DataCollatorWithPadding`] function.
-Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of PyTorch tensors!
-
-```python
-from transformers import DataCollatorWithPadding
-
-data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf")
-```
-
-Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`. Specify inputs and labels in the
-`columns` argument:
-
-```python
-tf_train_set = tokenized_imdb["train"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "label"],
-    shuffle=True,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-
-tf_validation_set = tokenized_imdb["test"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "label"],
-    shuffle=False,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-```
-
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
-
-```python
-from transformers import create_optimizer
-import tensorflow as tf
-
-batch_size = 16
-num_train_epochs = 5
-batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
-total_train_steps = int(batches_per_epoch * num_train_epochs)
-optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
-```
-
-Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels:
-
-```python
-from transformers import TFAutoModelForSequenceClassification
-
-model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
-```
-
-Compile the model:
-
-```python
-import tensorflow as tf
-
-model.compile(optimizer=optimizer)
-```
-
-Finally, fine-tune the model by calling `model.fit`:
-
-```python
-model.fit(
-    tf_train_set,
-    validation_data=tf_validation_set,
-    epochs=num_train_epochs,
-)
-```
-
-<a id='tok_ner'></a>
-
-## Token classification with WNUT emerging entities
-
-Token classification refers to the task of classifying individual tokens in a sentence. One of the most common token
-classification tasks is Named Entity Recognition (NER). NER attempts to find a label for each entity in a sentence,
-such as a person, location, or organization. In this example, learn how to fine-tune a model on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
-
-<Tip>
-
-For a more in-depth example of how to fine-tune a model for token classification, take a look at the corresponding
-[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)
-or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
-
-</Tip>
-
-### Load WNUT 17 dataset
-
-Load the WNUT 17 dataset from the 🤗 Datasets library:
-
-```python
->>> from datasets import load_dataset
-
->>> wnut = load_dataset("wnut_17")
-```
-
-A quick look at the dataset shows the labels associated with each word in the sentence:
-
-```python
->>> wnut["train"][0]
-{'id': '0',
- 'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0],
- 'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.']
-}
-```
-
-View the specific NER tags by:
-
-```python
->>> label_list = wnut["train"].features[f"ner_tags"].feature.names
->>> label_list
-[
-    "O",
-    "B-corporation",
-    "I-corporation",
-    "B-creative-work",
-    "I-creative-work",
-    "B-group",
-    "I-group",
-    "B-location",
-    "I-location",
-    "B-person",
-    "I-person",
-    "B-product",
-    "I-product",
-]
-```
-
-A letter prefixes each NER tag which can mean:
-
- `B-` indicates the beginning of an entity.
- `I-` indicates a token is contained inside the same entity (e.g., the `State` token is a part of an entity like
-  `Empire State Building`).
- `0` indicates the token doesn't correspond to any entity.
-
-### Preprocess
-
-Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
-
-```python
-from transformers import AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
-```
-
-Since the input has already been split into words, set `is_split_into_words=True` to tokenize the words into
-subwords:
-
-```python
->>> tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
->>> tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
->>> tokens
-['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
-```
-
-The addition of the special tokens `[CLS]` and `[SEP]` and subword tokenization creates a mismatch between the
-input and labels. Realign the labels and tokens by:
-
-1. Mapping all tokens to their corresponding word with the `word_ids` method.
-2. Assigning the label `-100` to the special tokens `[CLS]` and ``[SEP]``` so the PyTorch loss function ignores
-   them.
-3. Only labeling the first token of a given word. Assign `-100` to the other subtokens from the same word.
-
-Here is how you can create a function that will realign the labels and tokens:
-
-```python
-def tokenize_and_align_labels(examples):
-    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
-
-    labels = []
-    for i, label in enumerate(examples[f"ner_tags"]):
-        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
-        previous_word_idx = None
-        label_ids = []
-        for word_idx in word_ids:  # Set the special tokens to -100.
-            if word_idx is None:
-                label_ids.append(-100)
-            elif word_idx != previous_word_idx:  # Only label the first token of a given word.
-                label_ids.append(label[word_idx])
-            else:
-                label_ids.append(-100)
-            previous_word_idx = word_idx
-        labels.append(label_ids)
-
-    tokenized_inputs["labels"] = labels
-    return tokenized_inputs
-```
-
-Now tokenize and align the labels over the entire dataset with 🤗 Datasets `map` function:
-
-```python
-tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
-```
-
-Finally, pad your text and labels, so they are a uniform length:
-
-```python
-from transformers import DataCollatorForTokenClassification
-
-data_collator = DataCollatorForTokenClassification(tokenizer)
-```
-
-### Fine-tune with the Trainer API
-
-Load your model with the [`AutoModelForTokenClassification`] class along with the number of expected labels:
-
-```python
-from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
-
-model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
-```
-
-Gather your training arguments in [`TrainingArguments`]:
-
-```python
-training_args = TrainingArguments(
-    output_dir="./results",
-    evaluation_strategy="epoch",
-    learning_rate=2e-5,
-    per_device_train_batch_size=16,
-    per_device_eval_batch_size=16,
-    num_train_epochs=3,
-    weight_decay=0.01,
-)
-```
-
-Collect your model, training arguments, dataset, data collator, and tokenizer in [`Trainer`]:
-
-```python
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=tokenized_wnut["train"],
-    eval_dataset=tokenized_wnut["test"],
-    data_collator=data_collator,
-    tokenizer=tokenizer,
-)
-```
-
-Fine-tune your model:
-
-```python
-trainer.train()
-```
-
-### Fine-tune with TensorFlow
-
-Batch your examples together and pad your text and labels, so they are a uniform length:
-
-```python
-from transformers import DataCollatorForTokenClassification
-
-data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf")
-```
-
-Convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`:
-
-```python
-tf_train_set = tokenized_wnut["train"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "labels"],
-    shuffle=True,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-
-tf_validation_set = tokenized_wnut["validation"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "labels"],
-    shuffle=False,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-```
-
-Load the model with the [`TFAutoModelForTokenClassification`] class along with the number of expected labels:
-
-```python
-from transformers import TFAutoModelForTokenClassification
-
-model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
-```
-
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
-
-```python
-from transformers import create_optimizer
-
-batch_size = 16
-num_train_epochs = 3
-num_train_steps = (len(tokenized_datasets["train"]) // batch_size) * num_train_epochs
-optimizer, lr_schedule = create_optimizer(
-    init_lr=2e-5,
-    num_train_steps=num_train_steps,
-    weight_decay_rate=0.01,
-    num_warmup_steps=0,
-)
-```
-
-Compile the model:
-
-```python
-import tensorflow as tf
-
-model.compile(optimizer=optimizer)
-```
-
-Call `model.fit` to fine-tune your model:
-
-```python
-model.fit(
-    tf_train_set,
-    validation_data=tf_validation_set,
-    epochs=num_train_epochs,
-)
-```
-
-<a id='qa_squad'></a>
-
-## Question Answering with SQuAD
-
-There are many types of question answering (QA) tasks. Extractive QA focuses on identifying the answer from the text
-given a question. In this example, learn how to fine-tune a model on the [SQuAD](https://huggingface.co/datasets/squad) dataset.
-
-<Tip>
-
-For a more in-depth example of how to fine-tune a model for question answering, take a look at the corresponding
-[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)
-or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
-
-</Tip>
-
-### Load SQuAD dataset
-
-Load the SQuAD dataset from the 🤗 Datasets library:
-
-```python
-from datasets import load_dataset
-
-squad = load_dataset("squad")
-```
-
-Take a look at an example from the dataset:
-
-```python
->>> squad["train"][0]
-{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
- 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
- 'id': '5733be284776f41900661182',
- 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
- 'title': 'University_of_Notre_Dame'
-}
-```
-
-### Preprocess
-
-Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
-
-```python
-from transformers import AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
-```
-
-There are a few things to be aware of when preprocessing text for question answering:
-
-1. Some examples in a dataset may have a very long `context` that exceeds the maximum input length of the model. You
-   can deal with this by truncating the `context` and set `truncation="only_second"`.
-2. Next, you need to map the start and end positions of the answer to the original context. Set
-   `return_offset_mapping=True` to handle this.
-3. With the mapping in hand, you can find the start and end tokens of the answer. Use the `sequence_ids` method to
-   find which part of the offset corresponds to the question, and which part of the offset corresponds to the context.
-
-Assemble everything in a preprocessing function as shown below:
-
-```python
-def preprocess_function(examples):
-    questions = [q.strip() for q in examples["question"]]
-    inputs = tokenizer(
-        questions,
-        examples["context"],
-        max_length=384,
-        truncation="only_second",
-        return_offsets_mapping=True,
-        padding="max_length",
-    )
-
-    offset_mapping = inputs.pop("offset_mapping")
-    answers = examples["answers"]
-    start_positions = []
-    end_positions = []
-
-    for i, offset in enumerate(offset_mapping):
-        answer = answers[i]
-        start_char = answer["answer_start"][0]
-        end_char = answer["answer_start"][0] + len(answer["text"][0])
-        sequence_ids = inputs.sequence_ids(i)
-
-        # Find the start and end of the context
-        idx = 0
-        while sequence_ids[idx] != 1:
-            idx += 1
-        context_start = idx
-        while sequence_ids[idx] == 1:
-            idx += 1
-        context_end = idx - 1
-
-        # If the answer is not fully inside the context, label it (0, 0)
-        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
-            start_positions.append(0)
-            end_positions.append(0)
-        else:
-            # Otherwise it's the start and end token positions
-            idx = context_start
-            while idx <= context_end and offset[idx][0] <= start_char:
-                idx += 1
-            start_positions.append(idx - 1)
-
-            idx = context_end
-            while idx >= context_start and offset[idx][1] >= end_char:
-                idx -= 1
-            end_positions.append(idx + 1)
-
-    inputs["start_positions"] = start_positions
-    inputs["end_positions"] = end_positions
-    return inputs
-```
-
-Apply the preprocessing function over the entire dataset with 🤗 Datasets `map` function:
-
-```python
-tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)
-```
-
-Batch the processed examples together:
-
-```python
-from transformers import default_data_collator
-
-data_collator = default_data_collator
-```
-
-### Fine-tune with the Trainer API
-
-Load your model with the [`AutoModelForQuestionAnswering`] class:
-
-```python
-from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
-
-model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
-```
-
-Gather your training arguments in [`TrainingArguments`]:
-
-```python
-training_args = TrainingArguments(
-    output_dir="./results",
-    evaluation_strategy="epoch",
-    learning_rate=2e-5,
-    per_device_train_batch_size=16,
-    per_device_eval_batch_size=16,
-    num_train_epochs=3,
-    weight_decay=0.01,
-)
-```
-
-Collect your model, training arguments, dataset, data collator, and tokenizer in [`Trainer`]:
-
-```python
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=tokenized_squad["train"],
-    eval_dataset=tokenized_squad["validation"],
-    data_collator=data_collator,
-    tokenizer=tokenizer,
-)
-```
-
-Fine-tune your model:
-
-```python
-trainer.train()
-```
-
-### Fine-tune with TensorFlow
-
-Batch the processed examples together with a TensorFlow default data collator:
-
-```python
-from transformers.data.data_collator import tf_default_collator
-
-data_collator = tf_default_collator
-```
-
-Convert your datasets to the `tf.data.Dataset` format with the `to_tf_dataset` function:
-
-```python
-tf_train_set = tokenized_squad["train"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "start_positions", "end_positions"],
-    dummy_labels=True,
-    shuffle=True,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-
-tf_validation_set = tokenized_squad["validation"].to_tf_dataset(
-    columns=["attention_mask", "input_ids", "start_positions", "end_positions"],
-    dummy_labels=True,
-    shuffle=False,
-    batch_size=16,
-    collate_fn=data_collator,
-)
-```
-
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
-
-```python
-from transformers import create_optimizer
-
-batch_size = 16
-num_epochs = 2
-total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
-optimizer, schedule = create_optimizer(
-    init_lr=2e-5,
-    num_warmup_steps=0,
-    num_train_steps=total_train_steps,
-)
-```
-
-Load your model with the [`TFAutoModelForQuestionAnswering`] class:
-
-```python
-from transformers import TFAutoModelForQuestionAnswering
-
-model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
-```
-
-Compile the model:
-
-```python
-import tensorflow as tf
-
-model.compile(optimizer=optimizer)
-```
-
-Call `model.fit` to fine-tune the model:
-
-```python
-model.fit(
-    tf_train_set,
-    validation_data=tf_validation_set,
-    epochs=num_train_epochs,
-)
-```
--- a/docs/source/fast_tokenizers.mdx
+++ b/docs/source/fast_tokenizers.mdx
@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Using tokenizers from 🤗 Tokenizers
+# Use tokenizers from 🤗 Tokenizers

 The [`PreTrainedTokenizerFast`] depends on the [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) library. The tokenizers obtained from the 🤗 Tokenizers library can be
 loaded very simply into 🤗 Transformers.
--- a/docs/source/index.mdx
+++ b/docs/source/index.mdx
@ -35,20 +35,17 @@ Each 🤗 Transformers architecture is defined in a standalone Python module so

 The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
-  and a glossary.
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
-  transformers model
- **API** contains the documentation of each public class and function, grouped in:
+- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
+- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
+- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
+- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers. 
+- **API** describes each class and function, grouped in:

  - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
  - **MODELS** for the classes and functions related to each model implemented in the library.
  - **INTERNAL HELPERS** for the classes and functions we use internally.

-The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
-conversion utilities for the following models.
+The library currently contains JAX, PyTorch and TensorFlow implementations, pretrained model weights, usage scripts and conversion utilities for the following models.

 ### Supported models

--- a/docs/source/pad_truncation.mdx
+++ b/docs/source/pad_truncation.mdx
@ -0,0 +1,66 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Padding and truncation
+
+Batched inputs are often different lengths, so they can't be converted to fixed-size tensors. Padding and truncation are strategies for dealing with this problem, to create rectangular tensors from batches of varying lengths. Padding adds a special **padding token** to ensure shorter sequences will have the same length as either the longest sequence in a batch or the maximum length accepted by the model. Truncation works in the other direction by truncating long sequences.
+
+In most cases, padding your batch to the length of the longest sequence and truncating to the maximum length a model can accept works pretty well. However, the API supports more strategies if you need them. The three arguments you need to are: `padding`, `truncation` and `max_length`.
+
+The `padding` argument controls padding. It can be a boolean or a string:
+
+  - `True` or `'longest'`: pad to the longest sequence in the batch (no padding is applied if you only provide
+    a single sequence).
+  - `'max_length'`: pad to a length specified by the `max_length` argument or the maximum length accepted
+    by the model if no `max_length` is provided (`max_length=None`). Padding will still be applied if you only provide a single sequence.
+  - `False` or `'do_not_pad'`: no padding is applied. This is the default behavior.
+
+The `truncation` argument controls truncation. It can be a boolean or a string:
+
+  - `True` or `'longest_first'`: truncate to a maximum length specified by the `max_length` argument or
+    the maximum length accepted by the model if no `max_length` is provided (`max_length=None`). This will
+    truncate token by token, removing a token from the longest sequence in the pair until the proper length is
+    reached.
+  - `'only_second'`: truncate to a maximum length specified by the `max_length` argument or the maximum
+    length accepted by the model if no `max_length` is provided (`max_length=None`). This will only truncate
+    the second sentence of a pair if a pair of sequences (or a batch of pairs of sequences) is provided.
+  - `'only_first'`: truncate to a maximum length specified by the `max_length` argument or the maximum
+    length accepted by the model if no `max_length` is provided (`max_length=None`). This will only truncate
+    the first sentence of a pair if a pair of sequences (or a batch of pairs of sequences) is provided.
+  - `False` or `'do_not_truncate'`: no truncation is applied. This is the default behavior.
+
+The `max_length` argument controls the length of the padding and truncation. It can be an integer or `None`, in which case it will default to the maximum length the model can accept. If the model has no specific maximum input length, truncation or padding to `max_length` is deactivated.
+
+The following table summarizes the recommended way to setup padding and truncation. If you use pairs of input sequences in any of the following examples, you can replace `truncation=True` by a `STRATEGY` selected in
+`['only_first', 'only_second', 'longest_first']`, i.e. `truncation='only_second'` or `truncation='longest_first'` to control how both sequences in the pair are truncated as detailed before.
+
+| Truncation                           | Padding                           | Instruction                                                                                 |
+|--------------------------------------|-----------------------------------|---------------------------------------------------------------------------------------------|
+| no truncation                        | no padding                        | `tokenizer(batch_sentences)`                                                           |
+|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True)` or                                          |
+|                                      |                                   | `tokenizer(batch_sentences, padding='longest')`                                        |
+|                                      | padding to max model input length | `tokenizer(batch_sentences, padding='max_length')`                                     |
+|                                      | padding to specific length        | `tokenizer(batch_sentences, padding='max_length', max_length=42)`                      |
+| truncation to max model input length | no padding                        | `tokenizer(batch_sentences, truncation=True)` or                                       |
+|                                      |                                   | `tokenizer(batch_sentences, truncation=STRATEGY)`                                      |
+|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True, truncation=True)` or                         |
+|                                      |                                   | `tokenizer(batch_sentences, padding=True, truncation=STRATEGY)`                        |
+|                                      | padding to max model input length | `tokenizer(batch_sentences, padding='max_length', truncation=True)` or                 |
+|                                      |                                   | `tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY)`                |
+|                                      | padding to specific length        | Not possible                                                                                |
+| truncation to specific length        | no padding                        | `tokenizer(batch_sentences, truncation=True, max_length=42)` or                        |
+|                                      |                                   | `tokenizer(batch_sentences, truncation=STRATEGY, max_length=42)`                       |
+|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True, truncation=True, max_length=42)` or          |
+|                                      |                                   | `tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42)`         |
+|                                      | padding to max model input length | Not possible                                                                                |
+|                                      | padding to specific length        | `tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42)` or  |
+|                                      |                                   | `tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42)` |
--- a/docs/source/preprocessing.mdx
+++ b/docs/source/preprocessing.mdx
@ -494,65 +494,4 @@ A processor combines a feature extractor and tokenizer. Load a processor with [`

 Notice the processor has added `input_values` and `labels`. The sampling rate has also been correctly downsampled to 16kHz.

-Awesome, you should now be able to preprocess data for any modality and even combine different modalities! In the next tutorial, learn how to fine-tune a model on your newly preprocessed data.
-
-## Everything you always wanted to know about padding and truncation
-
-We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
-truncate to the maximum length the model can accept). However, the API supports more strategies if you need them. The
-three arguments you need to know for this are `padding`, `truncation` and `max_length`.
-
- `padding` controls the padding. It can be a boolean or a string which should be:
-
-  - `True` or `'longest'` to pad to the longest sequence in the batch (doing no padding if you only provide
-    a single sequence).
-  - `'max_length'` to pad to a length specified by the `max_length` argument or the maximum length accepted
-    by the model if no `max_length` is provided (`max_length=None`). If you only provide a single sequence,
-    padding will still be applied to it.
-  - `False` or `'do_not_pad'` to not pad the sequences. As we have seen before, this is the default
-    behavior.
-
- `truncation` controls the truncation. It can be a boolean or a string which should be:
-
-  - `True` or `'longest_first'` truncate to a maximum length specified by the `max_length` argument or
-    the maximum length accepted by the model if no `max_length` is provided (`max_length=None`). This will
-    truncate token by token, removing a token from the longest sequence in the pair until the proper length is
-    reached.
-  - `'only_second'` truncate to a maximum length specified by the `max_length` argument or the maximum
-    length accepted by the model if no `max_length` is provided (`max_length=None`). This will only truncate
-    the second sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
-  - `'only_first'` truncate to a maximum length specified by the `max_length` argument or the maximum
-    length accepted by the model if no `max_length` is provided (`max_length=None`). This will only truncate
-    the first sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
-  - `False` or `'do_not_truncate'` to not truncate the sequences. As we have seen before, this is the
-    default behavior.
-
- `max_length` to control the length of the padding/truncation. It can be an integer or `None`, in which case
-  it will default to the maximum length the model can accept. If the model has no specific maximum input length,
-  truncation/padding to `max_length` is deactivated.
-
-Here is a table summarizing the recommend way to setup padding and truncation. If you use pair of inputs sequence in
-any of the following examples, you can replace `truncation=True` by a `STRATEGY` selected in
-`['only_first', 'only_second', 'longest_first']`, i.e. `truncation='only_second'` or `truncation= 'longest_first'` to control how both sequence in the pair are truncated as detailed before.
-
-| Truncation                           | Padding                           | Instruction                                                                                 |
-|--------------------------------------|-----------------------------------|---------------------------------------------------------------------------------------------|
-| no truncation                        | no padding                        | `tokenizer(batch_sentences)`                                                           |
-|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True)` or                                          |
-|                                      |                                   | `tokenizer(batch_sentences, padding='longest')`                                        |
-|                                      | padding to max model input length | `tokenizer(batch_sentences, padding='max_length')`                                     |
-|                                      | padding to specific length        | `tokenizer(batch_sentences, padding='max_length', max_length=42)`                      |
-| truncation to max model input length | no padding                        | `tokenizer(batch_sentences, truncation=True)` or                                       |
-|                                      |                                   | `tokenizer(batch_sentences, truncation=STRATEGY)`                                      |
-|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True, truncation=True)` or                         |
-|                                      |                                   | `tokenizer(batch_sentences, padding=True, truncation=STRATEGY)`                        |
-|                                      | padding to max model input length | `tokenizer(batch_sentences, padding='max_length', truncation=True)` or                 |
-|                                      |                                   | `tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY)`                |
-|                                      | padding to specific length        | Not possible                                                                                |
-| truncation to specific length        | no padding                        | `tokenizer(batch_sentences, truncation=True, max_length=42)` or                        |
-|                                      |                                   | `tokenizer(batch_sentences, truncation=STRATEGY, max_length=42)`                       |
-|                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True, truncation=True, max_length=42)` or          |
-|                                      |                                   | `tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42)`         |
-|                                      | padding to max model input length | Not possible                                                                                |
-|                                      | padding to specific length        | `tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42)` or  |
-|                                      |                                   | `tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42)` |
+Awesome, you should now be able to preprocess data for any modality and even combine different modalities! In the next tutorial, learn how to fine-tune a model on your newly preprocessed data.
--- a/docs/source/serialization.mdx
+++ b/docs/source/serialization.mdx
@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Exporting 🤗 Transformers Models
+# Export 🤗 Transformers Models

 If you need to deploy 🤗 Transformers models in production environments, we
 recommend exporting them to a serialized format that can be loaded and executed