Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)

* Added missing code in exemplary notebook - custom datasets fine-tuning Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification. The missing code concerns adding labels for all but first token in a single word. The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb). * Changes requested in the review - keep the code as simple as possible
2025-08-01 02:31:11 +06:00 · 2022-01-25 23:26:17 +01:00 · 2022-01-25 23:26:17 +01:00 · e79a0faeae
commit e79a0faeae
parent 0501beb846
1 changed files with 3 additions and 1 deletions
--- a/docs/source/custom_datasets.mdx
+++ b/docs/source/custom_datasets.mdx
@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
                label_ids.append(-100)
            elif word_idx != previous_word_idx:  # Only label the first token of a given word.
                label_ids.append(label[word_idx])
-
+            else:
+                label_ids.append(-100)
+            previous_word_idx = word_idx
        labels.append(label_ids)

    tokenized_inputs["labels"] = labels