Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)

* Added missing code in exemplary notebook - custom datasets fine-tuning

Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).

* Changes requested in the review - keep the code as simple as possible
This commit is contained in:
Maciej Pawłowski 2022-01-25 23:26:17 +01:00 committed by GitHub
parent 0501beb846
commit e79a0faeae
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels