29 KiB
Glossary
ãã®çšèªéã¯ãäžè¬çãªæ©æ¢°åŠç¿ãš ð€ ãã©ã³ã¹ãã©ãŒããŒã®çšèªãå®çŸ©ããããã¥ã¡ã³ããŒã·ã§ã³ãããçè§£ããã®ã«åœ¹ç«ã¡ãŸãã
A
attention mask
ã¢ãã³ã·ã§ã³ ãã¹ã¯ã¯ãã·ãŒã±ã³ã¹ããããåŠçããéã«äœ¿çšããããªãã·ã§ã³ã®åŒæ°ã§ãã
ãã®åŒæ°ã¯ãã¢ãã«ã«ã©ã®ããŒã¯ã³ã泚èŠãã¹ãããã©ã®ããŒã¯ã³ã泚èŠããªããã瀺ããŸãã
äŸãã°ã次ã®2ã€ã®ã·ãŒã±ã³ã¹ãèããŠã¿ãŠãã ããïŒ
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
The encoded versions have different lengths:
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
ãããã£ãŠããããã®ã·ãŒã±ã³ã¹ããã®ãŸãŸåããã³ãœã«ã«é 眮ããããšã¯ã§ããŸãããæåã®ã·ãŒã±ã³ã¹ã¯ã 2çªç®ã®ã·ãŒã±ã³ã¹ã®é·ãã«åãããŠããã£ã³ã°ããå¿ èŠããããŸãããŸãã¯ã2çªç®ã®ã·ãŒã±ã³ã¹ã¯ãæåã®ã·ãŒã±ã³ã¹ã® é·ãã«åãè©°ããå¿ èŠããããŸãã
æåã®å ŽåãIDã®ãªã¹ãã¯ããã£ã³ã°ã€ã³ããã¯ã¹ã§æ¡åŒµãããŸããããŒã¯ãã€ã¶ã«ãªã¹ããæž¡ããæ¬¡ã®ããã«ããã£ã³ã°ããããã« äŸé Œã§ããŸã:
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
0sã远å ãããŠãæåã®æã2çªç®ã®æãšåãé·ãã«ãªãã®ãããããŸãïŒ
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
ããã¯ãPyTorchãŸãã¯TensorFlowã§ãã³ãœã«ã«å€æã§ããŸããæ³šæãã¹ã¯ã¯ãã¢ãã«ããããã«æ³šæãæããªãããã«ãåã蟌ãŸããã€ã³ããã¯ã¹ã®äœçœ®ã瀺ããã€ããªãã³ãœã«ã§ãã[BertTokenizer
]ã§ã¯ã1
ã¯æ³šæãæãå¿
èŠãããå€ã瀺ãã0
ã¯åã蟌ãŸããå€ã瀺ããŸãããã®æ³šæãã¹ã¯ã¯ãããŒã¯ãã€ã¶ãè¿ãèŸæžã®ããŒãattention_maskãã®äžã«ãããŸãã
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
autoencoding models
ãšã³ã³ãŒããŒã¢ãã« ããã³ ãã¹ã¯èšèªã¢ããªã³ã° ãåç §ããŠãã ããã
autoregressive models
å æèšèªã¢ããªã³ã° ããã³ ãã³ãŒããŒã¢ãã« ãåç §ããŠãã ããã
B
backbone
ããã¯ããŒã³ã¯ãçã®é ããç¶æ
ãç¹åŸŽãåºåãããããã¯ãŒã¯ïŒåã蟌ã¿ãšå±€ïŒã§ããéåžžãç¹åŸŽãå
¥åãšããŠåãåãããã« ããã ã«æ¥ç¶ãããŠãããäºæž¬ãè¡ããŸããããšãã°ã[ViTModel
] ã¯ç¹å®ã®ããããäžã«ãªãããã¯ããŒã³ã§ããä»ã®ã¢ãã«ã [VitModel
] ãããã¯ããŒã³ãšããŠäœ¿çšã§ããŸããäŸãã° DPT ã§ãã
C
causal language modeling
ã¢ãã«ãããã¹ããé çªã«èªã¿ã次ã®åèªãäºæž¬ããäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ã§ããéåžžãã¢ãã«ã¯æå šäœãèªã¿åããŸãããç¹å®ã®ã¿ã€ã ã¹ãããã§æªæ¥ã®ããŒã¯ã³ãé ãããã«ã¢ãã«å ã§ãã¹ã¯ã䜿çšããŸãã
channel
ã«ã©ãŒç»åã¯ãèµ€ãç·ãéïŒRGBïŒã®3ã€ã®ãã£ãã«ã®å€ã®çµã¿åããããæãç«ã£ãŠãããã°ã¬ãŒã¹ã±ãŒã«ç»åã¯1ã€ã®ãã£ãã«ããæã¡ãŸãããð€ Transformers ã§ã¯ããã£ãã«ã¯ç»åã®ãã³ãœã«ã®æåãŸãã¯æåŸã®æ¬¡å
ã«ãªãããšããããŸãïŒ[n_channels
, height
, width
] ãŸã㯠[height
, width
, n_channels
]ã
connectionist temporal classification (CTC)
å ¥åãšåºåãæ£ç¢ºã«ã©ã®ããã«æŽåããããæ£ç¢ºã«ç¥ããªããŠãã¢ãã«ãåŠç¿ãããã¢ã«ãŽãªãºã ãCTC ã¯ãç¹å®ã®å ¥åã«å¯ŸããŠãã¹ãŠã®å¯èœãªåºåã®ååžãèšç®ãããã®äžããæãå¯èœæ§ã®é«ãåºåãéžæããŸããCTC ã¯ãã¹ããŒã«ãŒã®ç°ãªãçºè©±é床ãªã©ãããŸããŸãªçç±ã§é³å£°ããã©ã³ã¹ã¯ãªãããšå®å šã«æŽåããªãå Žåã«ãé³å£°èªèã¿ã¹ã¯ã§äžè¬çã«äœ¿çšãããŸãã
convolution
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®äžçš®ã§ãå ¥åè¡åãèŠçŽ ããšã«å°ããªè¡åïŒã«ãŒãã«ãŸãã¯ãã£ã«ã¿ãŒïŒãšä¹ç®ãããå€ãæ°ããè¡åã«åèšãããã¬ã€ã€ãŒã®ã¿ã€ããããã¯å ¥åè¡åå šäœã«å¯ŸããŠç¹°ãè¿ãããç³ã¿èŸŒã¿æäœãšããŠç¥ãããåæäœã¯å ¥åè¡åã®ç°ãªãã»ã°ã¡ã³ãã«é©çšãããŸããç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒã¯ãã³ã³ãã¥ãŒã¿ããžã§ã³ã§äžè¬çã«äœ¿çšãããŠããŸãã
D
decoder input IDs
ãã®å ¥åã¯ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã«ç¹æã§ããããã³ãŒããŒã«äŸçµŠãããå ¥åIDãå«ã¿ãŸãããããã®å ¥åã¯ã翻蚳ãèŠçŽãªã©ã®ã·ãŒã±ã³ã¹ããŒã·ãŒã±ã³ã¹ã¿ã¹ã¯ã«äœ¿çšãããéåžžãåã¢ãã«ã«åºæã®æ¹æ³ã§æ§ç¯ãããŸãã
ã»ãšãã©ã®ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ïŒBARTãT5ïŒã¯ãlabels
ããç¬èªã« decoder_input_ids
ãäœæããŸãããã®ãããªã¢ãã«ã§ã¯ãlabels
ãæž¡ãããšããã¬ãŒãã³ã°ãåŠçããåªããæ¹æ³ã§ãã
ã·ãŒã±ã³ã¹ããŒã·ãŒã±ã³ã¹ãã¬ãŒãã³ã°ã«ããããããã®å ¥åIDã®åŠçæ¹æ³ã確èªããããã«ãåã¢ãã«ã®ããã¥ã¡ã³ãã確èªããŠãã ããã
decoder models
ãªãŒããªã°ã¬ãã·ã§ã³ã¢ãã«ãšãåŒã°ããã¢ãã«ãããã¹ããé çªã«èªã¿ã次ã®åèªãäºæž¬ããäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ïŒå æèšèªã¢ããªã³ã°ïŒã«é¢äžããŸããéåžžãã¢ãã«ã¯æå šäœãèªã¿åããç¹å®ã®ã¿ã€ã ã¹ãããã§æªæ¥ã®ããŒã¯ã³ãé ããã¹ã¯ã䜿çšããŠè¡ãããŸãã
deep learning (DL)
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã§ãè€æ°ã®å±€ãæã£ãŠããŸãã
E
encoder models
ãªãŒããšã³ã³ãŒãã£ã³ã°ã¢ãã«ãšããŠãç¥ãããŠããããšã³ã³ãŒããŒã¢ãã«ã¯å ¥åïŒããã¹ããç»åãªã©ïŒããåã蟌ã¿ãšåŒã°ããç°¡ç¥åãããæ°å€è¡šçŸã«å€æããŸãããšã³ã³ãŒããŒã¢ãã«ã¯ããã°ãã°ãã¹ã¯ãããèšèªã¢ããªã³ã°ïŒ#masked-language-modeling-mlmïŒãªã©ã®æè¡ã䜿çšããŠäºåã«ãã¬ãŒãã³ã°ãããå ¥åã·ãŒã±ã³ã¹ã®äžéšããã¹ã¯ããã¢ãã«ã«ããæå³ã®ãã衚çŸãäœæããããšã匷å¶ãããŸãã
F
feature extraction
çããŒã¿ãããæ å ±è±ãã§æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã«ãšã£ãŠæçšãªç¹åŸŽã®ã»ããã«éžæããã³å€æããããã»ã¹ãç¹åŸŽæœåºã®äŸã«ã¯ãçã®ããã¹ããåèªåã蟌ã¿ã«å€æããããç»å/ãããªããŒã¿ãããšããžã圢ç¶ãªã©ã®éèŠãªç¹åŸŽãæœåºãããããããšãå«ãŸããŸãã
feed forward chunking
ãã©ã³ã¹ãã©ãŒããŒå
ã®åæ®å·®æ³šæãããã¯ã§ã¯ãéåžžãèªå·±æ³šæå±€ã®åŸã«2ã€ã®ãã£ãŒããã©ã¯ãŒãå±€ãç¶ããŸãã
ãã£ãŒããã©ã¯ãŒãå±€ã®äžéåã蟌ã¿ãµã€ãºã¯ãã¢ãã«ã®é ãããµã€ãºããã倧ããããšããããããŸãïŒããšãã°ãgoogle-bert/bert-base-uncased
ã®å ŽåïŒã
å
¥åãµã€ãºã [batch_sizeãsequence_length]
ã®å Žåãäžéãã£ãŒããã©ã¯ãŒãåã蟌㿠[batch_sizeãsequence_lengthãconfig.intermediate_size]
ãä¿åããããã«å¿
èŠãªã¡ã¢ãªã¯ãã¡ã¢ãªã®å€§éšåãå ããããšããããŸããReformer: The Efficient Transformerã®èè
ã¯ãèšç®ã sequence_length
次å
ã«äŸåããªããããäž¡æ¹ã®ãã£ãŒããã©ã¯ãŒãå±€ã®åºååã蟌㿠[batch_sizeãconfig.hidden_size]_0ã...ã[batch_sizeãconfig.hidden_size]_n
ãåå¥ã«èšç®ããåŸã§ [batch_sizeãsequence_lengthãconfig.hidden_size]
ã«é£çµããããšã¯æ°åŠçã«ç䟡ã§ãããšæ°ä»ããŸãããããã«ãããå¢å ããèšç®æéãšã¡ã¢ãªäœ¿çšéã®ãã¬ãŒããªããçããŸãããæ°åŠçã«ç䟡ãªçµæãåŸãããŸãã
[apply_chunking_to_forward
] 颿°ã䜿çšããã¢ãã«ã®å Žåãchunk_size
ã¯äžŠåã«èšç®ãããåºååã蟌ã¿ã®æ°ãå®çŸ©ããã¡ã¢ãªãšæéã®è€éãã®ãã¬ãŒããªããå®çŸ©ããŸããchunk_size
ã 0 ã«èšå®ãããŠããå Žåããã£ãŒããã©ã¯ãŒãã®ãã£ã³ãã³ã°ã¯è¡ãããŸããã
finetuned models
ãã¡ã€ã³ãã¥ãŒãã³ã°ã¯ãäºåã«ãã¬ãŒãã³ã°ãããã¢ãã«ãåãããã®éã¿ãåºå®ããæ°ãã远å ãããmodel headã§åºåã¬ã€ã€ãŒã眮ãæãã圢åŒã®è»¢ç§»åŠç¿ã§ããã¢ãã«ãããã¯å¯Ÿè±¡ã®ããŒã¿ã»ããã§ãã¬ãŒãã³ã°ãããŸãã
詳现ã«ã€ããŠã¯ãFine-tune a pretrained model ãã¥ãŒããªã¢ã«ãåç §ããŠãð€ Transformersã䜿çšããã¢ãã«ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãåŠã³ãŸãããã
H
head
ã¢ãã«ãããã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®æåŸã®ã¬ã€ã€ãŒãæããçã®é ããç¶æ ãåãå ¥ããŠç°ãªã次å ã«å°åœ±ããŸããåã¿ã¹ã¯ã«å¯ŸããŠç°ãªãã¢ãã«ãããããããŸããäŸãã°ïŒ
- [
GPT2ForSequenceClassification
] ã¯ãããŒã¹ã®[GPT2Model
]ã®äžã«ããã·ãŒã±ã³ã¹åé¡ãããïŒç·åœ¢å±€ïŒã§ãã - [
ViTForImageClassification
] ã¯ãããŒã¹ã®[ViTModel
]ã®CLS
ããŒã¯ã³ã®æçµé ããç¶æ ã®äžã«ããç»ååé¡ãããïŒç·åœ¢å±€ïŒã§ãã - [
Wav2Vec2ForCTC
] ã¯ãCTCãæã€ããŒã¹ã®[Wav2Vec2Model
]ã®èšèªã¢ããªã³ã°ãããã§ãã
I
image patch
ããžã§ã³ããŒã¹ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã«ã¯ãç»åãããå°ããªãããã«åå²ããããããç·åœ¢ã«åã蟌ã¿ãã¢ãã«ã«ã·ãŒã±ã³ã¹ãšããŠæž¡ããŸããã¢ãã«ã®
inference
æšè«ã¯ããã¬ãŒãã³ã°ãå®äºããåŸã«æ°ããããŒã¿ã§ã¢ãã«ãè©äŸ¡ããããã»ã¹ã§ãã ð€ Transformers ã䜿çšããŠæšè«ãå®è¡ããæ¹æ³ã«ã€ããŠã¯ãæšè«ã®ãã€ãã©ã€ã³ ãã¥ãŒããªã¢ã«ãåç §ããŠãã ããã
input IDs
å ¥åIDã¯ãã¢ãã«ãžã®å ¥åãšããŠæž¡ãå¿ èŠããããã©ã¡ãŒã¿ãŒã®äžã§æãäžè¬çãªãã®ã§ãããããã¯ããŒã¯ã³ã®ã€ã³ããã¯ã¹ã§ãããã¢ãã«ã«ãã£ãŠå ¥åãšããŠäœ¿çšãããã·ãŒã±ã³ã¹ãæ§ç¯ããããŒã¯ã³ã®æ°å€è¡šçŸã§ãã
åããŒã¯ãã€ã¶ãŒã¯ç°ãªãæ¹æ³ã§åäœããŸãããåºæ¬çãªã¡ã«ããºã ã¯åãã§ãã以äžã¯BERTããŒã¯ãã€ã¶ãŒã䜿çšããäŸã§ããBERTããŒã¯ãã€ã¶ãŒã¯WordPieceããŒã¯ãã€ã¶ãŒã§ãã
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
ããŒã¯ãã€ã¶ãŒã¯ãã·ãŒã±ã³ã¹ãããŒã¯ãã€ã¶ãŒèªåœã§äœ¿çšå¯èœãªããŒã¯ã³ã«åå²ããŸãã
>>> tokenized_sequence = tokenizer.tokenize(sequence)
ããŒã¯ã³ã¯åèªãŸãã¯ãµãã¯ãŒãã§ãã ããšãã°ãããã§ã¯ "VRAM" ã¯ã¢ãã«ã®èªåœã«å«ãŸããŠããªãã£ãããã"V"ã"RA"ã"M" ã«åå²ãããŸããã ãããã®ããŒã¯ã³ãå¥ã ã®åèªã§ã¯ãªããåãåèªã®äžéšã§ããããšã瀺ãããã«ã"RA" ãš "M" ã«ã¯ããã«ããã·ã¥ã®ãã¬ãã£ãã¯ã¹ã远å ãããŸãã
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
ãããã®ããŒã¯ã³ã¯ãã¢ãã«ãçè§£ã§ããããã«IDã«å€æã§ããŸããããã¯ãæãããŒã¯ãã€ã¶ãŒã«çŽæ¥äŸçµŠããŠè¡ãããšãã§ããŸããããŒã¯ãã€ã¶ãŒã¯ãããã©ãŒãã³ã¹ã®åäžã®ããã«ð€ Tokenizersã®Rustå®è£ ãæŽ»çšããŠããŸãã
>>> inputs = tokenizer(sequence)
ããŒã¯ãã€ã¶ãŒã¯ã察å¿ããã¢ãã«ãæ£ããåäœããããã«å¿
èŠãªãã¹ãŠã®åŒæ°ãå«ãèŸæžãè¿ããŸããããŒã¯ã³ã®ã€ã³ããã¯ã¹ã¯ãã㌠input_ids
ã®äžã«ãããŸãã
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
泚æïŒããŒã¯ãã€ã¶ã¯ãé¢é£ããã¢ãã«ãããããå¿ èŠãšããå Žåã«èªåçã«ãç¹å¥ãªããŒã¯ã³ãã远å ããŸãããããã¯ãã¢ãã«ãææäœ¿çšããç¹å¥ãªIDã§ãã
åã®IDã·ãŒã±ã³ã¹ããã³ãŒãããå Žåã
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
ç§ãã¡ã¯èŠãŸã
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
ããã¯[BertModel
]ããã®å
¥åãæåŸ
ããæ¹æ³ã§ãã
L
Labels
ã©ãã«ã¯ãã¢ãã«ãæå€±ãèšç®ããããã«æž¡ãããšãã§ãããªãã·ã§ã³ã®åŒæ°ã§ãããããã®ã©ãã«ã¯ãã¢ãã«ã®äºæž¬ã®æåŸ å€ã§ããã¹ãã§ããã¢ãã«ã¯ãéåžžã®æå€±ã䜿çšããŠããã®äºæž¬ãšæåŸ å€ïŒã©ãã«ïŒãšã®éã®æå€±ãèšç®ããŸãã
ãããã®ã©ãã«ã¯ã¢ãã«ã®ãããã«å¿ããŠç°ãªããŸããããšãã°ïŒ
- ã·ãŒã±ã³ã¹åé¡ã¢ãã«ïŒ[
BertForSequenceClassification
]ïŒã®å Žåãã¢ãã«ã¯æ¬¡å ã(batch_size)
ã®ãã³ãœã«ãæåŸ ãããããå ã®åå€ãã·ãŒã±ã³ã¹å šäœã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã - ããŒã¯ã³åé¡ã¢ãã«ïŒ[
BertForTokenClassification
]ïŒã®å Žåãã¢ãã«ã¯æ¬¡å ã(batch_size, seq_length)
ã®ãã³ãœã«ãæåŸ ããåå€ãååã ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã - ãã¹ã¯èšèªã¢ããªã³ã°ã®å ŽåïŒ[
BertForMaskedLM
]ïŒãã¢ãã«ã¯æ¬¡å ã(batch_size, seq_length)
ã®ãã³ãœã«ãæåŸ ããåå€ãååã ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸããããã§ã®ã©ãã«ã¯ãã¹ã¯ãããããŒã¯ã³ã®ããŒã¯ã³IDã§ãããä»ã®ããŒã¯ã³ã«ã¯éåžž -100 ãªã©ã®å€ãèšå®ãããŸãã - ã·ãŒã±ã³ã¹éã®ã¿ã¹ã¯ã®å ŽåïŒ[
BartForConditionalGeneration
]ã[MBartForConditionalGeneration
]ïŒãã¢ãã«ã¯æ¬¡å ã(batch_size, tgt_seq_length)
ã®ãã³ãœã«ãæåŸ ããåå€ãåå ¥åã·ãŒã±ã³ã¹ã«é¢é£ä»ããããã¿ãŒã²ããã·ãŒã±ã³ã¹ã«å¯Ÿå¿ããŸãããã¬ãŒãã³ã°äžãBARTãšT5ã®äž¡æ¹ã¯é©åãªdecoder_input_ids
ãšãã³ãŒããŒã®ã¢ãã³ã·ã§ã³ãã¹ã¯ãå éšã§çæããŸããéåžžãããããæäŸããå¿ èŠã¯ãããŸãããããã¯ãšã³ã³ãŒããŒãã³ãŒããŒãã¬ãŒã ã¯ãŒã¯ãå©çšããã¢ãã«ã«ã¯é©çšãããŸããã - ç»ååé¡ã¢ãã«ã®å ŽåïŒ[
ViTForImageClassification
]ïŒãã¢ãã«ã¯æ¬¡å ã(batch_size)
ã®ãã³ãœã«ãæåŸ ãããããå ã®åå€ãååã ã®ç»åã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã - ã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã¢ãã«ã®å ŽåïŒ[
SegformerForSemanticSegmentation
]ïŒãã¢ãã«ã¯æ¬¡å ã(batch_size, height, width)
ã®ãã³ãœã«ãæåŸ ãããããå ã®åå€ãååã ã®ãã¯ã»ã«ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã - ç©äœæ€åºã¢ãã«ã®å ŽåïŒ[
DetrForObjectDetection
]ïŒãã¢ãã«ã¯ååã ã®ç»åã®äºæž¬ã©ãã«ãšå¢çããã¯ã¹ã®æ°ã«å¯Ÿå¿ããclass_labels
ãšboxes
ããŒãæã€èŸæžã®ãªã¹ããæåŸ ããŸãã - èªåé³å£°èªèã¢ãã«ã®å ŽåïŒ[
Wav2Vec2ForCTC
]ïŒãã¢ãã«ã¯æ¬¡å ã(batch_size, target_length)
ã®ãã³ãœã«ãæåŸ ããåå€ãååã ã®ããŒã¯ã³ã®äºæž¬ã©ãã«ã«å¯Ÿå¿ããŸãã
åã¢ãã«ã®ã©ãã«ã¯ç°ãªãå Žåããããããåžžã«åã¢ãã«ã®ããã¥ã¡ã³ãã確èªããŠããããã®ç¹å®ã®ã©ãã«ã«é¢ãã詳现æ å ±ã確èªããŠãã ããïŒ
ããŒã¹ã¢ãã«ïŒ[BertModel
]ïŒã¯ã©ãã«ãåãå
¥ããŸããããããã¯ããŒã¹ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã«ã§ãããåã«ç¹åŸŽãåºåããŸãã
large language models (LLM)
倧éã®ããŒã¿ã§ãã¬ãŒãã³ã°ããã倿åšèšèªã¢ãã«ïŒGPT-3ãBLOOMãOPTïŒãæãäžè¬çãªçšèªã§ãããããã®ã¢ãã«ã¯éåžžãå€ãã®åŠç¿å¯èœãªãã©ã¡ãŒã¿ãæã£ãŠããŸãïŒããšãã°ãGPT-3ã®å Žåã1750ååïŒã
M
masked language modeling (MLM)
ã¢ãã«ã¯ããã¹ãã®ç ŽæããŒãžã§ã³ãèŠãäºåãã¬ãŒãã³ã°ã¿ã¹ã¯ã§ãéåžžã¯ã©ã³ãã ã«äžéšã®ããŒã¯ã³ããã¹ãã³ã°ããŠå ã®ããã¹ããäºæž¬ããå¿ èŠããããŸãã
multimodal
ããã¹ããšå¥ã®çš®é¡ã®å ¥åïŒããšãã°ç»åïŒãçµã¿åãããã¿ã¹ã¯ã§ãã
N
Natural language generation (NLG)
ããã¹ããçæããé¢é£ãããã¹ãŠã®ã¿ã¹ã¯ïŒããšãã°ãTransformersã§æžãã翻蚳ãªã©ïŒã
Natural language processing (NLP)
ããã¹ããæ±ãæ¹æ³ãäžè¬çã«è¡šçŸãããã®ã§ãã
Natural language understanding (NLU)
ããã¹ãå ã«äœãããããçè§£ããé¢é£ãããã¹ãŠã®ã¿ã¹ã¯ïŒããšãã°ãããã¹ãå šäœã®åé¡ãåã ã®åèªã®åé¡ãªã©ïŒã
P
pipeline
ð€ Transformersã®ãã€ãã©ã€ã³ã¯ãããŒã¿ã®ååŠçãšå€æãç¹å®ã®é åºã§å®è¡ããŠããŒã¿ãåŠçããã¢ãã«ããäºæž¬ãè¿ãäžé£ã®ã¹ããããæãæœè±¡åã§ãããã€ãã©ã€ã³ã«èŠãããããã€ãã®ã¹ããŒãžã®äŸã«ã¯ãããŒã¿ã®ååŠçãç¹åŸŽæœåºãæ£èŠåãªã©ããããŸãã
詳现ã«ã€ããŠã¯ãæšè«ã®ããã®ãã€ãã©ã€ã³ãåç §ããŠãã ããã
pixel values
ã¢ãã«ã«æž¡ãããç»åã®æ°å€è¡šçŸã®ãã³ãœã«ã§ãããã¯ã»ã«å€ã¯ã圢ç¶ã [ããããµã€ãº
, ãã£ãã«æ°
, é«ã
, å¹
] ã®è¡åã§ãç»åããã»ããµããçæãããŸãã
pooling
è¡åãå°ããªè¡åã«çž®å°ããæäœã§ãããŒã«å¯Ÿè±¡ã®æ¬¡å ã®æå€§å€ãŸãã¯å¹³åå€ãåãããšãäžè¬çã§ããããŒãªã³ã°ã¬ã€ã€ãŒã¯äžè¬çã«ç³ã¿èŸŒã¿ã¬ã€ã€ãŒã®éã«èŠãããç¹åŸŽè¡šçŸãããŠã³ãµã³ããªã³ã°ããŸãã
position IDs
ããŒã¯ã³ããšã®äœçœ®ãåã蟌ãŸããŠããRNNãšã¯ç°ãªãããã©ã³ã¹ãã©ãŒããŒã¯åããŒã¯ã³ã®äœçœ®ãææ¡ããŠããŸããããããã£ãŠãã¢ãã«ã¯ããŒã¯ã³ã®äœçœ®ãèå¥ããããã«äœçœ®IDïŒposition_ids
ïŒã䜿çšããŸãã
ããã¯ãªãã·ã§ã³ã®ãã©ã¡ãŒã¿ã§ããã¢ãã«ã« position_ids
ãæž¡ãããªãå ŽåãIDã¯èªåçã«çµ¶å¯Ÿçãªäœçœ®åã蟌ã¿ãšããŠäœæãããŸãã
絶察çãªäœçœ®åã蟌ã¿ã¯ç¯å² [0ãconfig.max_position_embeddings - 1]
ããéžæãããŸããäžéšã®ã¢ãã«ã¯ãæ£åŒŠæ³¢äœçœ®åã蟌ã¿ãçžå¯Ÿäœçœ®åã蟌ã¿ãªã©ãä»ã®ã¿ã€ãã®äœçœ®åã蟌ã¿ã䜿çšããããšããããŸãã
preprocessing
çããŒã¿ãæ©æ¢°åŠç¿ã¢ãã«ã§ç°¡åã«åŠçã§ãã圢åŒã«æºåããã¿ã¹ã¯ã§ããäŸãã°ãããã¹ãã¯éåžžãããŒã¯ã³åã«ãã£ãŠååŠçãããŸããä»ã®å ¥åã¿ã€ãã«å¯ŸããååŠçã®å ·äœçãªæ¹æ³ãç¥ãããå Žåã¯ãPreprocess ãã¥ãŒããªã¢ã«ãã芧ãã ããã
pretrained model
ããããŒã¿ïŒããšãã°ãWikipediaå šäœãªã©ïŒã§äºåã«åŠç¿ãããã¢ãã«ã§ããäºååŠç¿ã®æ¹æ³ã«ã¯ãèªå·±æåž«ããã®ç®çãå«ãŸããããã¹ããèªã¿åããæ¬¡ã®åèªãäºæž¬ããããšãããã®ïŒå æèšèªã¢ããªã³ã°ãåç §ïŒããäžéšã®åèªããã¹ã¯ããããããäºæž¬ããããšãããã®ïŒãã¹ã¯èšèªã¢ããªã³ã°ãåç §ïŒããããŸãã
é³å£°ãšããžã§ã³ã¢ãã«ã«ã¯ç¬èªã®äºååŠç¿ã®ç®çããããŸããããšãã°ãWav2Vec2ã¯é³å£°ã¢ãã«ã§ãã¢ãã«ã«å¯ŸããŠãçã®ãé³å£°è¡šçŸãåœã®é³å£°è¡šçŸã®ã»ããããèå¥ããå¿ èŠããã察æ¯çãªã¿ã¹ã¯ã§äºååŠç¿ãããŠããŸããäžæ¹ãBEiTã¯ããžã§ã³ã¢ãã«ã§ãäžéšã®ç»åãããããã¹ã¯ããã¢ãã«ã«ãã¹ã¯ãããããããäºæž¬ãããã¿ã¹ã¯ïŒãã¹ã¯èšèªã¢ããªã³ã°ã®ç®çãšäŒŒãŠããŸãïŒã§äºååŠç¿ãããŠããŸãã
R
recurrent neural network (RNN)
ããã¹ããåŠçããããã«å±€ãã«ãŒããããã¢ãã«ã®äžçš®ã§ãã
representation learning
çããŒã¿ã®æå³ã®ãã衚çŸãåŠç¿ããæ©æ¢°åŠç¿ã®ãµããã£ãŒã«ãã§ãã衚çŸåŠç¿ã®æè¡ã®äžéšã«ã¯åèªåã蟌ã¿ããªãŒããšã³ã³ãŒããŒãGenerative Adversarial NetworksïŒGANsïŒãªã©ããããŸãã
S
sampling rate
ç§ããšã«åããããµã³ãã«ïŒãªãŒãã£ãªä¿¡å·ãªã©ïŒã®æ°ããã«ãåäœã§æž¬å®ãããã®ã§ãããµã³ããªã³ã°ã¬ãŒãã¯é³å£°ãªã©ã®é£ç¶ä¿¡å·ã颿£åããçµæã§ãã
self-attention
å ¥åã®åèŠçŽ ã¯ãã©ã®ä»ã®èŠçŽ ã«æ³šæãæãã¹ãããæ€åºããŸãã
self-supervised learning
ã¢ãã«ãã©ãã«ã®ãªãããŒã¿ããèªåèªèº«ã®åŠç¿ç®æšãäœæããæ©æ¢°åŠç¿æè¡ã®ã«ããŽãªã§ããããã¯æåž«ãªãåŠç¿ãæåž«ããåŠç¿ãšã¯ç°ãªããåŠç¿ããã»ã¹ã¯ãŠãŒã¶ãŒããã¯æç€ºçã«ã¯ç£ç£ãããŠããªãç¹ãç°ãªããŸãã
èªå·±æåž«ããåŠç¿ã®1ã€ã®äŸã¯ãã¹ã¯èšèªã¢ããªã³ã°ã§ãã¢ãã«ã«ã¯äžéšã®ããŒã¯ã³ãåé€ãããæãäžããããæ¬ èœããããŒã¯ã³ãäºæž¬ããããã«åŠç¿ããŸãã
semi-supervised learning
ã©ãã«ä»ãããŒã¿ã®å°éãšã©ãã«ã®ãªãããŒã¿ã®å€§éãçµã¿åãããŠã¢ãã«ã®ç²ŸåºŠãåäžãããåºç¯ãªæ©æ¢°åŠç¿ãã¬ãŒãã³ã°æè¡ã®ã«ããŽãªã§ããæåž«ããåŠç¿ãæåž«ãªãåŠç¿ãšã¯ç°ãªããåæåž«ããåŠç¿ã®ã¢ãããŒãã®1ã€ã¯ãã»ã«ããã¬ãŒãã³ã°ãã§ãããã¢ãã«ã¯ã©ãã«ä»ãããŒã¿ã§ãã¬ãŒãã³ã°ãããæ¬¡ã«ã©ãã«ã®ãªãããŒã¿ã§äºæž¬ãè¡ããŸããã¢ãã«ãæãèªä¿¡ãæã£ãŠäºæž¬ããéšåãã©ãã«ä»ãããŒã¿ã»ããã«è¿œå ãããã¢ãã«ã®åãã¬ãŒãã³ã°ã«äœ¿çšãããŸãã
sequence-to-sequence (seq2seq)
å ¥åããæ°ããã·ãŒã±ã³ã¹ãçæããã¢ãã«ã§ãã翻蚳ã¢ãã«ãèŠçŽã¢ãã«ïŒBartãT5ãªã©ïŒãªã©ãããã«è©²åœããŸãã
stride
ç³ã¿èŸŒã¿ãŸãã¯ããŒãªã³ã°ã«ãããŠãã¹ãã©ã€ãã¯ã«ãŒãã«ãè¡åäžã§ç§»åããè·é¢ãæããŸããã¹ãã©ã€ãã1ã®å Žåãã«ãŒãã«ã¯1ãã¯ã»ã«ãã€ç§»åããã¹ãã©ã€ãã2ã®å Žåãã«ãŒãã«ã¯2ãã¯ã»ã«ãã€ç§»åããŸãã
supervised learning
ã¢ãã«ã®ãã¬ãŒãã³ã°æ¹æ³ã®äžã€ã§ãçŽæ¥ã©ãã«ä»ãããŒã¿ã䜿çšããŠã¢ãã«ã®æ§èœãä¿®æ£ãæå°ããŸããããŒã¿ããã¬ãŒãã³ã°ãããŠããã¢ãã«ã«äŸçµŠããããã®äºæž¬ãæ¢ç¥ã®ã©ãã«ãšæ¯èŒãããŸããã¢ãã«ã¯äºæž¬ãã©ãã ã誀ã£ãŠãããã«åºã¥ããŠéã¿ãæŽæ°ããããã»ã¹ã¯ã¢ãã«ã®æ§èœãæé©åããããã«ç¹°ãè¿ãããŸãã
T
token
æã®äžéšã§ãããéåžžã¯åèªã§ããããµãã¯ãŒãïŒäžè¬çã§ãªãåèªã¯ãã°ãã°ãµãã¯ãŒãã«åå²ãããããšããããŸãïŒãŸãã¯å¥èªç¹ã®èšå·ã§ããããšããããŸãã
token Type IDs
äžéšã®ã¢ãã«ã¯ãæã®ãã¢ã®åé¡ã質åå¿çãè¡ãããšãç®çãšããŠããŸãã
ããã«ã¯ç°ãªã2ã€ã®ã·ãŒã±ã³ã¹ãåäžã®ãinput_idsããšã³ããªã«çµåããå¿
èŠããããéåžžã¯åé¡åïŒ[CLS]
ïŒãåºåãèšå·ïŒ[SEP]
ïŒãªã©ã®ç¹å¥ãªããŒã¯ã³ã®å©ããåããŠå®è¡ãããŸããäŸãã°ãBERTã¢ãã«ã¯æ¬¡ã®ããã«2ã€ã®ã·ãŒã±ã³ã¹å
¥åãæ§ç¯ããŸãïŒ
æ¥æ¬èªèš³ãæäŸããŠããã ãããã§ããMarkdown圢åŒã§èšè¿°ããŠãã ããã
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
æã
ã¯ãåè¿°ã®ããã«ã2ã€ã®ã·ãŒã±ã³ã¹ã2ã€ã®åŒæ°ãšã㊠tokenizer
ã«æž¡ãããšã§ããã®ãããªæãèªåçã«çæããããšãã§ããŸãïŒä»¥åã®ããã«ãªã¹ãã§ã¯ãªãïŒã以äžã®ããã«ïŒ
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
ããã«å¯Ÿå¿ããã³ãŒãã¯ä»¥äžã§ãïŒ
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
äžéšã®ã¢ãã«ã§ã¯ã1ã€ã®ã·ãŒã±ã³ã¹ãã©ãã§çµãããå¥ã®ã·ãŒã±ã³ã¹ãã©ãã§å§ãŸãããçè§£ããã®ã«ååãªæ å ±ãåãã£ãŠããŸãããã ããBERTãªã©ã®ä»ã®ã¢ãã«ã§ã¯ãããŒã¯ã³ã¿ã€ãIDïŒã»ã°ã¡ã³ãIDãšãåŒã°ããïŒã䜿çšãããŠããŸããããã¯ãã¢ãã«å ã®2ã€ã®ã·ãŒã±ã³ã¹ãèå¥ãããã€ããªãã¹ã¯ãšããŠè¡šãããŸãã
ããŒã¯ãã€ã¶ã¯ããã®ãã¹ã¯ããtoken_type_idsããšããŠè¿ããŸãã
>>> encoded_dict["token_type_ids"]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
æåã®ã·ãŒã±ã³ã¹ãã€ãŸã質åã®ããã«äœ¿çšããããã³ã³ããã¹ããã¯ããã¹ãŠã®ããŒã¯ã³ãã0ãã§è¡šãããŠããŸããäžæ¹ã2çªç®ã®ã·ãŒã±ã³ã¹ã質åã«å¯Ÿå¿ãããã®ã¯ããã¹ãŠã®ããŒã¯ã³ãã1ãã§è¡šãããŠããŸãã
äžéšã®ã¢ãã«ãäŸãã° [XLNetModel
] ã®ããã«ã远å ã®ããŒã¯ã³ãã2ãã§è¡šãããŸãã
transfer learning
äºåã«åŠç¿ãããã¢ãã«ãåãããããã¿ã¹ã¯åºæã®ããŒã¿ã»ããã«é©å¿ãããæè¡ããŒãããã¢ãã«ãèšç·Žãã代ããã«ãæ¢åã®ã¢ãã«ããåŸãç¥èãåºçºç¹ãšããŠæŽ»çšã§ããŸããããã«ããåŠç¿ããã»ã¹ãå éããå¿ èŠãªèšç·ŽããŒã¿ã®éãæžå°ããŸãã
transformer
èªå·±æ³šæããŒã¹ã®æ·±å±€åŠç¿ã¢ãã«ã¢ãŒããã¯ãã£ã
U
unsupervised learning
ã¢ãã«ã«æäŸãããããŒã¿ãã©ãã«ä»ããããŠããªãã¢ãã«ãã¬ãŒãã³ã°ã®åœ¢æ ãæåž«ãªãåŠç¿ã®æè¡ã¯ãã¿ã¹ã¯ã«åœ¹ç«ã€ãã¿ãŒã³ãèŠã€ããããã«ããŒã¿ååžã®çµ±èšæ å ±ãæŽ»çšããŸãã