
* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
114 KiB
English | ç®äœäžæ | ç¹é«äžæ | íêµìŽ | Español | æ¥æ¬èª | à€¹à€¿à€šà¥à€Šà¥ | Ð ÑÑÑкОй | Ð ortuguês | à°€à±à°²à±à°à± | Français | Deutsch | Tiếng Viá»t |
JAXãPyTorchãTensorFlowã®ããã®æå ç«¯æ©æ¢°åŠç¿
ð€Transformersã¯ãããã¹ããèŠèŠãé³å£°ãªã©ã®ç°ãªãã¢ããªãã£ã«å¯ŸããŠã¿ã¹ã¯ãå®è¡ããããã«ãäºåã«åŠç¿ãããæ°åã®ã¢ãã«ãæäŸããŸãã
ãããã®ã¢ãã«ã¯æ¬¡ã®ãããªå Žåã«é©çšã§ããŸã:
- ð ããã¹ãã¯ãããã¹ãã®åé¡ãæ å ±æœåºã質åå¿çãèŠçŽã翻蚳ãããã¹ãçæãªã©ã®ã¿ã¹ã¯ã®ããã«ã100以äžã®èšèªã«å¯Ÿå¿ããŠããŸãã
- ðŒïž ç»ååé¡ãç©äœæ€åºãã»ã°ã¡ã³ããŒã·ã§ã³ãªã©ã®ã¿ã¹ã¯ã®ããã®ç»åã
- ð£ïž é³å£°ã¯ãé³å£°èªèãé³å£°åé¡ãªã©ã®ã¿ã¹ã¯ã«äœ¿çšããŸãã
ãã©ã³ã¹ãã©ãŒããŒã¢ãã«ã¯ãããŒãã«è³ªåå¿çãå åŠæåèªèãã¹ãã£ã³ææžããã®æ å ±æœåºããããªåé¡ãèŠèŠç質åå¿çãªã©ãè€æ°ã®ã¢ããªãã£ãçµã¿åãããã¿ã¹ã¯ãå®è¡å¯èœã§ãã
ð€Transformersã¯ãäžããããããã¹ãã«å¯ŸããŠãããã®äºååŠç¿ãããã¢ãã«ãçŽ æ©ãããŠã³ããŒãããŠäœ¿çšããããªãèªèº«ã®ããŒã¿ã»ããã§ãããã埮調æŽããç§ãã¡ã®model hubã§ã³ãã¥ããã£ãšå ±æããããã®APIãæäŸããŸããåæã«ãã¢ãŒããã¯ãã£ãå®çŸ©ããåPythonã¢ãžã¥ãŒã«ã¯å®å šã«ã¹ã¿ã³ãã¢ãã³ã§ãããè¿ éãªç ç©¶å®éšãå¯èœã«ããããã«å€æŽããããšãã§ããŸãã
ð€Transformersã¯JaxãPyTorchãTensorFlowãšãã3倧ãã£ãŒãã©ãŒãã³ã°ã©ã€ãã©ãªãŒã«æ¯ããããããããã®ã©ã€ãã©ãªãã·ãŒã ã¬ã¹ã«çµ±åããŠããŸããçæ¹ã§ã¢ãã«ãåŠç¿ããŠãããããçæ¹ã§æšè«çšã«ããŒãããã®ã¯ç°¡åãªããšã§ãã
ãªã³ã©ã€ã³ãã¢
model hubãããã»ãšãã©ã®ã¢ãã«ã®ããŒãžã§çŽæ¥ãã¹ãããããšãã§ããŸãããŸãããããªãã¯ã¢ãã«ããã©ã€ããŒãã¢ãã«ã«å¯ŸããŠããã©ã€ããŒãã¢ãã«ã®ãã¹ãã£ã³ã°ãããŒãžã§ãã³ã°ãæšè«APIãæäŸããŠããŸãã
以äžã¯ãã®äžäŸã§ã:
èªç¶èšèªåŠçã«ãŠ:
- BERTã«ãããã¹ã¯ãã¯ãŒãè£å®
- Electraã«ããååå®äœèªè
- GPT-2ã«ããããã¹ãçæ
- RoBERTaã«ããèªç¶èšèªæšè«
- BARTã«ããèŠçŽ
- DistilBERTã«ãã質åå¿ç
- T5ã«ãã翻蚳
ã³ã³ãã¥ãŒã¿ããžã§ã³ã«ãŠ:
- ViTã«ããç»ååé¡
- DETRã«ããç©äœæ€åº
- SegFormerã«ããã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³
- DETRã«ããããããã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³
ãªãŒãã£ãªã«ãŠ:
ãã«ãã¢ãŒãã«ãªã¿ã¹ã¯ã«ãŠ:
Hugging FaceããŒã ã«ãã£ãŠäœããã ãã©ã³ã¹ãã©ãŒããŒã䜿ã£ãæžã蟌㿠ã¯ããã®ãªããžããªã®ããã¹ãçææ©èœã®å ¬åŒãã¢ã§ããã
Hugging FaceããŒã ã«ããã«ã¹ã¿ã ã»ãµããŒãããåžæã®å Žå

ã¯ã€ãã¯ãã¢ãŒ
äžããããå
¥åïŒããã¹ããç»åãé³å£°ã...ïŒã«å¯ŸããŠããã«ã¢ãã«ã䜿ãããã«ãæã
ã¯pipeline
ãšããAPIãæäŸããŠãããŸããpipelineã¯ãåŠç¿æžã¿ã®ã¢ãã«ãšããã®ã¢ãã«ã®åŠç¿æã«äœ¿çšãããååŠçãã°ã«ãŒãåãããã®ã§ãã以äžã¯ãè¯å®çãªããã¹ããšåŠå®çãªããã¹ããåé¡ããããã«pipelineã䜿çšããæ¹æ³ã§ã:
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
2è¡ç®ã®ã³ãŒãã§ã¯ãpipelineã§äœ¿çšãããäºååŠç¿æžã¿ã¢ãã«ãããŠã³ããŒãããŠãã£ãã·ã¥ãã3è¡ç®ã§ã¯äžããããããã¹ãã«å¯ŸããŠãã®ã¢ãã«ãè©äŸ¡ããŸããããã§ã¯ãçãã¯99.97%ã®ä¿¡é ŒåºŠã§ãããžãã£ããã§ãã
èªç¶èšèªåŠçã ãã§ãªããã³ã³ãã¥ãŒã¿ããžã§ã³ãé³å£°åŠçã«ãããŠããå€ãã®ã¿ã¹ã¯ã«ã¯ãããããèšç·Žãããpipeline
ãçšæãããŠãããäŸãã°ãç»åããæ€åºãããç©äœãç°¡åã«æœåºããããšãã§ãã:
>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline
# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)
# Allocate a pipeline for object detection
>>> object_detector = pipeline('object-detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
'label': 'remote',
'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
{'score': 0.9960021376609802,
'label': 'remote',
'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
{'score': 0.9954745173454285,
'label': 'couch',
'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
{'score': 0.9988006353378296,
'label': 'cat',
'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
{'score': 0.9986783862113953,
'label': 'cat',
'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]
ããã§ã¯ãç»åããæ€åºããããªããžã§ã¯ãã®ãªã¹ããåŸããããªããžã§ã¯ããå²ãããã¯ã¹ãšä¿¡é ŒåºŠã¹ã³ã¢ã衚瀺ãããŸããå·ŠåŽãå ç»åãå³åŽãäºæž¬çµæã衚瀺ãããã®ã§ã:
ãã®ãã¥ãŒããªã¢ã«ã§ã¯ãpipeline
APIã§ãµããŒããããŠããã¿ã¹ã¯ã«ã€ããŠè©³ãã説æããŠããŸãã
pipeline
ã«å ããŠãäžããããã¿ã¹ã¯ã«åŠç¿æžã¿ã®ã¢ãã«ãããŠã³ããŒãããŠäœ¿çšããããã«å¿
èŠãªã®ã¯ã3è¡ã®ã³ãŒãã ãã§ãã以äžã¯PyTorchã®ããŒãžã§ã³ã§ã:
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
ãããŠãã¡ãã¯TensorFlowãšåçã®ã³ãŒããšãªããŸã:
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
ããŒã¯ãã€ã¶ã¯åŠç¿æžã¿ã¢ãã«ãæåŸ ãããã¹ãŠã®ååŠçãæ åœããåäžã®æåå (äžèšã®äŸã®ããã«) ãŸãã¯ãªã¹ãã«å¯ŸããŠçŽæ¥åŒã³åºãããšãã§ããŸããããã¯äžæµã®ã³ãŒãã§äœ¿çšã§ããèŸæžãåºåããŸãããŸããåçŽã« ** åŒæ°å±éæŒç®åã䜿çšããŠã¢ãã«ã«çŽæ¥æž¡ãããšãã§ããŸãã
ã¢ãã«èªäœã¯éåžžã®Pytorch nn.Module
ãŸã㯠TensorFlow tf.keras.Model
(ããã¯ãšã³ãã«ãã£ãŠç°ãªã)ã§ãéåžžéã䜿çšããããšãå¯èœã§ãããã®ãã¥ãŒããªã¢ã«ã§ã¯ããã®ãããªã¢ãã«ãåŸæ¥ã®PyTorchãTensorFlowã®åŠç¿ã«ãŒãã«çµ±åããæ¹æ³ããç§ãã¡ã®Trainer
APIã䜿ã£ãŠæ°ããããŒã¿ã»ããã§çŽ æ©ã埮調æŽãè¡ãæ¹æ³ã«ã€ããŠèª¬æããŸãã
ãªãtransformersã䜿ãå¿ èŠãããã®ã§ããããïŒ
-
䜿ããããææ°ã¢ãã«:
- èªç¶èšèªçè§£ã»çæãã³ã³ãã¥ãŒã¿ããžã§ã³ããªãŒãã£ãªã®åã¿ã¹ã¯ã§é«ãããã©ãŒãã³ã¹ãçºæ®ããŸãã
- æè²è ãå®åè ã«ãšã£ãŠã®äœãåå ¥éå£ã
- åŠç¿ããã¯ã©ã¹ã¯3ã€ã ãã§ããŠãŒã¶ãçŽé¢ããæœè±¡åã¯ã»ãšãã©ãããŸããã
- åŠç¿æžã¿ã¢ãã«ãå©çšããããã®çµ±äžãããAPIã
-
äœãèšç®ã³ã¹ããå°ãªãã«ãŒãã³ãããããªã³ã:
- ç ç©¶è ã¯ãåžžã«åãã¬ãŒãã³ã°ãè¡ãã®ã§ã¯ãªãããã¬ãŒãã³ã°ãããã¢ãã«ãå ±æããããšãã§ããŸãã
- å®åå®¶ã¯ãèšç®æéãçç£ã³ã¹ããåæžããããšãã§ããŸãã
- ãã¹ãŠã®ã¢ããªãã£ã«ãããŠã60,000以äžã®äºååŠç¿æžã¿ã¢ãã«ãæã€æ°å€ãã®ã¢ãŒããã¯ãã£ãæäŸããŸãã
-
ã¢ãã«ã®ã©ã€ãã¿ã€ã ã®ããããéšåã§é©åãªãã¬ãŒã ã¯ãŒã¯ãéžæå¯èœ:
- 3è¡ã®ã³ãŒãã§æå 端ã®ã¢ãã«ããã¬ãŒãã³ã°ã
- TF2.0/PyTorch/JAXãã¬ãŒã ã¯ãŒã¯éã§1ã€ã®ã¢ãã«ãèªåšã«ç§»åãããã
- åŠç¿ãè©äŸ¡ãçç£ã«é©ãããã¬ãŒã ã¯ãŒã¯ãã·ãŒã ã¬ã¹ã«éžæã§ããŸãã
-
ã¢ãã«ããµã³ãã«ãããŒãºã«åãããŠç°¡åã«ã«ã¹ã¿ãã€ãºå¯èœ:
- åèè ãçºè¡šããçµæãåçŸããããã«ãåã¢ãŒããã¯ãã£ã®äŸãæäŸããŠããŸãã
- ã¢ãã«å éšã¯å¯èœãªéãäžè²«ããŠå ¬éãããŠããŸãã
- ã¢ãã«ãã¡ã€ã«ã¯ã©ã€ãã©ãªãšã¯ç¬ç«ããŠå©çšããããšãã§ããè¿ éãªå®éšãå¯èœã§ãã
ãªãtransformersã䜿ã£ãŠã¯ãããªãã®ã§ããããïŒ
- ãã®ã©ã€ãã©ãªã¯ããã¥ãŒã©ã«ãããã®ããã®ãã«ãã£ã³ã°ãããã¯ã®ã¢ãžã¥ãŒã«åŒããŒã«ããã¯ã¹ã§ã¯ãããŸãããã¢ãã«ãã¡ã€ã«ã®ã³ãŒãã¯ãç ç©¶è ã远å ã®æœè±¡å/ãã¡ã€ã«ã«é£ã³èŸŒãããšãªããåã¢ãã«ãçŽ æ©ãå埩ã§ããããã«ãæå³çã«è¿œå ã®æœè±¡åã§ãªãã¡ã¯ã¿ãªã³ã°ãããŠããŸããã
- åŠç¿APIã¯ã©ã®ãããªã¢ãã«ã§ãåäœããããã§ã¯ãªããã©ã€ãã©ãªãæäŸããã¢ãã«ã§åäœããããã«æé©åãããŠããŸããäžè¬çãªæ©æ¢°åŠç¿ã®ã«ãŒãã«ã¯ãå¥ã®ã©ã€ãã©ãª(ããããAccelerate)ã䜿çšããå¿ èŠããããŸãã
- ç§ãã¡ã¯ã§ããã ãå€ãã®äœ¿çšäŸã玹ä»ããããåªåããŠããŸãããexamples ãã©ã«ã ã«ããã¹ã¯ãªããã¯ãããŸã§äŸã§ããããªãã®ç¹å®ã®åé¡ã«å¯ŸããŠããã«åäœããããã§ã¯ãªããããªãã®ããŒãºã«åãããããã«æ°è¡ã®ã³ãŒãã倿Žããå¿ èŠãããããšãäºæ³ãããŸãã
ã€ã³ã¹ããŒã«
pipã«ãŠ
ãã®ãªããžããªã¯ãPython 3.8+, Flax 0.4.1+, PyTorch 1.11+, TensorFlow 2.6+ ã§ãã¹ããããŠããŸãã
ð€Transformersã¯ä»®æ³ç°å¢ã«ã€ã³ã¹ããŒã«ããå¿ èŠããããŸããPythonã®ä»®æ³ç°å¢ã«æ £ããŠããªãå Žåã¯ããŠãŒã¶ãŒã¬ã€ãã確èªããŠãã ããã
ãŸãã䜿çšããããŒãžã§ã³ã®Pythonã§ä»®æ³ç°å¢ãäœæããã¢ã¯ãã£ããŒãããŸãã
ãã®åŸãFlax, PyTorch, TensorFlowã®ãã¡å°ãªããšã1ã€ãã€ã³ã¹ããŒã«ããå¿ èŠããããŸãã TensorFlowã€ã³ã¹ããŒã«ããŒãžãPyTorchã€ã³ã¹ããŒã«ããŒãžãFlaxãJaxã€ã³ã¹ããŒã«ããŒãžã§ãã䜿ãã®ãã©ãããã©ãŒã å¥ã®ã€ã³ã¹ããŒã«ã³ãã³ããåç §ããŠãã ããã
ãããã®ããã¯ãšã³ãã®ãããããã€ã³ã¹ããŒã«ãããŠããå Žåãð€Transformersã¯ä»¥äžã®ããã«pipã䜿çšããŠã€ã³ã¹ããŒã«ããããšãã§ããŸã:
pip install transformers
ãããµã³ãã«ã詊ãããããŸãã¯ã³ãŒãã®æå 端ãå¿ èŠã§ãæ°ãããªãªãŒã¹ãåŸ ãŠãªãå Žåã¯ãã©ã€ãã©ãªããœãŒã¹ããã€ã³ã¹ããŒã«ããå¿ èŠããããŸãã
condaã«ãŠ
ð€Transformersã¯ä»¥äžã®ããã«condaã䜿ã£ãŠèšçœ®ããããšãã§ããŸã:
conda install conda-forge::transformers
泚æ:
huggingface
ãã£ã³ãã«ããtransformers
ãã€ã³ã¹ããŒã«ããããšã¯éæšå¥šã§ãã
FlaxãPyTorchãTensorFlowãcondaã§ã€ã³ã¹ããŒã«ããæ¹æ³ã¯ãããããã®ã€ã³ã¹ããŒã«ããŒãžã«åŸã£ãŠãã ããã
泚æ: Windowsã§ã¯ããã£ãã·ã¥ã®æ©æµãåããããã«ãããããããŒã¢ãŒããæå¹ã«ããããä¿ãããããšããããŸãããã®ãããªå Žåã¯ããã®issueã§ãç¥ãããã ããã
ã¢ãã«ã¢ãŒããã¯ãã£
ð€TransformersãæäŸãã å šã¢ãã«ãã§ãã¯ãã€ã³ã ã¯ããŠãŒã¶ãŒãçµç¹ã«ãã£ãŠçŽæ¥ã¢ããããŒããããhuggingface.co model hubããã·ãŒã ã¬ã¹ã«çµ±åãããŠããŸãã
çŸåšã®ãã§ãã¯ãã€ã³ãæ°:
ð€Transformersã¯çŸåšã以äžã®ã¢ãŒããã¯ãã£ãæäŸããŠããŸãïŒããããã®ãã€ã¬ãã«ãªèŠçŽã¯ãã¡ããåç §ããŠãã ããïŒ:
- ALBERT (Google Research and the Toyota Technological Institute at Chicago ãã) Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut ããå ¬éãããç ç©¶è«æ: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- ALIGN (Google Research ãã) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. ããå ¬éãããç ç©¶è«æ Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- AltCLIP (BAAI ãã) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell ããå ¬éãããç ç©¶è«æ: AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
- Audio Spectrogram Transformer (MIT ãã) Yuan Gong, Yu-An Chung, James Glass ããå ¬éãããç ç©¶è«æ: AST: Audio Spectrogram Transformer
- Autoformer (from Tsinghua University) released with the paper Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
- Bark (from Suno) released in the repository suno-ai/bark by Suno AI team.
- BART (Facebook ãã) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer ããå ¬éãããç ç©¶è«æ: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- BARThez (Ãcole polytechnique ãã) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis ããå ¬éãããç ç©¶è«æ: BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- BARTpho (VinAI Research ãã) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen ããå ¬éãããç ç©¶è«æ: BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
- BEiT (Microsoft ãã) Hangbo Bao, Li Dong, Furu Wei ããå ¬éãããç ç©¶è«æ: BEiT: BERT Pre-Training of Image Transformers
- BERT (Google ãã) Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova ããå ¬éãããç ç©¶è«æ: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- BERT For Sequence Generation (Google ãã) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ããå ¬éãããç ç©¶è«æ: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- BERTweet (VinAI Research ãã) Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen ããå ¬éãããç ç©¶è«æ: BERTweet: A pre-trained language model for English Tweets
- BigBird-Pegasus (Google Research ãã) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ããå ¬éãããç ç©¶è«æ: Big Bird: Transformers for Longer Sequences
- BigBird-RoBERTa (Google Research ãã) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ããå ¬éãããç ç©¶è«æ: Big Bird: Transformers for Longer Sequences
- BioGpt (Microsoft Research AI4Science ãã) Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu ããå ¬éãããç ç©¶è«æ: BioGPT: generative pre-trained transformer for biomedical text generation and mining
- BiT (Google AI ãã) Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil ããå ¬éãããç ç©¶è«æ: Big Transfer (BiT): General Visual Representation LearningHoulsby.
- Blenderbot (Facebook ãã) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ããå ¬éãããç ç©¶è«æ: Recipes for building an open-domain chatbot
- BlenderbotSmall (Facebook ãã) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ããå ¬éãããç ç©¶è«æ: Recipes for building an open-domain chatbot
- BLIP (Salesforce ãã) Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi ããå ¬éãããç ç©¶è«æ: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- BLIP-2 (Salesforce ãã) Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. ããå ¬éãããç ç©¶è«æ BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- BLOOM (BigScience workshop ãã) BigScience Workshop ããå ¬éãããŸãã.
- BORT (Alexa ãã) Adrian de Wynter and Daniel J. Perry ããå ¬éãããç ç©¶è«æ: Optimal Subarchitecture Extraction For BERT
- BridgeTower (Harbin Institute of Technology/Microsoft Research Asia/Intel Labs ãã) released with the paper BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
- BROS (NAVER CLOVA ãã) Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. ããå ¬éãããç ç©¶è«æ BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
- ByT5 (Google Research ãã) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ããå ¬éãããç ç©¶è«æ: ByT5: Towards a token-free future with pre-trained byte-to-byte models
- CamemBERT (Inria/Facebook/Sorbonne ãã) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Ãric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot ããå ¬éãããç ç©¶è«æ: CamemBERT: a Tasty French Language Model
- CANINE (Google Research ãã) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ããå ¬éãããç ç©¶è«æ: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
- Chinese-CLIP (OFA-Sys ãã) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ããå ¬éãããç ç©¶è«æ: Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
- CLAP (LAION-AI ãã) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. ããå ¬éãããç ç©¶è«æ Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
- CLIP (OpenAI ãã) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Learning Transferable Visual Models From Natural Language Supervision
- CLIPSeg (University of Göttingen ãã) Timo LÃŒddecke and Alexander Ecker ããå ¬éãããç ç©¶è«æ: Image Segmentation Using Text and Image Prompts
- CLVP released with the paper Better speech synthesis through scaling by James Betker.
- CodeGen (Salesforce ãã) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ããå ¬éãããç ç©¶è«æ: A Conversational Paradigm for Program Synthesis
- CodeLlama (MetaAI ãã) Baptiste RoziÚre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. ããå ¬éãããç ç©¶è«æ Code Llama: Open Foundation Models for Code
- Cohere (Cohere ãã) Cohere. ããå ¬éãããç ç©¶è«æ Command-R: Retrieval Augmented Generation at Production Scale
- Conditional DETR (Microsoft Research Asia ãã) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ããå ¬éãããç ç©¶è«æ: Conditional DETR for Fast Training Convergence
- ConvBERT (YituTech ãã) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ããå ¬éãããç ç©¶è«æ: ConvBERT: Improving BERT with Span-based Dynamic Convolution
- ConvNeXT (Facebook AI ãã) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ããå ¬éãããç ç©¶è«æ: A ConvNet for the 2020s
- ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
- CPM (Tsinghua University ãã) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ããå ¬éãããç ç©¶è«æ: CPM: A Large-scale Generative Chinese Pre-trained Language Model
- CPM-Ant (OpenBMB ãã) OpenBMB ããå ¬éãããŸãã.
- CTRL (Salesforce ãã) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ããå ¬éãããç ç©¶è«æ: CTRL: A Conditional Transformer Language Model for Controllable Generation
- CvT (Microsoft ãã) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ããå ¬éãããç ç©¶è«æ: CvT: Introducing Convolutions to Vision Transformers
- Data2Vec (Facebook ãã) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ããå ¬éãããç ç©¶è«æ: Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
- DeBERTa (Microsoft ãã) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ããå ¬éãããç ç©¶è«æ: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
- DeBERTa-v2 (Microsoft ãã) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ããå ¬éãããç ç©¶è«æ: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
- Decision Transformer (Berkeley/Facebook/Google ãã) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ããå ¬éãããç ç©¶è«æ: Decision Transformer: Reinforcement Learning via Sequence Modeling
- Deformable DETR (SenseTime Research ãã) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ããå ¬éãããç ç©¶è«æ: Deformable DETR: Deformable Transformers for End-to-End Object Detection
- DeiT (Facebook ãã) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou ããå ¬éãããç ç©¶è«æ: Training data-efficient image transformers & distillation through attention
- DePlot (Google AI ãã) Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. ããå ¬éãããç ç©¶è«æ DePlot: One-shot visual language reasoning by plot-to-table translation
- Depth Anything (University of Hong Kong and TikTok ãã) Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. ããå ¬éãããç ç©¶è«æ Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
- DETA (The University of Texas at Austin ãã) Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp KrÀhenbÃŒhl. ããå ¬éãããç ç©¶è«æ NMS Strikes Back
- DETR (Facebook ãã) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ããå ¬éãããç ç©¶è«æ: End-to-End Object Detection with Transformers
- DialoGPT (Microsoft Research ãã) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ããå ¬éãããç ç©¶è«æ: DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
- DiNAT (SHI Labs ãã) Ali Hassani and Humphrey Shi ããå ¬éãããç ç©¶è«æ: Dilated Neighborhood Attention Transformer
- DINOv2 (Meta AI ãã) Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. ããå ¬éãããç ç©¶è«æ DINOv2: Learning Robust Visual Features without Supervision
- DistilBERT (HuggingFace ãã), Victor Sanh, Lysandre Debut and Thomas Wolf. åãææ³ã§ GPT2, RoBERTa ãš Multilingual BERT ã®å§çž®ãè¡ããŸãã.å§çž®ãããã¢ãã«ã¯ãããã DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterãDistilRoBERTaãDistilmBERT ãšåä»ããããŸãã. å ¬éãããç ç©¶è«æ: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- DiT (Microsoft Research ãã) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ããå ¬éãããç ç©¶è«æ: DiT: Self-supervised Pre-training for Document Image Transformer
- Donut (NAVER ãã), Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ããå ¬éãããç ç©¶è«æ: OCR-free Document Understanding Transformer
- DPR (Facebook ãã) Vladimir Karpukhin, Barlas OÄuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ããå ¬éãããç ç©¶è«æ: Dense Passage Retrieval for Open-Domain Question Answering
- DPT (Intel Labs ãã) René Ranftl, Alexey Bochkovskiy, Vladlen Koltun ããå ¬éãããç ç©¶è«æ: Vision Transformers for Dense Prediction
- EfficientFormer (Snap Research ãã) Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. ããå ¬éãããç ç©¶è«æ EfficientFormer: Vision Transformers at MobileNetSpeed
- EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
- ELECTRA (Google Research/Stanford University ãã) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ããå ¬éãããç ç©¶è«æ: ELECTRA: Pre-training text encoders as discriminators rather than generators
- EnCodec (Meta AI ãã) Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. ããå ¬éãããç ç©¶è«æ High Fidelity Neural Audio Compression
- EncoderDecoder (Google Research ãã) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ããå ¬éãããç ç©¶è«æ: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- ERNIE (Baidu ãã) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ããå ¬éãããç ç©¶è«æ: ERNIE: Enhanced Representation through Knowledge Integration
- ErnieM (Baidu ãã) Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. ããå ¬éãããç ç©¶è«æ ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
- ESM (Meta AI ãã) ã¯ãã©ã³ã¹ãã©ãŒããŒãããã€ã³èšèªã¢ãã«ã§ã. ESM-1b 㯠Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus ããå ¬éãããç ç©¶è«æ: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. ESM-1v 㯠Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rivesãããå ¬éãããç ç©¶è«æ: Language models enable zero-shot prediction of the effects of mutations on protein function. ESM-2 ãšãESMFold 㯠Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives ããå ¬éãããç ç©¶è«æ: Language models of protein sequences at the scale of evolution enable accurate structure prediction
- Falcon (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
- FastSpeech2Conformer (ESPnet and Microsoft Research ãã) Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang. ããå ¬éãããç ç©¶è«æ Recent Developments On Espnet Toolkit Boosted By Conformer
- FLAN-T5 (Google AI ãã) Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V ããå ¬éãããã¬ããžããªãŒ google-research/t5x Le, and Jason Wei
- FLAN-UL2 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
- FlauBERT (CNRS ãã) Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab ããå ¬éãããç ç©¶è«æ: FlauBERT: Unsupervised Language Model Pre-training for French
- FLAVA (Facebook AI ãã) Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela ããå ¬éãããç ç©¶è«æ: FLAVA: A Foundational Language And Vision Alignment Model
- FNet (Google Research ãã) James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon ããå ¬éãããç ç©¶è«æ: FNet: Mixing Tokens with Fourier Transforms
- FocalNet (Microsoft Research ãã) Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. ããå ¬éãããç ç©¶è«æ Focal Modulation Networks
- Funnel Transformer (CMU/Google Brain ãã) Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le ããå ¬éãããç ç©¶è«æ: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
- Fuyu (ADEPT ãã) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaÄnak TaÅırlar. ããå ¬éãããç ç©¶è«æ blog post
- Gemma (Google ãã) the Gemma Google team. ããå ¬éãããç ç©¶è«æ Gemma: Open Models Based on Gemini Technology and Research
- GIT (Microsoft Research ãã) Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. ããå ¬éãããç ç©¶è«æ GIT: A Generative Image-to-text Transformer for Vision and Language
- GLPN (KAIST ãã) Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim ããå ¬éãããç ç©¶è«æ: Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
- GPT (OpenAI ãã) Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Improving Language Understanding by Generative Pre-Training
- GPT Neo (EleutherAI ãã) Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy ããå ¬éãããã¬ããžããªãŒ : EleutherAI/gpt-neo
- GPT NeoX (EleutherAI ãã) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach ããå ¬éãããç ç©¶è«æ: GPT-NeoX-20B: An Open-Source Autoregressive Language Model
- GPT NeoX Japanese (ABEJA ãã) Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori ãããªãªãŒã¹.
- GPT-2 (OpenAI ãã) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Language Models are Unsupervised Multitask Learners
- GPT-J (EleutherAI ãã) Ben Wang and Aran Komatsuzaki ããå ¬éãããã¬ããžããªãŒ kingoflolz/mesh-transformer-jax
- GPT-Sw3 (AI-Sweden ãã) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Ãhman, Fredrik Carlsson, Magnus Sahlgren ããå ¬éãããç ç©¶è«æ: Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
- GPTBigCode (BigCode ãã) Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo GarcÃa del RÃo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. ããå ¬éãããç ç©¶è«æ SantaCoder: don't reach for the stars!
- GPTSAN-japanese tanreinama/GPTSAN 忬ä¿ä¹(tanreinama)ãããªãªãŒã¹ãããŸãã.
- Graphormer (Microsoft ãã) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu ããå ¬éãããç ç©¶è«æ: Do Transformers Really Perform Bad for Graph Representation?.
- GroupViT (UCSD, NVIDIA ãã) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ããå ¬éãããç ç©¶è«æ: GroupViT: Semantic Segmentation Emerges from Text Supervision
- HerBERT (Allegro.pl, AGH University of Science and Technology ãã) Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. ããå ¬éãããç ç©¶è«æ KLEJ: Comprehensive Benchmark for Polish Language Understanding
- Hubert (Facebook ãã) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ããå ¬éãããç ç©¶è«æ: HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
- I-BERT (Berkeley ãã) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ããå ¬éãããç ç©¶è«æ: I-BERT: Integer-only BERT Quantization
- IDEFICS (from HuggingFace) released with the paper OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh.
- ImageGPT (OpenAI ãã) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Generative Pretraining from Pixels
- Informer (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
- InstructBLIP (Salesforce ãã) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. ããå ¬éãããç ç©¶è«æ InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- Jukebox (OpenAI ãã) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Jukebox: A Generative Model for Music
- KOSMOS-2 (from Microsoft Research Asia) released with the paper Kosmos-2: Grounding Multimodal Large Language Models to the World by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
- LayoutLM (Microsoft Research Asia ãã) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ããå ¬éãããç ç©¶è«æ: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
- LayoutLMv2 (Microsoft Research Asia ãã) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ããå ¬éãããç ç©¶è«æ: LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
- LayoutLMv3 (Microsoft Research Asia ãã) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ããå ¬éãããç ç©¶è«æ: LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
- LayoutXLM (Microsoft Research Asia ãã) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ããå ¬éãããç ç©¶è«æ: LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
- LED (AllenAI ãã) Iz Beltagy, Matthew E. Peters, Arman Cohan ããå ¬éãããç ç©¶è«æ: Longformer: The Long-Document Transformer
- LeViT (Meta AI ãã) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze ããå ¬éãããç ç©¶è«æ: LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference
- LiLT (South China University of Technology ãã) Jiapeng Wang, Lianwen Jin, Kai Ding ããå ¬éãããç ç©¶è«æ: LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
- LLaMA (The FAIR team of Meta AI ãã) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste RoziÚre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. ããå ¬éãããç ç©¶è«æ LLaMA: Open and Efficient Foundation Language Models
- Llama2 (The FAIR team of Meta AI ãã) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. ããå ¬éãããç ç©¶è«æ Llama2: Open Foundation and Fine-Tuned Chat Models
- LLaVa (Microsoft Research & University of Wisconsin-Madison ãã) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. ããå ¬éãããç ç©¶è«æ Visual Instruction Tuning
- LLaVA-NeXT (Microsoft Research & University of Wisconsin-Madison ãã) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. ããå ¬éãããç ç©¶è«æ Improved Baselines with Visual Instruction Tuning
- Longformer (AllenAI ãã) Iz Beltagy, Matthew E. Peters, Arman Cohan ããå ¬éãããç ç©¶è«æ: Longformer: The Long-Document Transformer
- LongT5 (Google AI ãã) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ããå ¬éãããç ç©¶è«æ: LongT5: Efficient Text-To-Text Transformer for Long Sequences
- LUKE (Studio Ousia ãã) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ããå ¬éãããç ç©¶è«æ: LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
- LXMERT (UNC Chapel Hill ãã) Hao Tan and Mohit Bansal ããå ¬éãããç ç©¶è«æ: LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering
- M-CTC-T (Facebook ãã) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ããå ¬éãããç ç©¶è«æ: Pseudo-Labeling For Massively Multilingual Speech Recognition
- M2M100 (Facebook ãã) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ããå ¬éãããç ç©¶è«æ: Beyond English-Centric Multilingual Machine Translation
- MADLAD-400 (from Google) released with the paper MADLAD-400: A Multilingual And Document-Level Large Audited Dataset by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat.
- Mamba (Albert Gu and Tri Dao ãã) Albert Gu and Tri Dao. ããå ¬éãããç ç©¶è«æ Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- MarianMT Jörg Tiedemann ãã. OPUS ã䜿ããªããåŠç¿ããã "Machine translation" (ãã·ã³ãã©ã³ã¹ã¬ãŒã·ã§ã³) ã¢ãã«. Marian Framework ã¯Microsoft Translator TeamããçŸåšéçºäžã§ã.
- MarkupLM (Microsoft Research Asia ãã) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ããå ¬éãããç ç©¶è«æ: MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
- Mask2Former (FAIR and UIUC ãã) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. ããå ¬éãããç ç©¶è«æ Masked-attention Mask Transformer for Universal Image Segmentation
- MaskFormer (Meta and UIUC ãã) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov ããå ¬éãããç ç©¶è«æ: Per-Pixel Classification is Not All You Need for Semantic Segmentation
- MatCha (Google AI ãã) Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. ããå ¬éãããç ç©¶è«æ MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
- mBART (Facebook ãã) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ããå ¬éãããç ç©¶è«æ: Multilingual Denoising Pre-training for Neural Machine Translation
- mBART-50 (Facebook ãã) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ããå ¬éãããç ç©¶è«æ: Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
- MEGA (Facebook ãã) Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. ããå ¬éãããç ç©¶è«æ Mega: Moving Average Equipped Gated Attention
- Megatron-BERT (NVIDIA ãã) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ããå ¬éãããç ç©¶è«æ: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Megatron-GPT2 (NVIDIA ãã) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ããå ¬éãããç ç©¶è«æ: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- MGP-STR (Alibaba Research ãã) Peng Wang, Cheng Da, and Cong Yao. ããå ¬éãããç ç©¶è«æ Multi-Granularity Prediction for Scene Text Recognition
- Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
- Mixtral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
- mLUKE (Studio Ousia ãã) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ããå ¬éãããç ç©¶è«æ: mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
- MMS (Facebook ãã) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. ããå ¬éãããç ç©¶è«æ Scaling Speech Technology to 1,000+ Languages
- MobileBERT (CMU/Google Brain ãã) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ããå ¬éãããç ç©¶è«æ: MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
- MobileNetV1 (Google Inc. ãã) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ããå ¬éãããç ç©¶è«æ: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- MobileNetV2 (Google Inc. ãã) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ããå ¬éãããç ç©¶è«æ: MobileNetV2: Inverted Residuals and Linear Bottlenecks
- MobileViT (Apple ãã) Sachin Mehta and Mohammad Rastegari ããå ¬éãããç ç©¶è«æ: MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
- MobileViTV2 (Apple ãã) Sachin Mehta and Mohammad Rastegari. ããå ¬éãããç ç©¶è«æ Separable Self-attention for Mobile Vision Transformers
- MPNet (Microsoft Research ãã) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ããå ¬éãããç ç©¶è«æ: MPNet: Masked and Permuted Pre-training for Language Understanding
- MPT (MosaiML ãã) the MosaicML NLP Team. ããå ¬éãããç ç©¶è«æ llm-foundry
- MRA (the University of Wisconsin - Madison ãã) Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. ããå ¬éãããç ç©¶è«æ Multi Resolution Analysis (MRA) for Approximate Self-Attention
- MT5 (Google AI ãã) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ããå ¬éãããç ç©¶è«æ: mT5: A massively multilingual pre-trained text-to-text transformer
- MusicGen (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
- MusicGen Melody (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
- MVP (RUC AI Box ãã) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ããå ¬éãããç ç©¶è«æ: MVP: Multi-task Supervised Pre-training for Natural Language Generation
- NAT (SHI Labs ãã) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ããå ¬éãããç ç©¶è«æ: Neighborhood Attention Transformer
- Nezha (Huawei Noahâs Ark Lab ãã) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ããå ¬éãããç ç©¶è«æ: NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- NLLB (Meta ãã) the NLLB team ããå ¬éãããç ç©¶è«æ: No Language Left Behind: Scaling Human-Centered Machine Translation
- NLLB-MOE (Meta ãã) the NLLB team. ããå ¬éãããç ç©¶è«æ No Language Left Behind: Scaling Human-Centered Machine Translation
- Nougat (Meta AI ãã) Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. ããå ¬éãããç ç©¶è«æ Nougat: Neural Optical Understanding for Academic Documents
- Nyströmformer (the University of Wisconsin - Madison ãã) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ããå ¬éãããç ç©¶è«æ: Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
- OneFormer (SHI Labs ãã) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ããå ¬éãããç ç©¶è«æ: OneFormer: One Transformer to Rule Universal Image Segmentation
- OpenLlama (from s-JoL) released on GitHub (now removed).
- OPT (Meta AI ãã) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ããå ¬éãããç ç©¶è«æ: OPT: Open Pre-trained Transformer Language Models
- OWL-ViT (Google AI ãã) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ããå ¬éãããç ç©¶è«æ: Simple Open-Vocabulary Object Detection with Vision Transformers
- OWLv2 (Google AI ãã) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. ããå ¬éãããç ç©¶è«æ Scaling Open-Vocabulary Object Detection
- PatchTSMixer ( IBM Research ãã) Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. ããå ¬éãããç ç©¶è«æ TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting
- PatchTST (IBM ãã) Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. ããå ¬éãããç ç©¶è«æ A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
- Pegasus (Google ãã) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ããå ¬éãããç ç©¶è«æ: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- PEGASUS-X (Google ãã) Jason Phang, Yao Zhao, and Peter J. Liu ããå ¬éãããç ç©¶è«æ: Investigating Efficiently Extending Transformers for Long Input Summarization
- Perceiver IO (Deepmind ãã) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira ããå ¬éãããç ç©¶è«æ: Perceiver IO: A General Architecture for Structured Inputs & Outputs
- Persimmon (ADEPT ãã) Erich Elsen, Augustus Odena, Maxwell Nye, SaÄnak TaÅırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. ããå ¬éãããç ç©¶è«æ blog post
- Phi (from Microsoft) released with the papers - Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
- PhoBERT (VinAI Research ãã) Dat Quoc Nguyen and Anh Tuan Nguyen ããå ¬éãããç ç©¶è«æ: PhoBERT: Pre-trained language models for Vietnamese
- Pix2Struct (Google ãã) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. ããå ¬éãããç ç©¶è«æ Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
- PLBart (UCLA NLP ãã) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ããå ¬éãããç ç©¶è«æ: Unified Pre-training for Program Understanding and Generation
- PoolFormer (Sea AI Labs ãã) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ããå ¬éãããç ç©¶è«æ: MetaFormer is Actually What You Need for Vision
- Pop2Piano released with the paper Pop2Piano : Pop Audio-based Piano Cover Generation by Jongho Choi, Kyogu Lee.
- ProphetNet (Microsoft Research ãã) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ããå ¬éãããç ç©¶è«æ: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
- PVT (Nanjing University, The University of Hong Kong etc. ãã) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. ããå ¬éãããç ç©¶è«æ Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- PVTv2 (Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc. ãã) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. ããå ¬éãããç ç©¶è«æ PVT v2: Improved Baselines with Pyramid Vision Transformer
- QDQBert (NVIDIA ãã) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ããå ¬éãããç ç©¶è«æ: Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
- Qwen2 (the Qwen team, Alibaba Group ãã) Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu. ããå ¬éãããç ç©¶è«æ Qwen Technical Report
- RAG (Facebook ãã) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich KÃŒttler, Mike Lewis, Wen-tau Yih, Tim RocktÀschel, Sebastian Riedel, Douwe Kiela ããå ¬éãããç ç©¶è«æ: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- REALM (Google Research ãã) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ããå ¬éãããç ç©¶è«æ: REALM: Retrieval-Augmented Language Model Pre-Training
- Reformer (Google Research ãã) Nikita Kitaev, Åukasz Kaiser, Anselm Levskaya ããå ¬éãããç ç©¶è«æ: Reformer: The Efficient Transformer
- RegNet (META Platforms ãã) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár ããå ¬éãããç ç©¶è«æ: Designing Network Design Space
- RemBERT (Google Research ãã) Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder ããå ¬éãããç ç©¶è«æ: Rethinking embedding coupling in pre-trained language models
- ResNet (Microsoft Research ãã) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun ããå ¬éãããç ç©¶è«æ: Deep Residual Learning for Image Recognition
- RoBERTa (Facebook ãã), Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ããå ¬éãããç ç©¶è«æ: RoBERTa: A Robustly Optimized BERT Pretraining Approach
- RoBERTa-PreLayerNorm (Facebook ãã) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ããå ¬éãããç ç©¶è«æ: fairseq: A Fast, Extensible Toolkit for Sequence Modeling
- RoCBert (WeChatAI ãã) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ããå ¬éãããç ç©¶è«æ: RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining
- RoFormer (ZhuiyiTechnology ãã), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ããå ¬éãããç ç©¶è«æ: RoFormer: Enhanced Transformer with Rotary Position Embedding
- RWKV (Bo Peng ãã) Bo Peng. ããå ¬éãããç ç©¶è«æ this repo
- SeamlessM4T (from Meta AI) released with the paper SeamlessM4T â Massively Multilingual & Multimodal Machine Translation by the Seamless Communication team.
- SeamlessM4Tv2 (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
- SegFormer (NVIDIA ãã) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ããå ¬éãããç ç©¶è«æ: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
- SegGPT (Beijing Academy of Artificial Intelligence (BAAI ãã) Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. ããå ¬éãããç ç©¶è«æ SegGPT: Segmenting Everything In Context
- Segment Anything (Meta AI ãã) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. ããå ¬éãããç ç©¶è«æ Segment Anything
- SEW (ASAPP ãã) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ããå ¬éãããç ç©¶è«æ: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
- SEW-D (ASAPP ãã) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ããå ¬éãããç ç©¶è«æ: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
- SigLIP (Google AI ãã) Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. ããå ¬éãããç ç©¶è«æ Sigmoid Loss for Language Image Pre-Training
- SpeechT5 (Microsoft Research ãã) Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. ããå ¬éãããç ç©¶è«æ SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
- SpeechToTextTransformer (Facebook ãã), Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ããå ¬éãããç ç©¶è«æ: fairseq S2T: Fast Speech-to-Text Modeling with fairseq
- SpeechToTextTransformer2 (Facebook ãã), Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ããå ¬éãããç ç©¶è«æ: Large-Scale Self- and Semi-Supervised Learning for Speech Translation
- Splinter (Tel Aviv University ãã), Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ããå ¬éãããç ç©¶è«æ: Few-Shot Question Answering by Pretraining Span Selection
- SqueezeBERT (Berkeley ãã) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ããå ¬éãããç ç©¶è«æ: SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
- StableLm (from Stability AI) released with the paper StableLM 3B 4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
- Starcoder2 (from BigCode team) released with the paper StarCoder 2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas KrauÃ, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries.
- SuperPoint (from MagicLeap) released with the paper SuperPoint: Self-Supervised Interest Point Detection and Description by Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
- SwiftFormer (MBZUAI ãã) Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. ããå ¬éãããç ç©¶è«æ SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- Swin Transformer (Microsoft ãã) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ããå ¬éãããç ç©¶è«æ: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- Swin Transformer V2 (Microsoft ãã) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ããå ¬éãããç ç©¶è«æ: Swin Transformer V2: Scaling Up Capacity and Resolution
- Swin2SR (University of WÃŒrzburg ãã) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ããå ¬éãããç ç©¶è«æ: Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
- SwitchTransformers (Google ãã) William Fedus, Barret Zoph, Noam Shazeer ããå ¬éãããç ç©¶è«æ: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- T5 (Google AI ãã) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ããå ¬éãããç ç©¶è«æ: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- T5v1.1 (Google AI ãã) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ããå ¬éãããã¬ããžããªãŒ google-research/text-to-text-transfer-transformer
- Table Transformer (Microsoft Research ãã) Brandon Smock, Rohith Pesala, Robin Abraham ããå ¬éãããç ç©¶è«æ: PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
- TAPAS (Google AI ãã) Jonathan Herzig, PaweÅ Krzysztof Nowak, Thomas MÃŒller, Francesco Piccinno and Julian Martin Eisenschlos ããå ¬éãããç ç©¶è«æ: TAPAS: Weakly Supervised Table Parsing via Pre-training
- TAPEX (Microsoft Research ãã) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ããå ¬éãããç ç©¶è«æ: TAPEX: Table Pre-training via Learning a Neural SQL Executor
- Time Series Transformer (HuggingFace ãã).
- TimeSformer (Facebook ãã) Gedas Bertasius, Heng Wang, Lorenzo Torresani ããå ¬éãããç ç©¶è«æ: Is Space-Time Attention All You Need for Video Understanding?
- Trajectory Transformer (the University of California at Berkeley ãã) Michael Janner, Qiyang Li, Sergey Levine ããå ¬éãããç ç©¶è«æ: Offline Reinforcement Learning as One Big Sequence Modeling Problem
- Transformer-XL (Google/CMU ãã) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ããå ¬éãããç ç©¶è«æ: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- TrOCR (Microsoft ãã), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ããå ¬éãããç ç©¶è«æ: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
- TVLT (from UNC Chapel Hill ãã), Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal ããå ¬éãããç ç©¶è«æ: TVLT: Textless Vision-Language Transformer
- TVP (Intel ãã), Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding ããå ¬éãããç ç©¶è«æ: Text-Visual Prompting for Efficient 2D Temporal Video Grounding
- UDOP (Microsoft Research ãã) Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal. ããå ¬éãããç ç©¶è«æ Unifying Vision, Text, and Layout for Universal Document Processing
- UL2 (Google Research ãã) Yi Tay, Mostafa Dehghani, Vinh Q ããå ¬éãããç ç©¶è«æ: Unifying Language Learning Paradigms Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
- UMT5 (Google Research ãã) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. ããå ¬éãããç ç©¶è«æ UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
- UniSpeech (Microsoft Research ãã) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ããå ¬éãããç ç©¶è«æ: UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
- UniSpeechSat (Microsoft Research ãã) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ããå ¬éãããç ç©¶è«æ: UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
- UnivNet (from Kakao Corporation) released with the paper UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim.
- UPerNet (Peking University ãã) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. ããå ¬éãããç ç©¶è«æ Unified Perceptual Parsing for Scene Understanding
- VAN (Tsinghua University and Nankai University ãã) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ããå ¬éãããç ç©¶è«æ: Visual Attention Network
- VideoMAE (Multimedia Computing Group, Nanjing University ãã) Zhan Tong, Yibing Song, Jue Wang, Limin Wang ããå ¬éãããç ç©¶è«æ: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- ViLT (NAVER AI Lab/Kakao Enterprise/Kakao Brain ãã) Wonjae Kim, Bokyung Son, Ildoo Kim ããå ¬éãããç ç©¶è«æ: ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
- VipLlava (University of WisconsinâMadison ãã) Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. ããå ¬éãããç ç©¶è«æ Making Large Multimodal Models Understand Arbitrary Visual Prompts
- Vision Transformer (ViT) (Google AI ãã) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ããå ¬éãããç ç©¶è«æ: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- VisualBERT (UCLA NLP ãã) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ããå ¬éãããç ç©¶è«æ: VisualBERT: A Simple and Performant Baseline for Vision and Language
- ViT Hybrid (Google AI ãã) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ããå ¬éãããç ç©¶è«æ: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- VitDet (Meta AI ãã) Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. ããå ¬éãããç ç©¶è«æ Exploring Plain Vision Transformer Backbones for Object Detection
- ViTMAE (Meta AI ãã) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick ããå ¬éãããç ç©¶è«æ: Masked Autoencoders Are Scalable Vision Learners
- ViTMatte (HUST-VL ãã) Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. ããå ¬éãããç ç©¶è«æ ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
- ViTMSN (Meta AI ãã) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ããå ¬éãããç ç©¶è«æ: Masked Siamese Networks for Label-Efficient Learning
- VITS (Kakao Enterprise ãã) Jaehyeon Kim, Jungil Kong, Juhee Son. ããå ¬éãããç ç©¶è«æ Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
- ViViT (from Google Research) released with the paper ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario LuÄiÄ, Cordelia Schmid.
- Wav2Vec2 (Facebook AI ãã) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ããå ¬éãããç ç©¶è«æ: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Wav2Vec2-BERT (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
- Wav2Vec2-Conformer (Facebook AI ãã) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ããå ¬éãããç ç©¶è«æ: FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ
- Wav2Vec2Phoneme (Facebook AI ãã) Qiantong Xu, Alexei Baevski, Michael Auli ããå ¬éãããç ç©¶è«æ: Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
- WavLM (Microsoft Research ãã) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei ããå ¬éãããç ç©¶è«æ: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
- Whisper (OpenAI ãã) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ããå ¬éãããç ç©¶è«æ: Robust Speech Recognition via Large-Scale Weak Supervision
- X-CLIP (Microsoft Research ãã) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ããå ¬éãããç ç©¶è«æ: Expanding Language-Image Pretrained Models for General Video Recognition
- X-MOD (Meta AI ãã) Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. ããå ¬éãããç ç©¶è«æ Lifting the Curse of Multilinguality by Pre-training Modular Transformers
- XGLM (From Facebook AI) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li ããå ¬éãããç ç©¶è«æ: Few-shot Learning with Multilingual Language Models
- XLM (Facebook ãã) Guillaume Lample and Alexis Conneau ããå ¬éãããç ç©¶è«æ: Cross-lingual Language Model Pretraining
- XLM-ProphetNet (Microsoft Research ãã) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ããå ¬éãããç ç©¶è«æ: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
- XLM-RoBERTa (Facebook AI ãã), Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ããå ¬éãããç ç©¶è«æ: Unsupervised Cross-lingual Representation Learning at Scale
- XLM-RoBERTa-XL (Facebook AI ãã), Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ããå ¬éãããç ç©¶è«æ: Larger-Scale Transformers for Multilingual Masked Language Modeling
- XLM-V (Meta AI ãã) Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa ããå ¬éãããç ç©¶è«æ: XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
- XLNet (Google/CMU ãã) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ããå ¬éãããç ç©¶è«æ: XLNet: Generalized Autoregressive Pretraining for Language Understanding
- XLS-R (Facebook AI ãã) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ããå ¬éãããç ç©¶è«æ: XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
- XLSR-Wav2Vec2 (Facebook AI ãã) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ããå ¬éãããç ç©¶è«æ: Unsupervised Cross-Lingual Representation Learning For Speech Recognition
- YOLOS (Huazhong University of Science & Technology ãã) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ããå ¬éãããç ç©¶è«æ: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
- YOSO (the University of Wisconsin - Madison ãã) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ããå ¬éãããç ç©¶è«æ: You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
- æ°ããã¢ãã«ãæçš¿ãããã§ããïŒæ°ããã¢ãã«ã远å ããããã®ã¬ã€ããšããŠã詳现ãªã¬ã€ããšãã³ãã¬ãŒãã远å ãããŸããããããã¯ãªããžããªã®
templates
ãã©ã«ãã«ãããŸããPRãå§ããåã«ãå¿ ãã³ã³ããªãã¥ãŒã·ã§ã³ã¬ã€ãã確èªããã¡ã³ããã«é£çµ¡ãããããã£ãŒãããã¯ãåéããããã«issueãéããŠãã ããã
åã¢ãã«ãFlaxãPyTorchãTensorFlowã§å®è£ ãããŠããããð€Tokenizersã©ã€ãã©ãªã«æ¯ããããé¢é£ããŒã¯ãã€ã¶ãæã£ãŠãããã¯ããã®è¡šãåç §ããŠãã ããã
ãããã®å®è£ ã¯ããã€ãã®ããŒã¿ã»ããã§ãã¹ããããŠãã(ãµã³ãã«ã¹ã¯ãªãããåç §)ããªãªãžãã«ã®å®è£ ã®æ§èœãšäžèŽããã¯ãã§ãããæ§èœã®è©³çްã¯documentationã®Examplesã»ã¯ã·ã§ã³ã§èŠãããšãã§ããŸãã
ããã«è©³ãã
ã»ã¯ã·ã§ã³ | æŠèŠ |
---|---|
ããã¥ã¡ã³ã | å®å šãªAPIããã¥ã¡ã³ããšãã¥ãŒããªã¢ã« |
ã¿ã¹ã¯æŠèŠ | ð€TransformersããµããŒãããã¿ã¹ã¯ |
ååŠçãã¥ãŒããªã¢ã« | ã¢ãã«çšã®ããŒã¿ãæºåããããã«Tokenizer ã¯ã©ã¹ãäœ¿çš |
ãã¬ãŒãã³ã°ãšåŸ®èª¿æŽ | PyTorch/TensorFlowã®åŠç¿ã«ãŒããšTrainer APIã§ð€TransformersãæäŸããã¢ãã«ãäœ¿çš |
ã¯ã€ãã¯ãã¢ãŒ: 埮調æŽ/äœ¿çšæ¹æ³ã¹ã¯ãªãã | æ§ã ãªã¿ã¹ã¯ã§ã¢ãã«ã®åŸ®èª¿æŽãè¡ãããã®ã¹ã¯ãªããäŸ |
ã¢ãã«ã®å ±æãšã¢ããããŒã | 埮調æŽããã¢ãã«ãã¢ããããŒãããŠã³ãã¥ããã£ã§å ±æãã |
ãã€ã°ã¬ãŒã·ã§ã³ | pytorch-transformers ãŸãã¯pytorch-pretrained-bert ããð€Transformers ã«ç§»è¡ãã |
åŒçš
ð€ ãã©ã³ã¹ãã©ãŒããŒã©ã€ãã©ãªã«åŒçšã§ããè«æãåºæ¥ãŸãã:
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}