transformers/README_ja.md at d91fd7f92c76be2b128d3a7b4b0e2d7435723af1

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>

2024-03-20 15:51:12 +00:00

114 KiB

Raw Blame History

English | 简体中文 | 繁體中文 | 한국어 | Español | 日本語 | हिन्दी | Русский | Рortuguês | తెలుగు | Français | Deutsch | Tiếng Việt |

JAX、PyTorch、TensorFlowのための最先端機械学習

🤗Transformersは、テキスト、視覚、音声などの異なるモダリティに対してタスクを実行するために、事前に学習させた数千のモデルを提供します。

これらのモデルは次のような場合に適用できます:

📝 テキストは、テキストの分類、情報抽出、質問応答、要約、翻訳、テキスト生成などのタスクのために、100以上の言語に対応しています。
🖼️ 画像分類、物体検出、セグメンテーションなどのタスクのための画像。
🗣️ 音声は、音声認識や音声分類などのタスクに使用します。

トランスフォーマーモデルは、テーブル質問応答、光学文字認識、スキャン文書からの情報抽出、ビデオ分類、視覚的質問応答など、複数のモダリティを組み合わせたタスクも実行可能です。

🤗Transformersは、与えられたテキストに対してそれらの事前学習されたモデルを素早くダウンロードして使用し、あなた自身のデータセットでそれらを微調整し、私たちのmodel hubでコミュニティと共有するためのAPIを提供します。同時に、アーキテクチャを定義する各Pythonモジュールは完全にスタンドアロンであり、迅速な研究実験を可能にするために変更することができます。

🤗TransformersはJax、PyTorch、TensorFlowという3大ディープラーニングライブラリーに支えられ、それぞれのライブラリをシームレスに統合しています。片方でモデルを学習してから、もう片方で推論用にロードするのは簡単なことです。

オンラインデモ

model hubから、ほとんどのモデルのページで直接テストすることができます。また、パブリックモデル、プライベートモデルに対して、プライベートモデルのホスティング、バージョニング、推論APIを提供しています。

以下はその一例です:

自然言語処理にて:

コンピュータビジョンにて:

オーディオにて:

マルチモーダルなタスクにて:

ViLTによる視覚的質問応答

Hugging Faceチームによって作られた トランスフォーマーを使った書き込み は、このリポジトリのテキスト生成機能の公式デモである。

Hugging Faceチームによるカスタム・サポートをご希望の場合

クイックツアー

与えられた入力（テキスト、画像、音声、...）に対してすぐにモデルを使うために、我々はpipelineというAPIを提供しております。pipelineは、学習済みのモデルと、そのモデルの学習時に使用された前処理をグループ化したものです。以下は、肯定的なテキストと否定的なテキストを分類するためにpipelineを使用する方法です:

>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

2行目のコードでは、pipelineで使用される事前学習済みモデルをダウンロードしてキャッシュし、3行目では与えられたテキストに対してそのモデルを評価します。ここでは、答えは99.97%の信頼度で「ポジティブ」です。

自然言語処理だけでなく、コンピュータビジョンや音声処理においても、多くのタスクにはあらかじめ訓練されたpipelineが用意されている。例えば、画像から検出された物体を簡単に抽出することができる:

>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline

# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)

# Allocate a pipeline for object detection
>>> object_detector = pipeline('object-detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
  'label': 'remote',
  'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
 {'score': 0.9960021376609802,
  'label': 'remote',
  'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
 {'score': 0.9954745173454285,
  'label': 'couch',
  'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
 {'score': 0.9988006353378296,
  'label': 'cat',
  'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
 {'score': 0.9986783862113953,
  'label': 'cat',
  'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]

ここでは、画像から検出されたオブジェクトのリストが得られ、オブジェクトを囲むボックスと信頼度スコアが表示されます。左側が元画像、右側が予測結果を表示したものです:

このチュートリアルでは、pipelineAPIでサポートされているタスクについて詳しく説明しています。

pipelineに加えて、与えられたタスクに学習済みのモデルをダウンロードして使用するために必要なのは、3行のコードだけです。以下はPyTorchのバージョンです:

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

そしてこちらはTensorFlowと同等のコードとなります:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

トークナイザは学習済みモデルが期待するすべての前処理を担当し、単一の文字列 (上記の例のように) またはリストに対して直接呼び出すことができます。これは下流のコードで使用できる辞書を出力します。また、単純に ** 引数展開演算子を使用してモデルに直接渡すこともできます。

モデル自体は通常のPytorch nn.Module または TensorFlow tf.keras.Model (バックエンドによって異なる)で、通常通り使用することが可能です。このチュートリアルでは、このようなモデルを従来のPyTorchやTensorFlowの学習ループに統合する方法や、私たちのTrainerAPIを使って新しいデータセットで素早く微調整を行う方法について説明します。

なぜtransformersを使う必要があるのでしょうか？

使いやすい最新モデル:
- 自然言語理解・生成、コンピュータビジョン、オーディオの各タスクで高いパフォーマンスを発揮します。
- 教育者、実務者にとっての低い参入障壁。
- 学習するクラスは3つだけで、ユーザが直面する抽象化はほとんどありません。
- 学習済みモデルを利用するための統一されたAPI。
低い計算コスト、少ないカーボンフットプリント:
- 研究者は、常に再トレーニングを行うのではなく、トレーニングされたモデルを共有することができます。
- 実務家は、計算時間や生産コストを削減することができます。
- すべてのモダリティにおいて、60,000以上の事前学習済みモデルを持つ数多くのアーキテクチャを提供します。
モデルのライフタイムのあらゆる部分で適切なフレームワークを選択可能:
- 3行のコードで最先端のモデルをトレーニング。
- TF2.0/PyTorch/JAXフレームワーク間で1つのモデルを自在に移動させる。
- 学習、評価、生産に適したフレームワークをシームレスに選択できます。
モデルやサンプルをニーズに合わせて簡単にカスタマイズ可能:
- 原著者が発表した結果を再現するために、各アーキテクチャの例を提供しています。
- モデル内部は可能な限り一貫して公開されています。
- モデルファイルはライブラリとは独立して利用することができ、迅速な実験が可能です。

なぜtransformersを使ってはいけないのでしょうか？

このライブラリは、ニューラルネットのためのビルディングブロックのモジュール式ツールボックスではありません。モデルファイルのコードは、研究者が追加の抽象化/ファイルに飛び込むことなく、各モデルを素早く反復できるように、意図的に追加の抽象化でリファクタリングされていません。
学習APIはどのようなモデルでも動作するわけではなく、ライブラリが提供するモデルで動作するように最適化されています。一般的な機械学習のループには、別のライブラリ(おそらくAccelerate)を使用する必要があります。
私たちはできるだけ多くの使用例を紹介するよう努力していますが、examples フォルダにあるスクリプトはあくまで例です。あなたの特定の問題に対してすぐに動作するわけではなく、あなたのニーズに合わせるために数行のコードを変更する必要があることが予想されます。

インストール

pipにて

このリポジトリは、Python 3.8+, Flax 0.4.1+, PyTorch 1.11+, TensorFlow 2.6+ でテストされています。

🤗Transformersは仮想環境にインストールする必要があります。Pythonの仮想環境に慣れていない場合は、ユーザーガイドを確認してください。

まず、使用するバージョンのPythonで仮想環境を作成し、アクティベートします。

その後、Flax, PyTorch, TensorFlowのうち少なくとも1つをインストールする必要があります。 TensorFlowインストールページ、PyTorchインストールページ、Flax、Jaxインストールページで、お使いのプラットフォーム別のインストールコマンドを参照してください。

これらのバックエンドのいずれかがインストールされている場合、🤗Transformersは以下のようにpipを使用してインストールすることができます:

pip install transformers

もしサンプルを試したい、またはコードの最先端が必要で、新しいリリースを待てない場合は、ライブラリをソースからインストールする必要があります。

condaにて

🤗Transformersは以下のようにcondaを使って設置することができます:

conda install conda-forge::transformers

注意: huggingface チャンネルから transformers をインストールすることは非推奨です。

Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それぞれのインストールページに従ってください。

注意: Windowsでは、キャッシュの恩恵を受けるために、デベロッパーモードを有効にするよう促されることがあります。このような場合は、このissueでお知らせください。

モデルアーキテクチャ

🤗Transformersが提供する 全モデルチェックポイント は、ユーザーや組織によって直接アップロードされるhuggingface.co model hubからシームレスに統合されています。

現在のチェックポイント数:

🤗Transformersは現在、以下のアーキテクチャを提供しています（それぞれのハイレベルな要約はこちらを参照してください）:

ALBERT (Google Research and the Toyota Technological Institute at Chicago から) Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut から公開された研究論文: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
ALIGN (Google Research から) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. から公開された研究論文 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
AltCLIP (BAAI から) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell から公開された研究論文: AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Audio Spectrogram Transformer (MIT から) Yuan Gong, Yu-An Chung, James Glass から公開された研究論文: AST: Audio Spectrogram Transformer
Autoformer (from Tsinghua University) released with the paper Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
Bark (from Suno) released in the repository suno-ai/bark by Suno AI team.
BART (Facebook から) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer から公開された研究論文: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BARThez (École polytechnique から) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis から公開された研究論文: BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
BARTpho (VinAI Research から) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen から公開された研究論文: BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
BEiT (Microsoft から) Hangbo Bao, Li Dong, Furu Wei から公開された研究論文: BEiT: BERT Pre-Training of Image Transformers
BERT (Google から) Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova から公開された研究論文: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT For Sequence Generation (Google から) Sascha Rothe, Shashi Narayan, Aliaksei Severyn から公開された研究論文: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
BERTweet (VinAI Research から) Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen から公開された研究論文: BERTweet: A pre-trained language model for English Tweets
BigBird-Pegasus (Google Research から) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed から公開された研究論文: Big Bird: Transformers for Longer Sequences
BigBird-RoBERTa (Google Research から) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed から公開された研究論文: Big Bird: Transformers for Longer Sequences
BioGpt (Microsoft Research AI4Science から) Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu から公開された研究論文: BioGPT: generative pre-trained transformer for biomedical text generation and mining
BiT (Google AI から) Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil から公開された研究論文: Big Transfer (BiT): General Visual Representation LearningHoulsby.
Blenderbot (Facebook から) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston から公開された研究論文: Recipes for building an open-domain chatbot
BlenderbotSmall (Facebook から) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston から公開された研究論文: Recipes for building an open-domain chatbot
BLIP (Salesforce から) Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi から公開された研究論文: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP-2 (Salesforce から) Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. から公開された研究論文 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLOOM (BigScience workshop から) BigScience Workshop から公開されました.
BORT (Alexa から) Adrian de Wynter and Daniel J. Perry から公開された研究論文: Optimal Subarchitecture Extraction For BERT
BridgeTower (Harbin Institute of Technology/Microsoft Research Asia/Intel Labs から) released with the paper BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
BROS (NAVER CLOVA から) Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. から公開された研究論文 BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
ByT5 (Google Research から) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel から公開された研究論文: ByT5: Towards a token-free future with pre-trained byte-to-byte models
CamemBERT (Inria/Facebook/Sorbonne から) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot から公開された研究論文: CamemBERT: a Tasty French Language Model
CANINE (Google Research から) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting から公開された研究論文: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Chinese-CLIP (OFA-Sys から) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou から公開された研究論文: Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
CLAP (LAION-AI から) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. から公開された研究論文 Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
CLIP (OpenAI から) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever から公開された研究論文: Learning Transferable Visual Models From Natural Language Supervision
CLIPSeg (University of Göttingen から) Timo Lüddecke and Alexander Ecker から公開された研究論文: Image Segmentation Using Text and Image Prompts
CLVP released with the paper Better speech synthesis through scaling by James Betker.
CodeGen (Salesforce から) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong から公開された研究論文: A Conversational Paradigm for Program Synthesis
CodeLlama (MetaAI から) Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. から公開された研究論文 Code Llama: Open Foundation Models for Code
Cohere (Cohere から) Cohere. から公開された研究論文 Command-R: Retrieval Augmented Generation at Production Scale
Conditional DETR (Microsoft Research Asia から) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang から公開された研究論文: Conditional DETR for Fast Training Convergence
ConvBERT (YituTech から) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan から公開された研究論文: ConvBERT: Improving BERT with Span-based Dynamic Convolution
ConvNeXT (Facebook AI から) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie から公開された研究論文: A ConvNet for the 2020s
ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
CPM (Tsinghua University から) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun から公開された研究論文: CPM: A Large-scale Generative Chinese Pre-trained Language Model
CPM-Ant (OpenBMB から) OpenBMB から公開されました.
CTRL (Salesforce から) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher から公開された研究論文: CTRL: A Conditional Transformer Language Model for Controllable Generation
CvT (Microsoft から) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang から公開された研究論文: CvT: Introducing Convolutions to Vision Transformers
Data2Vec (Facebook から) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli から公開された研究論文: Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
DeBERTa (Microsoft から) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen から公開された研究論文: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa-v2 (Microsoft から) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen から公開された研究論文: DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Decision Transformer (Berkeley/Facebook/Google から) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch から公開された研究論文: Decision Transformer: Reinforcement Learning via Sequence Modeling
Deformable DETR (SenseTime Research から) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai から公開された研究論文: Deformable DETR: Deformable Transformers for End-to-End Object Detection
DeiT (Facebook から) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou から公開された研究論文: Training data-efficient image transformers & distillation through attention
DePlot (Google AI から) Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. から公開された研究論文 DePlot: One-shot visual language reasoning by plot-to-table translation
Depth Anything (University of Hong Kong and TikTok から) Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. から公開された研究論文 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
DETA (The University of Texas at Austin から) Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl. から公開された研究論文 NMS Strikes Back
DETR (Facebook から) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko から公開された研究論文: End-to-End Object Detection with Transformers
DialoGPT (Microsoft Research から) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan から公開された研究論文: DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
DiNAT (SHI Labs から) Ali Hassani and Humphrey Shi から公開された研究論文: Dilated Neighborhood Attention Transformer
DINOv2 (Meta AI から) Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. から公開された研究論文 DINOv2: Learning Robust Visual Features without Supervision
DistilBERT (HuggingFace から), Victor Sanh, Lysandre Debut and Thomas Wolf. 同じ手法で GPT2, RoBERTa と Multilingual BERT の圧縮を行いました.圧縮されたモデルはそれぞれ DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter、DistilRoBERTa、DistilmBERT と名付けられました. 公開された研究論文: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
DiT (Microsoft Research から) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei から公開された研究論文: DiT: Self-supervised Pre-training for Document Image Transformer
Donut (NAVER から), Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park から公開された研究論文: OCR-free Document Understanding Transformer
DPR (Facebook から) Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih から公開された研究論文: Dense Passage Retrieval for Open-Domain Question Answering
DPT (Intel Labs から) René Ranftl, Alexey Bochkovskiy, Vladlen Koltun から公開された研究論文: Vision Transformers for Dense Prediction
EfficientFormer (Snap Research から) Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. から公開された研究論文 EfficientFormer: Vision Transformers at MobileNetSpeed
EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
ELECTRA (Google Research/Stanford University から) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning から公開された研究論文: ELECTRA: Pre-training text encoders as discriminators rather than generators
EnCodec (Meta AI から) Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. から公開された研究論文 High Fidelity Neural Audio Compression
EncoderDecoder (Google Research から) Sascha Rothe, Shashi Narayan, Aliaksei Severyn から公開された研究論文: Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
ERNIE (Baidu から) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu から公開された研究論文: ERNIE: Enhanced Representation through Knowledge Integration
ErnieM (Baidu から) Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. から公開された研究論文 ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
ESM (Meta AI から) はトランスフォーマープロテイン言語モデルです. ESM-1b は Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus から公開された研究論文: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. ESM-1v は Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives　から公開された研究論文: Language models enable zero-shot prediction of the effects of mutations on protein function. ESM-2 と　ESMFold は Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives から公開された研究論文: Language models of protein sequences at the scale of evolution enable accurate structure prediction
Falcon (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
FastSpeech2Conformer (ESPnet and Microsoft Research から) Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang. から公開された研究論文 Recent Developments On Espnet Toolkit Boosted By Conformer
FLAN-T5 (Google AI から) Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V から公開されたレポジトリー google-research/t5x Le, and Jason Wei
FLAN-UL2 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
FlauBERT (CNRS から) Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab から公開された研究論文: FlauBERT: Unsupervised Language Model Pre-training for French
FLAVA (Facebook AI から) Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela から公開された研究論文: FLAVA: A Foundational Language And Vision Alignment Model
FNet (Google Research から) James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon から公開された研究論文: FNet: Mixing Tokens with Fourier Transforms
FocalNet (Microsoft Research から) Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. から公開された研究論文 Focal Modulation Networks
Funnel Transformer (CMU/Google Brain から) Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le から公開された研究論文: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Fuyu (ADEPT から) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, Sağnak Taşırlar. から公開された研究論文 blog post
Gemma (Google から) the Gemma Google team. から公開された研究論文 Gemma: Open Models Based on Gemini Technology and Research
GIT (Microsoft Research から) Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. から公開された研究論文 GIT: A Generative Image-to-text Transformer for Vision and Language
GLPN (KAIST から) Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim から公開された研究論文: Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
GPT (OpenAI から) Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever から公開された研究論文: Improving Language Understanding by Generative Pre-Training
GPT Neo (EleutherAI から) Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy から公開されたレポジトリー : EleutherAI/gpt-neo
GPT NeoX (EleutherAI から) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach から公開された研究論文: GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT NeoX Japanese (ABEJA から) Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori からリリース.
GPT-2 (OpenAI から) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever から公開された研究論文: Language Models are Unsupervised Multitask Learners
GPT-J (EleutherAI から) Ben Wang and Aran Komatsuzaki から公開されたレポジトリー kingoflolz/mesh-transformer-jax
GPT-Sw3 (AI-Sweden から) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren から公開された研究論文: Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
GPTBigCode (BigCode から) Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. から公開された研究論文 SantaCoder: don't reach for the stars!
GPTSAN-japanese tanreinama/GPTSAN 坂本俊之(tanreinama)からリリースされました.
Graphormer (Microsoft から) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu から公開された研究論文: Do Transformers Really Perform Bad for Graph Representation?.
GroupViT (UCSD, NVIDIA から) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang から公開された研究論文: GroupViT: Semantic Segmentation Emerges from Text Supervision
HerBERT (Allegro.pl, AGH University of Science and Technology から) Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. から公開された研究論文 KLEJ: Comprehensive Benchmark for Polish Language Understanding
Hubert (Facebook から) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed から公開された研究論文: HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
I-BERT (Berkeley から) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer から公開された研究論文: I-BERT: Integer-only BERT Quantization
IDEFICS (from HuggingFace) released with the paper OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh.
ImageGPT (OpenAI から) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever から公開された研究論文: Generative Pretraining from Pixels
Informer (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
InstructBLIP (Salesforce から) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. から公開された研究論文 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Jukebox (OpenAI から) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever から公開された研究論文: Jukebox: A Generative Model for Music
KOSMOS-2 (from Microsoft Research Asia) released with the paper Kosmos-2: Grounding Multimodal Large Language Models to the World by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
LayoutLM (Microsoft Research Asia から) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou から公開された研究論文: LayoutLM: Pre-training of Text and Layout for Document Image Understanding
LayoutLMv2 (Microsoft Research Asia から) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou から公開された研究論文: LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutLMv3 (Microsoft Research Asia から) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei から公開された研究論文: LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LayoutXLM (Microsoft Research Asia から) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei から公開された研究論文: LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
LED (AllenAI から) Iz Beltagy, Matthew E. Peters, Arman Cohan から公開された研究論文: Longformer: The Long-Document Transformer
LeViT (Meta AI から) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze から公開された研究論文: LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference
LiLT (South China University of Technology から) Jiapeng Wang, Lianwen Jin, Kai Ding から公開された研究論文: LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LLaMA (The FAIR team of Meta AI から) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. から公開された研究論文 LLaMA: Open and Efficient Foundation Language Models
Llama2 (The FAIR team of Meta AI から) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. から公開された研究論文 Llama2: Open Foundation and Fine-Tuned Chat Models
LLaVa (Microsoft Research & University of Wisconsin-Madison から) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. から公開された研究論文 Visual Instruction Tuning
LLaVA-NeXT (Microsoft Research & University of Wisconsin-Madison から) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. から公開された研究論文 Improved Baselines with Visual Instruction Tuning
Longformer (AllenAI から) Iz Beltagy, Matthew E. Peters, Arman Cohan から公開された研究論文: Longformer: The Long-Document Transformer
LongT5 (Google AI から) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang から公開された研究論文: LongT5: Efficient Text-To-Text Transformer for Long Sequences
LUKE (Studio Ousia から) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto から公開された研究論文: LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
LXMERT (UNC Chapel Hill から) Hao Tan and Mohit Bansal から公開された研究論文: LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering
M-CTC-T (Facebook から) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert から公開された研究論文: Pseudo-Labeling For Massively Multilingual Speech Recognition
M2M100 (Facebook から) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin から公開された研究論文: Beyond English-Centric Multilingual Machine Translation
MADLAD-400 (from Google) released with the paper MADLAD-400: A Multilingual And Document-Level Large Audited Dataset by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat.
Mamba (Albert Gu and Tri Dao から) Albert Gu and Tri Dao. から公開された研究論文 Mamba: Linear-Time Sequence Modeling with Selective State Spaces
MarianMT Jörg Tiedemann から. OPUS を使いながら学習された "Machine translation" (マシントランスレーション) モデル. Marian Framework はMicrosoft Translator Team　が現在開発中です.
MarkupLM (Microsoft Research Asia から) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei から公開された研究論文: MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Mask2Former (FAIR and UIUC から) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. から公開された研究論文 Masked-attention Mask Transformer for Universal Image Segmentation
MaskFormer (Meta and UIUC から) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov から公開された研究論文: Per-Pixel Classification is Not All You Need for Semantic Segmentation
MatCha (Google AI から) Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. から公開された研究論文 MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
mBART (Facebook から) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer から公開された研究論文: Multilingual Denoising Pre-training for Neural Machine Translation
mBART-50 (Facebook から) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan から公開された研究論文: Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
MEGA (Facebook から) Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. から公開された研究論文 Mega: Moving Average Equipped Gated Attention
Megatron-BERT (NVIDIA から) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro から公開された研究論文: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Megatron-GPT2 (NVIDIA から) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro から公開された研究論文: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
MGP-STR (Alibaba Research から) Peng Wang, Cheng Da, and Cong Yao. から公開された研究論文 Multi-Granularity Prediction for Scene Text Recognition
Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
Mixtral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
mLUKE (Studio Ousia から) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka から公開された研究論文: mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
MMS (Facebook から) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. から公開された研究論文 Scaling Speech Technology to 1,000+ Languages
MobileBERT (CMU/Google Brain から) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou から公開された研究論文: MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileNetV1 (Google Inc. から) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam から公開された研究論文: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNetV2 (Google Inc. から) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen から公開された研究論文: MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileViT (Apple から) Sachin Mehta and Mohammad Rastegari から公開された研究論文: MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
MobileViTV2 (Apple から) Sachin Mehta and Mohammad Rastegari. から公開された研究論文 Separable Self-attention for Mobile Vision Transformers
MPNet (Microsoft Research から) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu から公開された研究論文: MPNet: Masked and Permuted Pre-training for Language Understanding
MPT (MosaiML から) the MosaicML NLP Team. から公開された研究論文 llm-foundry
MRA (the University of Wisconsin - Madison から) Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. から公開された研究論文 Multi Resolution Analysis (MRA) for Approximate Self-Attention
MT5 (Google AI から) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel から公開された研究論文: mT5: A massively multilingual pre-trained text-to-text transformer
MusicGen (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
MusicGen Melody (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
MVP (RUC AI Box から) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen から公開された研究論文: MVP: Multi-task Supervised Pre-training for Natural Language Generation
NAT (SHI Labs から) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi から公開された研究論文: Neighborhood Attention Transformer
Nezha (Huawei Noah’s Ark Lab から) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu から公開された研究論文: NEZHA: Neural Contextualized Representation for Chinese Language Understanding
NLLB (Meta から) the NLLB team から公開された研究論文: No Language Left Behind: Scaling Human-Centered Machine Translation
NLLB-MOE (Meta から) the NLLB team. から公開された研究論文 No Language Left Behind: Scaling Human-Centered Machine Translation
Nougat (Meta AI から) Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. から公開された研究論文 Nougat: Neural Optical Understanding for Academic Documents
Nyströmformer (the University of Wisconsin - Madison から) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh から公開された研究論文: Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
OneFormer (SHI Labs から) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi から公開された研究論文: OneFormer: One Transformer to Rule Universal Image Segmentation
OpenLlama (from s-JoL) released on GitHub (now removed).
OPT (Meta AI から) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al から公開された研究論文: OPT: Open Pre-trained Transformer Language Models
OWL-ViT (Google AI から) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby から公開された研究論文: Simple Open-Vocabulary Object Detection with Vision Transformers
OWLv2 (Google AI から) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. から公開された研究論文 Scaling Open-Vocabulary Object Detection
PatchTSMixer ( IBM Research から) Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. から公開された研究論文 TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting
PatchTST (IBM から) Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. から公開された研究論文 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Pegasus (Google から) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu から公開された研究論文: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
PEGASUS-X (Google から) Jason Phang, Yao Zhao, and Peter J. Liu から公開された研究論文: Investigating Efficiently Extending Transformers for Long Input Summarization
Perceiver IO (Deepmind から) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira から公開された研究論文: Perceiver IO: A General Architecture for Structured Inputs & Outputs
Persimmon (ADEPT から) Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. から公開された研究論文 blog post
Phi (from Microsoft) released with the papers - Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
PhoBERT (VinAI Research から) Dat Quoc Nguyen and Anh Tuan Nguyen から公開された研究論文: PhoBERT: Pre-trained language models for Vietnamese
Pix2Struct (Google から) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. から公開された研究論文 Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
PLBart (UCLA NLP から) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang から公開された研究論文: Unified Pre-training for Program Understanding and Generation
PoolFormer (Sea AI Labs から) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng から公開された研究論文: MetaFormer is Actually What You Need for Vision
Pop2Piano released with the paper Pop2Piano : Pop Audio-based Piano Cover Generation by Jongho Choi, Kyogu Lee.
ProphetNet (Microsoft Research から) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou から公開された研究論文: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
PVT (Nanjing University, The University of Hong Kong etc. から) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. から公開された研究論文 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
PVTv2 (Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc. から) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. から公開された研究論文 PVT v2: Improved Baselines with Pyramid Vision Transformer
QDQBert (NVIDIA から) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius から公開された研究論文: Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
Qwen2 (the Qwen team, Alibaba Group から) Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu. から公開された研究論文 Qwen Technical Report
RAG (Facebook から) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela から公開された研究論文: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
REALM (Google Research から) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang から公開された研究論文: REALM: Retrieval-Augmented Language Model Pre-Training
Reformer (Google Research から) Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya から公開された研究論文: Reformer: The Efficient Transformer
RegNet (META Platforms から) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár から公開された研究論文: Designing Network Design Space
RemBERT (Google Research から) Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder から公開された研究論文: Rethinking embedding coupling in pre-trained language models
ResNet (Microsoft Research から) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun から公開された研究論文: Deep Residual Learning for Image Recognition
RoBERTa (Facebook から), Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov から公開された研究論文: RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa-PreLayerNorm (Facebook から) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli から公開された研究論文: fairseq: A Fast, Extensible Toolkit for Sequence Modeling
RoCBert (WeChatAI から) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou から公開された研究論文: RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining
RoFormer (ZhuiyiTechnology から), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu から公開された研究論文: RoFormer: Enhanced Transformer with Rotary Position Embedding
RWKV (Bo Peng から) Bo Peng. から公開された研究論文 this repo
SeamlessM4T (from Meta AI) released with the paper SeamlessM4T — Massively Multilingual & Multimodal Machine Translation by the Seamless Communication team.
SeamlessM4Tv2 (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
SegFormer (NVIDIA から) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo から公開された研究論文: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
SegGPT (Beijing Academy of Artificial Intelligence (BAAI から) Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. から公開された研究論文 SegGPT: Segmenting Everything In Context
Segment Anything (Meta AI から) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. から公開された研究論文 Segment Anything
SEW (ASAPP から) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi から公開された研究論文: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SEW-D (ASAPP から) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi から公開された研究論文: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SigLIP (Google AI から) Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. から公開された研究論文 Sigmoid Loss for Language Image Pre-Training
SpeechT5 (Microsoft Research から) Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. から公開された研究論文 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
SpeechToTextTransformer (Facebook から), Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino から公開された研究論文: fairseq S2T: Fast Speech-to-Text Modeling with fairseq
SpeechToTextTransformer2 (Facebook から), Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau から公開された研究論文: Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Splinter (Tel Aviv University から), Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy から公開された研究論文: Few-Shot Question Answering by Pretraining Span Selection
SqueezeBERT (Berkeley から) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer から公開された研究論文: SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
StableLm (from Stability AI) released with the paper StableLM 3B 4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
Starcoder2 (from BigCode team) released with the paper StarCoder 2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries.
SuperPoint (from MagicLeap) released with the paper SuperPoint: Self-Supervised Interest Point Detection and Description by Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
SwiftFormer (MBZUAI から) Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. から公開された研究論文 SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Swin Transformer (Microsoft から) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo から公開された研究論文: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer V2 (Microsoft から) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo から公開された研究論文: Swin Transformer V2: Scaling Up Capacity and Resolution
Swin2SR (University of Würzburg から) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte から公開された研究論文: Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
SwitchTransformers (Google から) William Fedus, Barret Zoph, Noam Shazeer から公開された研究論文: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
T5 (Google AI から) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu から公開された研究論文: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5v1.1 (Google AI から) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu から公開されたレポジトリー google-research/text-to-text-transfer-transformer
Table Transformer (Microsoft Research から) Brandon Smock, Rohith Pesala, Robin Abraham から公開された研究論文: PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
TAPAS (Google AI から) Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos から公開された研究論文: TAPAS: Weakly Supervised Table Parsing via Pre-training
TAPEX (Microsoft Research から) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou から公開された研究論文: TAPEX: Table Pre-training via Learning a Neural SQL Executor
Time Series Transformer (HuggingFace から).
TimeSformer (Facebook から) Gedas Bertasius, Heng Wang, Lorenzo Torresani から公開された研究論文: Is Space-Time Attention All You Need for Video Understanding?
Trajectory Transformer (the University of California at Berkeley から) Michael Janner, Qiyang Li, Sergey Levine から公開された研究論文: Offline Reinforcement Learning as One Big Sequence Modeling Problem
Transformer-XL (Google/CMU から) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov から公開された研究論文: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
TrOCR (Microsoft から), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei から公開された研究論文: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
TVLT (from UNC Chapel Hill から), Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal から公開された研究論文: TVLT: Textless Vision-Language Transformer
TVP (Intel から), Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding から公開された研究論文: Text-Visual Prompting for Efficient 2D Temporal Video Grounding
UDOP (Microsoft Research から) Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal. から公開された研究論文 Unifying Vision, Text, and Layout for Universal Document Processing
UL2 (Google Research から) Yi Tay, Mostafa Dehghani, Vinh Q から公開された研究論文: Unifying Language Learning Paradigms Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
UMT5 (Google Research から) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. から公開された研究論文 UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
UniSpeech (Microsoft Research から) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang から公開された研究論文: UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
UniSpeechSat (Microsoft Research から) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu から公開された研究論文: UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
UnivNet (from Kakao Corporation) released with the paper UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim.
UPerNet (Peking University から) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. から公開された研究論文 Unified Perceptual Parsing for Scene Understanding
VAN (Tsinghua University and Nankai University から) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu から公開された研究論文: Visual Attention Network
VideoMAE (Multimedia Computing Group, Nanjing University から) Zhan Tong, Yibing Song, Jue Wang, Limin Wang から公開された研究論文: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
ViLT (NAVER AI Lab/Kakao Enterprise/Kakao Brain から) Wonjae Kim, Bokyung Son, Ildoo Kim から公開された研究論文: ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
VipLlava (University of Wisconsin–Madison から) Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. から公開された研究論文 Making Large Multimodal Models Understand Arbitrary Visual Prompts
Vision Transformer (ViT) (Google AI から) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby から公開された研究論文: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
VisualBERT (UCLA NLP から) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang から公開された研究論文: VisualBERT: A Simple and Performant Baseline for Vision and Language
ViT Hybrid (Google AI から) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby から公開された研究論文: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
VitDet (Meta AI から) Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. から公開された研究論文 Exploring Plain Vision Transformer Backbones for Object Detection
ViTMAE (Meta AI から) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick から公開された研究論文: Masked Autoencoders Are Scalable Vision Learners
ViTMatte (HUST-VL から) Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. から公開された研究論文 ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
ViTMSN (Meta AI から) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas から公開された研究論文: Masked Siamese Networks for Label-Efficient Learning
VITS (Kakao Enterprise から) Jaehyeon Kim, Jungil Kong, Juhee Son. から公開された研究論文 Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ViViT (from Google Research) released with the paper ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid.
Wav2Vec2 (Facebook AI から) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli から公開された研究論文: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Wav2Vec2-BERT (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
Wav2Vec2-Conformer (Facebook AI から) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino から公開された研究論文: FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ
Wav2Vec2Phoneme (Facebook AI から) Qiantong Xu, Alexei Baevski, Michael Auli から公開された研究論文: Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
WavLM (Microsoft Research から) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei から公開された研究論文: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Whisper (OpenAI から) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever から公開された研究論文: Robust Speech Recognition via Large-Scale Weak Supervision
X-CLIP (Microsoft Research から) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling から公開された研究論文: Expanding Language-Image Pretrained Models for General Video Recognition
X-MOD (Meta AI から) Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. から公開された研究論文 Lifting the Curse of Multilinguality by Pre-training Modular Transformers
XGLM (From Facebook AI) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li から公開された研究論文: Few-shot Learning with Multilingual Language Models
XLM (Facebook から) Guillaume Lample and Alexis Conneau から公開された研究論文: Cross-lingual Language Model Pretraining
XLM-ProphetNet (Microsoft Research から) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou から公開された研究論文: ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
XLM-RoBERTa (Facebook AI から), Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov から公開された研究論文: Unsupervised Cross-lingual Representation Learning at Scale
XLM-RoBERTa-XL (Facebook AI から), Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau から公開された研究論文: Larger-Scale Transformers for Multilingual Masked Language Modeling
XLM-V (Meta AI から) Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa から公開された研究論文: XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
XLNet (Google/CMU から) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le から公開された研究論文: XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLS-R (Facebook AI から) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli から公開された研究論文: XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
XLSR-Wav2Vec2 (Facebook AI から) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli から公開された研究論文: Unsupervised Cross-Lingual Representation Learning For Speech Recognition
YOLOS (Huazhong University of Science & Technology から) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu から公開された研究論文: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
YOSO (the University of Wisconsin - Madison から) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh から公開された研究論文: You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
新しいモデルを投稿したいですか？新しいモデルを追加するためのガイドとして、詳細なガイドとテンプレートが追加されました。これらはリポジトリのtemplatesフォルダにあります。PRを始める前に、必ずコントリビューションガイドを確認し、メンテナに連絡するか、フィードバックを収集するためにissueを開いてください。

各モデルがFlax、PyTorch、TensorFlowで実装されているか、🤗Tokenizersライブラリに支えられた関連トークナイザを持っているかは、この表を参照してください。

これらの実装はいくつかのデータセットでテストされており(サンプルスクリプトを参照)、オリジナルの実装の性能と一致するはずである。性能の詳細はdocumentationのExamplesセクションで見ることができます。

さらに詳しく

セクション	概要
ドキュメント	完全なAPIドキュメントとチュートリアル
タスク概要	🤗Transformersがサポートするタスク
前処理チュートリアル	モデル用のデータを準備するために`Tokenizer`クラスを使用
トレーニングと微調整	PyTorch/TensorFlowの学習ループと`Trainer`APIで🤗Transformersが提供するモデルを使用
クイックツアー: 微調整/使用方法スクリプト	様々なタスクでモデルの微調整を行うためのスクリプト例
モデルの共有とアップロード	微調整したモデルをアップロードしてコミュニティで共有する
マイグレーション	`pytorch-transformers`または`pytorch-pretrained-bert`から🤗Transformers に移行する

引用

🤗 トランスフォーマーライブラリに引用できる論文が出来ました:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}

114 KiB Raw Blame History Unescape Escape