Rebase ESM PR and update all file formats (#19055)

* Rebase ESM PR and update all file formats

* Fix test relative imports

* Add __init__.py to the test dir

* Disable gradient checkpointing

* Remove references to TFESM... FOR NOW >:|

* Remove completed TODOs from tests

* Convert docstrings to mdx, fix-copies from BERT

* fix-copies for the README and index

* Update ESM's __init__.py to the modern format

* Add to _toctree.yml

* Ensure we correctly copy the pad_token_id from the original ESM model

* Ensure we correctly copy the pad_token_id from the original ESM model

* Tiny grammar nitpicks

* Make the layer norm after embeddings an optional flag

* Make the layer norm after embeddings an optional flag

* Update the conversion script to handle other model classes

* Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py

* Break the copied from link from BertModel.forward to remove token_type_ids

* Remove debug array saves

* Begin ESM-2 porting

* Add a hacky workaround for the precision issue in original repo

* Code cleanup

* Remove unused checkpoint conversion code

* Remove unused checkpoint conversion code

* Fix copyright notices

* Get rid of all references to the TF weights conversion

* Remove token_type_ids from the tests

* Fix test code

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add credit

* Remove _ args and __ kwargs in rotary embedding

* Assertively remove asserts

* Replace einsum with torch.outer()

* Fix docstring formatting

* Remove assertions in tokenization

* Add paper citation to ESMModel docstring

* Move vocab list to single line

* Remove ESMLayer from init

* Add Facebook copyrights

* Clean up RotaryEmbedding docstring

* Fix docstring formatting

* Fix docstring for config object

* Add explanation for new config methods

* make fix-copies

* Rename all the ESM- classes to Esm-

* Update conversion script to allow pushing to hub

* Update tests to point at my repo for now

* Set config properly for tests

* Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted

* make fixup

* Update expected values for slow tests

* make fixup

* Remove EsmForCausalLM for now

* Remove EsmForCausalLM for now

* Fix padding idx test

* Updated README and docs with ESM-1b and ESM-2 separately (#19221)

* Updated README and docs with ESM-1b and ESM-2 separately

* Update READMEs, longer entry with 3 citations

* make fix-copies

Co-authored-by: Your Name <you@example.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Tom Sercu <tsercu@fb.com>
Co-authored-by: Your Name <you@example.com>
This commit is contained in:
Matt 2022-09-30 14:16:25 +01:00 committed by GitHub
parent 4fd32a1f49
commit 368b649af6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
20 changed files with 2387 additions and 0 deletions

View File

@ -300,6 +300,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.

View File

@ -250,6 +250,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.

View File

@ -274,6 +274,7 @@ conda install -c huggingface transformers
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。

View File

@ -286,6 +286,7 @@ conda install -c huggingface transformers
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.

View File

@ -241,6 +241,8 @@
title: Encoder Decoder Models
- local: model_doc/ernie
title: ERNIE
- local: model_doc/esm
title: ESM
- local: model_doc/flaubert
title: FlauBERT
- local: model_doc/fnet

View File

@ -90,6 +90,7 @@ The documentation is organized into five sections:
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[ESM](model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
@ -239,6 +240,7 @@ Flax), PyTorch, and/or TensorFlow.
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ |
| ESM | ✅ | ❌ | ✅ | ❌ | ❌ |
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |

View File

@ -0,0 +1,109 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ESM
## Overview
This page provides code and pre-trained weights for Transformer protein language models from Meta AI's Fundamental
AI Research Team, providing the state-of-the-art ESM-2, and the previously released ESM-1b and ESM-1v. Transformer
protein language models were introduced in the paper [Biological structure and function emerge from scaling
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott,
C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.
The first version of this paper was [preprinted in 2019](https://www.biorxiv.org/content/10.1101/622803v1?versioned=true).
ESM-2 outperforms all tested single-sequence protein language models across a range of structure prediction tasks,
and enables atomic resolution structure prediction.
It was released with the paper [Language models of protein sequences at the scale of evolution enable accurate
structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie,
Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido and Alexander Rives.
The abstract from
"Biological structure and function emerge from scaling unsupervised learning to 250
million protein sequences" is
*In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised
learning has led to major advances in representation learning and statistical generation. In the life sciences, the
anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling
at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To
this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250
million protein sequences spanning evolutionary diversity. The resulting model contains information about biological
properties in its representations. The representations are learned from sequence data alone. The learned representation
space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to
remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and
can be identified by linear projections. Representation learning produces features that generalize across a range of
applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and
improving state-of-the-art features for long-range contact prediction.*
The abstract from
"Language models of protein sequences at the scale of evolution enable accurate structure prediction" is
*Large language models have recently been shown to develop emergent capabilities with scale, going beyond
simple pattern matching to perform higher level reasoning and generate lifelike images and text. While
language models trained on protein sequences have been studied at a smaller scale, little is known about
what they learn about biology as they are scaled up. In this work we train models up to 15 billion parameters,
the largest language models of proteins to be evaluated to date. We find that as models are scaled they learn
information enabling the prediction of the three-dimensional structure of a protein at the resolution of
individual atoms. We present ESMFold for high accuracy end-to-end atomic level structure prediction directly
from the individual sequence of a protein. ESMFold has similar accuracy to AlphaFold2 and RoseTTAFold for
sequences with low perplexity that are well understood by the language model. ESMFold inference is an
order of magnitude faster than AlphaFold2, enabling exploration of the structural space of metagenomic
proteins in practical timescales.*
Tips:
- ESM models are trained with a masked language modeling (MLM) objective.
The original code can be found [here](https://github.com/facebookresearch/esm) and was
was developed by the Fundamental AI Research team at Meta AI.
This model was contributed to huggingface by [jasonliu](https://huggingface.co/jasonliu)
and [Matt](https://huggingface.co/Rocketknight1).
## EsmConfig
[[autodoc]] EsmConfig
- all
## EsmTokenizer
[[autodoc]] EsmTokenizer
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- save_vocabulary
## EsmModel
[[autodoc]] EsmModel
- forward
## EsmForMaskedLM
[[autodoc]] EsmForMaskedLM
- forward
## EsmForSequenceClassification
[[autodoc]] EsmForSequenceClassification
- forward
## EsmForTokenClassification
[[autodoc]] EsmForTokenClassification
- forward

View File

@ -210,6 +210,7 @@ _import_structure = {
"ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP",
"ErnieConfig",
],
"models.esm": ["ESM_PRETRAINED_CONFIG_ARCHIVE_MAP", "EsmConfig", "EsmTokenizer"],
"models.flaubert": ["FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FlaubertConfig", "FlaubertTokenizer"],
"models.flava": [
"FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP",
@ -1220,6 +1221,16 @@ else:
"ErniePreTrainedModel",
]
)
_import_structure["models.esm"].extend(
[
"ESM_PRETRAINED_MODEL_ARCHIVE_LIST",
"EsmForMaskedLM",
"EsmForSequenceClassification",
"EsmForTokenClassification",
"EsmModel",
"EsmPreTrainedModel",
]
)
_import_structure["models.flaubert"].extend(
[
"FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
@ -3158,6 +3169,7 @@ if TYPE_CHECKING:
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
from .models.encoder_decoder import EncoderDecoderConfig
from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig
from .models.esm import ESM_PRETRAINED_CONFIG_ARCHIVE_MAP, EsmConfig, EsmTokenizer
from .models.flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig, FlaubertTokenizer
from .models.flava import (
FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP,
@ -4010,6 +4022,14 @@ if TYPE_CHECKING:
ErnieModel,
ErniePreTrainedModel,
)
from .models.esm import (
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
EsmForMaskedLM,
EsmForSequenceClassification,
EsmForTokenClassification,
EsmModel,
EsmPreTrainedModel,
)
from .models.flaubert import (
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
FlaubertForMultipleChoice,

View File

@ -60,6 +60,7 @@ from . import (
electra,
encoder_decoder,
ernie,
esm,
flaubert,
flava,
fnet,

View File

@ -64,6 +64,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("electra", "ElectraConfig"),
("encoder-decoder", "EncoderDecoderConfig"),
("ernie", "ErnieConfig"),
("esm", "EsmConfig"),
("flaubert", "FlaubertConfig"),
("flava", "FlavaConfig"),
("fnet", "FNetConfig"),
@ -197,6 +198,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
("dpt", "DPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("electra", "ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("ernie", "ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("esm", "ESM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("flaubert", "FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("flava", "FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@ -331,6 +333,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("electra", "ELECTRA"),
("encoder-decoder", "Encoder decoder"),
("ernie", "ERNIE"),
("esm", "ESM"),
("flaubert", "FlauBERT"),
("flava", "FLAVA"),
("fnet", "FNet"),

View File

@ -63,6 +63,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
("dpt", "DPTModel"),
("electra", "ElectraModel"),
("ernie", "ErnieModel"),
("esm", "EsmModel"),
("flaubert", "FlaubertModel"),
("flava", "FlavaModel"),
("fnet", "FNetModel"),
@ -231,6 +232,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
("electra", "ElectraForMaskedLM"),
("encoder-decoder", "EncoderDecoderModel"),
("ernie", "ErnieForMaskedLM"),
("esm", "EsmForMaskedLM"),
("flaubert", "FlaubertWithLMHeadModel"),
("fnet", "FNetForMaskedLM"),
("fsmt", "FSMTForConditionalGeneration"),
@ -519,6 +521,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("distilbert", "DistilBertForSequenceClassification"),
("electra", "ElectraForSequenceClassification"),
("ernie", "ErnieForSequenceClassification"),
("esm", "EsmForSequenceClassification"),
("flaubert", "FlaubertForSequenceClassification"),
("fnet", "FNetForSequenceClassification"),
("funnel", "FunnelForSequenceClassification"),
@ -648,6 +651,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("distilbert", "DistilBertForTokenClassification"),
("electra", "ElectraForTokenClassification"),
("ernie", "ErnieForTokenClassification"),
("esm", "EsmForTokenClassification"),
("flaubert", "FlaubertForTokenClassification"),
("fnet", "FNetForTokenClassification"),
("funnel", "FunnelForTokenClassification"),

View File

@ -0,0 +1,67 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2022 Facebook and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
_import_structure = {
"configuration_esm": ["ESM_PRETRAINED_CONFIG_ARCHIVE_MAP", "EsmConfig"],
"tokenization_esm": ["EsmTokenizer"],
}
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_esm"] = [
"ESM_PRETRAINED_MODEL_ARCHIVE_LIST",
"EsmForMaskedLM",
"EsmForSequenceClassification",
"EsmForTokenClassification",
"EsmModel",
"EsmPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_esm import ESM_PRETRAINED_CONFIG_ARCHIVE_MAP, EsmConfig
from .tokenization_esm import EsmTokenizer
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_esm import (
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
EsmForMaskedLM,
EsmForSequenceClassification,
EsmForTokenClassification,
EsmModel,
EsmPreTrainedModel,
)
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)

View File

@ -0,0 +1,142 @@
# coding=utf-8
# Copyright 2021 Facebook and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" ESM model configuration"""
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
ESM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"facebook/esm1b": "https://huggingface.co/facebook/esm1b/resolve/main/config.json",
# See all ESM models at https://huggingface.co/models?filter=esm
}
class EsmConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`ESMModel`]. It is used to instantiate a ESM model
according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the ESM
[esm-base-uncased](https://huggingface.co/esm-base-uncased) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*):
Vocabulary size of the ESM model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`ESMModel`].
mask_token_id (`int`, *optional*):
The index of the mask token in the vocabulary. This must be included in the config because of the
"mask-dropout" scaling trick, which will scale the inputs depending on the number of masked tokens.
pad_token_id (`int`, *optional*):
The index of the padding token in the vocabulary. This must be included in the config because certain parts
of the ESM code use this instead of the attention mask.
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 1026):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query", "rotary"`.
For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
classifier_dropout (`float`, *optional*):
The dropout ratio for the classification head.
emb_layer_norm_before (`bool`, *optional*):
Whether to apply layer normalization after embeddings but before the main stem of the network.
token_dropout (`bool`, defaults to `False`):
When this is enabled, masked tokens are treated as if they had been dropped out by input dropout.
Examples:
```python
>>> from transformers import EsmModel, EsmConfig
>>> # Initializing a ESM esm-base-uncased style configuration >>> configuration = EsmConfig()
>>> # Initializing a model from the configuration >>> model = ESMModel(configuration)
>>> # Accessing the model configuration >>> configuration = model.config
```"""
model_type = "esm"
def __init__(
self,
vocab_size=None,
mask_token_id=None,
pad_token_id=None,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=1026,
initializer_range=0.02,
layer_norm_eps=1e-12,
position_embedding_type="absolute",
use_cache=True,
classifier_dropout=None,
emb_layer_norm_before=None,
token_dropout=False,
**kwargs
):
super().__init__(pad_token_id=pad_token_id, **kwargs)
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.intermediate_size = intermediate_size
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.initializer_range = initializer_range
self.layer_norm_eps = layer_norm_eps
self.position_embedding_type = position_embedding_type
self.use_cache = use_cache
self.classifier_dropout = classifier_dropout
self.emb_layer_norm_before = emb_layer_norm_before
self.token_dropout = token_dropout
self.mask_token_id = mask_token_id
self.pad_token_id = pad_token_id

View File

@ -0,0 +1,264 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convert ESM checkpoint."""
import argparse
import pathlib
from pathlib import Path
from tempfile import TemporaryDirectory
import torch
import esm as esm_module
from transformers.models.esm.configuration_esm import EsmConfig
from transformers.models.esm.modeling_esm import (
EsmForMaskedLM,
EsmForSequenceClassification,
EsmIntermediate,
EsmLayer,
EsmOutput,
EsmSelfAttention,
EsmSelfOutput,
)
from transformers.models.esm.tokenization_esm import EsmTokenizer
from transformers.utils import logging
logging.set_verbosity_info()
logger = logging.get_logger(__name__)
SAMPLE_DATA = [
("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
("protein2", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA"),
("protein3", "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG"),
("protein4", "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLA"),
]
MODEL_MAPPING = {
"esm1b_t33_650M_UR50S": esm_module.pretrained.esm1b_t33_650M_UR50S,
"esm1v_t33_650M_UR90S_1": esm_module.pretrained.esm1v_t33_650M_UR90S_1,
"esm1v_t33_650M_UR90S_2": esm_module.pretrained.esm1v_t33_650M_UR90S_2,
"esm1v_t33_650M_UR90S_3": esm_module.pretrained.esm1v_t33_650M_UR90S_3,
"esm1v_t33_650M_UR90S_4": esm_module.pretrained.esm1v_t33_650M_UR90S_4,
"esm1v_t33_650M_UR90S_5": esm_module.pretrained.esm1v_t33_650M_UR90S_5,
"esm2_t48_15B_UR50D": esm_module.pretrained.esm2_t48_15B_UR50D,
"esm2_t36_3B_UR50D": esm_module.pretrained.esm2_t36_3B_UR50D,
"esm2_t33_650M_UR50D": esm_module.pretrained.esm2_t33_650M_UR50D,
"esm2_t30_150M_UR50D": esm_module.pretrained.esm2_t30_150M_UR50D,
"esm2_t12_35M_UR50D": esm_module.pretrained.esm2_t12_35M_UR50D,
"esm2_t6_8M_UR50D": esm_module.pretrained.esm2_t6_8M_UR50D,
}
def convert_esm_checkpoint_to_pytorch(
model: str, pytorch_dump_folder_path: str, classification_head: bool, push_to_repo: str, auth_token: str
):
"""
Copy/paste/tweak esm's weights to our BERT structure.
"""
esm, alphabet = MODEL_MAPPING[model]()
esm.eval() # disable dropout
esm_sent_encoder = esm
if hasattr(esm, "args"):
# Indicates an ESM-1b or ESM-1v model
embed_dim = esm.args.embed_dim
num_layers = esm.args.layers
num_attention_heads = esm.args.attention_heads
intermediate_size = esm.args.ffn_embed_dim
token_dropout = esm.args.token_dropout
emb_layer_norm_before = True if esm.emb_layer_norm_before else False
position_embedding_type = "absolute"
else:
# Indicates an ESM-2 model
embed_dim = esm.embed_dim
num_layers = esm.num_layers
num_attention_heads = esm.attention_heads
intermediate_size = 4 * embed_dim # This is hardcoded in ESM-2
token_dropout = esm.token_dropout
emb_layer_norm_before = False # This code path does not exist in ESM-2
position_embedding_type = "rotary"
config = EsmConfig(
vocab_size=esm_sent_encoder.embed_tokens.num_embeddings,
mask_token_id=alphabet.mask_idx,
hidden_size=embed_dim,
num_hidden_layers=num_layers,
num_attention_heads=num_attention_heads,
intermediate_size=intermediate_size,
max_position_embeddings=1026,
layer_norm_eps=1e-5, # PyTorch default used in fairseq
attention_probs_dropout_prob=0.0,
hidden_dropout_prob=0.0,
pad_token_id=esm.padding_idx,
emb_layer_norm_before=emb_layer_norm_before,
token_dropout=token_dropout,
position_embedding_type=position_embedding_type,
)
if classification_head:
config.num_labels = esm.classification_heads["mnli"].out_proj.weight.shape[0]
print("Our BERT config:", config)
model = EsmForSequenceClassification(config) if classification_head else EsmForMaskedLM(config)
model.eval()
# Now let's copy all the weights.
# Embeddings
model.esm.embeddings.word_embeddings.weight = esm_sent_encoder.embed_tokens.weight
if position_embedding_type == "absolute":
model.esm.embeddings.position_embeddings.weight = esm_sent_encoder.embed_positions.weight
if config.emb_layer_norm_before:
model.esm.embeddings.layer_norm.weight = esm_sent_encoder.emb_layer_norm_before.weight
model.esm.embeddings.layer_norm.bias = esm_sent_encoder.emb_layer_norm_before.bias
model.esm.encoder.emb_layer_norm_after.weight = esm_sent_encoder.emb_layer_norm_after.weight
model.esm.encoder.emb_layer_norm_after.bias = esm_sent_encoder.emb_layer_norm_after.bias
for i in range(config.num_hidden_layers):
# Encoder: start of layer
layer: EsmLayer = model.esm.encoder.layer[i]
# esm_layer: TransformerSentenceEncoderLayer = esm_sent_encoder.layers[i]
esm_layer = esm_sent_encoder.layers[i]
# self attention
self_attn: EsmSelfAttention = layer.attention.self
assert (
esm_layer.self_attn.k_proj.weight.data.shape
== esm_layer.self_attn.q_proj.weight.data.shape
== esm_layer.self_attn.v_proj.weight.data.shape
== torch.Size((config.hidden_size, config.hidden_size))
)
self_attn.query.weight.data = esm_layer.self_attn.q_proj.weight
self_attn.query.bias.data = esm_layer.self_attn.q_proj.bias
self_attn.key.weight.data = esm_layer.self_attn.k_proj.weight
self_attn.key.bias.data = esm_layer.self_attn.k_proj.bias
self_attn.value.weight.data = esm_layer.self_attn.v_proj.weight
self_attn.value.bias.data = esm_layer.self_attn.v_proj.bias
if hasattr(esm_layer.self_attn, "rot_emb"):
# Matt: Although inv_freq is not a trainable weight, it is computed at model init and cached.
# During the training of ESM-2 the model was converted to float16 precision, which also converts
# the inv_freq tensor, and the loss of precision remains even if the model is loaded later as float32.
# If we recompute inv_freq without this loss of precision then we will get subtly different rotary
# embeddings, which are enough to cause significant discrepancies in model outputs. To avoid this,
# we make sure the new model copies the data from the old inv_freq.
self_attn.rotary_embeddings.inv_freq.data = esm_layer.self_attn.rot_emb.inv_freq
# LayerNorm changes for pre-activation
layer.attention.LayerNorm.weight = esm_layer.self_attn_layer_norm.weight
layer.attention.LayerNorm.bias = esm_layer.self_attn_layer_norm.bias
layer.LayerNorm.weight = esm_layer.final_layer_norm.weight
layer.LayerNorm.bias = esm_layer.final_layer_norm.bias
# self-attention output
self_output: EsmSelfOutput = layer.attention.output
assert self_output.dense.weight.shape == esm_layer.self_attn.out_proj.weight.shape
self_output.dense.weight = esm_layer.self_attn.out_proj.weight
self_output.dense.bias = esm_layer.self_attn.out_proj.bias
# intermediate
intermediate: EsmIntermediate = layer.intermediate
assert intermediate.dense.weight.shape == esm_layer.fc1.weight.shape
intermediate.dense.weight = esm_layer.fc1.weight
intermediate.dense.bias = esm_layer.fc1.bias
# output
bert_output: EsmOutput = layer.output
assert bert_output.dense.weight.shape == esm_layer.fc2.weight.shape
bert_output.dense.weight = esm_layer.fc2.weight
bert_output.dense.bias = esm_layer.fc2.bias
# end of layer
if classification_head:
model.classifier.dense.weight = esm.esm.classification_heads["mnli"].dense.weight
model.classifier.dense.bias = esm.classification_heads["mnli"].dense.bias
model.classifier.out_proj.weight = esm.classification_heads["mnli"].out_proj.weight
model.classifier.out_proj.bias = esm.classification_heads["mnli"].out_proj.bias
else:
# LM Head
model.lm_head.dense.weight = esm.lm_head.dense.weight
model.lm_head.dense.bias = esm.lm_head.dense.bias
model.lm_head.layer_norm.weight = esm.lm_head.layer_norm.weight
model.lm_head.layer_norm.bias = esm.lm_head.layer_norm.bias
model.lm_head.decoder.weight = esm.lm_head.weight
model.lm_head.decoder.bias = esm.lm_head.bias
# Let's check that we get the same results.
batch_converter = alphabet.get_batch_converter()
# Prepare data (first 2 sequences from ESMStructuralSplitDataset superfamily / 4)
batch_labels, batch_strs, batch_tokens = batch_converter(SAMPLE_DATA)
# Prepare tokenizer and make sure it matches
with TemporaryDirectory() as tempdir:
vocab = "\n".join(alphabet.all_toks)
vocab_file = Path(tempdir) / "vocab.txt"
vocab_file.write_text(vocab)
hf_tokenizer = EsmTokenizer(vocab_file=str(vocab_file))
hf_tokens = hf_tokenizer([row[1] for row in SAMPLE_DATA], return_tensors="pt", padding=True)
success = torch.all(hf_tokens["input_ids"] == batch_tokens)
print("Do both models tokenizers output the same tokens?", "🔥" if success else "💩")
if not success:
raise Exception("Tokenization does not match!")
with torch.no_grad():
our_output = model(**hf_tokens, output_hidden_states=True)
our_output = our_output["logits"]
if classification_head:
their_output = esm.model.classification_heads["mnli"](esm.extract_features(batch_tokens))
else:
their_output = esm(batch_tokens, repr_layers=list(range(999)))
their_output = their_output["logits"]
print(our_output.shape, their_output.shape)
max_absolute_diff = torch.max(torch.abs(our_output - their_output)).item()
print(f"max_absolute_diff = {max_absolute_diff}") # ~ 1e-5
success = torch.allclose(our_output, their_output, atol=3e-4)
print("Do both models output the same tensors?", "🔥" if success else "💩")
if not success:
raise Exception("Something went wRoNg")
pathlib.Path(pytorch_dump_folder_path).mkdir(parents=True, exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving tokenizer to {pytorch_dump_folder_path}")
hf_tokenizer.save_pretrained(pytorch_dump_folder_path)
if push_to_repo:
model.push_to_hub(repo_id=push_to_repo, use_auth_token=auth_token)
hf_tokenizer.push_to_hub(repo_id=push_to_repo, use_auth_token=auth_token)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--pytorch_dump_folder_path", type=str, required=True, help="Path to the output PyTorch model."
)
parser.add_argument(
"--classification_head", action="store_true", help="Whether to convert a final classification head."
)
parser.add_argument("--model", default=None, type=str, required=True, help="Name of model to convert.")
parser.add_argument("--push_to_repo", type=str, help="Repo to upload to (including username!).")
parser.add_argument("--auth_token", type=str, help="HuggingFace auth token.")
args = parser.parse_args()
convert_esm_checkpoint_to_pytorch(
args.model, args.pytorch_dump_folder_path, args.classification_head, args.push_to_repo, args.auth_token
)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,106 @@
# coding=utf-8
# Copyright Facebook and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes for ESM."""
import os
from typing import List, Optional, Union
from ...tokenization_utils import PreTrainedTokenizer
from ...tokenization_utils_base import AddedToken
from ...utils import logging
logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
"facebook/esm1b": "https://huggingface.co/facebook/esm1b/resolve/main/vocab.txt",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
"facebook/esm1b": 1024,
}
def load_vocab_file(vocab_file):
with open(vocab_file, "r") as f:
lines = f.read().splitlines()
return [l.strip() for l in lines]
class EsmTokenizer(PreTrainedTokenizer):
"""
Constructs an ESM tokenizer.
"""
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
def __init__(self, vocab_file, **kwargs):
super().__init__(**kwargs)
self.all_tokens = load_vocab_file(vocab_file)
self._id_to_token = {ind: tok for ind, tok in enumerate(self.all_tokens)}
self._token_to_id = {tok: ind for ind, tok in enumerate(self.all_tokens)}
self.unk_token = "<unk>"
self.cls_token = "<cls>"
self.pad_token = "<pad>"
self.mask_token = "<mask>"
self.eos_token = "<eos>"
self.unique_no_split_tokens = self.all_tokens
self._create_trie(self.unique_no_split_tokens)
def _convert_id_to_token(self, index: int) -> str:
return self._id_to_token.get(index, self.unk_token)
def _convert_token_to_id(self, token: str) -> int:
return self._token_to_id.get(token, self._token_to_id.get(self.unk_token))
def _tokenize(self, text, **kwargs):
return text.split()
def get_vocab_size(self, with_added_tokens=False):
return len(self._id_to_token)
def token_to_id(self, token: str) -> int:
return self._token_to_id.get(token, self._token_to_id.get(self.unk_token))
def id_to_token(self, index: int) -> str:
return self._id_to_token.get(index, self.unk_token)
def build_inputs_with_special_tokens(
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
if token_ids_1 is not None:
raise ValueError("Multiple input sentences are not supported!")
cls_: List[int] = [self.cls_token_id]
eos_: List[int] = [self.eos_token_id]
return cls_ + token_ids_0 + eos_
def save_vocabulary(self, save_directory, filename_prefix):
vocab_file = os.path.join(save_directory, (filename_prefix + "-" if filename_prefix else "") + "vocab.txt")
with open(vocab_file, "w") as f:
f.write("\n".join(self.all_tokens))
return (vocab_file,)
@property
def vocab_size(self) -> int:
return self.get_vocab_size(with_added_tokens=False)
def _add_tokens(self, new_tokens: Union[List[str], List[AddedToken]], special_tokens: bool = False) -> int:
return super()._add_tokens(new_tokens, special_tokens=True)

View File

@ -1948,6 +1948,44 @@ class ErniePreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["torch"])
ESM_PRETRAINED_MODEL_ARCHIVE_LIST = None
class EsmForMaskedLM(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class EsmForSequenceClassification(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class EsmForTokenClassification(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class EsmModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class EsmPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None

View File

View File

@ -0,0 +1,293 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Testing suite for the PyTorch ESM model. """
import unittest
from transformers import EsmConfig, is_torch_available
from transformers.testing_utils import TestCasePlus, require_torch, slow, torch_device
from ...generation.test_generation_utils import GenerationTesterMixin
from ...test_configuration_common import ConfigTester
from ...test_modeling_common import ModelTesterMixin, ids_tensor, random_attention_mask
if is_torch_available():
import torch
from transformers import EsmForMaskedLM, EsmForSequenceClassification, EsmForTokenClassification, EsmModel
from transformers.models.esm.modeling_esm import (
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
EsmEmbeddings,
create_position_ids_from_input_ids,
)
# copied from tests.test_modeling_roberta
class EsmModelTester:
def __init__(
self,
parent,
):
self.parent = parent
self.batch_size = 13
self.seq_length = 7
self.is_training = False
self.use_input_mask = True
self.use_token_type_ids = False
self.use_labels = True
self.vocab_size = 99
self.hidden_size = 32
self.num_hidden_layers = 5
self.num_attention_heads = 4
self.intermediate_size = 37
self.hidden_act = "gelu"
self.hidden_dropout_prob = 0.1
self.attention_probs_dropout_prob = 0.1
self.max_position_embeddings = 512
self.type_vocab_size = 16
self.type_sequence_label_size = 2
self.initializer_range = 0.02
self.num_labels = 3
self.num_choices = 4
self.scope = None
def prepare_config_and_inputs(self):
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
input_mask = None
if self.use_input_mask:
input_mask = random_attention_mask([self.batch_size, self.seq_length])
sequence_labels = None
token_labels = None
choice_labels = None
if self.use_labels:
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
choice_labels = ids_tensor([self.batch_size], self.num_choices)
config = self.get_config()
return config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
def get_config(self):
return EsmConfig(
vocab_size=self.vocab_size,
hidden_size=self.hidden_size,
pad_token_id=1,
num_hidden_layers=self.num_hidden_layers,
num_attention_heads=self.num_attention_heads,
intermediate_size=self.intermediate_size,
hidden_act=self.hidden_act,
hidden_dropout_prob=self.hidden_dropout_prob,
attention_probs_dropout_prob=self.attention_probs_dropout_prob,
max_position_embeddings=self.max_position_embeddings,
type_vocab_size=self.type_vocab_size,
initializer_range=self.initializer_range,
)
def create_and_check_model(self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels):
model = EsmModel(config=config)
model.to(torch_device)
model.eval()
result = model(input_ids, attention_mask=input_mask)
result = model(input_ids)
result = model(input_ids)
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
self.parent.assertEqual(result.pooler_output.shape, (self.batch_size, self.hidden_size))
def create_and_check_for_masked_lm(
self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
):
model = EsmForMaskedLM(config=config)
model.to(torch_device)
model.eval()
result = model(input_ids, attention_mask=input_mask, labels=token_labels)
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
def create_and_check_for_token_classification(
self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
):
config.num_labels = self.num_labels
model = EsmForTokenClassification(config=config)
model.to(torch_device)
model.eval()
result = model(input_ids, attention_mask=input_mask, labels=token_labels)
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.num_labels))
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
config,
input_ids,
input_mask,
sequence_labels,
token_labels,
choice_labels,
) = config_and_inputs
inputs_dict = {"input_ids": input_ids, "attention_mask": input_mask}
return config, inputs_dict
@require_torch
class EsmModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):
test_mismatched_shapes = False
all_model_classes = (
(
EsmForMaskedLM,
EsmModel,
EsmForSequenceClassification,
EsmForTokenClassification,
)
if is_torch_available()
else ()
)
all_generative_model_classes = ()
test_sequence_classification_problem_types = True
def setUp(self):
self.model_tester = EsmModelTester(self)
self.config_tester = ConfigTester(self, config_class=EsmConfig, hidden_size=37)
def test_config(self):
self.config_tester.run_common_tests()
def test_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_model(*config_and_inputs)
def test_model_various_embeddings(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
for type in ["absolute", "relative_key", "relative_key_query"]:
config_and_inputs[0].position_embedding_type = type
self.model_tester.create_and_check_model(*config_and_inputs)
def test_for_masked_lm(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_for_masked_lm(*config_and_inputs)
def test_for_token_classification(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_for_token_classification(*config_and_inputs)
@slow
def test_model_from_pretrained(self):
for model_name in ESM_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
model = EsmModel.from_pretrained(model_name)
self.assertIsNotNone(model)
def test_create_position_ids_respects_padding_index(self):
"""Ensure that the default position ids only assign a sequential . This is a regression
test for https://github.com/huggingface/transformers/issues/1761
The position ids should be masked with the embedding object's padding index. Therefore, the
first available non-padding position index is EsmEmbeddings.padding_idx + 1
"""
config = self.model_tester.prepare_config_and_inputs()[0]
model = EsmEmbeddings(config=config)
input_ids = torch.as_tensor([[12, 31, 13, model.padding_idx]])
expected_positions = torch.as_tensor(
[
[
0 + model.padding_idx + 1,
1 + model.padding_idx + 1,
2 + model.padding_idx + 1,
model.padding_idx,
]
]
)
position_ids = create_position_ids_from_input_ids(input_ids, model.padding_idx)
self.assertEqual(position_ids.shape, expected_positions.shape)
self.assertTrue(torch.all(torch.eq(position_ids, expected_positions)))
def test_create_position_ids_from_inputs_embeds(self):
"""Ensure that the default position ids only assign a sequential . This is a regression
test for https://github.com/huggingface/transformers/issues/1761
The position ids should be masked with the embedding object's padding index. Therefore, the
first available non-padding position index is EsmEmbeddings.padding_idx + 1
"""
config = self.model_tester.prepare_config_and_inputs()[0]
embeddings = EsmEmbeddings(config=config)
inputs_embeds = torch.empty(2, 4, 30)
expected_single_positions = [
0 + embeddings.padding_idx + 1,
1 + embeddings.padding_idx + 1,
2 + embeddings.padding_idx + 1,
3 + embeddings.padding_idx + 1,
]
expected_positions = torch.as_tensor([expected_single_positions, expected_single_positions])
position_ids = embeddings.create_position_ids_from_inputs_embeds(inputs_embeds)
self.assertEqual(position_ids.shape, expected_positions.shape)
self.assertTrue(torch.all(torch.eq(position_ids, expected_positions)))
@require_torch
class EsmModelIntegrationTest(TestCasePlus):
@slow
def test_inference_masked_lm(self):
model = EsmForMaskedLM.from_pretrained("Rocketknight1/esm-2-8m")
input_ids = torch.tensor([[0, 1, 2, 3, 4, 5]])
output = model(input_ids)[0]
vocab_size = 33
expected_shape = torch.Size((1, 6, vocab_size))
self.assertEqual(output.shape, expected_shape)
expected_slice = torch.tensor(
[[[15.0973, -6.6406, -1.1351], [-0.2209, -9.9622, 4.2109], [-1.6055, -10.0023, 1.5914]]]
)
self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4))
@slow
def test_inference_no_head(self):
model = EsmModel.from_pretrained("Rocketknight1/esm-2-8m")
input_ids = torch.tensor([[0, 6, 4, 13, 5, 4, 16, 12, 11, 7, 2]])
output = model(input_ids)[0]
# compare the actual values for a slice.
expected_slice = torch.tensor(
[[[0.1444, 0.5413, 0.3248], [0.3034, 0.0053, 0.3108], [0.3228, -0.2499, 0.3415]]]
)
self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4))
def test_lm_head_ignore_keys(self):
from copy import deepcopy
keys_to_ignore_on_save_tied = [r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
keys_to_ignore_on_save_untied = [r"lm_head.decoder.bias"]
config = EsmConfig.from_pretrained("Rocketknight1/esm-2-8m")
config_tied = deepcopy(config)
config_tied.tie_word_embeddings = True
config_untied = deepcopy(config)
config_untied.tie_word_embeddings = False
for cls in [EsmForMaskedLM]:
model = cls(config_tied)
self.assertEqual(model._keys_to_ignore_on_save, keys_to_ignore_on_save_tied, cls)
# the keys should be different when embeddings aren't tied
model = cls(config_untied)
self.assertEqual(model._keys_to_ignore_on_save, keys_to_ignore_on_save_untied, cls)
# test that saving works with updated ignore keys - just testing that it doesn't fail
model.save_pretrained(self.get_auto_remove_tmp_dir())

View File

@ -0,0 +1,91 @@
# coding=utf-8
# Copyright 2021 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import tempfile
import unittest
from typing import List
from transformers.models.esm.tokenization_esm import VOCAB_FILES_NAMES, EsmTokenizer
from transformers.testing_utils import require_tokenizers
from transformers.tokenization_utils import PreTrainedTokenizer
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
@require_tokenizers
class ESMTokenizationTest(unittest.TestCase):
tokenizer_class = EsmTokenizer
def setUp(self):
super().setUp()
self.tmpdirname = tempfile.mkdtemp()
# fmt: off
vocab_tokens: List[str] = ["<cls>", "<pad>", "<eos>", "<unk>", "L", "A", "G", "V", "S", "E", "R", "T", "I", "D", "P", "K", "Q", "N", "F", "Y", "M", "H", "W", "C", "X", "B", "U", "Z", "O", ".", "-", "<null_1>", "<mask>"] # noqa: E501
# fmt: on
self.vocab_file = os.path.join(self.tmpdirname, VOCAB_FILES_NAMES["vocab_file"])
with open(self.vocab_file, "w", encoding="utf-8") as vocab_writer:
vocab_writer.write("".join([x + "\n" for x in vocab_tokens]))
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
return [self.get_tokenizer(**kwargs)]
def get_tokenizer(self, **kwargs) -> PreTrainedTokenizer:
return self.tokenizer_class.from_pretrained(self.tmpdirname, **kwargs)
def test_tokenizer_single_example(self):
tokenizer = self.tokenizer_class(self.vocab_file)
tokens = tokenizer.tokenize("LAGVS")
self.assertListEqual(tokens, ["L", "A", "G", "V", "S"])
self.assertListEqual(tokenizer.convert_tokens_to_ids(tokens), [4, 5, 6, 7, 8])
def test_tokenizer_encode_single(self):
tokenizer = self.tokenizer_class(self.vocab_file)
seq = "LAGVS"
self.assertListEqual(tokenizer.encode(seq), [0, 4, 5, 6, 7, 8, 2])
def test_tokenizer_call_no_pad(self):
tokenizer = self.tokenizer_class(self.vocab_file)
seq_batch = ["LAGVS", "WCB"]
tokens_batch = tokenizer(seq_batch, padding=False)["input_ids"]
self.assertListEqual(tokens_batch, [[0, 4, 5, 6, 7, 8, 2], [0, 22, 23, 25, 2]])
def test_tokenizer_call_pad(self):
tokenizer = self.tokenizer_class(self.vocab_file)
seq_batch = ["LAGVS", "WCB"]
tokens_batch = tokenizer(seq_batch, padding=True)["input_ids"]
self.assertListEqual(tokens_batch, [[0, 4, 5, 6, 7, 8, 2], [0, 22, 23, 25, 2, 1, 1]])
def test_tokenize_special_tokens(self):
"""Test `tokenize` with special tokens."""
tokenizers = self.get_tokenizers(fast=True)
for tokenizer in tokenizers:
with self.subTest(f"{tokenizer.__class__.__name__}"):
SPECIAL_TOKEN_1 = "<unk>"
SPECIAL_TOKEN_2 = "<mask>"
token_1 = tokenizer.tokenize(SPECIAL_TOKEN_1)
token_2 = tokenizer.tokenize(SPECIAL_TOKEN_2)
self.assertEqual(len(token_1), 1)
self.assertEqual(len(token_2), 1)
self.assertEqual(token_1[0], SPECIAL_TOKEN_1)
self.assertEqual(token_2[0], SPECIAL_TOKEN_2)