mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 21:00:08 +06:00
Rebase ESM PR and update all file formats (#19055)
* Rebase ESM PR and update all file formats * Fix test relative imports * Add __init__.py to the test dir * Disable gradient checkpointing * Remove references to TFESM... FOR NOW >:| * Remove completed TODOs from tests * Convert docstrings to mdx, fix-copies from BERT * fix-copies for the README and index * Update ESM's __init__.py to the modern format * Add to _toctree.yml * Ensure we correctly copy the pad_token_id from the original ESM model * Ensure we correctly copy the pad_token_id from the original ESM model * Tiny grammar nitpicks * Make the layer norm after embeddings an optional flag * Make the layer norm after embeddings an optional flag * Update the conversion script to handle other model classes * Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py * Break the copied from link from BertModel.forward to remove token_type_ids * Remove debug array saves * Begin ESM-2 porting * Add a hacky workaround for the precision issue in original repo * Code cleanup * Remove unused checkpoint conversion code * Remove unused checkpoint conversion code * Fix copyright notices * Get rid of all references to the TF weights conversion * Remove token_type_ids from the tests * Fix test code * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add credit * Remove _ args and __ kwargs in rotary embedding * Assertively remove asserts * Replace einsum with torch.outer() * Fix docstring formatting * Remove assertions in tokenization * Add paper citation to ESMModel docstring * Move vocab list to single line * Remove ESMLayer from init * Add Facebook copyrights * Clean up RotaryEmbedding docstring * Fix docstring formatting * Fix docstring for config object * Add explanation for new config methods * make fix-copies * Rename all the ESM- classes to Esm- * Update conversion script to allow pushing to hub * Update tests to point at my repo for now * Set config properly for tests * Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted * make fixup * Update expected values for slow tests * make fixup * Remove EsmForCausalLM for now * Remove EsmForCausalLM for now * Fix padding idx test * Updated README and docs with ESM-1b and ESM-2 separately (#19221) * Updated README and docs with ESM-1b and ESM-2 separately * Update READMEs, longer entry with 3 citations * make fix-copies Co-authored-by: Your Name <you@example.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Tom Sercu <tsercu@fb.com> Co-authored-by: Your Name <you@example.com>
This commit is contained in:
parent
4fd32a1f49
commit
368b649af6
@ -300,6 +300,7 @@ Current number of checkpoints: ** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
|
@ -250,6 +250,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
|
@ -274,6 +274,7 @@ conda install -c huggingface transformers
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
||||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
|
||||
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
|
||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
|
||||
|
@ -286,6 +286,7 @@ conda install -c huggingface transformers
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||
1. **[ESM](https://huggingface.co/docs/transformers/main/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
|
@ -241,6 +241,8 @@
|
||||
title: Encoder Decoder Models
|
||||
- local: model_doc/ernie
|
||||
title: ERNIE
|
||||
- local: model_doc/esm
|
||||
title: ESM
|
||||
- local: model_doc/flaubert
|
||||
title: FlauBERT
|
||||
- local: model_doc/fnet
|
||||
|
@ -90,6 +90,7 @@ The documentation is organized into five sections:
|
||||
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
|
||||
1. **[ESM](model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
|
||||
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
|
||||
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
@ -239,6 +240,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| ESM | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
|
109
docs/source/en/model_doc/esm.mdx
Normal file
109
docs/source/en/model_doc/esm.mdx
Normal file
@ -0,0 +1,109 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ESM
|
||||
|
||||
## Overview
|
||||
This page provides code and pre-trained weights for Transformer protein language models from Meta AI's Fundamental
|
||||
AI Research Team, providing the state-of-the-art ESM-2, and the previously released ESM-1b and ESM-1v. Transformer
|
||||
protein language models were introduced in the paper [Biological structure and function emerge from scaling
|
||||
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by
|
||||
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott,
|
||||
C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.
|
||||
The first version of this paper was [preprinted in 2019](https://www.biorxiv.org/content/10.1101/622803v1?versioned=true).
|
||||
|
||||
ESM-2 outperforms all tested single-sequence protein language models across a range of structure prediction tasks,
|
||||
and enables atomic resolution structure prediction.
|
||||
It was released with the paper [Language models of protein sequences at the scale of evolution enable accurate
|
||||
structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie,
|
||||
Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido and Alexander Rives.
|
||||
|
||||
|
||||
The abstract from
|
||||
"Biological structure and function emerge from scaling unsupervised learning to 250
|
||||
million protein sequences" is
|
||||
|
||||
|
||||
*In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised
|
||||
learning has led to major advances in representation learning and statistical generation. In the life sciences, the
|
||||
anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling
|
||||
at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To
|
||||
this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250
|
||||
million protein sequences spanning evolutionary diversity. The resulting model contains information about biological
|
||||
properties in its representations. The representations are learned from sequence data alone. The learned representation
|
||||
space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to
|
||||
remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and
|
||||
can be identified by linear projections. Representation learning produces features that generalize across a range of
|
||||
applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and
|
||||
improving state-of-the-art features for long-range contact prediction.*
|
||||
|
||||
|
||||
The abstract from
|
||||
"Language models of protein sequences at the scale of evolution enable accurate structure prediction" is
|
||||
|
||||
*Large language models have recently been shown to develop emergent capabilities with scale, going beyond
|
||||
simple pattern matching to perform higher level reasoning and generate lifelike images and text. While
|
||||
language models trained on protein sequences have been studied at a smaller scale, little is known about
|
||||
what they learn about biology as they are scaled up. In this work we train models up to 15 billion parameters,
|
||||
the largest language models of proteins to be evaluated to date. We find that as models are scaled they learn
|
||||
information enabling the prediction of the three-dimensional structure of a protein at the resolution of
|
||||
individual atoms. We present ESMFold for high accuracy end-to-end atomic level structure prediction directly
|
||||
from the individual sequence of a protein. ESMFold has similar accuracy to AlphaFold2 and RoseTTAFold for
|
||||
sequences with low perplexity that are well understood by the language model. ESMFold inference is an
|
||||
order of magnitude faster than AlphaFold2, enabling exploration of the structural space of metagenomic
|
||||
proteins in practical timescales.*
|
||||
|
||||
|
||||
|
||||
|
||||
Tips:
|
||||
|
||||
- ESM models are trained with a masked language modeling (MLM) objective.
|
||||
|
||||
The original code can be found [here](https://github.com/facebookresearch/esm) and was
|
||||
was developed by the Fundamental AI Research team at Meta AI.
|
||||
This model was contributed to huggingface by [jasonliu](https://huggingface.co/jasonliu)
|
||||
and [Matt](https://huggingface.co/Rocketknight1).
|
||||
|
||||
## EsmConfig
|
||||
|
||||
[[autodoc]] EsmConfig
|
||||
- all
|
||||
|
||||
## EsmTokenizer
|
||||
|
||||
[[autodoc]] EsmTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
|
||||
## EsmModel
|
||||
|
||||
[[autodoc]] EsmModel
|
||||
- forward
|
||||
|
||||
## EsmForMaskedLM
|
||||
|
||||
[[autodoc]] EsmForMaskedLM
|
||||
- forward
|
||||
|
||||
## EsmForSequenceClassification
|
||||
|
||||
[[autodoc]] EsmForSequenceClassification
|
||||
- forward
|
||||
|
||||
## EsmForTokenClassification
|
||||
|
||||
[[autodoc]] EsmForTokenClassification
|
||||
- forward
|
@ -210,6 +210,7 @@ _import_structure = {
|
||||
"ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
||||
"ErnieConfig",
|
||||
],
|
||||
"models.esm": ["ESM_PRETRAINED_CONFIG_ARCHIVE_MAP", "EsmConfig", "EsmTokenizer"],
|
||||
"models.flaubert": ["FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FlaubertConfig", "FlaubertTokenizer"],
|
||||
"models.flava": [
|
||||
"FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
||||
@ -1220,6 +1221,16 @@ else:
|
||||
"ErniePreTrainedModel",
|
||||
]
|
||||
)
|
||||
_import_structure["models.esm"].extend(
|
||||
[
|
||||
"ESM_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||
"EsmForMaskedLM",
|
||||
"EsmForSequenceClassification",
|
||||
"EsmForTokenClassification",
|
||||
"EsmModel",
|
||||
"EsmPreTrainedModel",
|
||||
]
|
||||
)
|
||||
_import_structure["models.flaubert"].extend(
|
||||
[
|
||||
"FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||
@ -3158,6 +3169,7 @@ if TYPE_CHECKING:
|
||||
from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer
|
||||
from .models.encoder_decoder import EncoderDecoderConfig
|
||||
from .models.ernie import ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP, ErnieConfig
|
||||
from .models.esm import ESM_PRETRAINED_CONFIG_ARCHIVE_MAP, EsmConfig, EsmTokenizer
|
||||
from .models.flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig, FlaubertTokenizer
|
||||
from .models.flava import (
|
||||
FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||
@ -4010,6 +4022,14 @@ if TYPE_CHECKING:
|
||||
ErnieModel,
|
||||
ErniePreTrainedModel,
|
||||
)
|
||||
from .models.esm import (
|
||||
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||
EsmForMaskedLM,
|
||||
EsmForSequenceClassification,
|
||||
EsmForTokenClassification,
|
||||
EsmModel,
|
||||
EsmPreTrainedModel,
|
||||
)
|
||||
from .models.flaubert import (
|
||||
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||
FlaubertForMultipleChoice,
|
||||
|
@ -60,6 +60,7 @@ from . import (
|
||||
electra,
|
||||
encoder_decoder,
|
||||
ernie,
|
||||
esm,
|
||||
flaubert,
|
||||
flava,
|
||||
fnet,
|
||||
|
@ -64,6 +64,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
|
||||
("electra", "ElectraConfig"),
|
||||
("encoder-decoder", "EncoderDecoderConfig"),
|
||||
("ernie", "ErnieConfig"),
|
||||
("esm", "EsmConfig"),
|
||||
("flaubert", "FlaubertConfig"),
|
||||
("flava", "FlavaConfig"),
|
||||
("fnet", "FNetConfig"),
|
||||
@ -197,6 +198,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
|
||||
("dpt", "DPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("electra", "ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("ernie", "ERNIE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("esm", "ESM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("flaubert", "FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("flava", "FLAVA_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||
@ -331,6 +333,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
|
||||
("electra", "ELECTRA"),
|
||||
("encoder-decoder", "Encoder decoder"),
|
||||
("ernie", "ERNIE"),
|
||||
("esm", "ESM"),
|
||||
("flaubert", "FlauBERT"),
|
||||
("flava", "FLAVA"),
|
||||
("fnet", "FNet"),
|
||||
|
@ -63,6 +63,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
|
||||
("dpt", "DPTModel"),
|
||||
("electra", "ElectraModel"),
|
||||
("ernie", "ErnieModel"),
|
||||
("esm", "EsmModel"),
|
||||
("flaubert", "FlaubertModel"),
|
||||
("flava", "FlavaModel"),
|
||||
("fnet", "FNetModel"),
|
||||
@ -231,6 +232,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
|
||||
("electra", "ElectraForMaskedLM"),
|
||||
("encoder-decoder", "EncoderDecoderModel"),
|
||||
("ernie", "ErnieForMaskedLM"),
|
||||
("esm", "EsmForMaskedLM"),
|
||||
("flaubert", "FlaubertWithLMHeadModel"),
|
||||
("fnet", "FNetForMaskedLM"),
|
||||
("fsmt", "FSMTForConditionalGeneration"),
|
||||
@ -519,6 +521,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
|
||||
("distilbert", "DistilBertForSequenceClassification"),
|
||||
("electra", "ElectraForSequenceClassification"),
|
||||
("ernie", "ErnieForSequenceClassification"),
|
||||
("esm", "EsmForSequenceClassification"),
|
||||
("flaubert", "FlaubertForSequenceClassification"),
|
||||
("fnet", "FNetForSequenceClassification"),
|
||||
("funnel", "FunnelForSequenceClassification"),
|
||||
@ -648,6 +651,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
|
||||
("distilbert", "DistilBertForTokenClassification"),
|
||||
("electra", "ElectraForTokenClassification"),
|
||||
("ernie", "ErnieForTokenClassification"),
|
||||
("esm", "EsmForTokenClassification"),
|
||||
("flaubert", "FlaubertForTokenClassification"),
|
||||
("fnet", "FNetForTokenClassification"),
|
||||
("funnel", "FunnelForTokenClassification"),
|
||||
|
67
src/transformers/models/esm/__init__.py
Normal file
67
src/transformers/models/esm/__init__.py
Normal file
@ -0,0 +1,67 @@
|
||||
# flake8: noqa
|
||||
# There's no way to ignore "F401 '...' imported but unused" warnings in this
|
||||
# module, but to preserve other warnings. So, don't check this module at all.
|
||||
|
||||
# Copyright 2022 Facebook and The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
|
||||
|
||||
|
||||
_import_structure = {
|
||||
"configuration_esm": ["ESM_PRETRAINED_CONFIG_ARCHIVE_MAP", "EsmConfig"],
|
||||
"tokenization_esm": ["EsmTokenizer"],
|
||||
}
|
||||
|
||||
try:
|
||||
if not is_torch_available():
|
||||
raise OptionalDependencyNotAvailable()
|
||||
except OptionalDependencyNotAvailable:
|
||||
pass
|
||||
else:
|
||||
_import_structure["modeling_esm"] = [
|
||||
"ESM_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||
"EsmForMaskedLM",
|
||||
"EsmForSequenceClassification",
|
||||
"EsmForTokenClassification",
|
||||
"EsmModel",
|
||||
"EsmPreTrainedModel",
|
||||
]
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from .configuration_esm import ESM_PRETRAINED_CONFIG_ARCHIVE_MAP, EsmConfig
|
||||
from .tokenization_esm import EsmTokenizer
|
||||
|
||||
try:
|
||||
if not is_torch_available():
|
||||
raise OptionalDependencyNotAvailable()
|
||||
except OptionalDependencyNotAvailable:
|
||||
pass
|
||||
else:
|
||||
from .modeling_esm import (
|
||||
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||
EsmForMaskedLM,
|
||||
EsmForSequenceClassification,
|
||||
EsmForTokenClassification,
|
||||
EsmModel,
|
||||
EsmPreTrainedModel,
|
||||
)
|
||||
|
||||
|
||||
else:
|
||||
import sys
|
||||
|
||||
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)
|
142
src/transformers/models/esm/configuration_esm.py
Normal file
142
src/transformers/models/esm/configuration_esm.py
Normal file
@ -0,0 +1,142 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2021 Facebook and The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
""" ESM model configuration"""
|
||||
|
||||
from ...configuration_utils import PretrainedConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
ESM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||
"facebook/esm1b": "https://huggingface.co/facebook/esm1b/resolve/main/config.json",
|
||||
# See all ESM models at https://huggingface.co/models?filter=esm
|
||||
}
|
||||
|
||||
|
||||
class EsmConfig(PretrainedConfig):
|
||||
r"""
|
||||
This is the configuration class to store the configuration of a [`ESMModel`]. It is used to instantiate a ESM model
|
||||
according to the specified arguments, defining the model architecture. Instantiating a configuration with the
|
||||
defaults will yield a similar configuration to that of the ESM
|
||||
[esm-base-uncased](https://huggingface.co/esm-base-uncased) architecture.
|
||||
|
||||
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
||||
documentation from [`PretrainedConfig`] for more information.
|
||||
|
||||
|
||||
Args:
|
||||
vocab_size (`int`, *optional*):
|
||||
Vocabulary size of the ESM model. Defines the number of different tokens that can be represented by the
|
||||
`inputs_ids` passed when calling [`ESMModel`].
|
||||
mask_token_id (`int`, *optional*):
|
||||
The index of the mask token in the vocabulary. This must be included in the config because of the
|
||||
"mask-dropout" scaling trick, which will scale the inputs depending on the number of masked tokens.
|
||||
pad_token_id (`int`, *optional*):
|
||||
The index of the padding token in the vocabulary. This must be included in the config because certain parts
|
||||
of the ESM code use this instead of the attention mask.
|
||||
hidden_size (`int`, *optional*, defaults to 768):
|
||||
Dimensionality of the encoder layers and the pooler layer.
|
||||
num_hidden_layers (`int`, *optional*, defaults to 12):
|
||||
Number of hidden layers in the Transformer encoder.
|
||||
num_attention_heads (`int`, *optional*, defaults to 12):
|
||||
Number of attention heads for each attention layer in the Transformer encoder.
|
||||
intermediate_size (`int`, *optional*, defaults to 3072):
|
||||
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
|
||||
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
|
||||
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
|
||||
`"relu"`, `"silu"` and `"gelu_new"` are supported.
|
||||
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
|
||||
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
|
||||
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
|
||||
The dropout ratio for the attention probabilities.
|
||||
max_position_embeddings (`int`, *optional*, defaults to 1026):
|
||||
The maximum sequence length that this model might ever be used with. Typically set this to something large
|
||||
just in case (e.g., 512 or 1024 or 2048).
|
||||
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
|
||||
The epsilon used by the layer normalization layers.
|
||||
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
|
||||
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query", "rotary"`.
|
||||
For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
|
||||
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
|
||||
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
|
||||
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
|
||||
use_cache (`bool`, *optional*, defaults to `True`):
|
||||
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
||||
relevant if `config.is_decoder=True`.
|
||||
classifier_dropout (`float`, *optional*):
|
||||
The dropout ratio for the classification head.
|
||||
emb_layer_norm_before (`bool`, *optional*):
|
||||
Whether to apply layer normalization after embeddings but before the main stem of the network.
|
||||
token_dropout (`bool`, defaults to `False`):
|
||||
When this is enabled, masked tokens are treated as if they had been dropped out by input dropout.
|
||||
|
||||
Examples:
|
||||
|
||||
```python
|
||||
>>> from transformers import EsmModel, EsmConfig
|
||||
|
||||
>>> # Initializing a ESM esm-base-uncased style configuration >>> configuration = EsmConfig()
|
||||
|
||||
>>> # Initializing a model from the configuration >>> model = ESMModel(configuration)
|
||||
|
||||
>>> # Accessing the model configuration >>> configuration = model.config
|
||||
```"""
|
||||
model_type = "esm"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
vocab_size=None,
|
||||
mask_token_id=None,
|
||||
pad_token_id=None,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=1026,
|
||||
initializer_range=0.02,
|
||||
layer_norm_eps=1e-12,
|
||||
position_embedding_type="absolute",
|
||||
use_cache=True,
|
||||
classifier_dropout=None,
|
||||
emb_layer_norm_before=None,
|
||||
token_dropout=False,
|
||||
**kwargs
|
||||
):
|
||||
super().__init__(pad_token_id=pad_token_id, **kwargs)
|
||||
|
||||
self.vocab_size = vocab_size
|
||||
self.hidden_size = hidden_size
|
||||
self.num_hidden_layers = num_hidden_layers
|
||||
self.num_attention_heads = num_attention_heads
|
||||
self.hidden_act = hidden_act
|
||||
self.intermediate_size = intermediate_size
|
||||
self.hidden_dropout_prob = hidden_dropout_prob
|
||||
self.attention_probs_dropout_prob = attention_probs_dropout_prob
|
||||
self.max_position_embeddings = max_position_embeddings
|
||||
self.initializer_range = initializer_range
|
||||
self.layer_norm_eps = layer_norm_eps
|
||||
self.position_embedding_type = position_embedding_type
|
||||
self.use_cache = use_cache
|
||||
self.classifier_dropout = classifier_dropout
|
||||
self.emb_layer_norm_before = emb_layer_norm_before
|
||||
self.token_dropout = token_dropout
|
||||
self.mask_token_id = mask_token_id
|
||||
self.pad_token_id = pad_token_id
|
264
src/transformers/models/esm/convert_esm.py
Normal file
264
src/transformers/models/esm/convert_esm.py
Normal file
@ -0,0 +1,264 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2022 The HuggingFace Inc. team.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""Convert ESM checkpoint."""
|
||||
|
||||
|
||||
import argparse
|
||||
import pathlib
|
||||
from pathlib import Path
|
||||
from tempfile import TemporaryDirectory
|
||||
|
||||
import torch
|
||||
|
||||
import esm as esm_module
|
||||
from transformers.models.esm.configuration_esm import EsmConfig
|
||||
from transformers.models.esm.modeling_esm import (
|
||||
EsmForMaskedLM,
|
||||
EsmForSequenceClassification,
|
||||
EsmIntermediate,
|
||||
EsmLayer,
|
||||
EsmOutput,
|
||||
EsmSelfAttention,
|
||||
EsmSelfOutput,
|
||||
)
|
||||
from transformers.models.esm.tokenization_esm import EsmTokenizer
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
logging.set_verbosity_info()
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
SAMPLE_DATA = [
|
||||
("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
|
||||
("protein2", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA"),
|
||||
("protein3", "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG"),
|
||||
("protein4", "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLA"),
|
||||
]
|
||||
|
||||
MODEL_MAPPING = {
|
||||
"esm1b_t33_650M_UR50S": esm_module.pretrained.esm1b_t33_650M_UR50S,
|
||||
"esm1v_t33_650M_UR90S_1": esm_module.pretrained.esm1v_t33_650M_UR90S_1,
|
||||
"esm1v_t33_650M_UR90S_2": esm_module.pretrained.esm1v_t33_650M_UR90S_2,
|
||||
"esm1v_t33_650M_UR90S_3": esm_module.pretrained.esm1v_t33_650M_UR90S_3,
|
||||
"esm1v_t33_650M_UR90S_4": esm_module.pretrained.esm1v_t33_650M_UR90S_4,
|
||||
"esm1v_t33_650M_UR90S_5": esm_module.pretrained.esm1v_t33_650M_UR90S_5,
|
||||
"esm2_t48_15B_UR50D": esm_module.pretrained.esm2_t48_15B_UR50D,
|
||||
"esm2_t36_3B_UR50D": esm_module.pretrained.esm2_t36_3B_UR50D,
|
||||
"esm2_t33_650M_UR50D": esm_module.pretrained.esm2_t33_650M_UR50D,
|
||||
"esm2_t30_150M_UR50D": esm_module.pretrained.esm2_t30_150M_UR50D,
|
||||
"esm2_t12_35M_UR50D": esm_module.pretrained.esm2_t12_35M_UR50D,
|
||||
"esm2_t6_8M_UR50D": esm_module.pretrained.esm2_t6_8M_UR50D,
|
||||
}
|
||||
|
||||
|
||||
def convert_esm_checkpoint_to_pytorch(
|
||||
model: str, pytorch_dump_folder_path: str, classification_head: bool, push_to_repo: str, auth_token: str
|
||||
):
|
||||
"""
|
||||
Copy/paste/tweak esm's weights to our BERT structure.
|
||||
"""
|
||||
esm, alphabet = MODEL_MAPPING[model]()
|
||||
esm.eval() # disable dropout
|
||||
esm_sent_encoder = esm
|
||||
if hasattr(esm, "args"):
|
||||
# Indicates an ESM-1b or ESM-1v model
|
||||
embed_dim = esm.args.embed_dim
|
||||
num_layers = esm.args.layers
|
||||
num_attention_heads = esm.args.attention_heads
|
||||
intermediate_size = esm.args.ffn_embed_dim
|
||||
token_dropout = esm.args.token_dropout
|
||||
emb_layer_norm_before = True if esm.emb_layer_norm_before else False
|
||||
position_embedding_type = "absolute"
|
||||
else:
|
||||
# Indicates an ESM-2 model
|
||||
embed_dim = esm.embed_dim
|
||||
num_layers = esm.num_layers
|
||||
num_attention_heads = esm.attention_heads
|
||||
intermediate_size = 4 * embed_dim # This is hardcoded in ESM-2
|
||||
token_dropout = esm.token_dropout
|
||||
emb_layer_norm_before = False # This code path does not exist in ESM-2
|
||||
position_embedding_type = "rotary"
|
||||
|
||||
config = EsmConfig(
|
||||
vocab_size=esm_sent_encoder.embed_tokens.num_embeddings,
|
||||
mask_token_id=alphabet.mask_idx,
|
||||
hidden_size=embed_dim,
|
||||
num_hidden_layers=num_layers,
|
||||
num_attention_heads=num_attention_heads,
|
||||
intermediate_size=intermediate_size,
|
||||
max_position_embeddings=1026,
|
||||
layer_norm_eps=1e-5, # PyTorch default used in fairseq
|
||||
attention_probs_dropout_prob=0.0,
|
||||
hidden_dropout_prob=0.0,
|
||||
pad_token_id=esm.padding_idx,
|
||||
emb_layer_norm_before=emb_layer_norm_before,
|
||||
token_dropout=token_dropout,
|
||||
position_embedding_type=position_embedding_type,
|
||||
)
|
||||
if classification_head:
|
||||
config.num_labels = esm.classification_heads["mnli"].out_proj.weight.shape[0]
|
||||
print("Our BERT config:", config)
|
||||
|
||||
model = EsmForSequenceClassification(config) if classification_head else EsmForMaskedLM(config)
|
||||
model.eval()
|
||||
|
||||
# Now let's copy all the weights.
|
||||
# Embeddings
|
||||
model.esm.embeddings.word_embeddings.weight = esm_sent_encoder.embed_tokens.weight
|
||||
if position_embedding_type == "absolute":
|
||||
model.esm.embeddings.position_embeddings.weight = esm_sent_encoder.embed_positions.weight
|
||||
|
||||
if config.emb_layer_norm_before:
|
||||
model.esm.embeddings.layer_norm.weight = esm_sent_encoder.emb_layer_norm_before.weight
|
||||
model.esm.embeddings.layer_norm.bias = esm_sent_encoder.emb_layer_norm_before.bias
|
||||
|
||||
model.esm.encoder.emb_layer_norm_after.weight = esm_sent_encoder.emb_layer_norm_after.weight
|
||||
model.esm.encoder.emb_layer_norm_after.bias = esm_sent_encoder.emb_layer_norm_after.bias
|
||||
|
||||
for i in range(config.num_hidden_layers):
|
||||
# Encoder: start of layer
|
||||
layer: EsmLayer = model.esm.encoder.layer[i]
|
||||
# esm_layer: TransformerSentenceEncoderLayer = esm_sent_encoder.layers[i]
|
||||
esm_layer = esm_sent_encoder.layers[i]
|
||||
|
||||
# self attention
|
||||
self_attn: EsmSelfAttention = layer.attention.self
|
||||
assert (
|
||||
esm_layer.self_attn.k_proj.weight.data.shape
|
||||
== esm_layer.self_attn.q_proj.weight.data.shape
|
||||
== esm_layer.self_attn.v_proj.weight.data.shape
|
||||
== torch.Size((config.hidden_size, config.hidden_size))
|
||||
)
|
||||
|
||||
self_attn.query.weight.data = esm_layer.self_attn.q_proj.weight
|
||||
self_attn.query.bias.data = esm_layer.self_attn.q_proj.bias
|
||||
self_attn.key.weight.data = esm_layer.self_attn.k_proj.weight
|
||||
self_attn.key.bias.data = esm_layer.self_attn.k_proj.bias
|
||||
self_attn.value.weight.data = esm_layer.self_attn.v_proj.weight
|
||||
self_attn.value.bias.data = esm_layer.self_attn.v_proj.bias
|
||||
|
||||
if hasattr(esm_layer.self_attn, "rot_emb"):
|
||||
# Matt: Although inv_freq is not a trainable weight, it is computed at model init and cached.
|
||||
# During the training of ESM-2 the model was converted to float16 precision, which also converts
|
||||
# the inv_freq tensor, and the loss of precision remains even if the model is loaded later as float32.
|
||||
# If we recompute inv_freq without this loss of precision then we will get subtly different rotary
|
||||
# embeddings, which are enough to cause significant discrepancies in model outputs. To avoid this,
|
||||
# we make sure the new model copies the data from the old inv_freq.
|
||||
self_attn.rotary_embeddings.inv_freq.data = esm_layer.self_attn.rot_emb.inv_freq
|
||||
|
||||
# LayerNorm changes for pre-activation
|
||||
layer.attention.LayerNorm.weight = esm_layer.self_attn_layer_norm.weight
|
||||
layer.attention.LayerNorm.bias = esm_layer.self_attn_layer_norm.bias
|
||||
layer.LayerNorm.weight = esm_layer.final_layer_norm.weight
|
||||
layer.LayerNorm.bias = esm_layer.final_layer_norm.bias
|
||||
|
||||
# self-attention output
|
||||
self_output: EsmSelfOutput = layer.attention.output
|
||||
assert self_output.dense.weight.shape == esm_layer.self_attn.out_proj.weight.shape
|
||||
self_output.dense.weight = esm_layer.self_attn.out_proj.weight
|
||||
self_output.dense.bias = esm_layer.self_attn.out_proj.bias
|
||||
|
||||
# intermediate
|
||||
intermediate: EsmIntermediate = layer.intermediate
|
||||
assert intermediate.dense.weight.shape == esm_layer.fc1.weight.shape
|
||||
intermediate.dense.weight = esm_layer.fc1.weight
|
||||
intermediate.dense.bias = esm_layer.fc1.bias
|
||||
|
||||
# output
|
||||
bert_output: EsmOutput = layer.output
|
||||
assert bert_output.dense.weight.shape == esm_layer.fc2.weight.shape
|
||||
bert_output.dense.weight = esm_layer.fc2.weight
|
||||
bert_output.dense.bias = esm_layer.fc2.bias
|
||||
# end of layer
|
||||
|
||||
if classification_head:
|
||||
model.classifier.dense.weight = esm.esm.classification_heads["mnli"].dense.weight
|
||||
model.classifier.dense.bias = esm.classification_heads["mnli"].dense.bias
|
||||
model.classifier.out_proj.weight = esm.classification_heads["mnli"].out_proj.weight
|
||||
model.classifier.out_proj.bias = esm.classification_heads["mnli"].out_proj.bias
|
||||
else:
|
||||
# LM Head
|
||||
model.lm_head.dense.weight = esm.lm_head.dense.weight
|
||||
model.lm_head.dense.bias = esm.lm_head.dense.bias
|
||||
model.lm_head.layer_norm.weight = esm.lm_head.layer_norm.weight
|
||||
model.lm_head.layer_norm.bias = esm.lm_head.layer_norm.bias
|
||||
model.lm_head.decoder.weight = esm.lm_head.weight
|
||||
model.lm_head.decoder.bias = esm.lm_head.bias
|
||||
|
||||
# Let's check that we get the same results.
|
||||
batch_converter = alphabet.get_batch_converter()
|
||||
|
||||
# Prepare data (first 2 sequences from ESMStructuralSplitDataset superfamily / 4)
|
||||
|
||||
batch_labels, batch_strs, batch_tokens = batch_converter(SAMPLE_DATA)
|
||||
|
||||
# Prepare tokenizer and make sure it matches
|
||||
with TemporaryDirectory() as tempdir:
|
||||
vocab = "\n".join(alphabet.all_toks)
|
||||
vocab_file = Path(tempdir) / "vocab.txt"
|
||||
vocab_file.write_text(vocab)
|
||||
hf_tokenizer = EsmTokenizer(vocab_file=str(vocab_file))
|
||||
|
||||
hf_tokens = hf_tokenizer([row[1] for row in SAMPLE_DATA], return_tensors="pt", padding=True)
|
||||
success = torch.all(hf_tokens["input_ids"] == batch_tokens)
|
||||
print("Do both models tokenizers output the same tokens?", "🔥" if success else "💩")
|
||||
if not success:
|
||||
raise Exception("Tokenization does not match!")
|
||||
|
||||
with torch.no_grad():
|
||||
our_output = model(**hf_tokens, output_hidden_states=True)
|
||||
our_output = our_output["logits"]
|
||||
if classification_head:
|
||||
their_output = esm.model.classification_heads["mnli"](esm.extract_features(batch_tokens))
|
||||
else:
|
||||
their_output = esm(batch_tokens, repr_layers=list(range(999)))
|
||||
their_output = their_output["logits"]
|
||||
print(our_output.shape, their_output.shape)
|
||||
max_absolute_diff = torch.max(torch.abs(our_output - their_output)).item()
|
||||
print(f"max_absolute_diff = {max_absolute_diff}") # ~ 1e-5
|
||||
success = torch.allclose(our_output, their_output, atol=3e-4)
|
||||
print("Do both models output the same tensors?", "🔥" if success else "💩")
|
||||
|
||||
if not success:
|
||||
raise Exception("Something went wRoNg")
|
||||
|
||||
pathlib.Path(pytorch_dump_folder_path).mkdir(parents=True, exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving tokenizer to {pytorch_dump_folder_path}")
|
||||
hf_tokenizer.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_repo:
|
||||
model.push_to_hub(repo_id=push_to_repo, use_auth_token=auth_token)
|
||||
hf_tokenizer.push_to_hub(repo_id=push_to_repo, use_auth_token=auth_token)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
# Required parameters
|
||||
parser.add_argument(
|
||||
"--pytorch_dump_folder_path", type=str, required=True, help="Path to the output PyTorch model."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--classification_head", action="store_true", help="Whether to convert a final classification head."
|
||||
)
|
||||
parser.add_argument("--model", default=None, type=str, required=True, help="Name of model to convert.")
|
||||
parser.add_argument("--push_to_repo", type=str, help="Repo to upload to (including username!).")
|
||||
parser.add_argument("--auth_token", type=str, help="HuggingFace auth token.")
|
||||
args = parser.parse_args()
|
||||
convert_esm_checkpoint_to_pytorch(
|
||||
args.model, args.pytorch_dump_folder_path, args.classification_head, args.push_to_repo, args.auth_token
|
||||
)
|
1241
src/transformers/models/esm/modeling_esm.py
Executable file
1241
src/transformers/models/esm/modeling_esm.py
Executable file
File diff suppressed because it is too large
Load Diff
106
src/transformers/models/esm/tokenization_esm.py
Normal file
106
src/transformers/models/esm/tokenization_esm.py
Normal file
@ -0,0 +1,106 @@
|
||||
# coding=utf-8
|
||||
# Copyright Facebook and The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""Tokenization classes for ESM."""
|
||||
import os
|
||||
from typing import List, Optional, Union
|
||||
|
||||
from ...tokenization_utils import PreTrainedTokenizer
|
||||
from ...tokenization_utils_base import AddedToken
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
|
||||
|
||||
PRETRAINED_VOCAB_FILES_MAP = {
|
||||
"vocab_file": {
|
||||
"facebook/esm1b": "https://huggingface.co/facebook/esm1b/resolve/main/vocab.txt",
|
||||
},
|
||||
}
|
||||
|
||||
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
|
||||
"facebook/esm1b": 1024,
|
||||
}
|
||||
|
||||
|
||||
def load_vocab_file(vocab_file):
|
||||
with open(vocab_file, "r") as f:
|
||||
lines = f.read().splitlines()
|
||||
return [l.strip() for l in lines]
|
||||
|
||||
|
||||
class EsmTokenizer(PreTrainedTokenizer):
|
||||
"""
|
||||
Constructs an ESM tokenizer.
|
||||
"""
|
||||
|
||||
vocab_files_names = VOCAB_FILES_NAMES
|
||||
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
|
||||
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
|
||||
model_input_names = ["input_ids", "attention_mask"]
|
||||
|
||||
def __init__(self, vocab_file, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.all_tokens = load_vocab_file(vocab_file)
|
||||
self._id_to_token = {ind: tok for ind, tok in enumerate(self.all_tokens)}
|
||||
self._token_to_id = {tok: ind for ind, tok in enumerate(self.all_tokens)}
|
||||
self.unk_token = "<unk>"
|
||||
self.cls_token = "<cls>"
|
||||
self.pad_token = "<pad>"
|
||||
self.mask_token = "<mask>"
|
||||
self.eos_token = "<eos>"
|
||||
self.unique_no_split_tokens = self.all_tokens
|
||||
self._create_trie(self.unique_no_split_tokens)
|
||||
|
||||
def _convert_id_to_token(self, index: int) -> str:
|
||||
return self._id_to_token.get(index, self.unk_token)
|
||||
|
||||
def _convert_token_to_id(self, token: str) -> int:
|
||||
return self._token_to_id.get(token, self._token_to_id.get(self.unk_token))
|
||||
|
||||
def _tokenize(self, text, **kwargs):
|
||||
return text.split()
|
||||
|
||||
def get_vocab_size(self, with_added_tokens=False):
|
||||
return len(self._id_to_token)
|
||||
|
||||
def token_to_id(self, token: str) -> int:
|
||||
return self._token_to_id.get(token, self._token_to_id.get(self.unk_token))
|
||||
|
||||
def id_to_token(self, index: int) -> str:
|
||||
return self._id_to_token.get(index, self.unk_token)
|
||||
|
||||
def build_inputs_with_special_tokens(
|
||||
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
||||
) -> List[int]:
|
||||
if token_ids_1 is not None:
|
||||
raise ValueError("Multiple input sentences are not supported!")
|
||||
cls_: List[int] = [self.cls_token_id]
|
||||
eos_: List[int] = [self.eos_token_id]
|
||||
return cls_ + token_ids_0 + eos_
|
||||
|
||||
def save_vocabulary(self, save_directory, filename_prefix):
|
||||
vocab_file = os.path.join(save_directory, (filename_prefix + "-" if filename_prefix else "") + "vocab.txt")
|
||||
with open(vocab_file, "w") as f:
|
||||
f.write("\n".join(self.all_tokens))
|
||||
return (vocab_file,)
|
||||
|
||||
@property
|
||||
def vocab_size(self) -> int:
|
||||
return self.get_vocab_size(with_added_tokens=False)
|
||||
|
||||
def _add_tokens(self, new_tokens: Union[List[str], List[AddedToken]], special_tokens: bool = False) -> int:
|
||||
return super()._add_tokens(new_tokens, special_tokens=True)
|
@ -1948,6 +1948,44 @@ class ErniePreTrainedModel(metaclass=DummyObject):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
ESM_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||
|
||||
|
||||
class EsmForMaskedLM(metaclass=DummyObject):
|
||||
_backends = ["torch"]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
class EsmForSequenceClassification(metaclass=DummyObject):
|
||||
_backends = ["torch"]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
class EsmForTokenClassification(metaclass=DummyObject):
|
||||
_backends = ["torch"]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
class EsmModel(metaclass=DummyObject):
|
||||
_backends = ["torch"]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
class EsmPreTrainedModel(metaclass=DummyObject):
|
||||
_backends = ["torch"]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
requires_backends(self, ["torch"])
|
||||
|
||||
|
||||
FLAUBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||
|
||||
|
||||
|
0
tests/models/esm/__init__.py
Normal file
0
tests/models/esm/__init__.py
Normal file
293
tests/models/esm/test_modeling_esm.py
Normal file
293
tests/models/esm/test_modeling_esm.py
Normal file
@ -0,0 +1,293 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
""" Testing suite for the PyTorch ESM model. """
|
||||
|
||||
|
||||
import unittest
|
||||
|
||||
from transformers import EsmConfig, is_torch_available
|
||||
from transformers.testing_utils import TestCasePlus, require_torch, slow, torch_device
|
||||
|
||||
from ...generation.test_generation_utils import GenerationTesterMixin
|
||||
from ...test_configuration_common import ConfigTester
|
||||
from ...test_modeling_common import ModelTesterMixin, ids_tensor, random_attention_mask
|
||||
|
||||
|
||||
if is_torch_available():
|
||||
import torch
|
||||
|
||||
from transformers import EsmForMaskedLM, EsmForSequenceClassification, EsmForTokenClassification, EsmModel
|
||||
from transformers.models.esm.modeling_esm import (
|
||||
ESM_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||
EsmEmbeddings,
|
||||
create_position_ids_from_input_ids,
|
||||
)
|
||||
|
||||
|
||||
# copied from tests.test_modeling_roberta
|
||||
class EsmModelTester:
|
||||
def __init__(
|
||||
self,
|
||||
parent,
|
||||
):
|
||||
self.parent = parent
|
||||
self.batch_size = 13
|
||||
self.seq_length = 7
|
||||
self.is_training = False
|
||||
self.use_input_mask = True
|
||||
self.use_token_type_ids = False
|
||||
self.use_labels = True
|
||||
self.vocab_size = 99
|
||||
self.hidden_size = 32
|
||||
self.num_hidden_layers = 5
|
||||
self.num_attention_heads = 4
|
||||
self.intermediate_size = 37
|
||||
self.hidden_act = "gelu"
|
||||
self.hidden_dropout_prob = 0.1
|
||||
self.attention_probs_dropout_prob = 0.1
|
||||
self.max_position_embeddings = 512
|
||||
self.type_vocab_size = 16
|
||||
self.type_sequence_label_size = 2
|
||||
self.initializer_range = 0.02
|
||||
self.num_labels = 3
|
||||
self.num_choices = 4
|
||||
self.scope = None
|
||||
|
||||
def prepare_config_and_inputs(self):
|
||||
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
|
||||
|
||||
input_mask = None
|
||||
if self.use_input_mask:
|
||||
input_mask = random_attention_mask([self.batch_size, self.seq_length])
|
||||
|
||||
sequence_labels = None
|
||||
token_labels = None
|
||||
choice_labels = None
|
||||
if self.use_labels:
|
||||
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
|
||||
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
|
||||
choice_labels = ids_tensor([self.batch_size], self.num_choices)
|
||||
|
||||
config = self.get_config()
|
||||
|
||||
return config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||
|
||||
def get_config(self):
|
||||
return EsmConfig(
|
||||
vocab_size=self.vocab_size,
|
||||
hidden_size=self.hidden_size,
|
||||
pad_token_id=1,
|
||||
num_hidden_layers=self.num_hidden_layers,
|
||||
num_attention_heads=self.num_attention_heads,
|
||||
intermediate_size=self.intermediate_size,
|
||||
hidden_act=self.hidden_act,
|
||||
hidden_dropout_prob=self.hidden_dropout_prob,
|
||||
attention_probs_dropout_prob=self.attention_probs_dropout_prob,
|
||||
max_position_embeddings=self.max_position_embeddings,
|
||||
type_vocab_size=self.type_vocab_size,
|
||||
initializer_range=self.initializer_range,
|
||||
)
|
||||
|
||||
def create_and_check_model(self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels):
|
||||
model = EsmModel(config=config)
|
||||
model.to(torch_device)
|
||||
model.eval()
|
||||
result = model(input_ids, attention_mask=input_mask)
|
||||
result = model(input_ids)
|
||||
result = model(input_ids)
|
||||
|
||||
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
|
||||
self.parent.assertEqual(result.pooler_output.shape, (self.batch_size, self.hidden_size))
|
||||
|
||||
def create_and_check_for_masked_lm(
|
||||
self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||
):
|
||||
model = EsmForMaskedLM(config=config)
|
||||
model.to(torch_device)
|
||||
model.eval()
|
||||
result = model(input_ids, attention_mask=input_mask, labels=token_labels)
|
||||
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||
|
||||
def create_and_check_for_token_classification(
|
||||
self, config, input_ids, input_mask, sequence_labels, token_labels, choice_labels
|
||||
):
|
||||
config.num_labels = self.num_labels
|
||||
model = EsmForTokenClassification(config=config)
|
||||
model.to(torch_device)
|
||||
model.eval()
|
||||
result = model(input_ids, attention_mask=input_mask, labels=token_labels)
|
||||
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.num_labels))
|
||||
|
||||
def prepare_config_and_inputs_for_common(self):
|
||||
config_and_inputs = self.prepare_config_and_inputs()
|
||||
(
|
||||
config,
|
||||
input_ids,
|
||||
input_mask,
|
||||
sequence_labels,
|
||||
token_labels,
|
||||
choice_labels,
|
||||
) = config_and_inputs
|
||||
inputs_dict = {"input_ids": input_ids, "attention_mask": input_mask}
|
||||
return config, inputs_dict
|
||||
|
||||
|
||||
@require_torch
|
||||
class EsmModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):
|
||||
|
||||
test_mismatched_shapes = False
|
||||
|
||||
all_model_classes = (
|
||||
(
|
||||
EsmForMaskedLM,
|
||||
EsmModel,
|
||||
EsmForSequenceClassification,
|
||||
EsmForTokenClassification,
|
||||
)
|
||||
if is_torch_available()
|
||||
else ()
|
||||
)
|
||||
all_generative_model_classes = ()
|
||||
test_sequence_classification_problem_types = True
|
||||
|
||||
def setUp(self):
|
||||
self.model_tester = EsmModelTester(self)
|
||||
self.config_tester = ConfigTester(self, config_class=EsmConfig, hidden_size=37)
|
||||
|
||||
def test_config(self):
|
||||
self.config_tester.run_common_tests()
|
||||
|
||||
def test_model(self):
|
||||
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||
self.model_tester.create_and_check_model(*config_and_inputs)
|
||||
|
||||
def test_model_various_embeddings(self):
|
||||
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||
for type in ["absolute", "relative_key", "relative_key_query"]:
|
||||
config_and_inputs[0].position_embedding_type = type
|
||||
self.model_tester.create_and_check_model(*config_and_inputs)
|
||||
|
||||
def test_for_masked_lm(self):
|
||||
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||
self.model_tester.create_and_check_for_masked_lm(*config_and_inputs)
|
||||
|
||||
def test_for_token_classification(self):
|
||||
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||
self.model_tester.create_and_check_for_token_classification(*config_and_inputs)
|
||||
|
||||
@slow
|
||||
def test_model_from_pretrained(self):
|
||||
for model_name in ESM_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
|
||||
model = EsmModel.from_pretrained(model_name)
|
||||
self.assertIsNotNone(model)
|
||||
|
||||
def test_create_position_ids_respects_padding_index(self):
|
||||
"""Ensure that the default position ids only assign a sequential . This is a regression
|
||||
test for https://github.com/huggingface/transformers/issues/1761
|
||||
|
||||
The position ids should be masked with the embedding object's padding index. Therefore, the
|
||||
first available non-padding position index is EsmEmbeddings.padding_idx + 1
|
||||
"""
|
||||
config = self.model_tester.prepare_config_and_inputs()[0]
|
||||
model = EsmEmbeddings(config=config)
|
||||
|
||||
input_ids = torch.as_tensor([[12, 31, 13, model.padding_idx]])
|
||||
expected_positions = torch.as_tensor(
|
||||
[
|
||||
[
|
||||
0 + model.padding_idx + 1,
|
||||
1 + model.padding_idx + 1,
|
||||
2 + model.padding_idx + 1,
|
||||
model.padding_idx,
|
||||
]
|
||||
]
|
||||
)
|
||||
position_ids = create_position_ids_from_input_ids(input_ids, model.padding_idx)
|
||||
self.assertEqual(position_ids.shape, expected_positions.shape)
|
||||
self.assertTrue(torch.all(torch.eq(position_ids, expected_positions)))
|
||||
|
||||
def test_create_position_ids_from_inputs_embeds(self):
|
||||
"""Ensure that the default position ids only assign a sequential . This is a regression
|
||||
test for https://github.com/huggingface/transformers/issues/1761
|
||||
|
||||
The position ids should be masked with the embedding object's padding index. Therefore, the
|
||||
first available non-padding position index is EsmEmbeddings.padding_idx + 1
|
||||
"""
|
||||
config = self.model_tester.prepare_config_and_inputs()[0]
|
||||
embeddings = EsmEmbeddings(config=config)
|
||||
|
||||
inputs_embeds = torch.empty(2, 4, 30)
|
||||
expected_single_positions = [
|
||||
0 + embeddings.padding_idx + 1,
|
||||
1 + embeddings.padding_idx + 1,
|
||||
2 + embeddings.padding_idx + 1,
|
||||
3 + embeddings.padding_idx + 1,
|
||||
]
|
||||
expected_positions = torch.as_tensor([expected_single_positions, expected_single_positions])
|
||||
position_ids = embeddings.create_position_ids_from_inputs_embeds(inputs_embeds)
|
||||
self.assertEqual(position_ids.shape, expected_positions.shape)
|
||||
self.assertTrue(torch.all(torch.eq(position_ids, expected_positions)))
|
||||
|
||||
|
||||
@require_torch
|
||||
class EsmModelIntegrationTest(TestCasePlus):
|
||||
@slow
|
||||
def test_inference_masked_lm(self):
|
||||
model = EsmForMaskedLM.from_pretrained("Rocketknight1/esm-2-8m")
|
||||
input_ids = torch.tensor([[0, 1, 2, 3, 4, 5]])
|
||||
output = model(input_ids)[0]
|
||||
|
||||
vocab_size = 33
|
||||
|
||||
expected_shape = torch.Size((1, 6, vocab_size))
|
||||
self.assertEqual(output.shape, expected_shape)
|
||||
|
||||
expected_slice = torch.tensor(
|
||||
[[[15.0973, -6.6406, -1.1351], [-0.2209, -9.9622, 4.2109], [-1.6055, -10.0023, 1.5914]]]
|
||||
)
|
||||
self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4))
|
||||
|
||||
@slow
|
||||
def test_inference_no_head(self):
|
||||
model = EsmModel.from_pretrained("Rocketknight1/esm-2-8m")
|
||||
|
||||
input_ids = torch.tensor([[0, 6, 4, 13, 5, 4, 16, 12, 11, 7, 2]])
|
||||
output = model(input_ids)[0]
|
||||
# compare the actual values for a slice.
|
||||
expected_slice = torch.tensor(
|
||||
[[[0.1444, 0.5413, 0.3248], [0.3034, 0.0053, 0.3108], [0.3228, -0.2499, 0.3415]]]
|
||||
)
|
||||
self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4))
|
||||
|
||||
def test_lm_head_ignore_keys(self):
|
||||
from copy import deepcopy
|
||||
|
||||
keys_to_ignore_on_save_tied = [r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
|
||||
keys_to_ignore_on_save_untied = [r"lm_head.decoder.bias"]
|
||||
config = EsmConfig.from_pretrained("Rocketknight1/esm-2-8m")
|
||||
config_tied = deepcopy(config)
|
||||
config_tied.tie_word_embeddings = True
|
||||
config_untied = deepcopy(config)
|
||||
config_untied.tie_word_embeddings = False
|
||||
for cls in [EsmForMaskedLM]:
|
||||
model = cls(config_tied)
|
||||
self.assertEqual(model._keys_to_ignore_on_save, keys_to_ignore_on_save_tied, cls)
|
||||
|
||||
# the keys should be different when embeddings aren't tied
|
||||
model = cls(config_untied)
|
||||
self.assertEqual(model._keys_to_ignore_on_save, keys_to_ignore_on_save_untied, cls)
|
||||
|
||||
# test that saving works with updated ignore keys - just testing that it doesn't fail
|
||||
model.save_pretrained(self.get_auto_remove_tmp_dir())
|
91
tests/models/esm/test_tokenization_esm.py
Normal file
91
tests/models/esm/test_tokenization_esm.py
Normal file
@ -0,0 +1,91 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
from typing import List
|
||||
|
||||
from transformers.models.esm.tokenization_esm import VOCAB_FILES_NAMES, EsmTokenizer
|
||||
from transformers.testing_utils import require_tokenizers
|
||||
from transformers.tokenization_utils import PreTrainedTokenizer
|
||||
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
|
||||
@require_tokenizers
|
||||
class ESMTokenizationTest(unittest.TestCase):
|
||||
tokenizer_class = EsmTokenizer
|
||||
|
||||
def setUp(self):
|
||||
super().setUp()
|
||||
self.tmpdirname = tempfile.mkdtemp()
|
||||
# fmt: off
|
||||
vocab_tokens: List[str] = ["<cls>", "<pad>", "<eos>", "<unk>", "L", "A", "G", "V", "S", "E", "R", "T", "I", "D", "P", "K", "Q", "N", "F", "Y", "M", "H", "W", "C", "X", "B", "U", "Z", "O", ".", "-", "<null_1>", "<mask>"] # noqa: E501
|
||||
# fmt: on
|
||||
self.vocab_file = os.path.join(self.tmpdirname, VOCAB_FILES_NAMES["vocab_file"])
|
||||
with open(self.vocab_file, "w", encoding="utf-8") as vocab_writer:
|
||||
vocab_writer.write("".join([x + "\n" for x in vocab_tokens]))
|
||||
|
||||
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
|
||||
return [self.get_tokenizer(**kwargs)]
|
||||
|
||||
def get_tokenizer(self, **kwargs) -> PreTrainedTokenizer:
|
||||
return self.tokenizer_class.from_pretrained(self.tmpdirname, **kwargs)
|
||||
|
||||
def test_tokenizer_single_example(self):
|
||||
tokenizer = self.tokenizer_class(self.vocab_file)
|
||||
|
||||
tokens = tokenizer.tokenize("LAGVS")
|
||||
self.assertListEqual(tokens, ["L", "A", "G", "V", "S"])
|
||||
self.assertListEqual(tokenizer.convert_tokens_to_ids(tokens), [4, 5, 6, 7, 8])
|
||||
|
||||
def test_tokenizer_encode_single(self):
|
||||
tokenizer = self.tokenizer_class(self.vocab_file)
|
||||
|
||||
seq = "LAGVS"
|
||||
self.assertListEqual(tokenizer.encode(seq), [0, 4, 5, 6, 7, 8, 2])
|
||||
|
||||
def test_tokenizer_call_no_pad(self):
|
||||
tokenizer = self.tokenizer_class(self.vocab_file)
|
||||
|
||||
seq_batch = ["LAGVS", "WCB"]
|
||||
tokens_batch = tokenizer(seq_batch, padding=False)["input_ids"]
|
||||
|
||||
self.assertListEqual(tokens_batch, [[0, 4, 5, 6, 7, 8, 2], [0, 22, 23, 25, 2]])
|
||||
|
||||
def test_tokenizer_call_pad(self):
|
||||
tokenizer = self.tokenizer_class(self.vocab_file)
|
||||
|
||||
seq_batch = ["LAGVS", "WCB"]
|
||||
tokens_batch = tokenizer(seq_batch, padding=True)["input_ids"]
|
||||
|
||||
self.assertListEqual(tokens_batch, [[0, 4, 5, 6, 7, 8, 2], [0, 22, 23, 25, 2, 1, 1]])
|
||||
|
||||
def test_tokenize_special_tokens(self):
|
||||
"""Test `tokenize` with special tokens."""
|
||||
tokenizers = self.get_tokenizers(fast=True)
|
||||
for tokenizer in tokenizers:
|
||||
with self.subTest(f"{tokenizer.__class__.__name__}"):
|
||||
SPECIAL_TOKEN_1 = "<unk>"
|
||||
SPECIAL_TOKEN_2 = "<mask>"
|
||||
|
||||
token_1 = tokenizer.tokenize(SPECIAL_TOKEN_1)
|
||||
token_2 = tokenizer.tokenize(SPECIAL_TOKEN_2)
|
||||
|
||||
self.assertEqual(len(token_1), 1)
|
||||
self.assertEqual(len(token_2), 1)
|
||||
self.assertEqual(token_1[0], SPECIAL_TOKEN_1)
|
||||
self.assertEqual(token_2[0], SPECIAL_TOKEN_2)
|
Loading…
Reference in New Issue
Block a user