add model resources for CPMAnt (new) (#20906)

* resolve conflicts

* rebase and make style

* test

* test

* test

* rebase and make style

* rebase and make style

* tests

* tests

* rewrite some functions

* rebase and make style

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* add models and tests

* solve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* save resolution

* make style

* delete redefinition code

* reformat function

* reformat

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* make style

* fix bugs and refactor

* modify docstrings and make style

* unify import format in __init__.py

* fix import-altclp bug

* fix copies to update index.md

* fix unused config parameters

* fix unused config parameters

* fix unused config parameters

* update README_ja.md

* dummy commit for unit test

* fix attention mask

* add CPMAntTokenizer&-Fast to auto-mapping

* drop redundant changes in README_ko

* fix  defaults in docstring

* fix use_cache and some docstring

* add missing args in tokenizer

* modify tester inheritance

* add is_jieba_available

* fix some bugs

* make style and fix-copies

* add doctests

* skip integration tests

* add is_jieba_available

* fix bugs in common tests

* adjust docstrings and make style

* add argument docstring

* adjust code to some specifications

* make style and fix-copies

* add fast tokenization test

* dummy commit for unit test

* dummy commit for unit test

* dummy commit for unit test

* normalize some comments and names

* Bert->CPMAnt

* camel names and drop redundant codes

* make style and fix-coies

* add CpmTokenizerFast _import_structure

* drop cpmanttokenizerfast in model_doc

* fix some problems

* fix CPMAnt tokenization for common test

* make style and fixup

* fix copies and fixup

* fix bugs in tokenization test

* dummy commit for connection failure in unittest

* fix copies

* drop trailing comma

* fix decorator in tests

* dummy commit for connection failure in unittest

---------

Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
This commit is contained in:
pioliverse 2023-04-12 19:33:20 +08:00 committed by GitHub
parent 17503b00ea
commit 523ca4e016
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
27 changed files with 1780 additions and 1 deletions

View File

@ -310,6 +310,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.

View File

@ -298,6 +298,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.

View File

@ -270,6 +270,7 @@ conda install -c huggingface transformers
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI से) साथ वाला पेपर [A ConvNet for the 2020s](https://arxiv.org/abs /2201.03545) ज़ुआंग लियू, हेंज़ी माओ, चाओ-युआन वू, क्रिस्टोफ़ फीचटेनहोफ़र, ट्रेवर डेरेल, सैनिंग ज़ी द्वारा।
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (सिंघुआ यूनिवर्सिटी से) साथ में पेपर [सीपीएम: ए लार्ज-स्केल जेनेरेटिव चाइनीज प्री-ट्रेंड लैंग्वेज मॉडल](https : //arxiv.org/abs/2012.00413) झेंग्यान झांग, जू हान, हाओ झोउ, पेई के, युक्सियन गु, डेमिंग ये, युजिया किन, युशेंग सु, हाओझे जी, जियान गुआन, फैंचाओ क्यूई, ज़ियाओझी वांग, यानान झेंग द्वारा , गुओयांग ज़ेंग, हुआनकी काओ, शेंगकी चेन, डाइक्सुआन ली, ज़ेनबो सन, ज़ियुआन लियू, मिनली हुआंग, वेंटाओ हान, जी तांग, जुआनज़ी ली, ज़ियाओयान झू, माओसोंग सन।
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (सेल्सफोर्स से) साथ में पेपर [CTRL: ए कंडिशनल ट्रांसफॉर्मर लैंग्वेज मॉडल फॉर कंट्रोलेबल जेनरेशन](https://arxiv.org/abs/1909.05858) नीतीश शिरीष केसकर*, ब्रायन मैककैन*, लव आर. वार्ष्णेय, कैमिंग जिओंग और रिचर्ड द्वारा सोचर द्वारा जारी किया गया।
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft से) साथ में दिया गया पेपर [CvT: इंट्रोड्यूसिंग कनवॉल्यूशन टू विजन ट्रांसफॉर्मर्स](https://arxiv.org/ एब्स/2103.15808) हैपिंग वू, बिन जिओ, नोएल कोडेला, मेंगचेन लियू, जियांग दाई, लू युआन, लेई झांग द्वारा।
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (फेसबुक से) साथ में कागज [Data2Vec: भाषण, दृष्टि और भाषा में स्व-पर्यवेक्षित सीखने के लिए एक सामान्य ढांचा] (https://arxiv.org/abs/2202.03555) एलेक्सी बाएव्स्की, वेई-निंग सू, कियानटोंग जू, अरुण बाबू, जियाताओ गु, माइकल औली द्वारा पोस्ट किया गया।

View File

@ -332,6 +332,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI から) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie から公開された研究論文: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (Tsinghua University から) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun から公開された研究論文: [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413)
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (OpenBMB から) [OpenBMB](https://www.openbmb.org/) から公開されました.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (Salesforce から) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher から公開された研究論文: [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858)
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft から) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang から公開された研究論文: [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808)
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (Facebook から) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli から公開された研究論文: [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555)

View File

@ -247,6 +247,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI 에서) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 의 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 논문과 함께 발표했습니다.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (Tsinghua University 에서) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 의 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 논문과 함께 발표했습니다.
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (Salesforce 에서) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 의 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 논문과 함께 발표했습니다.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft 에서) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 의 [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) 논문과 함께 발표했습니다.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (Facebook 에서) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 의 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 논문과 함께 발표했습니다.

View File

@ -271,6 +271,7 @@ conda install -c huggingface transformers
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (来自 Facebook AI) 伴随论文 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (来自 Microsoft) 伴随论文 [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) 由 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 发布。
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。

View File

@ -283,6 +283,7 @@ conda install -c huggingface transformers
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CPM-Ant](https://huggingface.co/docs/transformers/main/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.

View File

@ -263,6 +263,8 @@
title: ConvBERT
- local: model_doc/cpm
title: CPM
- local: model_doc/cpmant
title: CPMANT
- local: model_doc/ctrl
title: CTRL
- local: model_doc/deberta

View File

@ -84,6 +84,7 @@ The documentation is organized into five sections:
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CPM-Ant](model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/).
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
@ -287,6 +288,7 @@ Flax), PyTorch, and/or TensorFlow.
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ConvNeXT | ❌ | ❌ | ✅ | ✅ | ❌ |
| ConvNeXTV2 | ❌ | ❌ | ✅ | ❌ | ❌ |
| CPM-Ant | ✅ | ❌ | ✅ | ❌ | ❌ |
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
| CvT | ❌ | ❌ | ✅ | ✅ | ❌ |
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ |

View File

@ -0,0 +1,44 @@
<!--Copyright 2022 The HuggingFace Team and The OpenBMB Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# CPMAnt
## Overview
CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. It is also the first milestone of the live training process of CPM-Live. The training process is cost-effective and environment-friendly. CPM-Ant also achieves promising results with delta tuning on the CUGE benchmark. Besides the full model, we also provide various compressed versions to meet the requirements of different hardware configurations. [See more](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live)
Tips:
This model was contributed by [OpenBMB](https://huggingface.co/openbmb). The original code can be found [here](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live).
⚙️ Training & Inference
- A tutorial on [CPM-Live](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live).
## CpmAntConfig
[[autodoc]] CpmAntConfig
- all
## CpmAntTokenizer
[[autodoc]] CpmAntTokenizer
- all
## CpmAntModel
[[autodoc]] CpmAntModel
- all
## CpmAntForCausalLM
[[autodoc]] CpmAntForCausalLM
- all

View File

@ -34,7 +34,7 @@ Choose one of the following architectures:
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
[BART](../model_doc/bart), [BERT](../model_doc/bert), [Bert Generation](../model_doc/bert-generation), [BigBird](../model_doc/big_bird), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [BioGpt](../model_doc/biogpt), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [BLOOM](../model_doc/bloom), [CamemBERT](../model_doc/camembert), [CodeGen](../model_doc/codegen), [CTRL](../model_doc/ctrl), [Data2VecText](../model_doc/data2vec-text), [ELECTRA](../model_doc/electra), [ERNIE](../model_doc/ernie), [GIT](../model_doc/git), [GPT-Sw3](../model_doc/gpt-sw3), [OpenAI GPT-2](../model_doc/gpt2), [GPTBigCode](../model_doc/gpt_bigcode), [GPT Neo](../model_doc/gpt_neo), [GPT NeoX](../model_doc/gpt_neox), [GPT NeoX Japanese](../model_doc/gpt_neox_japanese), [GPT-J](../model_doc/gptj), [LLaMA](../model_doc/llama), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MEGA](../model_doc/mega), [Megatron-BERT](../model_doc/megatron-bert), [MVP](../model_doc/mvp), [OpenAI GPT](../model_doc/openai-gpt), [OPT](../model_doc/opt), [Pegasus](../model_doc/pegasus), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [QDQBert](../model_doc/qdqbert), [Reformer](../model_doc/reformer), [RemBERT](../model_doc/rembert), [RoBERTa](../model_doc/roberta), [RoBERTa-PreLayerNorm](../model_doc/roberta-prelayernorm), [RoCBert](../model_doc/roc_bert), [RoFormer](../model_doc/roformer), [Speech2Text2](../model_doc/speech_to_text_2), [Transformer-XL](../model_doc/transfo-xl), [TrOCR](../model_doc/trocr), [XGLM](../model_doc/xglm), [XLM](../model_doc/xlm), [XLM-ProphetNet](../model_doc/xlm-prophetnet), [XLM-RoBERTa](../model_doc/xlm-roberta), [XLM-RoBERTa-XL](../model_doc/xlm-roberta-xl), [XLNet](../model_doc/xlnet), [X-MOD](../model_doc/xmod)
[BART](../model_doc/bart), [BERT](../model_doc/bert), [Bert Generation](../model_doc/bert-generation), [BigBird](../model_doc/big_bird), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [BioGpt](../model_doc/biogpt), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [BLOOM](../model_doc/bloom), [CamemBERT](../model_doc/camembert), [CodeGen](../model_doc/codegen), [CPM-Ant](../model_doc/cpmant), [CTRL](../model_doc/ctrl), [Data2VecText](../model_doc/data2vec-text), [ELECTRA](../model_doc/electra), [ERNIE](../model_doc/ernie), [GIT](../model_doc/git), [GPT-Sw3](../model_doc/gpt-sw3), [OpenAI GPT-2](../model_doc/gpt2), [GPTBigCode](../model_doc/gpt_bigcode), [GPT Neo](../model_doc/gpt_neo), [GPT NeoX](../model_doc/gpt_neox), [GPT NeoX Japanese](../model_doc/gpt_neox_japanese), [GPT-J](../model_doc/gptj), [LLaMA](../model_doc/llama), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MEGA](../model_doc/mega), [Megatron-BERT](../model_doc/megatron-bert), [MVP](../model_doc/mvp), [OpenAI GPT](../model_doc/openai-gpt), [OPT](../model_doc/opt), [Pegasus](../model_doc/pegasus), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [QDQBert](../model_doc/qdqbert), [Reformer](../model_doc/reformer), [RemBERT](../model_doc/rembert), [RoBERTa](../model_doc/roberta), [RoBERTa-PreLayerNorm](../model_doc/roberta-prelayernorm), [RoCBert](../model_doc/roc_bert), [RoFormer](../model_doc/roformer), [Speech2Text2](../model_doc/speech_to_text_2), [Transformer-XL](../model_doc/transfo-xl), [TrOCR](../model_doc/trocr), [XGLM](../model_doc/xglm), [XLM](../model_doc/xlm), [XLM-ProphetNet](../model_doc/xlm-prophetnet), [XLM-RoBERTa](../model_doc/xlm-roberta), [XLM-RoBERTa-XL](../model_doc/xlm-roberta-xl), [XLNet](../model_doc/xlnet), [X-MOD](../model_doc/xmod)
<!--End of the generated tip-->

View File

@ -243,6 +243,7 @@ _import_structure = {
"models.convnext": ["CONVNEXT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ConvNextConfig"],
"models.convnextv2": ["CONVNEXTV2_PRETRAINED_CONFIG_ARCHIVE_MAP", "ConvNextV2Config"],
"models.cpm": [],
"models.cpmant": ["CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CpmAntConfig", "CpmAntTokenizer"],
"models.ctrl": ["CTRL_PRETRAINED_CONFIG_ARCHIVE_MAP", "CTRLConfig", "CTRLTokenizer"],
"models.cvt": ["CVT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CvtConfig"],
"models.data2vec": [
@ -1325,6 +1326,14 @@ else:
"ConvNextV2PreTrainedModel",
]
)
_import_structure["models.cpmant"].extend(
[
"CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST",
"CpmAntForCausalLM",
"CpmAntModel",
"CpmAntPreTrainedModel",
]
)
_import_structure["models.ctrl"].extend(
[
"CTRL_PRETRAINED_MODEL_ARCHIVE_LIST",
@ -3941,6 +3950,7 @@ if TYPE_CHECKING:
from .models.convbert import CONVBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, ConvBertConfig, ConvBertTokenizer
from .models.convnext import CONVNEXT_PRETRAINED_CONFIG_ARCHIVE_MAP, ConvNextConfig
from .models.convnextv2 import CONVNEXTV2_PRETRAINED_CONFIG_ARCHIVE_MAP, ConvNextV2Config
from .models.cpmant import CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP, CpmAntConfig, CpmAntTokenizer
from .models.ctrl import CTRL_PRETRAINED_CONFIG_ARCHIVE_MAP, CTRLConfig, CTRLTokenizer
from .models.cvt import CVT_PRETRAINED_CONFIG_ARCHIVE_MAP, CvtConfig
from .models.data2vec import (
@ -4889,6 +4899,12 @@ if TYPE_CHECKING:
ConvNextV2Model,
ConvNextV2PreTrainedModel,
)
from .models.cpmant import (
CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST,
CpmAntForCausalLM,
CpmAntModel,
CpmAntPreTrainedModel,
)
from .models.ctrl import (
CTRL_PRETRAINED_MODEL_ARCHIVE_LIST,
CTRLForSequenceClassification,

View File

@ -50,6 +50,7 @@ from . import (
convnext,
convnextv2,
cpm,
cpmant,
ctrl,
cvt,
data2vec,

View File

@ -58,6 +58,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("convbert", "ConvBertConfig"),
("convnext", "ConvNextConfig"),
("convnextv2", "ConvNextV2Config"),
("cpmant", "CpmAntConfig"),
("ctrl", "CTRLConfig"),
("cvt", "CvtConfig"),
("data2vec-audio", "Data2VecAudioConfig"),
@ -243,6 +244,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
("convbert", "CONVBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("convnext", "CONVNEXT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("convnextv2", "CONVNEXTV2_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("cpmant", "CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("ctrl", "CTRL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("cvt", "CVT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("data2vec-audio", "DATA2VEC_AUDIO_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@ -419,6 +421,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("convnext", "ConvNeXT"),
("convnextv2", "ConvNeXTV2"),
("cpm", "CPM"),
("cpmant", "CPM-Ant"),
("ctrl", "CTRL"),
("cvt", "CvT"),
("data2vec-audio", "Data2VecAudio"),

View File

@ -57,6 +57,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
("convbert", "ConvBertModel"),
("convnext", "ConvNextModel"),
("convnextv2", "ConvNextV2Model"),
("cpmant", "CpmAntModel"),
("ctrl", "CTRLModel"),
("cvt", "CvtModel"),
("data2vec-audio", "Data2VecAudioModel"),
@ -279,6 +280,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
("camembert", "CamembertForMaskedLM"),
("codegen", "CodeGenForCausalLM"),
("convbert", "ConvBertForMaskedLM"),
("cpmant", "CpmAntForCausalLM"),
("ctrl", "CTRLLMHeadModel"),
("data2vec-text", "Data2VecTextForMaskedLM"),
("deberta", "DebertaForMaskedLM"),
@ -358,6 +360,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
("bloom", "BloomForCausalLM"),
("camembert", "CamembertForCausalLM"),
("codegen", "CodeGenForCausalLM"),
("cpmant", "CpmAntForCausalLM"),
("ctrl", "CTRLLMHeadModel"),
("data2vec-text", "Data2VecTextForCausalLM"),
("electra", "ElectraForCausalLM"),

View File

@ -127,6 +127,7 @@ else:
"CpmTokenizerFast" if is_tokenizers_available() else None,
),
),
("cpmant", ("CpmAntTokenizer", None)),
("ctrl", ("CTRLTokenizer", None)),
("data2vec-text", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)),
("deberta", ("DebertaTokenizer", "DebertaTokenizerFast" if is_tokenizers_available() else None)),

View File

@ -0,0 +1,64 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2022 The HuggingFace Team and The OpenBMB Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
# rely on isort to merge the imports
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = {
"configuration_cpmant": ["CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CpmAntConfig"],
"tokenization_cpmant": ["CpmAntTokenizer"],
}
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_cpmant"] = [
"CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST",
"CpmAntForCausalLM",
"CpmAntModel",
"CpmAntPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_cpmant import CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP, CpmAntConfig
from .tokenization_cpmant import CpmAntTokenizer
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_cpmant import (
CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST,
CpmAntForCausalLM,
CpmAntModel,
CpmAntPreTrainedModel,
)
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)

View File

@ -0,0 +1,127 @@
# coding=utf-8
# Copyright 2022 The OpenBMB Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" CPMAnt model configuration"""
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
CPMANT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/config.json"
# See all CPMAnt models at https://huggingface.co/models?filter=cpmant
}
class CpmAntConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`CpmAntModel`]. It is used to instantiate an
CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the CPMAnt
[openbmb/cpm-ant-10b](https://huggingface.co/openbmb/cpm-ant-10b) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 30720):
Vocabulary size of the CPMAnt model. Defines the number of different tokens that can be represented by the
`input` passed when calling [`CpmAntModel`].
hidden_size (`int`, *optional*, defaults to 4096):
Dimension of the encoder layers.
num_attention_heads (`int`, *optional*, defaults to 32):
Number of attention heads in the Transformer encoder.
dim_head (`int`, *optional*, defaults to 128):
Dimension of attention heads for each attention layer in the Transformer encoder.
dim_ff (`int`, *optional*, defaults to 10240):
Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
num_hidden_layers (`int`, *optional*, defaults to 48):
Number of layers of the Transformer encoder.
dropout_p (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder.
position_bias_num_buckets (`int`, *optional*, defaults to 512):
The number of position_bias buckets.
position_bias_max_distance (`int`, *optional*, defaults to 2048):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
eps (`float`, *optional*, defaults to 1e-6):
The epsilon used by the layer normalization layers.
prompt_types (`int`, *optional*, defaults to 32):
The type of prompt.
prompt_length (`int`, *optional*, defaults to 32):
The length of prompt.
segment_types (`int`, *optional*, defaults to 32):
The type of segment.
use_cache (`bool`, *optional*, defaults to `True`):
Whether to use cache.
init_std (`float`, *optional*, defaults to 1.0):
Initialize parameters with std = init_std.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
Example:
```python
>>> from transformers import CpmAntModel, CpmAntConfig
>>> # Initializing a CPMAnt cpm-ant-10b style configuration
>>> configuration = CpmAntConfig()
>>> # Initializing a model from the cpm-ant-10b style configuration
>>> model = CpmAntModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "cpmant"
def __init__(
self,
vocab_size: int = 30720,
hidden_size: int = 4096,
num_attention_heads: int = 32,
dim_head: int = 128,
dim_ff: int = 10240,
num_hidden_layers: int = 48,
dropout_p: int = 0.0,
position_bias_num_buckets: int = 512,
position_bias_max_distance: int = 2048,
eps: int = 1e-6,
init_std: float = 1.0,
prompt_types: int = 32,
prompt_length: int = 32,
segment_types: int = 32,
use_cache: bool = True,
return_dict: bool = True,
**kwargs,
):
super().__init__(**kwargs)
self.prompt_types = prompt_types
self.prompt_length = prompt_length
self.segment_types = segment_types
self.hidden_size = hidden_size
self.num_attention_heads = num_attention_heads
self.dim_head = dim_head
self.dim_ff = dim_ff
self.num_hidden_layers = num_hidden_layers
self.position_bias_num_buckets = position_bias_num_buckets
self.position_bias_max_distance = position_bias_max_distance
self.dropout_p = dropout_p
self.eps = eps
self.use_cache = use_cache
self.vocab_size = vocab_size
self.return_dict = return_dict
self.init_std = init_std

View File

@ -0,0 +1,880 @@
# coding=utf-8
# Copyright 2022 The OpenBMB Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" PyTorch CPMAnt"""
import math
from typing import List, Optional, Tuple, Union
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from torch import nn
from torch.nn import CrossEntropyLoss
from ...activations import ACT2FN
from ...modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
from ...modeling_utils import PreTrainedModel
from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
from .configuration_cpmant import CpmAntConfig
logger = logging.get_logger(__name__)
_CHECKPOINT_FOR_DOC = "openbmb/cpm-ant-10b"
_CONFIG_FOR_DOC = "CpmAntConfig"
CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST = [
"openbmb/cpm-ant-10b",
# See all CPMAnt models at https://huggingface.co/models?filter=cpmant
]
class CpmAntLayerNorm(nn.Module):
"""
We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."
"""
def __init__(self, config: CpmAntConfig):
super().__init__()
self.eps = config.eps
self.dim_norm = config.hidden_size
self.weight = nn.Parameter(torch.empty(config.hidden_size))
def forward(self, hidden_states: torch.Tensor):
"""
Args:
hidden_states (`torch.Tensor` of shape `(batch, seq_len, dim_in)`)
"""
if hidden_states.size(-1) != self.dim_norm:
raise AssertionError("hidden_states.size(-1) != self.dim_norm")
old_dtype = hidden_states.dtype
variance = hidden_states.to(torch.float32).pow(2).mean(dim=-1, keepdim=True)
hidden_states = (hidden_states * torch.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
return hidden_states
class CpmAntAttention(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.dim_model = config.hidden_size
self.num_heads = config.num_attention_heads
self.dim_head = config.dim_head
self.project_q = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
self.project_k = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
self.project_v = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
self.attention_out = nn.Linear(self.num_heads * self.dim_head, self.dim_model, bias=False)
self.softmax = torch.nn.Softmax(dim=-1)
if config.dropout_p is not None:
self.dropout = torch.nn.Dropout(p=config.dropout_p)
else:
self.dropout = None
def forward(
self,
hidden_q: torch.Tensor,
hidden_kv: torch.Tensor,
attention_mask: torch.BoolTensor,
position_bias: torch.Tensor,
output_attentions: Optional[bool] = False,
past_key_values: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
use_cache: Optional[bool] = None,
):
"""
Args:
hidden_q (`torch.Tensor`):
Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
hidden_kv (`torch.Tensor` of shape `(batch, len_k, dim_model)`)):
Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
attention_mask (`torch.Tensor` of shape `(batch, len_seq, len_seq)`):
Avoid invalid areas to participate in the calculation of self-attention.
position_bias (`torch.Tensor` of shape `(batch, len_seq, len_seq)`):
Provide positional information to self-attention block.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
past_key_values (`Tuple[torch.Tensor, torch.Tensor]`, *optional*):
Cached past key and value projection states.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
(see `past_key_values`).
"""
batch_size = hidden_q.size(0)
len_q = hidden_q.size(1)
len_k = hidden_kv.size(1)
query = self.project_q(hidden_q)
key = self.project_k(hidden_kv)
value = self.project_v(hidden_kv)
query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
if past_key_values is not None:
key = torch.cat([past_key_values[0], key], dim=-2)
value = torch.cat([past_key_values[1], value], dim=-2)
len_k = key.size(-2)
# (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
score = torch.matmul(query, key.transpose(-1, -2)) / math.sqrt(self.dim_head)
score = score + position_bias
score = torch.masked_fill(
score,
attention_mask.view(batch_size, 1, len_q, len_k) == torch.tensor(False),
torch.scalar_tensor(float("-inf"), device=score.device, dtype=score.dtype),
)
score = self.softmax(score)
score = torch.masked_fill(
score,
attention_mask.view(batch_size, 1, len_q, len_k) == torch.tensor(False),
torch.scalar_tensor(0, device=score.device, dtype=score.dtype),
)
if output_attentions:
attn_weights = score
else:
attn_weights = None
if self.dropout is not None:
score = self.dropout(score)
# (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
score = torch.matmul(score, value)
score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
score = score.contiguous().view(batch_size, len_q, self.num_heads * self.dim_head)
score = self.attention_out(score)
past_key_values = None
if use_cache:
past_key_values = (key, value)
return score, attn_weights, past_key_values
class CpmAntSelfAttentionBlock(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.layernorm_before_attention = CpmAntLayerNorm(config)
self.self_attention = CpmAntAttention(config)
if config.dropout_p:
self.dropout = torch.nn.Dropout(config.dropout_p)
else:
self.dropout = None
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
position_bias: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = False,
past_key_values: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
use_cache: Optional[bool] = None,
):
"""
Args:
hidden_states (`torch.Tensor` of shape `(batch, len_seq, dim_model)`):
Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
attention_mask (`torch.Tensor` of shape `(batch, len_seq, len_seq)`):
Avoid invalid areas to participate in the calculation of self-attention.
position_bias (`torch.Tensor` of shape `(batch, len_seq, len_seq)`):
Provide positional information to self-attention block.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
past_key_values (`Tuple(torch.FloatTensor)`, *optional*):
Cached past key and value projection states.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
(see `past_key_values`).
"""
outputs = self.layernorm_before_attention(hidden_states)
outputs = self.self_attention(
outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
)
outputs, attn_weights, current_key_value = outputs
if self.dropout is not None:
outputs = self.dropout(outputs)
hidden_states = hidden_states + outputs
return hidden_states, attn_weights, current_key_value
class CpmAntDenseGatedACT(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.w_0 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
self.w_1 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
self.act = torch.nn.GELU()
def forward(self, hidden_states: torch.Tensor):
"""Transform an input tensor from one feature space to another via a nonlinear operation
Args:
hidden_states (`torch.Tensor` of shape `(batch, seq_len, dim_in)`)
"""
gate_score = self.act(self.w_0(hidden_states))
hidden_states = self.w_1(hidden_states)
hidden_states = gate_score * hidden_states
return hidden_states
class CpmAntFeedForward(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.w_in = CpmAntDenseGatedACT(config)
if config.dropout_p is not None:
self.dropout = torch.nn.Dropout(config.dropout_p)
else:
self.dropout = None
self.w_out = nn.Linear(config.dim_ff, config.hidden_size, bias=False)
def forward(self, hidden_states: torch.Tensor):
"""
Args:
hidden_states (`torch.Tensor` of shape `(batch, seq_len, dim_in)`)
"""
hidden_states = self.w_in(hidden_states)
if self.dropout is not None:
hidden_states = self.dropout(hidden_states)
hidden_states = self.w_out(hidden_states)
return hidden_states
class CpmAntFFNBlock(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.layernorm_before_ffn = CpmAntLayerNorm(config)
self.ffn = CpmAntFeedForward(config)
if config.dropout_p:
self.dropout = torch.nn.Dropout(config.dropout_p)
else:
self.dropout = None
def forward(
self,
hidden_states: torch.Tensor,
):
"""
Args:
hidden_states (`torch.Tensor` of shape `(batch, len_seq, dim_model)`):
Hidden states before feed forward layer.
"""
ln_outputs = self.layernorm_before_ffn(hidden_states)
outputs = self.ffn(ln_outputs)
if self.dropout is not None:
outputs = self.dropout(outputs)
hidden_states = hidden_states + outputs
return hidden_states
class CpmAntTransformerBlock(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.self_att = CpmAntSelfAttentionBlock(config)
self.ffn = CpmAntFFNBlock(config)
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
position_bias: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = False,
past_key_values: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
use_cache: Optional[bool] = None,
):
"""
Args:
hidden_states (`torch.Tensor`):
Input to the layer of shape `(batch, seq_len, dim_model)`
attention_mask (`torch.Tensor`):
Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
position_bias (`torch.Tensor`):
Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
past_key_values (`Tuple[torch.Tensor, torch.Tensor])`, *optional*):
Cached past key and value projection states
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
(see `past_key_values`).
"""
hidden_states = self.self_att(
hidden_states,
attention_mask=attention_mask,
position_bias=position_bias,
output_attentions=output_attentions,
past_key_values=past_key_values,
use_cache=use_cache,
)
hidden_states, attn_weights, current_key_value = hidden_states
hidden_states = self.ffn(hidden_states)
return hidden_states, attn_weights, current_key_value
class CpmAntEncoder(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.num_layers = config.num_hidden_layers
self.layers = nn.ModuleList([CpmAntTransformerBlock(config) for ith in range(self.num_layers)])
self.output_layernorm = CpmAntLayerNorm(config)
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
position_bias: torch.Tensor,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
past_key_values: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
use_cache: Optional[bool] = None,
):
"""
Args:
hidden_states (`torch.Tensor`):
Input to the layer of shape `(batch, seq_len, dim_model)`
attention_mask (`torch.Tensor`):
Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
position_bias (`torch.Tensor`):
Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers.
past_key_values (`Tuple[torch.Tensor, torch.Tensor])`, *optional*):
Cached past key and value projection states
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
(see `past_key_values`).
"""
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
current_key_values = [] if use_cache else None
for i, layer in enumerate(self.layers):
if output_hidden_states:
all_hidden_states += (hidden_states,)
layer_outputs = layer(
hidden_states,
attention_mask,
position_bias,
output_attentions=output_attentions,
past_key_values=past_key_values[i] if past_key_values else None,
use_cache=use_cache,
)
hidden_states, attn_weights, current_key_value = layer_outputs
if output_attentions:
all_self_attns += (attn_weights,)
if current_key_value is not None:
current_key_values.append(current_key_value)
hidden_states = self.output_layernorm(hidden_states)
if output_hidden_states:
all_hidden_states += (hidden_states,)
return hidden_states, current_key_values, all_hidden_states, all_self_attns
# Copied from transformers.models.bert.modeling_bert.BertIntermediate with Bert->CPMAnt
class CpmAntIntermediate(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
if isinstance(config.hidden_act, str):
self.intermediate_act_fn = ACT2FN[config.hidden_act]
else:
self.intermediate_act_fn = config.hidden_act
def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
hidden_states = self.dense(hidden_states)
hidden_states = self.intermediate_act_fn(hidden_states)
return hidden_states
class CpmAntSegmentPositionEmbedding(nn.Module):
def __init__(self, config: CpmAntConfig):
super().__init__()
self.num_heads = config.num_attention_heads
self.num_buckets = config.position_bias_num_buckets
self.max_distance = config.position_bias_max_distance
self.num_segments = config.segment_types
self.relative_attention_bias = nn.Parameter(
torch.empty(
config.segment_types * config.segment_types + config.position_bias_num_buckets,
config.num_attention_heads,
)
)
def forward(
self,
key_pos: torch.Tensor,
query_pos: torch.Tensor,
key_segment: torch.Tensor,
query_segment: torch.Tensor,
):
with torch.no_grad():
batch = key_pos.size(0)
keylen = key_pos.size(1)
querylen = query_pos.size(1)
if key_pos.size(0) != query_pos.size(0):
raise AssertionError(
f"key_pos.size(0) should be equal to query_pos.size(0), but got {key_pos.size(0)} and {query_pos.size(0)}!"
)
if keylen != key_segment.size(1) or querylen != query_segment.size(1):
raise AssertionError(
f"keylen should be equal to key_segment.size(1), but got {keylen} and {key_segment.size(1)}!"
)
if querylen != query_segment.size(1):
raise AssertionError(
f"querylen should be equal to query_segment.size(1), but got {querylen} and {query_segment.szie(1)}!"
)
key_pos = key_pos.view(batch, -1, keylen)
query_pos = query_pos.view(batch, querylen, -1)
key_segment = key_segment.view(batch, -1, keylen)
query_segment = query_segment.view(batch, querylen, -1)
relative_position_bucket = self._segment_relative_position_bucket(query_segment, key_segment)
relative_position_bucket = relative_position_bucket + self.num_buckets
# (batch, len_q, len_k)
absolute_position_bucket = self._position_bucket(
torch.arange(keylen, dtype=torch.int32, device=relative_position_bucket.device)[None, :]
- torch.arange(querylen, dtype=torch.int32, device=relative_position_bucket.device)[:, None],
num_buckets=self.num_buckets,
max_distance=self.max_distance,
)
relative_position_bucket = torch.where(
(key_segment == query_segment),
absolute_position_bucket[None, :, :],
relative_position_bucket,
)
# (batch, len_q, len_k, num_heads)
embeds = F.embedding(relative_position_bucket, self.relative_attention_bias)
# (batch, num_heads, len_q, len_k)
embeds = embeds.permute(0, 3, 1, 2).contiguous()
return embeds
def _segment_relative_position_bucket(self, query_segment, key_segment):
return query_segment * self.num_segments + key_segment
def _position_bucket(self, relative_position, num_buckets=32, max_distance=128):
relative_buckets = 0
# always bidirectional in CPMAnt
num_buckets //= 2
relative_buckets = (relative_position > 0).to(torch.int32) * num_buckets
relative_position = torch.abs(relative_position)
max_exact = num_buckets // 2
is_small = relative_position < max_exact
relative_postion_if_large = max_exact + (
torch.log(relative_position.float() / max_exact)
/ math.log(max_distance / max_exact)
* (num_buckets - max_exact)
).to(torch.int32)
relative_postion_if_large = torch.min(
relative_postion_if_large,
torch.full_like(relative_postion_if_large, num_buckets - 1),
)
relative_buckets += torch.where(is_small, relative_position.to(torch.int32), relative_postion_if_large)
return relative_buckets
# Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->CPMAnt
class CpmAntOutput(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def forward(self, hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor:
hidden_states = self.dense(hidden_states)
hidden_states = self.dropout(hidden_states)
hidden_states = self.LayerNorm(hidden_states + input_tensor)
return hidden_states
class CpmAntPreTrainedModel(PreTrainedModel):
"""
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
models.
"""
config_class = CpmAntConfig
base_model_prefix = "cpmant"
supports_gradient_checkpointing = True
_keys_to_ignore_on_load_missing = [r"position_ids"]
def _init_weights(self, module):
"""Initialize the weights"""
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=0.0, std=self.config.init_std)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.Embedding):
module.weight.data.normal_(mean=0.0, std=self.config.init_std)
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
elif isinstance(module, nn.LayerNorm):
module.bias.data.zero_()
module.weight.data.fill_(1.0)
elif isinstance(module, CpmAntLayerNorm):
module.weight.data.fill_(1.0)
elif isinstance(module, CpmAntSegmentPositionEmbedding):
module.relative_attention_bias.data.normal_(mean=0.0, std=self.config.init_std)
def _set_gradient_checkpointing(self, module, value=False):
if isinstance(module, CpmAntEncoder):
module.gradient_checkpointing = value
CPMANT_START_DOCSTRING = r"""
This model is a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) sub-class. Use
it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and
behavior.
Parameters
config ([`~CpmAntConfig`]): Model configuration class with all the parameters of the
Initializing with a config file does not load the weights associated with the model, only the
configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
CPMANT_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.Tensor` of shape `(batch_size, seq_len)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""
@add_start_docstrings(
"The bare CPMAnt Model outputting raw hidden-states without any specific head on top.",
CPMANT_START_DOCSTRING,
)
class CpmAntModel(CpmAntPreTrainedModel):
def __init__(self, config: CpmAntConfig):
super().__init__(config)
self.encoder = CpmAntEncoder(config)
self.segment_embedding = nn.Embedding(config.segment_types, config.hidden_size)
self.input_embedding = nn.Embedding(
config.vocab_size + config.prompt_types * config.prompt_length, config.hidden_size
)
self.position_bias = CpmAntSegmentPositionEmbedding(config)
self.prompt_length = config.prompt_length
self.vocab_size = config.vocab_size
self.post_init()
def get_input_embeddings(self):
return self.input_embedding
def set_input_embeddings(self, embeddings, **kwargs):
self.input_embedding = embeddings
def _prepare_attention_mask(self, input_ids, span, context, length):
batch = input_ids.size(0)
seqlen = input_ids.size(1)
device = input_ids.device
directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(seqlen, device=device).view(-1, 1)
attention_mask = context[:, None, :] | (
context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
)
attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
# mask for left padding
mask_1d = (
torch.tensor(list(range(seqlen - self.prompt_length))[::-1], device=device)[None, :].repeat(batch, 1)
< length[:, None]
)
mask_1d = torch.cat((torch.ones(batch, self.prompt_length, device=device).bool(), mask_1d), dim=1)
attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
return attention_mask
@add_start_docstrings_to_model_forward(CPMANT_INPUTS_DOCSTRING)
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_DOC,
output_type=BaseModelOutputWithPast,
config_class=_CONFIG_FOR_DOC,
)
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
past_key_values: Optional[Tuple[Tuple[torch.Tensor]]] = None,
use_cache: Optional[bool] = None,
return_dict: Optional[bool] = None,
**kwargs,
):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.return_dict
use_cache = use_cache if use_cache is not None else self.config.use_cache
# add prompts ahead
if input_ids.dtype != torch.int32:
input_ids = input_ids.to(torch.int32)
dtype, device = input_ids.dtype, input_ids.device
segment = torch.where(input_ids != 0, 2, 0).to(dtype=dtype, device=device)
length = (segment != 0).sum(-1).to(dtype=dtype, device=device)
input_ids = torch.cat(
(
torch.arange(
self.prompt_length * 2 + self.vocab_size,
self.prompt_length * 3 + self.vocab_size,
dtype=dtype,
device=device,
).repeat(input_ids.size(0), 1),
input_ids,
),
dim=1,
)
batch, seq_length = input_ids.size()
segment = torch.cat((torch.zeros(batch, self.prompt_length, dtype=dtype, device=device), segment), dim=1)
context = torch.full((batch, seq_length), 1, dtype=dtype, device=device)
position = torch.arange(seq_length, dtype=dtype, device=device).repeat(batch, 1)
span = torch.full((batch, seq_length), 0, dtype=dtype, device=device)
if past_key_values is None:
past_length = 0
past_key_values = tuple([None] * self.encoder.num_layers)
input_ids = input_ids.contiguous()
hidden_states = self.input_embedding(input_ids)
segment_states = self.segment_embedding(segment)
hidden_states = hidden_states + segment_states
else:
past_length = past_key_values[0][0].size(-2)
segment_states = self.segment_embedding(segment)
hidden_states = self.input_embedding(input_ids) + segment_states[:, -1:, :]
attention_mask = self._prepare_attention_mask(input_ids, span, context, length)
position_bias = self.position_bias(position, position, segment, segment)
attention_mask = attention_mask[:, past_length:, :]
position_bias = position_bias[:, :, past_length:, :]
hidden_states = hidden_states[:, past_length:, :]
hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
hidden_states,
attention_mask,
position_bias,
output_attentions,
output_hidden_states,
past_key_values,
use_cache,
)
if past_length == 0:
hidden_states = hidden_states[:, self.prompt_length :, :]
# drop the prompt
if all_attentions is not None:
new_attentions = ()
for attention in all_attentions:
new_attentions += (attention[:, :, self.prompt_length :, self.prompt_length :],)
all_attentions = new_attentions
if all_hidden_states is not None:
new_hidden_states = ()
for hidden_state in all_hidden_states:
new_hidden_states += (hidden_state[:, self.prompt_length :, :],)
all_hidden_states = new_hidden_states
if not return_dict:
return tuple(
v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
)
return BaseModelOutputWithPast(
last_hidden_state=hidden_states,
past_key_values=present_key_values,
hidden_states=all_hidden_states,
attentions=all_attentions,
)
@add_start_docstrings(
"""
The CPMAnt Model with a language modeling head on top (linear layer with weights tied to the input embeddings).
""",
CPMANT_START_DOCSTRING,
)
class CpmAntForCausalLM(CpmAntPreTrainedModel):
_keys_to_ignore_on_load_missing = [r"lm_head.weight"]
def __init__(self, config: CpmAntConfig):
super().__init__(config)
self.cpmant = CpmAntModel(config)
# lm_head.weight is tied to cpmant.input_embedding.weight
self.lm_head = nn.Linear(
config.hidden_size, config.vocab_size + config.prompt_types * config.prompt_length, bias=False
)
self.post_init()
@add_start_docstrings_to_model_forward(CPMANT_INPUTS_DOCSTRING)
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_DOC,
output_type=CausalLMOutputWithPast,
config_class=_CONFIG_FOR_DOC,
)
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
past_key_values: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
labels: Optional[torch.Tensor] = None,
return_dict: Optional[bool] = None,
attention_mask: Optional[torch.Tensor] = None, # dummy parameter for text-generation pipeline
**kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
r"""
Args:
input_ids (`torch.Tensor` of shape `(batch_size, seq_len)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
(see `past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers.
labels (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
Labels for computing the masked language modeling loss.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
CPMAnt will process attention mask automatically, this parameter is a dummy parameter for
text-generation pipeline.
Example:
Text Generation with CpmAntForCausalLM.
```python
>>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
>>> texts = "今天天气不错,"
>>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
>>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
>>> input_ids = tokenizer(texts, return_tensors="pt")
>>> outputs = model.generate(**input_ids)
>>> output_texts = tokenizer.batch_decode(outputs)
>>> print(output_texts)
['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']
```
"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
model_output = self.cpmant(
input_ids, output_attentions, output_hidden_states, past_key_values, use_cache, return_dict
)
hidden_states = model_output.last_hidden_state if return_dict else model_output[0]
logits = self.lm_head(hidden_states)
loss = None
if labels is not None:
loss_func = CrossEntropyLoss()
loss = loss_func(logits.view(-1, logits.size(-1)), labels.view(-1))
if not return_dict:
output = (logits,) + model_output[1:]
return ((loss,) + output) if loss is not None else output
return CausalLMOutputWithPast(
loss=loss,
logits=logits,
past_key_values=model_output.past_key_values,
hidden_states=model_output.hidden_states,
attentions=model_output.attentions,
)
def get_input_embeddings(self):
return self.cpmant.input_embedding
def set_input_embeddings(self, embeddings):
self.cpmant.input_embedding = embeddings
def get_output_embeddings(self):
return self.lm_head
def set_output_embeddings(self, new_embeddings):
self.lm_head = new_embeddings
def prepare_inputs_for_generation(self, input_ids, **kwargs):
input_ids = input_ids.int()
# save the memory usage of dummy attention mask
if "attention_mask" in kwargs:
kwargs["attention_mask"] = torch.zeros(1, 1)
return {
"input_ids": input_ids,
"use_cache": kwargs["use_cache"],
"past_key_values": kwargs.get("past_key_values", None),
}
def _reorder_cache(self, past_key_values, beam_idx):
past_key_values = [list(each) if each is not None else each for each in past_key_values]
for key_value_layer in past_key_values:
key_value_layer[0] = key_value_layer[0][beam_idx]
key_value_layer[1] = key_value_layer[1][beam_idx]
return past_key_values

View File

@ -0,0 +1,277 @@
# coding=utf-8
# Copyright 2022 The OpenBMB Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes for CPMAnt."""
import collections
import os
from typing import List, Optional, Tuple
from transformers.utils import is_jieba_available, requires_backends
if is_jieba_available():
import jieba
from ...tokenization_utils import PreTrainedTokenizer
from ...utils import logging
logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
"openbmb/cpm-ant-10b": "https://huggingface.co/openbmb/cpm-ant-10b/blob/main/vocab.txt",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
"openbmb/cpm-ant-10b": 1024,
}
def load_vocab(vocab_file):
"""Loads a vocabulary file into a dictionary."""
vocab = collections.OrderedDict()
with open(vocab_file, "r", encoding="utf-8") as reader:
tokens = reader.readlines()
for index, token in enumerate(tokens):
token = token.rstrip("\n")
vocab[token] = index
return vocab
class WordpieceTokenizer(object):
def __init__(self, vocab, unk_token="<unk>", max_input_chars_per_word=200):
self.vocab = vocab
self.unk_token = unk_token
self.max_input_chars_per_word = max_input_chars_per_word
def tokenize(self, token):
chars = list(token)
if len(chars) > self.max_input_chars_per_word:
return [self.unk_token]
start = 0
sub_tokens = []
while start < len(chars):
end = len(chars)
cur_substr = None
while start < end:
substr = "".join(chars[start:end])
if substr in self.vocab:
cur_substr = substr
break
end -= 1
if cur_substr is None:
sub_tokens.append(self.unk_token)
start += 1
else:
sub_tokens.append(cur_substr)
start = end
return sub_tokens
class CpmAntTokenizer(PreTrainedTokenizer):
"""
Construct a CPMAnt tokenizer. Based on byte-level Byte-Pair-Encoding.
Args:
vocab_file (`str`):
Path to the vocabulary file.
bod_token (`str`, *optional*, defaults to `"<d>"`):
The beginning of document token.
eod_token (`str`, *optional*, defaults to `"</d>"`):
The end of document token.
bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token.
pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token.
line_token (`str`, *optional*, defaults to `"</n>"`):
The line token.
space_token (`str`, *optional*, defaults to `"</_>"`):
The space token.
"""
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
add_prefix_space = False
def __init__(
self,
vocab_file,
bod_token="<d>",
eod_token="</d>",
bos_token="<s>",
eos_token="</s>",
pad_token="<pad>",
unk_token="<unk>",
line_token="</n>",
space_token="</_>",
padding_side="left",
**kwargs,
):
requires_backends(self, ["jieba"])
super().__init__(
bod_token=bod_token,
eod_token=eod_token,
bos_token=bos_token,
eos_token=eos_token,
pad_token=pad_token,
unk_token=unk_token,
line_token=line_token,
space_token=space_token,
padding_side=padding_side,
**kwargs,
)
self.bod_token = bod_token
self.eod_token = eod_token
self.encoder = load_vocab(vocab_file)
self.encoder[" "] = self.encoder[space_token]
self.encoder["\n"] = self.encoder[line_token]
del self.encoder[space_token]
del self.encoder[line_token]
self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
self.decoder = {v: k for k, v in self.encoder.items()}
self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.encoder, unk_token=self.unk_token)
@property
def bod_token_id(self):
return self.encoder[self.bod_token]
@property
def eod_token_id(self):
return self.encoder[self.eod_token]
@property
def newline_id(self):
return self.encoder["\n"]
@property
def vocab_size(self) -> int:
return len(self.encoder)
def get_vocab(self):
return dict(self.encoder, **self.added_tokens_encoder)
def _tokenize(self, text):
"""Tokenize a string."""
output_tokens = []
for x in jieba.cut(text, cut_all=False):
output_tokens.extend(self.wordpiece_tokenizer.tokenize(x))
return output_tokens
def _decode(self, token_ids, **kwargs):
"""Decode ids into a string."""
token_ids = [i for i in token_ids if i >= 0]
token_ids = [
x for x in token_ids if x != self.pad_token_id and x != self.eos_token_id and x != self.bos_token_id
]
return super()._decode(token_ids, **kwargs)
def check(self, token):
return token in self.encoder
def convert_tokens_to_string(self, tokens: List[str]) -> str:
return "".join(tokens)
def _convert_token_to_id(self, token):
"""Converts a token (str) in an id using the vocab."""
return self.encoder.get(token, self.encoder.get(self.unk_token))
def _convert_id_to_token(self, index):
"""Converts an index (integer) in a token (str) using the vocab."""
return self.decoder.get(index, self.unk_token)
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
if os.path.isdir(save_directory):
vocab_file = os.path.join(
save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
)
else:
vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
index = 0
if " " in self.encoder:
self.encoder["</_>"] = self.encoder[" "]
del self.encoder[" "]
if "\n" in self.encoder:
self.encoder["</n>"] = self.encoder["\n"]
del self.encoder["\n"]
self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
with open(vocab_file, "w", encoding="utf-8") as writer:
for token, token_index in self.encoder.items():
if index != token_index:
logger.warning(
f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
" Please check that the vocabulary is not corrupted!"
)
index = token_index
writer.write(token + "\n")
index += 1
return (vocab_file,)
def build_inputs_with_special_tokens(self, token_ids_0: List[int], token_ids_1: List[int] = None) -> List[int]:
"""
Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
adding special tokens. A CPMAnt sequence has the following format:
- single sequence: `[BOS] Sequence`.
Args:
token_ids_0 (`List[int]`): The first tokenized sequence that special tokens will be added.
token_ids_1 (`List[int]`): The optional second tokenized sequence that special tokens will be added.
Returns:
`List[int]`: The model input with special tokens.
"""
if token_ids_1 is None:
return [self.bos_token_id] + token_ids_0
return [self.bos_token_id] + token_ids_0 + [self.bos_token_id] + token_ids_1
def get_special_tokens_mask(
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]:
"""
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer `prepare_for_model` method.
Args:
token_ids_0 (`List[int]`): List of IDs.
token_ids_1 (`List[int]`, *optional*): Optional second list of IDs for sequence pairs.
already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""
if already_has_special_tokens:
return super().get_special_tokens_mask(
token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
)
if token_ids_1 is not None:
return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1))
return [1] + ([0] * len(token_ids_0))

View File

@ -58,6 +58,7 @@ from .utils import (
is_flax_available,
is_ftfy_available,
is_ipex_available,
is_jieba_available,
is_jumanpp_available,
is_keras_nlp_available,
is_librosa_available,
@ -277,6 +278,13 @@ def require_rjieba(test_case):
return unittest.skipUnless(is_rjieba_available(), "test requires rjieba")(test_case)
def require_jieba(test_case):
"""
Decorator marking a test that requires jieba. These tests are skipped when jieba isn't installed.
"""
return unittest.skipUnless(is_jieba_available(), "test requires jieba")(test_case)
def require_tf2onnx(test_case):
return unittest.skipUnless(is_tf2onnx_available(), "test requires tf2onnx")(test_case)

View File

@ -110,6 +110,7 @@ from .import_utils import (
is_ftfy_available,
is_in_notebook,
is_ipex_available,
is_jieba_available,
is_jumanpp_available,
is_kenlm_available,
is_keras_nlp_available,

View File

@ -1841,6 +1841,30 @@ class ConvNextV2PreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["torch"])
CPMANT_PRETRAINED_MODEL_ARCHIVE_LIST = None
class CpmAntForCausalLM(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class CpmAntModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class CpmAntPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
CTRL_PRETRAINED_MODEL_ARCHIVE_LIST = None

View File

@ -274,6 +274,13 @@ try:
except importlib_metadata.PackageNotFoundError:
_decord_availale = False
_jieba_available = importlib.util.find_spec("jieba") is not None
try:
_jieba_version = importlib_metadata.version("jieba")
logger.debug(f"Successfully imported jieba version {_jieba_version}")
except importlib_metadata.PackageNotFoundError:
_jieba_available = False
# This is the version of torch required to run torch.fx features and torch.onnx with dictionary inputs.
TORCH_FX_REQUIRED_VERSION = version.parse("1.10")
@ -740,6 +747,10 @@ def is_cython_available():
return importlib.util.find_spec("pyximport") is not None
def is_jieba_available():
return _jieba_available
# docstyle-ignore
DATASETS_IMPORT_ERROR = """
{0} requires the 🤗 Datasets library but it was not found in your environment. You can install it with:
@ -997,6 +1008,11 @@ CYTHON_IMPORT_ERROR = """
Cython`. Please note that you may need to restart your runtime after installation.
"""
JIEBA_IMPORT_ERROR = """
{0} requires the jieba library but it was not found in your environment. You can install it with pip: `pip install
jieba`. Please note that you may need to restart your runtime after installation.
"""
BACKENDS_MAPPING = OrderedDict(
[
("bs4", (is_bs4_available, BS4_IMPORT_ERROR)),
@ -1029,6 +1045,7 @@ BACKENDS_MAPPING = OrderedDict(
("oneccl_bind_pt", (is_ccl_available, CCL_IMPORT_ERROR)),
("decord", (is_decord_available, DECORD_IMPORT_ERROR)),
("cython", (is_cython_available, CYTHON_IMPORT_ERROR)),
("jieba", (is_jieba_available, JIEBA_IMPORT_ERROR)),
]
)

View File

View File

@ -0,0 +1,233 @@
# coding=utf-8
# Copyright 2022 The OpenBMB Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Testing suite for the PyTorch CPMAnt model. """
import unittest
from transformers.testing_utils import is_torch_available, require_torch, tooslow
from ...generation.test_utils import torch_device
from ...test_configuration_common import ConfigTester
from ...test_modeling_common import ModelTesterMixin, ids_tensor
if is_torch_available():
import torch
from transformers import (
CpmAntConfig,
CpmAntForCausalLM,
CpmAntModel,
CpmAntTokenizer,
)
@require_torch
class CpmAntModelTester:
def __init__(
self,
parent,
batch_size=2,
seq_length=8,
is_training=True,
use_token_type_ids=False,
use_input_mask=False,
use_labels=False,
use_mc_token_ids=False,
vocab_size=99,
hidden_size=32,
num_hidden_layers=3,
num_attention_heads=4,
intermediate_size=37,
num_buckets=32,
max_distance=128,
prompt_length=8,
prompt_types=8,
segment_types=8,
init_std=1.0,
return_dict=True,
):
self.parent = parent
self.batch_size = batch_size
self.seq_length = seq_length
self.is_training = is_training
self.use_token_type_ids = use_token_type_ids
self.use_input_mask = use_input_mask
self.use_labels = use_labels
self.use_mc_token_ids = use_mc_token_ids
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.intermediate_size = intermediate_size
self.num_buckets = num_buckets
self.max_distance = max_distance
self.prompt_length = prompt_length
self.prompt_types = prompt_types
self.segment_types = segment_types
self.init_std = init_std
self.return_dict = return_dict
def prepare_config_and_inputs(self):
input_ids = {}
input_ids["input_ids"] = ids_tensor([self.batch_size, self.seq_length], self.vocab_size).type(torch.int32)
input_ids["use_cache"] = False
config = self.get_config()
return (config, input_ids)
def get_config(self):
return CpmAntConfig(
vocab_size=self.vocab_size,
hidden_size=self.hidden_size,
num_hidden_layers=self.num_hidden_layers,
num_attention_heads=self.num_attention_heads,
dim_ff=self.intermediate_size,
position_bias_num_buckets=self.num_buckets,
position_bias_max_distance=self.max_distance,
prompt_types=self.prompt_types,
prompt_length=self.prompt_length,
segment_types=self.segment_types,
use_cache=True,
init_std=self.init_std,
return_dict=self.return_dict,
)
def create_and_check_cpmant_model(self, config, input_ids, *args):
model = CpmAntModel(config=config)
model.to(torch_device)
model.eval()
hidden_states = model(**input_ids).last_hidden_state
self.parent.assertEqual(hidden_states.shape, (self.batch_size, self.seq_length, config.hidden_size))
def create_and_check_lm_head_model(self, config, input_ids, *args):
model = CpmAntForCausalLM(config)
model.to(torch_device)
input_ids["input_ids"] = input_ids["input_ids"].to(torch_device)
model.eval()
model_output = model(**input_ids)
self.parent.assertEqual(
model_output.logits.shape,
(self.batch_size, self.seq_length, config.vocab_size + config.prompt_types * config.prompt_length),
)
def prepare_config_and_inputs_for_common(self):
config, inputs_dict = self.prepare_config_and_inputs()
return config, inputs_dict
@require_torch
class CpmAntModelTest(ModelTesterMixin, unittest.TestCase):
all_model_classes = (CpmAntModel, CpmAntForCausalLM) if is_torch_available() else ()
test_pruning = False
test_missing_keys = False
test_mismatched_shapes = False
test_head_masking = False
test_resize_embeddings = False
def setUp(self):
self.model_tester = CpmAntModelTester(self)
self.config_tester = ConfigTester(self, config_class=CpmAntConfig)
def test_config(self):
self.config_tester.create_and_test_config_common_properties()
self.config_tester.create_and_test_config_to_json_string()
self.config_tester.create_and_test_config_to_json_file()
self.config_tester.create_and_test_config_from_and_save_pretrained()
self.config_tester.check_config_can_be_init_without_params()
self.config_tester.check_config_arguments_init()
def test_inputs_embeds(self):
unittest.skip("CPMAnt doesn't support input_embeds.")(self.test_inputs_embeds)
def test_retain_grad_hidden_states_attentions(self):
unittest.skip(
"CPMAnt doesn't support retain grad in hidden_states or attentions, because prompt management will peel off the output.hidden_states from graph.\
So is attentions. We strongly recommand you use loss to tune model."
)(self.test_retain_grad_hidden_states_attentions)
def test_cpmant_model(self):
config, inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_cpmant_model(config, inputs)
def test_cpmant_lm_head_model(self):
config, inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_lm_head_model(config, inputs)
@require_torch
class CpmAntModelIntegrationTest(unittest.TestCase):
@tooslow
def test_inference_masked_lm(self):
texts = "今天天气真好!"
model_path = "openbmb/cpm-ant-10b"
model = CpmAntModel.from_pretrained(model_path)
tokenizer = CpmAntTokenizer.from_pretrained(model_path)
inputs = tokenizer(texts, return_tensors="pt")
hidden_states = model(**inputs).last_hidden_state
expected_slice = torch.tensor(
[[[6.1708, 5.9244, 1.0835], [6.5207, 6.2893, -11.3324], [-1.0107, -0.0576, -5.9577]]],
)
self.assertTrue(torch.allclose(hidden_states[:, :3, :3], expected_slice, atol=1e-2))
@require_torch
class CpmAntForCausalLMlIntegrationTest(unittest.TestCase):
@tooslow
def test_inference_casual(self):
texts = "今天天气真好!"
model_path = "openbmb/cpm-ant-10b"
model = CpmAntForCausalLM.from_pretrained(model_path)
tokenizer = CpmAntTokenizer.from_pretrained(model_path)
inputs = tokenizer(texts, return_tensors="pt")
hidden_states = model(**inputs).logits
expected_slice = torch.tensor(
[[[-6.4267, -6.4083, -6.3958], [-5.8802, -5.9447, -5.7811], [-5.3896, -5.4820, -5.4295]]],
)
self.assertTrue(torch.allclose(hidden_states[:, :3, :3], expected_slice, atol=1e-2))
@tooslow
def test_simple_generation(self):
model_path = "openbmb/cpm-ant-10b"
model = CpmAntForCausalLM.from_pretrained(model_path)
tokenizer = CpmAntTokenizer.from_pretrained(model_path)
texts = "今天天气不错,"
expected_output = "今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的"
model_inputs = tokenizer(texts, return_tensors="pt")
token_ids = model.generate(**model_inputs)
output_texts = tokenizer.batch_decode(token_ids)
self.assertEqual(expected_output, output_texts)
@tooslow
def test_batch_generation(self):
model_path = "openbmb/cpm-ant-10b"
model = CpmAntForCausalLM.from_pretrained(model_path)
tokenizer = CpmAntTokenizer.from_pretrained(model_path)
texts = ["今天天气不错,", "新年快乐,万事如意!"]
expected_output = [
"今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的",
"新年快乐,万事如意!在这辞旧迎新的美好时刻,我谨代表《农村新技术》杂志社全体同仁,向一直以来关心、支持《农村新技术》杂志发展的各级领导、各界朋友和广大读者致以最诚挚的",
]
model_inputs = tokenizer(texts, return_tensors="pt", padding=True)
token_ids = model.generate(**model_inputs)
output_texts = tokenizer.batch_decode(token_ids)
self.assertEqual(expected_output, output_texts)

View File

@ -0,0 +1,69 @@
# coding=utf-8
# Copyright 2022 The OpenBMB Team and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import unittest
from transformers.models.cpmant.tokenization_cpmant import VOCAB_FILES_NAMES, CpmAntTokenizer
from transformers.testing_utils import require_jieba, tooslow
from ...test_tokenization_common import TokenizerTesterMixin
@require_jieba
class CPMAntTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
tokenizer_class = CpmAntTokenizer
test_rust_tokenizer = False
def setUp(self):
super().setUp()
vocab_tokens = [
"<d>",
"</d>",
"<s>",
"</s>",
"</_>",
"<unk>",
"<pad>",
"</n>",
"",
"",
"C",
"P",
"M",
"A",
"n",
"t",
]
self.vocab_file = os.path.join(self.tmpdirname, VOCAB_FILES_NAMES["vocab_file"])
with open(self.vocab_file, "w", encoding="utf-8") as vocab_writer:
vocab_writer.write("".join([x + "\n" for x in vocab_tokens]))
@tooslow
def test_pre_tokenization(self):
tokenizer = CpmAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
texts = "今天天气真好!"
jieba_tokens = ["今天", "天气", "", "", ""]
tokens = tokenizer.tokenize(texts)
self.assertListEqual(tokens, jieba_tokens)
normalized_text = "今天天气真好!"
input_tokens = [tokenizer.bos_token] + tokens
input_jieba_tokens = [6, 9802, 14962, 2082, 831, 244]
self.assertListEqual(tokenizer.convert_tokens_to_ids(input_tokens), input_jieba_tokens)
reconstructed_text = tokenizer.decode(input_jieba_tokens)
self.assertEqual(reconstructed_text, normalized_text)