mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-24 06:48:58 +06:00
Add ImageGPT (#14240)
* First draft * More improvements * Improve conversion script * Fix init weights for layer norm * Fix correct model for conversion script * Don't tie input and output embeddings * Add print statements for debugging * Add print statements for debugging * Fix vocab size of model * Improve documentation, remove fast tokenizer * Add ImageGPTForImageClassification, improve docs * Fix docs issue * Set verbosity level back to info * Improve tests * Fix tests and add figure * Delete tokenizer file * Remove ImageGPTTokenizer from init files * Remove ImageGPTLayer from init files * Remove ImageGPT tokenizer from docs * First draft of ImageGPTFeatureExtractor * Fix typo * Fix bug * More improvements * Apply suggestions from code review, add tests for feature extractor * Fix layernorm * Update save_pretrained method * Fix issue * Make all tests of ImageGPTFeatureExtractor pass * Update code examples * Rename model inputs to pixel_values * Improve code examples * Update init_weights to post_init * Fix post_init
This commit is contained in:
parent
d83b0e0c07
commit
da36c557f7
@ -249,6 +249,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
|||||||
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||||
|
1. **[ImageGPT](https://huggingface.co/transformers/master/model_doc/imagegpt.html)** (from OpenAI) released with the paper [Generative Pretraining from Pixes](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||||
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||||
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||||
|
@ -247,6 +247,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
|
|||||||
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||||
|
1. **[ImageGPT](https://huggingface.co/transformers/master/model_doc/imagegpt.html)** (from OpenAI) released with the paper [Generative Pretraining from Pixes](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||||
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||||
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||||
|
@ -271,6 +271,7 @@ conda install -c huggingface transformers
|
|||||||
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
|
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
|
||||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
|
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
|
||||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
|
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
|
||||||
|
1. **[ImageGPT](https://huggingface.co/transformers/master/model_doc/imagegpt.html)** (来自 OpenAI) 伴随论文 [Generative Pretraining from Pixes](https://openai.com/blog/image-gpt/) 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
|
||||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
|
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
|
||||||
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
|
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
|
||||||
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
|
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
|
||||||
|
@ -283,6 +283,7 @@ conda install -c huggingface transformers
|
|||||||
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
1. **[GPT-J](https://huggingface.co/transformers/model_doc/gptj.html)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||||
|
1. **[ImageGPT](https://huggingface.co/transformers/master/model_doc/imagegpt.html)** (from OpenAI) released with the paper [Generative Pretraining from Pixes](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||||
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
1. **[LayoutLMv2](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||||
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
1. **[LayoutXLM](https://huggingface.co/transformers/model_doc/layoutlmv2.html)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||||
|
BIN
docs/source/imgs/ImageGPT.png
Normal file
BIN
docs/source/imgs/ImageGPT.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 177 KiB |
@ -213,139 +213,142 @@ Supported models
|
|||||||
Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||||
38. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
|
38. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
|
||||||
<https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
<https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||||
39. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
|
39. `ImageGPT <https://huggingface.co/transformers/master/model_doc/imagegpt.html>`__ (from OpenAI) released with the
|
||||||
|
paper `Generative Pretraining from Pixes <https://openai.com/blog/image-gpt/>`__ by Mark Chen, Alec Radford, Rewon
|
||||||
|
Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||||
|
40. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
|
||||||
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
|
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
|
||||||
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||||
40. :doc:`LayoutLMv2 <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutLMv2:
|
41. :doc:`LayoutLMv2 <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutLMv2:
|
||||||
Multi-modal Pre-training for Visually-Rich Document Understanding <https://arxiv.org/abs/2012.14740>`__ by Yang Xu,
|
Multi-modal Pre-training for Visually-Rich Document Understanding <https://arxiv.org/abs/2012.14740>`__ by Yang Xu,
|
||||||
Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min
|
Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min
|
||||||
Zhang, Lidong Zhou.
|
Zhang, Lidong Zhou.
|
||||||
41. :doc:`LayoutXLM <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutXLM:
|
42. :doc:`LayoutXLM <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutXLM:
|
||||||
Multimodal Pre-training for Multilingual Visually-rich Document Understanding <https://arxiv.org/abs/2104.08836>`__
|
Multimodal Pre-training for Multilingual Visually-rich Document Understanding <https://arxiv.org/abs/2104.08836>`__
|
||||||
by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||||
42. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
|
43. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
|
||||||
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||||
43. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
|
44. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
|
||||||
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||||
44. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
|
45. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
|
||||||
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
|
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
|
||||||
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||||
45. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
|
46. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
|
||||||
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
|
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
|
||||||
by Hao Tan and Mohit Bansal.
|
by Hao Tan and Mohit Bansal.
|
||||||
46. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
|
47. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
|
||||||
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma,
|
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma,
|
||||||
Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal,
|
Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal,
|
||||||
Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||||
47. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
|
48. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
|
||||||
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
|
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
|
||||||
Translator Team.
|
Translator Team.
|
||||||
48. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
|
49. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
|
||||||
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
|
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
|
||||||
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||||
49. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
|
50. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
|
||||||
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
|
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
|
||||||
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||||
50. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
51. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
||||||
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
||||||
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||||
51. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
52. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
||||||
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
||||||
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||||
52. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
|
53. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
|
||||||
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
|
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
|
||||||
Jianfeng Lu, Tie-Yan Liu.
|
Jianfeng Lu, Tie-Yan Liu.
|
||||||
53. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
|
54. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
|
||||||
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
|
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
|
||||||
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||||
54. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
|
55. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
|
||||||
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__ by Jingqing Zhang, Yao Zhao,
|
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__ by Jingqing Zhang, Yao Zhao,
|
||||||
Mohammad Saleh and Peter J. Liu.
|
Mohammad Saleh and Peter J. Liu.
|
||||||
55. :doc:`PhoBERT <model_doc/phobert>` (from VinAI Research) released with the paper `PhoBERT: Pre-trained language
|
56. :doc:`PhoBERT <model_doc/phobert>` (from VinAI Research) released with the paper `PhoBERT: Pre-trained language
|
||||||
models for Vietnamese <https://www.aclweb.org/anthology/2020.findings-emnlp.92/>`__ by Dat Quoc Nguyen and Anh Tuan
|
models for Vietnamese <https://www.aclweb.org/anthology/2020.findings-emnlp.92/>`__ by Dat Quoc Nguyen and Anh Tuan
|
||||||
Nguyen.
|
Nguyen.
|
||||||
56. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
|
57. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
|
||||||
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
|
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
|
||||||
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||||
57. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
|
58. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
|
||||||
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||||
58. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
|
59. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
|
||||||
pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
|
pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
|
||||||
Tsai, M. Johnson, Sebastian Ruder.
|
Tsai, M. Johnson, Sebastian Ruder.
|
||||||
59. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
|
60. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
|
||||||
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
|
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
|
||||||
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||||
60. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
|
61. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
|
||||||
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
|
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
|
||||||
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||||
61. :doc:`SegFormer <model_doc/segformer>` (from NVIDIA) released with the paper `SegFormer: Simple and Efficient
|
62. :doc:`SegFormer <model_doc/segformer>` (from NVIDIA) released with the paper `SegFormer: Simple and Efficient
|
||||||
Design for Semantic Segmentation with Transformers <https://arxiv.org/abs/2105.15203>`__ by Enze Xie, Wenhai Wang,
|
Design for Semantic Segmentation with Transformers <https://arxiv.org/abs/2105.15203>`__ by Enze Xie, Wenhai Wang,
|
||||||
Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
||||||
62. :doc:`SEW <model_doc/sew>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in Unsupervised
|
63. :doc:`SEW <model_doc/sew>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in Unsupervised
|
||||||
Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu
|
Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu
|
||||||
Han, Kilian Q. Weinberger, Yoav Artzi.
|
Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||||
63. :doc:`SEW-D <model_doc/sew_d>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in
|
64. :doc:`SEW-D <model_doc/sew_d>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in
|
||||||
Unsupervised Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim,
|
Unsupervised Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim,
|
||||||
Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||||
64. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
|
65. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
|
||||||
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
|
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
|
||||||
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||||
65. :doc:`SpeechToTextTransformer2 <model_doc/speech_to_text_2>` (from Facebook), released together with the paper
|
66. :doc:`SpeechToTextTransformer2 <model_doc/speech_to_text_2>` (from Facebook), released together with the paper
|
||||||
`Large-Scale Self- and Semi-Supervised Learning for Speech Translation <https://arxiv.org/abs/2104.06678>`__ by
|
`Large-Scale Self- and Semi-Supervised Learning for Speech Translation <https://arxiv.org/abs/2104.06678>`__ by
|
||||||
Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
||||||
66. :doc:`Splinter <model_doc/splinter>` (from Tel Aviv University), released together with the paper `Few-Shot
|
67. :doc:`Splinter <model_doc/splinter>` (from Tel Aviv University), released together with the paper `Few-Shot
|
||||||
Question Answering by Pretraining Span Selection <https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain,
|
Question Answering by Pretraining Span Selection <https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain,
|
||||||
Jonathan Berant, Amir Globerson, Omer Levy.
|
Jonathan Berant, Amir Globerson, Omer Levy.
|
||||||
67. :doc:`SqueezeBert <model_doc/squeezebert>` (from Berkeley) released with the paper `SqueezeBERT: What can computer
|
68. :doc:`SqueezeBert <model_doc/squeezebert>` (from Berkeley) released with the paper `SqueezeBERT: What can computer
|
||||||
vision teach NLP about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola,
|
vision teach NLP about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola,
|
||||||
Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||||
68. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
|
69. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
|
||||||
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
|
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
|
||||||
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||||
69. :doc:`T5v1.1 <model_doc/t5v1.1>` (from Google AI) released in the repository
|
70. :doc:`T5v1.1 <model_doc/t5v1.1>` (from Google AI) released in the repository
|
||||||
`google-research/text-to-text-transfer-transformer
|
`google-research/text-to-text-transfer-transformer
|
||||||
<https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511>`__ by
|
<https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511>`__ by
|
||||||
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi
|
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi
|
||||||
Zhou and Wei Li and Peter J. Liu.
|
Zhou and Wei Li and Peter J. Liu.
|
||||||
70. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
|
71. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
|
||||||
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
|
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
|
||||||
Francesco Piccinno and Julian Martin Eisenschlos.
|
Francesco Piccinno and Julian Martin Eisenschlos.
|
||||||
71. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
|
72. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
|
||||||
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
|
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
|
||||||
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||||
72. :doc:`TrOCR <model_doc/trocr>` (from Microsoft), released together with the paper `TrOCR: Transformer-based Optical
|
73. :doc:`TrOCR <model_doc/trocr>` (from Microsoft), released together with the paper `TrOCR: Transformer-based Optical
|
||||||
Character Recognition with Pre-trained Models <https://arxiv.org/abs/2109.10282>`__ by Minghao Li, Tengchao Lv, Lei
|
Character Recognition with Pre-trained Models <https://arxiv.org/abs/2109.10282>`__ by Minghao Li, Tengchao Lv, Lei
|
||||||
Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
||||||
73. :doc:`UniSpeech <model_doc/unispeech>` (from Microsoft Research) released with the paper `UniSpeech: Unified Speech
|
74. :doc:`UniSpeech <model_doc/unispeech>` (from Microsoft Research) released with the paper `UniSpeech: Unified Speech
|
||||||
Representation Learning with Labeled and Unlabeled Data <https://arxiv.org/abs/2101.07597>`__ by Chengyi Wang, Yu
|
Representation Learning with Labeled and Unlabeled Data <https://arxiv.org/abs/2101.07597>`__ by Chengyi Wang, Yu
|
||||||
Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
||||||
74. :doc:`UniSpeechSat <model_doc/unispeech_sat>` (from Microsoft Research) released with the paper `UNISPEECH-SAT:
|
75. :doc:`UniSpeechSat <model_doc/unispeech_sat>` (from Microsoft Research) released with the paper `UNISPEECH-SAT:
|
||||||
UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING <https://arxiv.org/abs/2110.05752>`__ by
|
UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING <https://arxiv.org/abs/2110.05752>`__ by
|
||||||
Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li,
|
Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li,
|
||||||
Xiangzhan Yu.
|
Xiangzhan Yu.
|
||||||
75. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
|
76. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
|
||||||
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
|
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
|
||||||
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
|
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
|
||||||
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||||
76. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
|
77. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
|
||||||
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
|
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
|
||||||
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||||
77. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
|
78. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
|
||||||
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
|
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
|
||||||
Zhou, Abdelrahman Mohamed, Michael Auli.
|
Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||||
78. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
|
79. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
|
||||||
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
|
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
|
||||||
79. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
|
80. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
|
||||||
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
|
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
|
||||||
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||||
80. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
|
81. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
|
||||||
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
|
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
|
||||||
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
|
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
|
||||||
Zettlemoyer and Veselin Stoyanov.
|
Zettlemoyer and Veselin Stoyanov.
|
||||||
81. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
|
82. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
|
||||||
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
|
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
|
||||||
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||||
82. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
|
83. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
|
||||||
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
|
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
|
||||||
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||||
|
|
||||||
@ -425,6 +428,8 @@ Flax), PyTorch, and/or TensorFlow.
|
|||||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||||
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||||
|
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||||
|
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||||
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
|
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||||
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
|
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||||
@ -629,6 +634,7 @@ Flax), PyTorch, and/or TensorFlow.
|
|||||||
model_doc/funnel
|
model_doc/funnel
|
||||||
model_doc/herbert
|
model_doc/herbert
|
||||||
model_doc/ibert
|
model_doc/ibert
|
||||||
|
model_doc/imagegpt
|
||||||
model_doc/layoutlm
|
model_doc/layoutlm
|
||||||
model_doc/layoutlmv2
|
model_doc/layoutlmv2
|
||||||
model_doc/layoutxlm
|
model_doc/layoutxlm
|
||||||
|
110
docs/source/model_doc/imagegpt.rst
Normal file
110
docs/source/model_doc/imagegpt.rst
Normal file
@ -0,0 +1,110 @@
|
|||||||
|
..
|
||||||
|
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
|
||||||
|
ImageGPT
|
||||||
|
-----------------------------------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Overview
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The ImageGPT model was proposed in `Generative Pretraining from Pixels <https://openai.com/blog/image-gpt/>`__ by Mark
|
||||||
|
Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like
|
||||||
|
model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.
|
||||||
|
|
||||||
|
The abstract from the paper is the following:
|
||||||
|
|
||||||
|
*Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models
|
||||||
|
can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels,
|
||||||
|
without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels,
|
||||||
|
we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and
|
||||||
|
low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide
|
||||||
|
ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also
|
||||||
|
competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0%
|
||||||
|
top-1 accuracy on a linear probe of our features.*
|
||||||
|
|
||||||
|
The figure below summarizes the approach (taken from the `original paper
|
||||||
|
<https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf>`__):
|
||||||
|
|
||||||
|
.. image:: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/imagegpt_architecture.png
|
||||||
|
:width: 600
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- ImageGPT is almost exactly the same as :doc:`GPT-2 <gpt2>`, with the exception that a different activation function
|
||||||
|
is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. ImageGPT also doesn't
|
||||||
|
have tied input- and output embeddings.
|
||||||
|
- As the time- and memory requirements of the attention mechanism of Transformers scales quadratically in the sequence
|
||||||
|
length, the authors pre-trained ImageGPT on smaller input resolutions, such as 32x32 and 64x64. However, feeding a
|
||||||
|
sequence of 32x32x3=3072 tokens from 0..255 into a Transformer is still prohibitively large. Therefore, the authors
|
||||||
|
applied k-means clustering to the (R,G,B) pixel values with k=512. This way, we only have a 32*32 = 1024-long
|
||||||
|
sequence, but now of integers in the range 0..511. So we are shrinking the sequence length at the cost of a bigger
|
||||||
|
embedding matrix. In other words, the vocabulary size of ImageGPT is 512, + 1 for a special "start of sentence" (SOS)
|
||||||
|
token, used at the beginning of every sequence. One can use :class:`~transformers.ImageGPTFeatureExtractor` to
|
||||||
|
prepare images for the model.
|
||||||
|
- Despite being pre-trained entirely unsupervised (i.e. without the use of any labels), ImageGPT produces fairly
|
||||||
|
performant image features useful for downstream tasks, such as image classification. The authors showed that the
|
||||||
|
features in the middle of the network are the most performant, and can be used as-is to train a linear model (such as
|
||||||
|
a sklearn logistic regression model for example). This is also referred to as "linear probing". Features can be
|
||||||
|
easily obtained by first forwarding the image through the model, then specifying `output_hidden_states=True`, and
|
||||||
|
then average-pool the hidden states at whatever layer you like.
|
||||||
|
- Alternatively, one can further fine-tune the entire model on a downstream dataset, similar to BERT. For this, you can
|
||||||
|
use :class:`~transformers.ImageGPTForImageClassification`.
|
||||||
|
- ImageGPT comes in different sizes: there's ImageGPT-small, ImageGPT-medium and ImageGPT-large. The authors did also
|
||||||
|
train an XL variant, which they didn't release. The differences in size are summarized in the following table:
|
||||||
|
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
| **Model variant** | **Number of layers** | **Hidden size** | **Number of heads** | **# params** |
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
| iGPT-small | 24 | 512 | 8 | 76 million |
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
| iGPT-medium | 36 | 1024 | 8 | 455 million |
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
| iGPT-large | 48 | 1536 | 16 | 1.4 million |
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
| iGPT-XL | 60 | 3072 | not specified | 6.8 billion |
|
||||||
|
+-------------------+----------------------+-----------------+---------------------+--------------+
|
||||||
|
|
||||||
|
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__, based on `this issue
|
||||||
|
<https://github.com/openai/image-gpt/issues/7>`__. The original code can be found `here
|
||||||
|
<https://github.com/openai/image-gpt>`__.
|
||||||
|
|
||||||
|
ImageGPTConfig
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.ImageGPTConfig
|
||||||
|
:members:
|
||||||
|
|
||||||
|
ImageGPTFeatureExtractor
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.ImageGPTFeatureExtractor
|
||||||
|
:members: __call__
|
||||||
|
|
||||||
|
ImageGPTModel
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.ImageGPTModel
|
||||||
|
:members: forward
|
||||||
|
|
||||||
|
|
||||||
|
ImageGPTForCausalLM
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.ImageGPTForCausalLM
|
||||||
|
:members: forward
|
||||||
|
|
||||||
|
|
||||||
|
ImageGPTForImageClassification
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.ImageGPTForImageClassification
|
||||||
|
:members: forward
|
@ -221,6 +221,7 @@ _import_structure = {
|
|||||||
"models.herbert": ["HerbertTokenizer"],
|
"models.herbert": ["HerbertTokenizer"],
|
||||||
"models.hubert": ["HUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "HubertConfig"],
|
"models.hubert": ["HUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "HubertConfig"],
|
||||||
"models.ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
|
"models.ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
|
||||||
|
"models.imagegpt": ["IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ImageGPTConfig"],
|
||||||
"models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"],
|
"models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"],
|
||||||
"models.layoutlmv2": [
|
"models.layoutlmv2": [
|
||||||
"LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
"LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP",
|
||||||
@ -478,6 +479,7 @@ if is_vision_available():
|
|||||||
_import_structure["models.clip"].append("CLIPProcessor")
|
_import_structure["models.clip"].append("CLIPProcessor")
|
||||||
_import_structure["models.deit"].append("DeiTFeatureExtractor")
|
_import_structure["models.deit"].append("DeiTFeatureExtractor")
|
||||||
_import_structure["models.detr"].append("DetrFeatureExtractor")
|
_import_structure["models.detr"].append("DetrFeatureExtractor")
|
||||||
|
_import_structure["models.imagegpt"].append("ImageGPTFeatureExtractor")
|
||||||
_import_structure["models.layoutlmv2"].append("LayoutLMv2FeatureExtractor")
|
_import_structure["models.layoutlmv2"].append("LayoutLMv2FeatureExtractor")
|
||||||
_import_structure["models.layoutlmv2"].append("LayoutLMv2Processor")
|
_import_structure["models.layoutlmv2"].append("LayoutLMv2Processor")
|
||||||
_import_structure["models.layoutxlm"].append("LayoutXLMProcessor")
|
_import_structure["models.layoutxlm"].append("LayoutXLMProcessor")
|
||||||
@ -943,6 +945,16 @@ if is_torch_available():
|
|||||||
"IBertPreTrainedModel",
|
"IBertPreTrainedModel",
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
_import_structure["models.imagegpt"].extend(
|
||||||
|
[
|
||||||
|
"IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
|
"ImageGPTForCausalLM",
|
||||||
|
"ImageGPTForImageClassification",
|
||||||
|
"ImageGPTModel",
|
||||||
|
"ImageGPTPreTrainedModel",
|
||||||
|
"load_tf_weights_in_imagegpt",
|
||||||
|
]
|
||||||
|
)
|
||||||
_import_structure["models.layoutlm"].extend(
|
_import_structure["models.layoutlm"].extend(
|
||||||
[
|
[
|
||||||
"LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST",
|
"LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
@ -2150,6 +2162,7 @@ if TYPE_CHECKING:
|
|||||||
from .models.herbert import HerbertTokenizer
|
from .models.herbert import HerbertTokenizer
|
||||||
from .models.hubert import HUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, HubertConfig
|
from .models.hubert import HUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, HubertConfig
|
||||||
from .models.ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
|
from .models.ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
|
||||||
|
from .models.imagegpt import IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP, ImageGPTConfig
|
||||||
from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer
|
from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer
|
||||||
from .models.layoutlmv2 import (
|
from .models.layoutlmv2 import (
|
||||||
LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
@ -2369,6 +2382,7 @@ if TYPE_CHECKING:
|
|||||||
from .models.clip import CLIPFeatureExtractor, CLIPProcessor
|
from .models.clip import CLIPFeatureExtractor, CLIPProcessor
|
||||||
from .models.deit import DeiTFeatureExtractor
|
from .models.deit import DeiTFeatureExtractor
|
||||||
from .models.detr import DetrFeatureExtractor
|
from .models.detr import DetrFeatureExtractor
|
||||||
|
from .models.imagegpt import ImageGPTFeatureExtractor
|
||||||
from .models.layoutlmv2 import LayoutLMv2FeatureExtractor, LayoutLMv2Processor
|
from .models.layoutlmv2 import LayoutLMv2FeatureExtractor, LayoutLMv2Processor
|
||||||
from .models.layoutxlm import LayoutXLMProcessor
|
from .models.layoutxlm import LayoutXLMProcessor
|
||||||
from .models.segformer import SegformerFeatureExtractor
|
from .models.segformer import SegformerFeatureExtractor
|
||||||
@ -2756,6 +2770,14 @@ if TYPE_CHECKING:
|
|||||||
IBertModel,
|
IBertModel,
|
||||||
IBertPreTrainedModel,
|
IBertPreTrainedModel,
|
||||||
)
|
)
|
||||||
|
from .models.imagegpt import (
|
||||||
|
IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
ImageGPTForCausalLM,
|
||||||
|
ImageGPTForImageClassification,
|
||||||
|
ImageGPTModel,
|
||||||
|
ImageGPTPreTrainedModel,
|
||||||
|
load_tf_weights_in_imagegpt,
|
||||||
|
)
|
||||||
from .models.layoutlm import (
|
from .models.layoutlm import (
|
||||||
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
|
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
LayoutLMForMaskedLM,
|
LayoutLMForMaskedLM,
|
||||||
|
@ -470,7 +470,13 @@ class FeatureExtractionMixin:
|
|||||||
:obj:`str`: String containing all the attributes that make up this feature_extractor instance in JSON
|
:obj:`str`: String containing all the attributes that make up this feature_extractor instance in JSON
|
||||||
format.
|
format.
|
||||||
"""
|
"""
|
||||||
return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"
|
dictionary = self.to_dict()
|
||||||
|
|
||||||
|
for key, value in dictionary.items():
|
||||||
|
if isinstance(value, np.ndarray):
|
||||||
|
dictionary[key] = value.tolist()
|
||||||
|
|
||||||
|
return json.dumps(dictionary, indent=2, sort_keys=True) + "\n"
|
||||||
|
|
||||||
def to_json_file(self, json_file_path: Union[str, os.PathLike]):
|
def to_json_file(self, json_file_path: Union[str, os.PathLike]):
|
||||||
"""
|
"""
|
||||||
|
@ -57,6 +57,7 @@ from . import (
|
|||||||
herbert,
|
herbert,
|
||||||
hubert,
|
hubert,
|
||||||
ibert,
|
ibert,
|
||||||
|
imagegpt,
|
||||||
layoutlm,
|
layoutlm,
|
||||||
layoutlmv2,
|
layoutlmv2,
|
||||||
layoutxlm,
|
layoutxlm,
|
||||||
|
@ -30,6 +30,7 @@ logger = logging.get_logger(__name__)
|
|||||||
CONFIG_MAPPING_NAMES = OrderedDict(
|
CONFIG_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
# Add configs here
|
# Add configs here
|
||||||
|
("imagegpt", "ImageGPTConfig"),
|
||||||
("vision-encoder-decoder", "VisionEncoderDecoderConfig"),
|
("vision-encoder-decoder", "VisionEncoderDecoderConfig"),
|
||||||
("trocr", "TrOCRConfig"),
|
("trocr", "TrOCRConfig"),
|
||||||
("fnet", "FNetConfig"),
|
("fnet", "FNetConfig"),
|
||||||
@ -111,6 +112,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
|
|||||||
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
|
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
# Add archive maps here
|
# Add archive maps here
|
||||||
|
("imagegpt", "IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("pegasus", "PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("pegasus", "PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
("segformer", "SEGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
("segformer", "SEGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
|
||||||
@ -182,6 +184,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
|
|||||||
MODEL_NAMES_MAPPING = OrderedDict(
|
MODEL_NAMES_MAPPING = OrderedDict(
|
||||||
[
|
[
|
||||||
# Add full (and cased) model names here
|
# Add full (and cased) model names here
|
||||||
|
("imagegpt", "ImageGPT"),
|
||||||
("vision-encoder-decoder", "Vision Encoder decoder"),
|
("vision-encoder-decoder", "Vision Encoder decoder"),
|
||||||
("trocr", "TrOCR"),
|
("trocr", "TrOCR"),
|
||||||
("fnet", "FNet"),
|
("fnet", "FNet"),
|
||||||
|
@ -28,6 +28,7 @@ logger = logging.get_logger(__name__)
|
|||||||
MODEL_MAPPING_NAMES = OrderedDict(
|
MODEL_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
# Base model mapping
|
# Base model mapping
|
||||||
|
("imagegpt", "ImageGPTModel"),
|
||||||
("fnet", "FNetModel"),
|
("fnet", "FNetModel"),
|
||||||
("segformer", "SegformerModel"),
|
("segformer", "SegformerModel"),
|
||||||
("gptj", "GPTJModel"),
|
("gptj", "GPTJModel"),
|
||||||
@ -145,6 +146,7 @@ MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict(
|
|||||||
MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
|
MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
# Model with LM heads mapping
|
# Model with LM heads mapping
|
||||||
|
("imagegpt", "ImageGPTForCausalLM"),
|
||||||
("fnet", "FNetForMaskedLM"),
|
("fnet", "FNetForMaskedLM"),
|
||||||
("gptj", "GPTJForCausalLM"),
|
("gptj", "GPTJForCausalLM"),
|
||||||
("rembert", "RemBertForMaskedLM"),
|
("rembert", "RemBertForMaskedLM"),
|
||||||
@ -195,6 +197,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
|
|||||||
MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
|
MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
|
||||||
[
|
[
|
||||||
# Model for Causal LM mapping
|
# Model for Causal LM mapping
|
||||||
|
("imagegpt", "ImageGPTForCausalLM"),
|
||||||
("trocr", "TrOCRForCausalLM"),
|
("trocr", "TrOCRForCausalLM"),
|
||||||
("gptj", "GPTJForCausalLM"),
|
("gptj", "GPTJForCausalLM"),
|
||||||
("rembert", "RemBertForCausalLM"),
|
("rembert", "RemBertForCausalLM"),
|
||||||
|
61
src/transformers/models/imagegpt/__init__.py
Normal file
61
src/transformers/models/imagegpt/__init__.py
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
# flake8: noqa
|
||||||
|
# There's no way to ignore "F401 '...' imported but unused" warnings in this
|
||||||
|
# module, but to preserve other warnings. So, don't check this module at all.
|
||||||
|
|
||||||
|
# Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
from typing import TYPE_CHECKING
|
||||||
|
|
||||||
|
from ...file_utils import _LazyModule, is_torch_available, is_vision_available
|
||||||
|
|
||||||
|
|
||||||
|
_import_structure = {
|
||||||
|
"configuration_imagegpt": ["IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ImageGPTConfig"],
|
||||||
|
}
|
||||||
|
|
||||||
|
if is_vision_available():
|
||||||
|
_import_structure["feature_extraction_imagegpt"] = ["ImageGPTFeatureExtractor"]
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
_import_structure["modeling_imagegpt"] = [
|
||||||
|
"IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST",
|
||||||
|
"ImageGPTForCausalLM",
|
||||||
|
"ImageGPTForImageClassification",
|
||||||
|
"ImageGPTModel",
|
||||||
|
"ImageGPTPreTrainedModel",
|
||||||
|
"load_tf_weights_in_imagegpt",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from .configuration_imagegpt import IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP, ImageGPTConfig
|
||||||
|
|
||||||
|
if is_vision_available():
|
||||||
|
from .feature_extraction_imagegpt import ImageGPTFeatureExtractor
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
from .modeling_imagegpt import (
|
||||||
|
IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
ImageGPTForCausalLM,
|
||||||
|
ImageGPTForImageClassification,
|
||||||
|
ImageGPTModel,
|
||||||
|
ImageGPTPreTrainedModel,
|
||||||
|
load_tf_weights_in_imagegpt,
|
||||||
|
)
|
||||||
|
|
||||||
|
else:
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)
|
142
src/transformers/models/imagegpt/configuration_imagegpt.py
Normal file
142
src/transformers/models/imagegpt/configuration_imagegpt.py
Normal file
@ -0,0 +1,142 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 The HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
""" OpenAI ImageGPT configuration """
|
||||||
|
|
||||||
|
from ...configuration_utils import PretrainedConfig
|
||||||
|
from ...utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||||
|
"openai/imagegpt-small": "",
|
||||||
|
"openai/imagegpt-medium": "",
|
||||||
|
"openai/imagegpt-large": "",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTConfig(PretrainedConfig):
|
||||||
|
"""
|
||||||
|
This is the configuration class to store the configuration of a :class:`~transformers.ImageGPTModel` or a
|
||||||
|
:class:`~transformers.TFImageGPTModel`. It is used to instantiate a GPT-2 model according to the specified
|
||||||
|
arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
|
||||||
|
configuration to that of the ImageGPT `small <https://huggingface.co/imagegpt>`__ architecture.
|
||||||
|
|
||||||
|
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
|
||||||
|
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
|
||||||
|
|
||||||
|
|
||||||
|
Args:
|
||||||
|
vocab_size (:obj:`int`, `optional`, defaults to 512):
|
||||||
|
Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by the
|
||||||
|
:obj:`inputs_ids` passed when calling :class:`~transformers.ImageGPTModel` or
|
||||||
|
:class:`~transformers.TFImageGPTModel`.
|
||||||
|
n_positions (:obj:`int`, `optional`, defaults to 32*32):
|
||||||
|
The maximum sequence length that this model might ever be used with. Typically set this to something large
|
||||||
|
just in case (e.g., 512 or 1024 or 2048).
|
||||||
|
n_embd (:obj:`int`, `optional`, defaults to 512):
|
||||||
|
Dimensionality of the embeddings and hidden states.
|
||||||
|
n_layer (:obj:`int`, `optional`, defaults to 24):
|
||||||
|
Number of hidden layers in the Transformer encoder.
|
||||||
|
n_head (:obj:`int`, `optional`, defaults to 8):
|
||||||
|
Number of attention heads for each attention layer in the Transformer encoder.
|
||||||
|
n_inner (:obj:`int`, `optional`, defaults to None):
|
||||||
|
Dimensionality of the inner feed-forward layers. :obj:`None` will set it to 4 times n_embd
|
||||||
|
activation_function (:obj:`str`, `optional`, defaults to :obj:`"quick_gelu"`):
|
||||||
|
Activation function (can be one of the activation functions defined in src/transformers/activations.py).
|
||||||
|
Defaults to "quick_gelu".
|
||||||
|
resid_pdrop (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
|
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
|
||||||
|
embd_pdrop (:obj:`int`, `optional`, defaults to 0.1):
|
||||||
|
The dropout ratio for the embeddings.
|
||||||
|
attn_pdrop (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
|
The dropout ratio for the attention.
|
||||||
|
layer_norm_epsilon (:obj:`float`, `optional`, defaults to 1e-5):
|
||||||
|
The epsilon to use in the layer normalization layers.
|
||||||
|
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
|
||||||
|
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||||
|
scale_attn_weights (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
|
Scale attention weights by dividing by sqrt(hidden_size)..
|
||||||
|
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
|
Whether or not the model should return the last key/values attentions (not used by all models).
|
||||||
|
scale_attn_by_inverse_layer_idx (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether to additionally scale attention weights by ``1 / layer_idx + 1``.
|
||||||
|
reorder_and_upcast_attn (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention
|
||||||
|
dot-product/softmax to float() when training with mixed precision.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
>>> from transformers import ImageGPTModel, ImageGPTConfig
|
||||||
|
|
||||||
|
>>> # Initializing a ImageGPT configuration
|
||||||
|
>>> configuration = ImageGPTConfig()
|
||||||
|
|
||||||
|
>>> # Initializing a model from the configuration
|
||||||
|
>>> model = ImageGPTModel(configuration)
|
||||||
|
|
||||||
|
>>> # Accessing the model configuration
|
||||||
|
>>> configuration = model.config
|
||||||
|
"""
|
||||||
|
|
||||||
|
model_type = "imagegpt"
|
||||||
|
keys_to_ignore_at_inference = ["past_key_values"]
|
||||||
|
attribute_map = {
|
||||||
|
"hidden_size": "n_embd",
|
||||||
|
"max_position_embeddings": "n_positions",
|
||||||
|
"num_attention_heads": "n_head",
|
||||||
|
"num_hidden_layers": "n_layer",
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_size=512 + 1, # add one for start of sentence (sos) token
|
||||||
|
n_positions=32 * 32,
|
||||||
|
n_embd=512,
|
||||||
|
n_layer=24,
|
||||||
|
n_head=8,
|
||||||
|
n_inner=None,
|
||||||
|
activation_function="quick_gelu",
|
||||||
|
resid_pdrop=0.1,
|
||||||
|
embd_pdrop=0.1,
|
||||||
|
attn_pdrop=0.1,
|
||||||
|
layer_norm_epsilon=1e-5,
|
||||||
|
initializer_range=0.02,
|
||||||
|
scale_attn_weights=True,
|
||||||
|
use_cache=True,
|
||||||
|
tie_word_embeddings=False,
|
||||||
|
scale_attn_by_inverse_layer_idx=False,
|
||||||
|
reorder_and_upcast_attn=False,
|
||||||
|
**kwargs,
|
||||||
|
):
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.n_positions = n_positions
|
||||||
|
self.n_embd = n_embd
|
||||||
|
self.n_layer = n_layer
|
||||||
|
self.n_head = n_head
|
||||||
|
self.n_inner = n_inner
|
||||||
|
self.activation_function = activation_function
|
||||||
|
self.resid_pdrop = resid_pdrop
|
||||||
|
self.embd_pdrop = embd_pdrop
|
||||||
|
self.attn_pdrop = attn_pdrop
|
||||||
|
self.layer_norm_epsilon = layer_norm_epsilon
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.scale_attn_weights = scale_attn_weights
|
||||||
|
self.use_cache = use_cache
|
||||||
|
self.scale_attn_by_inverse_layer_idx = scale_attn_by_inverse_layer_idx
|
||||||
|
self.reorder_and_upcast_attn = reorder_and_upcast_attn
|
||||||
|
self.tie_word_embeddings = tie_word_embeddings
|
||||||
|
|
||||||
|
super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
|
@ -0,0 +1,73 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 The HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
"""Convert OpenAI Image GPT checkpoints."""
|
||||||
|
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from transformers import ImageGPTConfig, ImageGPTForCausalLM, load_tf_weights_in_imagegpt
|
||||||
|
from transformers.file_utils import CONFIG_NAME, WEIGHTS_NAME
|
||||||
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logging.set_verbosity_info()
|
||||||
|
|
||||||
|
|
||||||
|
def convert_imagegpt_checkpoint_to_pytorch(imagegpt_checkpoint_path, model_size, pytorch_dump_folder_path):
|
||||||
|
# Construct configuration depending on size
|
||||||
|
MODELS = {"small": (512, 8, 24), "medium": (1024, 8, 36), "large": (1536, 16, 48)}
|
||||||
|
n_embd, n_head, n_layer = MODELS[model_size] # set model hyperparameters
|
||||||
|
config = ImageGPTConfig(n_embd=n_embd, n_layer=n_layer, n_head=n_head)
|
||||||
|
model = ImageGPTForCausalLM(config)
|
||||||
|
|
||||||
|
# Load weights from numpy
|
||||||
|
load_tf_weights_in_imagegpt(model, config, imagegpt_checkpoint_path)
|
||||||
|
|
||||||
|
# Save pytorch-model
|
||||||
|
pytorch_weights_dump_path = pytorch_dump_folder_path + "/" + WEIGHTS_NAME
|
||||||
|
pytorch_config_dump_path = pytorch_dump_folder_path + "/" + CONFIG_NAME
|
||||||
|
print(f"Save PyTorch model to {pytorch_weights_dump_path}")
|
||||||
|
torch.save(model.state_dict(), pytorch_weights_dump_path)
|
||||||
|
print(f"Save configuration file to {pytorch_config_dump_path}")
|
||||||
|
with open(pytorch_config_dump_path, "w", encoding="utf-8") as f:
|
||||||
|
f.write(config.to_json_string())
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
# Required parameters
|
||||||
|
parser.add_argument(
|
||||||
|
"--imagegpt_checkpoint_path",
|
||||||
|
default=None,
|
||||||
|
type=str,
|
||||||
|
required=True,
|
||||||
|
help="Path to the TensorFlow checkpoint path.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--model_size",
|
||||||
|
default=None,
|
||||||
|
type=str,
|
||||||
|
required=True,
|
||||||
|
help="Size of the model (can be either 'small', 'medium' or 'large').",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--pytorch_dump_folder_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
convert_imagegpt_checkpoint_to_pytorch(
|
||||||
|
args.imagegpt_checkpoint_path, args.model_size, args.pytorch_dump_folder_path
|
||||||
|
)
|
176
src/transformers/models/imagegpt/feature_extraction_imagegpt.py
Normal file
176
src/transformers/models/imagegpt/feature_extraction_imagegpt.py
Normal file
@ -0,0 +1,176 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
"""Feature extractor class for ImageGPT."""
|
||||||
|
|
||||||
|
from typing import List, Optional, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
from ...feature_extraction_utils import BatchFeature, FeatureExtractionMixin
|
||||||
|
from ...file_utils import TensorType
|
||||||
|
from ...image_utils import ImageFeatureExtractionMixin, is_torch_tensor
|
||||||
|
from ...utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def squared_euclidean_distance(a, b):
|
||||||
|
b = b.T
|
||||||
|
a2 = np.sum(np.square(a), axis=1)
|
||||||
|
b2 = np.sum(np.square(b), axis=0)
|
||||||
|
ab = np.matmul(a, b)
|
||||||
|
d = a2[:, None] - 2 * ab + b2[None, :]
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
def color_quantize(x, clusters):
|
||||||
|
x = x.reshape(-1, 3)
|
||||||
|
d = squared_euclidean_distance(x, clusters)
|
||||||
|
return np.argmin(d, axis=1)
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionMixin):
|
||||||
|
r"""
|
||||||
|
Constructs an ImageGPT feature extractor. This feature extractor can be used to resize images to a smaller
|
||||||
|
resolution (such as 32x32 or 64x64), normalize them and finally color quantize them to obtain sequences of "pixel
|
||||||
|
values" (color clusters).
|
||||||
|
|
||||||
|
This feature extractor inherits from :class:`~transformers.FeatureExtractionMixin` which contains most of the main
|
||||||
|
methods. Users should refer to this superclass for more information regarding those methods.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
clusters (:obj:`np.ndarray`):
|
||||||
|
The color clusters to use, as a :obj:`np.ndarray` of shape :obj:`(n_clusters, 3)`.
|
||||||
|
do_resize (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
|
Whether to resize the input to a certain :obj:`size`.
|
||||||
|
size (:obj:`int` or :obj:`Tuple(int)`, `optional`, defaults to 32):
|
||||||
|
Resize the input to the given size. If a tuple is provided, it should be (width, height). If only an
|
||||||
|
integer is provided, then the input will be resized to (size, size). Only has an effect if :obj:`do_resize`
|
||||||
|
is set to :obj:`True`.
|
||||||
|
resample (:obj:`int`, `optional`, defaults to :obj:`PIL.Image.BILINEAR`):
|
||||||
|
An optional resampling filter. This can be one of :obj:`PIL.Image.NEAREST`, :obj:`PIL.Image.BOX`,
|
||||||
|
:obj:`PIL.Image.BILINEAR`, :obj:`PIL.Image.HAMMING`, :obj:`PIL.Image.BICUBIC` or :obj:`PIL.Image.LANCZOS`.
|
||||||
|
Only has an effect if :obj:`do_resize` is set to :obj:`True`.
|
||||||
|
do_normalize (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
|
Whether or not to normalize the input to the range between -1 and +1.
|
||||||
|
"""
|
||||||
|
|
||||||
|
model_input_names = ["pixel_values"]
|
||||||
|
|
||||||
|
def __init__(self, clusters, do_resize=True, size=32, resample=Image.BILINEAR, do_normalize=True, **kwargs):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
self.clusters = np.asarray(clusters)
|
||||||
|
self.do_resize = do_resize
|
||||||
|
self.size = size
|
||||||
|
self.resample = resample
|
||||||
|
self.do_normalize = do_normalize
|
||||||
|
|
||||||
|
def normalize(self, image):
|
||||||
|
"""
|
||||||
|
Normalizes :obj:`image` into the range -1 to +1.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image (:obj:`PIL.Image.Image` or :obj:`np.ndarray` or :obj:`torch.Tensor`):
|
||||||
|
The image to normalize.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
:obj:`np.ndarray`: The normalized image.
|
||||||
|
"""
|
||||||
|
image = self.to_numpy_array(image, rescale=False, channel_first=False)
|
||||||
|
|
||||||
|
return image / 127.5 - 1
|
||||||
|
|
||||||
|
def __call__(
|
||||||
|
self,
|
||||||
|
images: Union[
|
||||||
|
Image.Image, np.ndarray, "torch.Tensor", List[Image.Image], List[np.ndarray], List["torch.Tensor"] # noqa
|
||||||
|
],
|
||||||
|
return_tensors: Optional[Union[str, TensorType]] = None,
|
||||||
|
**kwargs
|
||||||
|
) -> BatchFeature:
|
||||||
|
"""
|
||||||
|
Main method to prepare for the model one or several image(s).
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
NumPy arrays and PyTorch tensors are converted to PIL images when resizing, so the most efficient is to pass
|
||||||
|
PIL images.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
images (:obj:`PIL.Image.Image`, :obj:`np.ndarray`, :obj:`torch.Tensor`, :obj:`List[PIL.Image.Image]`, :obj:`List[np.ndarray]`, :obj:`List[torch.Tensor]`):
|
||||||
|
The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch
|
||||||
|
tensor. In case of a NumPy array/PyTorch tensor, each image should be of shape (C, H, W), where C is a
|
||||||
|
number of channels, H and W are image height and width.
|
||||||
|
|
||||||
|
return_tensors (:obj:`str` or :class:`~transformers.file_utils.TensorType`, `optional`, defaults to :obj:`'np'`):
|
||||||
|
If set, will return tensors of a particular framework. Acceptable values are:
|
||||||
|
|
||||||
|
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects.
|
||||||
|
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects.
|
||||||
|
* :obj:`'np'`: Return NumPy :obj:`np.ndarray` objects.
|
||||||
|
* :obj:`'jax'`: Return JAX :obj:`jnp.ndarray` objects.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
:class:`~transformers.BatchFeature`: A :class:`~transformers.BatchFeature` with the following fields:
|
||||||
|
|
||||||
|
- **pixel_values** -- Pixel values to be fed to a model, of shape (batch_size, num_channels, height,
|
||||||
|
width).
|
||||||
|
"""
|
||||||
|
# Input type checking for clearer error
|
||||||
|
valid_images = False
|
||||||
|
|
||||||
|
# Check that images has a valid type
|
||||||
|
if isinstance(images, (Image.Image, np.ndarray)) or is_torch_tensor(images):
|
||||||
|
valid_images = True
|
||||||
|
elif isinstance(images, (list, tuple)):
|
||||||
|
if len(images) == 0 or isinstance(images[0], (Image.Image, np.ndarray)) or is_torch_tensor(images[0]):
|
||||||
|
valid_images = True
|
||||||
|
|
||||||
|
if not valid_images:
|
||||||
|
raise ValueError(
|
||||||
|
"Images must of type `PIL.Image.Image`, `np.ndarray` or `torch.Tensor` (single example), "
|
||||||
|
"`List[PIL.Image.Image]`, `List[np.ndarray]` or `List[torch.Tensor]` (batch of examples)."
|
||||||
|
)
|
||||||
|
|
||||||
|
is_batched = bool(
|
||||||
|
isinstance(images, (list, tuple))
|
||||||
|
and (isinstance(images[0], (Image.Image, np.ndarray)) or is_torch_tensor(images[0]))
|
||||||
|
)
|
||||||
|
|
||||||
|
if not is_batched:
|
||||||
|
images = [images]
|
||||||
|
|
||||||
|
# transformations (resizing + normalization)
|
||||||
|
if self.do_resize and self.size is not None:
|
||||||
|
images = [self.resize(image, size=self.size, resample=self.resample) for image in images]
|
||||||
|
|
||||||
|
if self.do_normalize:
|
||||||
|
images = [self.normalize(image) for image in images]
|
||||||
|
|
||||||
|
# color quantize from (batch_size, height, width, 3) to (batch_size, height, width)
|
||||||
|
images = np.array(images)
|
||||||
|
images = color_quantize(images, self.clusters).reshape(images.shape[:-1])
|
||||||
|
|
||||||
|
# flatten to (batch_size, height*width)
|
||||||
|
batch_size = images.shape[0]
|
||||||
|
images = images.reshape(batch_size, -1)
|
||||||
|
|
||||||
|
# return as BatchFeature
|
||||||
|
data = {"pixel_values": images}
|
||||||
|
encoded_inputs = BatchFeature(data=data, tensor_type=return_tensors)
|
||||||
|
|
||||||
|
return encoded_inputs
|
1159
src/transformers/models/imagegpt/modeling_imagegpt.py
Executable file
1159
src/transformers/models/imagegpt/modeling_imagegpt.py
Executable file
File diff suppressed because it is too large
Load Diff
@ -2637,6 +2637,54 @@ class IBertPreTrainedModel:
|
|||||||
requires_backends(self, ["torch"])
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTForCausalLM:
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_pretrained(cls, *args, **kwargs):
|
||||||
|
requires_backends(cls, ["torch"])
|
||||||
|
|
||||||
|
def forward(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTForImageClassification:
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTModel:
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_pretrained(cls, *args, **kwargs):
|
||||||
|
requires_backends(cls, ["torch"])
|
||||||
|
|
||||||
|
def forward(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTPreTrainedModel:
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_pretrained(cls, *args, **kwargs):
|
||||||
|
requires_backends(cls, ["torch"])
|
||||||
|
|
||||||
|
def forward(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
|
def load_tf_weights_in_imagegpt(*args, **kwargs):
|
||||||
|
requires_backends(load_tf_weights_in_imagegpt, ["torch"])
|
||||||
|
|
||||||
|
|
||||||
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST = None
|
||||||
|
|
||||||
|
|
||||||
|
@ -36,6 +36,11 @@ class DetrFeatureExtractor:
|
|||||||
requires_backends(self, ["vision"])
|
requires_backends(self, ["vision"])
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTFeatureExtractor:
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
requires_backends(self, ["vision"])
|
||||||
|
|
||||||
|
|
||||||
class LayoutLMv2FeatureExtractor:
|
class LayoutLMv2FeatureExtractor:
|
||||||
def __init__(self, *args, **kwargs):
|
def __init__(self, *args, **kwargs):
|
||||||
requires_backends(self, ["vision"])
|
requires_backends(self, ["vision"])
|
||||||
|
177
tests/test_feature_extraction_imagegpt.py
Normal file
177
tests/test_feature_extraction_imagegpt.py
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 HuggingFace Inc.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from datasets import load_dataset
|
||||||
|
|
||||||
|
from transformers.file_utils import is_torch_available, is_vision_available
|
||||||
|
from transformers.testing_utils import require_torch, require_vision, slow
|
||||||
|
|
||||||
|
from .test_feature_extraction_common import FeatureExtractionSavingTestMixin
|
||||||
|
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
import torch
|
||||||
|
|
||||||
|
if is_vision_available():
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
from transformers import ImageGPTFeatureExtractor
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTFeatureExtractionTester(unittest.TestCase):
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
parent,
|
||||||
|
batch_size=7,
|
||||||
|
num_channels=3,
|
||||||
|
image_size=18,
|
||||||
|
min_resolution=30,
|
||||||
|
max_resolution=400,
|
||||||
|
do_resize=True,
|
||||||
|
size=18,
|
||||||
|
do_normalize=True,
|
||||||
|
):
|
||||||
|
self.parent = parent
|
||||||
|
self.batch_size = batch_size
|
||||||
|
self.num_channels = num_channels
|
||||||
|
self.image_size = image_size
|
||||||
|
self.min_resolution = min_resolution
|
||||||
|
self.max_resolution = max_resolution
|
||||||
|
self.do_resize = do_resize
|
||||||
|
self.size = size
|
||||||
|
self.do_normalize = do_normalize
|
||||||
|
|
||||||
|
def prepare_feat_extract_dict(self):
|
||||||
|
return {
|
||||||
|
# here we create 2 clusters for the sake of simplicity
|
||||||
|
"clusters": np.asarray(
|
||||||
|
[
|
||||||
|
[0.8866443634033203, 0.6618829369544983, 0.3891746401786804],
|
||||||
|
[-0.6042559146881104, -0.02295008860528469, 0.5423797369003296],
|
||||||
|
]
|
||||||
|
),
|
||||||
|
"do_resize": self.do_resize,
|
||||||
|
"size": self.size,
|
||||||
|
"do_normalize": self.do_normalize,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
@require_vision
|
||||||
|
class ImageGPTFeatureExtractionTest(FeatureExtractionSavingTestMixin, unittest.TestCase):
|
||||||
|
|
||||||
|
feature_extraction_class = ImageGPTFeatureExtractor if is_vision_available() else None
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.feature_extract_tester = ImageGPTFeatureExtractionTester(self)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def feat_extract_dict(self):
|
||||||
|
return self.feature_extract_tester.prepare_feat_extract_dict()
|
||||||
|
|
||||||
|
def test_feat_extract_properties(self):
|
||||||
|
feature_extractor = self.feature_extraction_class(**self.feat_extract_dict)
|
||||||
|
self.assertTrue(hasattr(feature_extractor, "clusters"))
|
||||||
|
self.assertTrue(hasattr(feature_extractor, "do_resize"))
|
||||||
|
self.assertTrue(hasattr(feature_extractor, "size"))
|
||||||
|
self.assertTrue(hasattr(feature_extractor, "do_normalize"))
|
||||||
|
|
||||||
|
def test_feat_extract_to_json_string(self):
|
||||||
|
feat_extract = self.feature_extraction_class(**self.feat_extract_dict)
|
||||||
|
obj = json.loads(feat_extract.to_json_string())
|
||||||
|
for key, value in self.feat_extract_dict.items():
|
||||||
|
if key == "clusters":
|
||||||
|
self.assertTrue(np.array_equal(value, obj[key]))
|
||||||
|
else:
|
||||||
|
self.assertEqual(obj[key], value)
|
||||||
|
|
||||||
|
def test_feat_extract_to_json_file(self):
|
||||||
|
feat_extract_first = self.feature_extraction_class(**self.feat_extract_dict)
|
||||||
|
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdirname:
|
||||||
|
json_file_path = os.path.join(tmpdirname, "feat_extract.json")
|
||||||
|
feat_extract_first.to_json_file(json_file_path)
|
||||||
|
feat_extract_second = self.feature_extraction_class.from_json_file(json_file_path).to_dict()
|
||||||
|
|
||||||
|
feat_extract_first = feat_extract_first.to_dict()
|
||||||
|
for key, value in feat_extract_first.items():
|
||||||
|
if key == "clusters":
|
||||||
|
self.assertTrue(np.array_equal(value, feat_extract_second[key]))
|
||||||
|
else:
|
||||||
|
self.assertEqual(feat_extract_first[key], value)
|
||||||
|
|
||||||
|
def test_feat_extract_from_and_save_pretrained(self):
|
||||||
|
feat_extract_first = self.feature_extraction_class(**self.feat_extract_dict)
|
||||||
|
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdirname:
|
||||||
|
feat_extract_first.save_pretrained(tmpdirname)
|
||||||
|
feat_extract_second = self.feature_extraction_class.from_pretrained(tmpdirname).to_dict()
|
||||||
|
|
||||||
|
feat_extract_first = feat_extract_first.to_dict()
|
||||||
|
for key, value in feat_extract_first.items():
|
||||||
|
if key == "clusters":
|
||||||
|
self.assertTrue(np.array_equal(value, feat_extract_second[key]))
|
||||||
|
else:
|
||||||
|
self.assertEqual(feat_extract_first[key], value)
|
||||||
|
|
||||||
|
@unittest.skip("ImageGPT requires clusters at initialization")
|
||||||
|
def test_init_without_params(self):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def prepare_images():
|
||||||
|
dataset = load_dataset("hf-internal-testing/fixtures_image_utils", split="test")
|
||||||
|
|
||||||
|
image1 = Image.open(dataset[4]["file"])
|
||||||
|
image2 = Image.open(dataset[5]["file"])
|
||||||
|
|
||||||
|
images = [image1, image2]
|
||||||
|
|
||||||
|
return images
|
||||||
|
|
||||||
|
|
||||||
|
@require_vision
|
||||||
|
@require_torch
|
||||||
|
class ImageGPTFeatureExtractorIntegrationTest(unittest.TestCase):
|
||||||
|
@slow
|
||||||
|
def test_image(self):
|
||||||
|
feature_extractor = ImageGPTFeatureExtractor.from_pretrained("openai/imagegpt-small")
|
||||||
|
|
||||||
|
images = prepare_images()
|
||||||
|
|
||||||
|
# test non-batched
|
||||||
|
encoding = feature_extractor(images[0], return_tensors="pt")
|
||||||
|
|
||||||
|
self.assertIsInstance(encoding.pixel_values, torch.LongTensor)
|
||||||
|
self.assertEqual(encoding.pixel_values.shape, (1, 1024))
|
||||||
|
|
||||||
|
expected_slice = [306, 191, 191]
|
||||||
|
self.assertEqual(encoding.pixel_values[0, :3].tolist(), expected_slice)
|
||||||
|
|
||||||
|
# test batched
|
||||||
|
encoding = feature_extractor(images, return_tensors="pt")
|
||||||
|
|
||||||
|
self.assertIsInstance(encoding.pixel_values, torch.LongTensor)
|
||||||
|
self.assertEqual(encoding.pixel_values.shape, (2, 1024))
|
||||||
|
|
||||||
|
expected_slice = [303, 13, 13]
|
||||||
|
self.assertEqual(encoding.pixel_values[1, -3:].tolist(), expected_slice)
|
498
tests/test_modeling_imagegpt.py
Normal file
498
tests/test_modeling_imagegpt.py
Normal file
@ -0,0 +1,498 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import copy
|
||||||
|
import inspect
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from transformers import ImageGPTConfig, is_torch_available
|
||||||
|
from transformers.testing_utils import require_torch, slow, torch_device
|
||||||
|
|
||||||
|
from .test_configuration_common import ConfigTester
|
||||||
|
from .test_generation_utils import GenerationTesterMixin
|
||||||
|
from .test_modeling_common import ModelTesterMixin, _config_zero_init, floats_tensor, ids_tensor, random_attention_mask
|
||||||
|
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from transformers import (
|
||||||
|
IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
ImageGPTForCausalLM,
|
||||||
|
ImageGPTForImageClassification,
|
||||||
|
ImageGPTModel,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ImageGPTModelTester:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
parent,
|
||||||
|
batch_size=14,
|
||||||
|
seq_length=7,
|
||||||
|
is_training=True,
|
||||||
|
use_token_type_ids=True,
|
||||||
|
use_input_mask=True,
|
||||||
|
use_labels=True,
|
||||||
|
use_mc_token_ids=True,
|
||||||
|
vocab_size=99,
|
||||||
|
hidden_size=32,
|
||||||
|
num_hidden_layers=5,
|
||||||
|
num_attention_heads=4,
|
||||||
|
intermediate_size=37,
|
||||||
|
hidden_act="gelu",
|
||||||
|
hidden_dropout_prob=0.1,
|
||||||
|
attention_probs_dropout_prob=0.1,
|
||||||
|
max_position_embeddings=512,
|
||||||
|
type_vocab_size=16,
|
||||||
|
type_sequence_label_size=2,
|
||||||
|
initializer_range=0.02,
|
||||||
|
num_labels=3,
|
||||||
|
num_choices=4,
|
||||||
|
scope=None,
|
||||||
|
):
|
||||||
|
self.parent = parent
|
||||||
|
self.batch_size = batch_size
|
||||||
|
self.seq_length = seq_length
|
||||||
|
self.is_training = is_training
|
||||||
|
self.use_token_type_ids = use_token_type_ids
|
||||||
|
self.use_input_mask = use_input_mask
|
||||||
|
self.use_labels = use_labels
|
||||||
|
self.use_mc_token_ids = use_mc_token_ids
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.hidden_size = hidden_size
|
||||||
|
self.num_hidden_layers = num_hidden_layers
|
||||||
|
self.num_attention_heads = num_attention_heads
|
||||||
|
self.intermediate_size = intermediate_size
|
||||||
|
self.hidden_act = hidden_act
|
||||||
|
self.hidden_dropout_prob = hidden_dropout_prob
|
||||||
|
self.attention_probs_dropout_prob = attention_probs_dropout_prob
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.type_vocab_size = type_vocab_size
|
||||||
|
self.type_sequence_label_size = type_sequence_label_size
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.num_labels = num_labels
|
||||||
|
self.num_choices = num_choices
|
||||||
|
self.scope = None
|
||||||
|
|
||||||
|
def get_large_model_config(self):
|
||||||
|
return ImageGPTConfig.from_pretrained("imagegpt")
|
||||||
|
|
||||||
|
def prepare_config_and_inputs(
|
||||||
|
self, gradient_checkpointing=False, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False
|
||||||
|
):
|
||||||
|
pixel_values = ids_tensor([self.batch_size, self.seq_length], self.vocab_size - 1)
|
||||||
|
|
||||||
|
input_mask = None
|
||||||
|
if self.use_input_mask:
|
||||||
|
input_mask = random_attention_mask([self.batch_size, self.seq_length])
|
||||||
|
|
||||||
|
token_type_ids = None
|
||||||
|
if self.use_token_type_ids:
|
||||||
|
token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size)
|
||||||
|
|
||||||
|
mc_token_ids = None
|
||||||
|
if self.use_mc_token_ids:
|
||||||
|
mc_token_ids = ids_tensor([self.batch_size, self.num_choices], self.seq_length)
|
||||||
|
|
||||||
|
sequence_labels = None
|
||||||
|
token_labels = None
|
||||||
|
choice_labels = None
|
||||||
|
if self.use_labels:
|
||||||
|
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
|
||||||
|
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
|
||||||
|
choice_labels = ids_tensor([self.batch_size], self.num_choices)
|
||||||
|
|
||||||
|
config = self.get_config(
|
||||||
|
gradient_checkpointing=gradient_checkpointing,
|
||||||
|
scale_attn_by_inverse_layer_idx=scale_attn_by_inverse_layer_idx,
|
||||||
|
reorder_and_upcast_attn=reorder_and_upcast_attn,
|
||||||
|
)
|
||||||
|
|
||||||
|
head_mask = ids_tensor([self.num_hidden_layers, self.num_attention_heads], 2)
|
||||||
|
|
||||||
|
return (
|
||||||
|
config,
|
||||||
|
pixel_values,
|
||||||
|
input_mask,
|
||||||
|
head_mask,
|
||||||
|
token_type_ids,
|
||||||
|
mc_token_ids,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_config(
|
||||||
|
self, gradient_checkpointing=False, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False
|
||||||
|
):
|
||||||
|
return ImageGPTConfig(
|
||||||
|
vocab_size=self.vocab_size,
|
||||||
|
n_embd=self.hidden_size,
|
||||||
|
n_layer=self.num_hidden_layers,
|
||||||
|
n_head=self.num_attention_heads,
|
||||||
|
n_inner=self.intermediate_size,
|
||||||
|
activation_function=self.hidden_act,
|
||||||
|
resid_pdrop=self.hidden_dropout_prob,
|
||||||
|
attn_pdrop=self.attention_probs_dropout_prob,
|
||||||
|
n_positions=self.max_position_embeddings,
|
||||||
|
type_vocab_size=self.type_vocab_size,
|
||||||
|
initializer_range=self.initializer_range,
|
||||||
|
use_cache=True,
|
||||||
|
gradient_checkpointing=gradient_checkpointing,
|
||||||
|
scale_attn_by_inverse_layer_idx=scale_attn_by_inverse_layer_idx,
|
||||||
|
reorder_and_upcast_attn=reorder_and_upcast_attn,
|
||||||
|
)
|
||||||
|
|
||||||
|
def prepare_config_and_inputs_for_decoder(self):
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
pixel_values,
|
||||||
|
input_mask,
|
||||||
|
head_mask,
|
||||||
|
token_type_ids,
|
||||||
|
mc_token_ids,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
) = self.prepare_config_and_inputs()
|
||||||
|
|
||||||
|
encoder_hidden_states = floats_tensor([self.batch_size, self.seq_length, self.hidden_size])
|
||||||
|
encoder_attention_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2)
|
||||||
|
|
||||||
|
return (
|
||||||
|
config,
|
||||||
|
pixel_values,
|
||||||
|
input_mask,
|
||||||
|
head_mask,
|
||||||
|
token_type_ids,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
encoder_hidden_states,
|
||||||
|
encoder_attention_mask,
|
||||||
|
)
|
||||||
|
|
||||||
|
def create_and_check_imagegpt_model(self, config, pixel_values, input_mask, head_mask, token_type_ids, *args):
|
||||||
|
model = ImageGPTModel(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
result = model(pixel_values, token_type_ids=token_type_ids, head_mask=head_mask)
|
||||||
|
result = model(pixel_values, token_type_ids=token_type_ids)
|
||||||
|
result = model(pixel_values)
|
||||||
|
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
|
||||||
|
self.parent.assertEqual(len(result.past_key_values), config.n_layer)
|
||||||
|
|
||||||
|
def create_and_check_lm_head_model(self, config, pixel_values, input_mask, head_mask, token_type_ids, *args):
|
||||||
|
model = ImageGPTForCausalLM(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
labels = ids_tensor([self.batch_size, self.seq_length], self.vocab_size - 1)
|
||||||
|
result = model(pixel_values, token_type_ids=token_type_ids, labels=labels)
|
||||||
|
self.parent.assertEqual(result.loss.shape, ())
|
||||||
|
# ImageGPTForCausalLM doens't have tied input- and output embeddings
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size - 1))
|
||||||
|
|
||||||
|
def create_and_check_imagegpt_for_image_classification(
|
||||||
|
self, config, pixel_values, input_mask, head_mask, token_type_ids, mc_token_ids, sequence_labels, *args
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = ImageGPTForImageClassification(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(pixel_values, attention_mask=input_mask, token_type_ids=token_type_ids, labels=sequence_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_labels))
|
||||||
|
|
||||||
|
def prepare_config_and_inputs_for_common(self):
|
||||||
|
config_and_inputs = self.prepare_config_and_inputs()
|
||||||
|
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
pixel_values,
|
||||||
|
input_mask,
|
||||||
|
head_mask,
|
||||||
|
token_type_ids,
|
||||||
|
mc_token_ids,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
) = config_and_inputs
|
||||||
|
|
||||||
|
inputs_dict = {
|
||||||
|
"pixel_values": pixel_values,
|
||||||
|
"token_type_ids": token_type_ids,
|
||||||
|
"head_mask": head_mask,
|
||||||
|
}
|
||||||
|
|
||||||
|
return config, inputs_dict
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
class ImageGPTModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):
|
||||||
|
|
||||||
|
all_model_classes = (
|
||||||
|
(ImageGPTForCausalLM, ImageGPTForImageClassification, ImageGPTModel) if is_torch_available() else ()
|
||||||
|
)
|
||||||
|
all_generative_model_classes = (ImageGPTForCausalLM,) if is_torch_available() else ()
|
||||||
|
test_missing_keys = False
|
||||||
|
input_name = "pixel_values"
|
||||||
|
|
||||||
|
# as ImageGPTForImageClassification isn't included in any auto mapping, we add labels here
|
||||||
|
def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
|
||||||
|
inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
|
||||||
|
|
||||||
|
if return_labels:
|
||||||
|
if model_class.__name__ == "ImageGPTForImageClassification":
|
||||||
|
inputs_dict["labels"] = torch.zeros(
|
||||||
|
self.model_tester.batch_size, dtype=torch.long, device=torch_device
|
||||||
|
)
|
||||||
|
|
||||||
|
return inputs_dict
|
||||||
|
|
||||||
|
# we overwrite the _check_scores method of GenerationTesterMixin, as ImageGPTForCausalLM doesn't have tied input- and output embeddings
|
||||||
|
def _check_scores(self, batch_size, scores, length, config):
|
||||||
|
expected_shape = (batch_size, config.vocab_size - 1)
|
||||||
|
self.assertIsInstance(scores, tuple)
|
||||||
|
self.assertEqual(len(scores), length)
|
||||||
|
self.assertListEqual([iter_scores.shape for iter_scores in scores], [expected_shape] * len(scores))
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.model_tester = ImageGPTModelTester(self)
|
||||||
|
self.config_tester = ConfigTester(self, config_class=ImageGPTConfig, n_embd=37)
|
||||||
|
|
||||||
|
def test_config(self):
|
||||||
|
self.config_tester.run_common_tests()
|
||||||
|
|
||||||
|
def test_imagegpt_model(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_imagegpt_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_imagegpt_causal_lm(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_lm_head_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_imagegpt_image_classification(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_imagegpt_for_image_classification(*config_and_inputs)
|
||||||
|
|
||||||
|
@slow
|
||||||
|
def test_model_from_pretrained(self):
|
||||||
|
for model_name in IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
|
||||||
|
model = ImageGPTModel.from_pretrained(model_name)
|
||||||
|
self.assertIsNotNone(model)
|
||||||
|
|
||||||
|
def test_forward_signature(self):
|
||||||
|
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
|
||||||
|
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
model = model_class(config)
|
||||||
|
signature = inspect.signature(model.forward)
|
||||||
|
# signature.parameters is an OrderedDict => so arg_names order is deterministic
|
||||||
|
arg_names = [*signature.parameters.keys()]
|
||||||
|
|
||||||
|
expected_arg_names = ["pixel_values"]
|
||||||
|
self.assertListEqual(arg_names[:1], expected_arg_names)
|
||||||
|
|
||||||
|
def test_resize_tokens_embeddings(self):
|
||||||
|
(
|
||||||
|
original_config,
|
||||||
|
inputs_dict,
|
||||||
|
) = self.model_tester.prepare_config_and_inputs_for_common()
|
||||||
|
if not self.test_resize_embeddings:
|
||||||
|
return
|
||||||
|
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
config = copy.deepcopy(original_config)
|
||||||
|
model = model_class(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
|
||||||
|
if self.model_tester.is_training is False:
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
model_vocab_size = config.vocab_size
|
||||||
|
# Retrieve the embeddings and clone theme
|
||||||
|
model_embed = model.resize_token_embeddings(model_vocab_size)
|
||||||
|
cloned_embeddings = model_embed.weight.clone()
|
||||||
|
|
||||||
|
# Check that resizing the token embeddings with a larger vocab size increases the model's vocab size
|
||||||
|
model_embed = model.resize_token_embeddings(model_vocab_size + 10)
|
||||||
|
self.assertEqual(model.config.vocab_size, model_vocab_size + 10)
|
||||||
|
# Check that it actually resizes the embeddings matrix
|
||||||
|
self.assertEqual(model_embed.weight.shape[0], cloned_embeddings.shape[0] + 10)
|
||||||
|
# Check that the model can still do a forward pass successfully (every parameter should be resized)
|
||||||
|
model(**self._prepare_for_class(inputs_dict, model_class))
|
||||||
|
|
||||||
|
# Check that resizing the token embeddings with a smaller vocab size decreases the model's vocab size
|
||||||
|
model_embed = model.resize_token_embeddings(model_vocab_size - 15)
|
||||||
|
self.assertEqual(model.config.vocab_size, model_vocab_size - 15)
|
||||||
|
# Check that it actually resizes the embeddings matrix
|
||||||
|
self.assertEqual(model_embed.weight.shape[0], cloned_embeddings.shape[0] - 15)
|
||||||
|
|
||||||
|
# Check that the model can still do a forward pass successfully (every parameter should be resized)
|
||||||
|
# Input ids should be clamped to the maximum size of the vocabulary
|
||||||
|
inputs_dict["pixel_values"].clamp_(max=model_vocab_size - 15 - 1)
|
||||||
|
|
||||||
|
# Check that adding and removing tokens has not modified the first part of the embedding matrix.
|
||||||
|
models_equal = True
|
||||||
|
for p1, p2 in zip(cloned_embeddings, model_embed.weight):
|
||||||
|
if p1.data.ne(p2.data).sum() > 0:
|
||||||
|
models_equal = False
|
||||||
|
|
||||||
|
self.assertTrue(models_equal)
|
||||||
|
|
||||||
|
def test_resize_embeddings_untied(self):
|
||||||
|
(
|
||||||
|
original_config,
|
||||||
|
inputs_dict,
|
||||||
|
) = self.model_tester.prepare_config_and_inputs_for_common()
|
||||||
|
if not self.test_resize_embeddings:
|
||||||
|
return
|
||||||
|
|
||||||
|
original_config.tie_word_embeddings = False
|
||||||
|
|
||||||
|
# if model cannot untied embeddings -> leave test
|
||||||
|
if original_config.tie_word_embeddings:
|
||||||
|
return
|
||||||
|
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
config = copy.deepcopy(original_config)
|
||||||
|
model = model_class(config).to(torch_device)
|
||||||
|
|
||||||
|
# if no output embeddings -> leave test
|
||||||
|
if model.get_output_embeddings() is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check that resizing the token embeddings with a larger vocab size increases the model's vocab size
|
||||||
|
model_vocab_size = config.vocab_size
|
||||||
|
model.resize_token_embeddings(model_vocab_size + 10)
|
||||||
|
self.assertEqual(model.config.vocab_size, model_vocab_size + 10)
|
||||||
|
output_embeds = model.get_output_embeddings()
|
||||||
|
self.assertEqual(output_embeds.weight.shape[0], model_vocab_size + 10)
|
||||||
|
# Check bias if present
|
||||||
|
if output_embeds.bias is not None:
|
||||||
|
self.assertEqual(output_embeds.bias.shape[0], model_vocab_size + 10)
|
||||||
|
# Check that the model can still do a forward pass successfully (every parameter should be resized)
|
||||||
|
model(**self._prepare_for_class(inputs_dict, model_class))
|
||||||
|
|
||||||
|
# Check that resizing the token embeddings with a smaller vocab size decreases the model's vocab size
|
||||||
|
model.resize_token_embeddings(model_vocab_size - 15)
|
||||||
|
self.assertEqual(model.config.vocab_size, model_vocab_size - 15)
|
||||||
|
# Check that it actually resizes the embeddings matrix
|
||||||
|
output_embeds = model.get_output_embeddings()
|
||||||
|
self.assertEqual(output_embeds.weight.shape[0], model_vocab_size - 15)
|
||||||
|
# Check bias if present
|
||||||
|
if output_embeds.bias is not None:
|
||||||
|
self.assertEqual(output_embeds.bias.shape[0], model_vocab_size - 15)
|
||||||
|
# Check that the model can still do a forward pass successfully (every parameter should be resized)
|
||||||
|
# Input ids should be clamped to the maximum size of the vocabulary
|
||||||
|
inputs_dict["pixel_values"].clamp_(max=model_vocab_size - 15 - 1)
|
||||||
|
# Check that the model can still do a forward pass successfully (every parameter should be resized)
|
||||||
|
model(**self._prepare_for_class(inputs_dict, model_class))
|
||||||
|
|
||||||
|
def test_inputs_embeds(self):
|
||||||
|
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
|
||||||
|
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
model = model_class(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
inputs = copy.deepcopy(self._prepare_for_class(inputs_dict, model_class))
|
||||||
|
|
||||||
|
pixel_values = inputs["pixel_values"]
|
||||||
|
del inputs["pixel_values"]
|
||||||
|
|
||||||
|
wte = model.get_input_embeddings()
|
||||||
|
inputs["inputs_embeds"] = wte(pixel_values)
|
||||||
|
|
||||||
|
with torch.no_grad():
|
||||||
|
model(**inputs)[0]
|
||||||
|
|
||||||
|
def _create_and_check_torchscript(self, config, inputs_dict):
|
||||||
|
if not self.test_torchscript:
|
||||||
|
return
|
||||||
|
|
||||||
|
configs_no_init = _config_zero_init(config) # To be sure we have no Nan
|
||||||
|
configs_no_init.torchscript = True
|
||||||
|
for model_class in self.all_model_classes:
|
||||||
|
model = model_class(config=configs_no_init)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
inputs = self._prepare_for_class(inputs_dict, model_class)
|
||||||
|
|
||||||
|
try:
|
||||||
|
pixel_values = inputs["pixel_values"]
|
||||||
|
traced_model = torch.jit.trace(model, pixel_values)
|
||||||
|
except RuntimeError:
|
||||||
|
self.fail("Couldn't trace module.")
|
||||||
|
|
||||||
|
with tempfile.TemporaryDirectory() as tmp_dir_name:
|
||||||
|
pt_file_name = os.path.join(tmp_dir_name, "traced_model.pt")
|
||||||
|
|
||||||
|
try:
|
||||||
|
torch.jit.save(traced_model, pt_file_name)
|
||||||
|
except Exception:
|
||||||
|
self.fail("Couldn't save module.")
|
||||||
|
|
||||||
|
try:
|
||||||
|
loaded_model = torch.jit.load(pt_file_name)
|
||||||
|
except Exception:
|
||||||
|
self.fail("Couldn't load module.")
|
||||||
|
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
loaded_model.to(torch_device)
|
||||||
|
loaded_model.eval()
|
||||||
|
|
||||||
|
model_state_dict = model.state_dict()
|
||||||
|
loaded_model_state_dict = loaded_model.state_dict()
|
||||||
|
|
||||||
|
non_persistent_buffers = {}
|
||||||
|
for key in loaded_model_state_dict.keys():
|
||||||
|
if key not in model_state_dict.keys():
|
||||||
|
non_persistent_buffers[key] = loaded_model_state_dict[key]
|
||||||
|
|
||||||
|
loaded_model_state_dict = {
|
||||||
|
key: value for key, value in loaded_model_state_dict.items() if key not in non_persistent_buffers
|
||||||
|
}
|
||||||
|
|
||||||
|
self.assertEqual(set(model_state_dict.keys()), set(loaded_model_state_dict.keys()))
|
||||||
|
|
||||||
|
model_buffers = list(model.buffers())
|
||||||
|
for non_persistent_buffer in non_persistent_buffers.values():
|
||||||
|
found_buffer = False
|
||||||
|
for i, model_buffer in enumerate(model_buffers):
|
||||||
|
if torch.equal(non_persistent_buffer, model_buffer):
|
||||||
|
found_buffer = True
|
||||||
|
break
|
||||||
|
|
||||||
|
self.assertTrue(found_buffer)
|
||||||
|
model_buffers.pop(i)
|
||||||
|
|
||||||
|
models_equal = True
|
||||||
|
for layer_name, p1 in model_state_dict.items():
|
||||||
|
if layer_name in loaded_model_state_dict:
|
||||||
|
p2 = loaded_model_state_dict[layer_name]
|
||||||
|
if p1.data.ne(p2.data).sum() > 0:
|
||||||
|
models_equal = False
|
||||||
|
|
||||||
|
self.assertTrue(models_equal)
|
@ -100,6 +100,7 @@ TEST_FILES_WITH_NO_COMMON_TESTS = [
|
|||||||
# should **not** be the rule.
|
# should **not** be the rule.
|
||||||
IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||||
# models to ignore for model xxx mapping
|
# models to ignore for model xxx mapping
|
||||||
|
"ImageGPTForImageClassification",
|
||||||
"SegformerDecodeHead",
|
"SegformerDecodeHead",
|
||||||
"SegformerForSemanticSegmentation",
|
"SegformerForSemanticSegmentation",
|
||||||
"BeitForSemanticSegmentation",
|
"BeitForSemanticSegmentation",
|
||||||
|
Loading…
Reference in New Issue
Block a user