diff --git a/README.md b/README.md index 6e21a36e800..06c17f76325 100644 --- a/README.md +++ b/README.md @@ -238,6 +238,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 1. **[EncoderDecoder](https://huggingface.co/transformers/model_doc/encoderdecoder.html)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. +1. **[FNet](https://huggingface.co/transformers/master/model_doc/fnet.html)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. diff --git a/README_zh-hans.md b/README_zh-hans.md index 62e0a48a068..a3f3c221be2 100644 --- a/README_zh-hans.md +++ b/README_zh-hans.md @@ -260,6 +260,7 @@ conda install -c huggingface transformers 1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。 1. **[EncoderDecoder](https://huggingface.co/transformers/model_doc/encoderdecoder.html)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。 1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。 +1. **[FNet](https://huggingface.co/transformers/master/model_doc/fnet.html)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。 1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。 1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。 1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。 diff --git a/README_zh-hant.md b/README_zh-hant.md index 8230813e34c..444acedb2a5 100644 --- a/README_zh-hant.md +++ b/README_zh-hant.md @@ -272,6 +272,7 @@ conda install -c huggingface transformers 1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[EncoderDecoder](https://huggingface.co/transformers/model_doc/encoderdecoder.html)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. +1. **[FNet](https://huggingface.co/transformers/master/model_doc/fnet.html)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. diff --git a/docs/source/index.rst b/docs/source/index.rst index 39e1ea5ff09..c80219e76ee 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -185,137 +185,140 @@ Supported models 28. :doc:`FlauBERT ` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model Pre-training for French `__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. -29. :doc:`Funnel Transformer ` (from CMU/Google Brain) released with the paper `Funnel-Transformer: +29. `FNet `__ (from Google Research) released with the + paper `FNet: Mixing Tokens with Fourier Transforms `__ by James Lee-Thorp, Joshua + Ainslie, Ilya Eckstein, Santiago Ontanon. +30. :doc:`Funnel Transformer ` (from CMU/Google Brain) released with the paper `Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing `__ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -30. :doc:`GPT ` (from OpenAI) released with the paper `Improving Language Understanding by Generative +31. :doc:`GPT ` (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `__ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -31. :doc:`GPT-2 ` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask +32. :doc:`GPT-2 ` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners `__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -32. :doc:`GPT-J ` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax +33. :doc:`GPT-J ` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax `__ by Ben Wang and Aran Komatsuzaki. -33. :doc:`GPT Neo ` (from EleutherAI) released in the repository `EleutherAI/gpt-neo +34. :doc:`GPT Neo ` (from EleutherAI) released in the repository `EleutherAI/gpt-neo `__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -34. :doc:`Hubert ` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech +35. :doc:`Hubert ` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units `__ by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -35. :doc:`I-BERT ` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization +36. :doc:`I-BERT ` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization `__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -36. :doc:`LayoutLM ` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training +37. :doc:`LayoutLM ` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding `__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -37. :doc:`LayoutLMv2 ` (from Microsoft Research Asia) released with the paper `LayoutLMv2: +38. :doc:`LayoutLMv2 ` (from Microsoft Research Asia) released with the paper `LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding `__ by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -38. :doc:`LayoutXLM ` (from Microsoft Research Asia) released with the paper `LayoutXLM: +39. :doc:`LayoutXLM ` (from Microsoft Research Asia) released with the paper `LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding `__ by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -39. :doc:`LED ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer +40. :doc:`LED ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. -40. :doc:`Longformer ` (from AllenAI) released with the paper `Longformer: The Long-Document +41. :doc:`Longformer ` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan. -41. :doc:`LUKE ` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity +42. :doc:`LUKE ` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention `__ by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -42. :doc:`LXMERT ` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality +43. :doc:`LXMERT ` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering `__ by Hao Tan and Mohit Bansal. -43. :doc:`M2M100 ` (from Facebook) released with the paper `Beyond English-Centric Multilingual +44. :doc:`M2M100 ` (from Facebook) released with the paper `Beyond English-Centric Multilingual Machine Translation `__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -44. :doc:`MarianMT ` Machine translation models trained using `OPUS `__ data by +45. :doc:`MarianMT ` Machine translation models trained using `OPUS `__ data by Jörg Tiedemann. The `Marian Framework `__ is being developed by the Microsoft Translator Team. -45. :doc:`MBart ` (from Facebook) released with the paper `Multilingual Denoising Pre-training for +46. :doc:`MBart ` (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation `__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -46. :doc:`MBart-50 ` (from Facebook) released with the paper `Multilingual Translation with Extensible +47. :doc:`MBart-50 ` (from Facebook) released with the paper `Multilingual Translation with Extensible Multilingual Pretraining and Finetuning `__ by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -47. :doc:`Megatron-BERT ` (from NVIDIA) released with the paper `Megatron-LM: Training +48. :doc:`Megatron-BERT ` (from NVIDIA) released with the paper `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -48. :doc:`Megatron-GPT2 ` (from NVIDIA) released with the paper `Megatron-LM: Training +49. :doc:`Megatron-GPT2 ` (from NVIDIA) released with the paper `Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism `__ by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -49. :doc:`MPNet ` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted +50. :doc:`MPNet ` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted Pre-training for Language Understanding `__ by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -50. :doc:`MT5 ` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained +51. :doc:`MT5 ` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained text-to-text transformer `__ by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -51. :doc:`Pegasus ` (from Google) released with the paper `PEGASUS: Pre-training with Extracted +52. :doc:`Pegasus ` (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization `__ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -52. :doc:`ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting +53. :doc:`ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -53. :doc:`Reformer ` (from Google Research) released with the paper `Reformer: The Efficient +54. :doc:`Reformer ` (from Google Research) released with the paper `Reformer: The Efficient Transformer `__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. -54. :doc:`RemBERT ` (from Google Research) released with the paper `Rethinking embedding coupling in +55. :doc:`RemBERT ` (from Google Research) released with the paper `Rethinking embedding coupling in pre-trained language models `__ by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder. -55. :doc:`RoBERTa ` (from Facebook), released together with the paper a `Robustly Optimized BERT +56. :doc:`RoBERTa ` (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -56. :doc:`RoFormer ` (from ZhuiyiTechnology), released together with the paper a `RoFormer: +57. :doc:`RoFormer ` (from ZhuiyiTechnology), released together with the paper a `RoFormer: Enhanced Transformer with Rotary Position Embedding `__ by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -57. `SpeechEncoderDecoder `__ -58. :doc:`SpeechToTextTransformer ` (from Facebook), released together with the paper +58. `SpeechEncoderDecoder `__ +59. :doc:`SpeechToTextTransformer ` (from Facebook), released together with the paper `fairseq S2T: Fast Speech-to-Text Modeling with fairseq `__ by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -59. `SpeechToTextTransformer2 `__ (from +60. `SpeechToTextTransformer2 `__ (from Facebook), released together with the paper `Large-Scale Self- and Semi-Supervised Learning for Speech Translation `__ by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -60. :doc:`Splinter ` (from Tel Aviv University), released together with the paper `Few-Shot +61. :doc:`Splinter ` (from Tel Aviv University), released together with the paper `Few-Shot Question Answering by Pretraining Span Selection `__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -61. :doc:`SqueezeBert ` (from Berkeley) released with the paper `SqueezeBERT: What can computer +62. :doc:`SqueezeBert ` (from Berkeley) released with the paper `SqueezeBERT: What can computer vision teach NLP about efficient neural networks? `__ by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -62. :doc:`T5 ` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a +63. :doc:`T5 ` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer `__ by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -63. :doc:`T5v1.1 ` (from Google AI) released in the repository +64. :doc:`T5v1.1 ` (from Google AI) released in the repository `google-research/text-to-text-transfer-transformer `__ by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -64. :doc:`TAPAS ` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via +65. :doc:`TAPAS ` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via Pre-training `__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. -65. :doc:`Transformer-XL ` (from Google/CMU) released with the paper `Transformer-XL: +66. :doc:`Transformer-XL ` (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context `__ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -66. :doc:`Vision Transformer (ViT) ` (from Google AI) released with the paper `An Image is Worth 16x16 +67. :doc:`Vision Transformer (ViT) ` (from Google AI) released with the paper `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale `__ by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -67. :doc:`VisualBERT ` (from UCLA NLP) released with the paper `VisualBERT: A Simple and +68. :doc:`VisualBERT ` (from UCLA NLP) released with the paper `VisualBERT: A Simple and Performant Baseline for Vision and Language `__ by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -68. :doc:`Wav2Vec2 ` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for +69. :doc:`Wav2Vec2 ` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations `__ by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -69. :doc:`XLM ` (from Facebook) released together with the paper `Cross-lingual Language Model +70. :doc:`XLM ` (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining `__ by Guillaume Lample and Alexis Conneau. -70. :doc:`XLM-ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: +71. :doc:`XLM-ProphetNet ` (from Microsoft Research) released with the paper `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -71. :doc:`XLM-RoBERTa ` (from Facebook AI), released together with the paper `Unsupervised +72. :doc:`XLM-RoBERTa ` (from Facebook AI), released together with the paper `Unsupervised Cross-lingual Representation Learning at Scale `__ by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -72. :doc:`XLNet ` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive +73. :doc:`XLNet ` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive Pretraining for Language Understanding `__ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -73. :doc:`XLSR-Wav2Vec2 ` (from Facebook AI) released with the paper `Unsupervised +74. :doc:`XLSR-Wav2Vec2 ` (from Facebook AI) released with the paper `Unsupervised Cross-Lingual Representation Learning For Speech Recognition `__ by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. @@ -383,6 +386,8 @@ Flax), PyTorch, and/or TensorFlow. +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +| FNet | ✅ | ✅ | ✅ | ❌ | ❌ | ++-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ | GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ | @@ -575,6 +580,7 @@ Flax), PyTorch, and/or TensorFlow. model_doc/electra model_doc/encoderdecoder model_doc/flaubert + model_doc/fnet model_doc/fsmt model_doc/funnel model_doc/herbert diff --git a/docs/source/model_doc/fnet.rst b/docs/source/model_doc/fnet.rst new file mode 100644 index 00000000000..743b34333ae --- /dev/null +++ b/docs/source/model_doc/fnet.rst @@ -0,0 +1,121 @@ +.. + Copyright 2021 The HuggingFace Team. All rights reserved. + + Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on + an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the + specific language governing permissions and limitations under the License. + +FNet +----------------------------------------------------------------------------------------------------------------------- + +Overview +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The FNet model was proposed in `FNet: Mixing Tokens with Fourier Transforms `__ by +James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT +model with a fourier transform which returns only the real parts of the transform. The model is significantly faster +than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97% +accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model. The abstract from the +paper is the following: + +*We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the +self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with +standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text +classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder +with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE +benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths, +our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena +benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all +sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint +and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models +outperform Transformer counterparts.* + +Tips on usage: + +- The model was trained without an attention mask as it is based on Fourier Transform. The model was trained with + maximum sequence length 512 which includes pad tokens. Hence, it is highly recommended to use the same maximum + sequence length for fine-tuning and inference. + +This model was contributed by `gchhablani `__. The original code can be found `here +`__. + +FNetConfig +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetConfig + :members: + + +FNetTokenizer +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetTokenizer + :members: build_inputs_with_special_tokens, get_special_tokens_mask, + create_token_type_ids_from_sequences, save_vocabulary + + +FNetTokenizerFast +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetTokenizerFast + :members: + + +FNetModel +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetModel + :members: forward + + +FNetForPreTraining +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForPreTraining + :members: forward + + +FNetForMaskedLM +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForMaskedLM + :members: forward + + +FNetForNextSentencePrediction +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForNextSentencePrediction + :members: forward + +FNetForSequenceClassification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForSequenceClassification + :members: forward + + +FNetForMultipleChoice +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForMultipleChoice + :members: forward + + +FNetForTokenClassification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForTokenClassification + :members: forward + + +FNetForQuestionAnswering +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.FNetForQuestionAnswering + :members: forward diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py index f67da668b11..5c633e8e9b3 100755 --- a/src/transformers/__init__.py +++ b/src/transformers/__init__.py @@ -209,6 +209,7 @@ _import_structure = { "models.electra": ["ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP", "ElectraConfig", "ElectraTokenizer"], "models.encoder_decoder": ["EncoderDecoderConfig"], "models.flaubert": ["FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FlaubertConfig", "FlaubertTokenizer"], + "models.fnet": ["FNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "FNetConfig", "FNetTokenizer"], "models.fsmt": ["FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP", "FSMTConfig", "FSMTTokenizer"], "models.funnel": ["FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP", "FunnelConfig", "FunnelTokenizer"], "models.gpt2": ["GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPT2Config", "GPT2Tokenizer"], @@ -365,6 +366,7 @@ else: # tokenizers-backed objects if is_tokenizers_available(): # Fast tokenizers + _import_structure["models.fnet"].append("FNetTokenizerFast") _import_structure["models.roformer"].append("RoFormerTokenizerFast") _import_structure["models.clip"].append("CLIPTokenizerFast") _import_structure["models.convbert"].append("ConvBertTokenizerFast") @@ -800,6 +802,21 @@ if is_torch_available(): "FlaubertWithLMHeadModel", ] ) + _import_structure["models.fnet"].extend( + [ + "FNET_PRETRAINED_MODEL_ARCHIVE_LIST", + "FNetForMaskedLM", + "FNetForMultipleChoice", + "FNetForNextSentencePrediction", + "FNetForPreTraining", + "FNetForQuestionAnswering", + "FNetForSequenceClassification", + "FNetForTokenClassification", + "FNetLayer", + "FNetModel", + "FNetPreTrainedModel", + ] + ) _import_structure["models.fsmt"].extend(["FSMTForConditionalGeneration", "FSMTModel", "PretrainedFSMTModel"]) _import_structure["models.funnel"].extend( [ @@ -2000,6 +2017,7 @@ if TYPE_CHECKING: from .models.electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig, ElectraTokenizer from .models.encoder_decoder import EncoderDecoderConfig from .models.flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig, FlaubertTokenizer + from .models.fnet import FNET_PRETRAINED_CONFIG_ARCHIVE_MAP, FNetConfig, FNetTokenizer from .models.fsmt import FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP, FSMTConfig, FSMTTokenizer from .models.funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig, FunnelTokenizer from .models.gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config, GPT2Tokenizer @@ -2161,6 +2179,7 @@ if TYPE_CHECKING: from .models.distilbert import DistilBertTokenizerFast from .models.dpr import DPRContextEncoderTokenizerFast, DPRQuestionEncoderTokenizerFast, DPRReaderTokenizerFast from .models.electra import ElectraTokenizerFast + from .models.fnet import FNetTokenizerFast from .models.funnel import FunnelTokenizerFast from .models.gpt2 import GPT2TokenizerFast from .models.herbert import HerbertTokenizerFast @@ -2505,6 +2524,19 @@ if TYPE_CHECKING: FlaubertModel, FlaubertWithLMHeadModel, ) + from .models.fnet import ( + FNET_PRETRAINED_MODEL_ARCHIVE_LIST, + FNetForMaskedLM, + FNetForMultipleChoice, + FNetForNextSentencePrediction, + FNetForPreTraining, + FNetForQuestionAnswering, + FNetForSequenceClassification, + FNetForTokenClassification, + FNetLayer, + FNetModel, + FNetPreTrainedModel, + ) from .models.fsmt import FSMTForConditionalGeneration, FSMTModel, PretrainedFSMTModel from .models.funnel import ( FUNNEL_PRETRAINED_MODEL_ARCHIVE_LIST, diff --git a/src/transformers/convert_slow_tokenizer.py b/src/transformers/convert_slow_tokenizer.py index c0f5af86cd1..4dd528c55ed 100644 --- a/src/transformers/convert_slow_tokenizer.py +++ b/src/transformers/convert_slow_tokenizer.py @@ -526,6 +526,18 @@ class AlbertConverter(SpmConverter): ) +class FNetConverter(SpmConverter): + def post_processor(self): + return processors.TemplateProcessing( + single="[CLS]:0 $A:0 [SEP]:0", + pair="[CLS]:0 $A:0 [SEP]:0 $B:1 [SEP]:1", + special_tokens=[ + ("[CLS]", self.original_tokenizer.convert_tokens_to_ids("[CLS]")), + ("[SEP]", self.original_tokenizer.convert_tokens_to_ids("[SEP]")), + ], + ) + + class BarthezConverter(SpmConverter): def unk_id(self, proto): unk_id = 3 @@ -896,6 +908,7 @@ SLOW_TO_FAST_CONVERTERS = { "DPRQuestionEncoderTokenizer": BertConverter, "DPRContextEncoderTokenizer": BertConverter, "ElectraTokenizer": BertConverter, + "FNetTokenizer": FNetConverter, "FunnelTokenizer": FunnelConverter, "GPT2Tokenizer": GPT2Converter, "HerbertTokenizer": HerbertConverter, diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py index 75fcbfa6376..9268916cb79 100644 --- a/src/transformers/models/__init__.py +++ b/src/transformers/models/__init__.py @@ -46,6 +46,7 @@ from . import ( electra, encoder_decoder, flaubert, + fnet, fsmt, funnel, gpt2, diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py index 27e1c633583..44774960453 100644 --- a/src/transformers/models/auto/configuration_auto.py +++ b/src/transformers/models/auto/configuration_auto.py @@ -26,6 +26,7 @@ from ...file_utils import CONFIG_NAME CONFIG_MAPPING_NAMES = OrderedDict( [ # Add configs here + ("fnet", "FNetConfig"), ("gptj", "GPTJConfig"), ("layoutlmv2", "LayoutLMv2Config"), ("beit", "BeitConfig"), @@ -99,6 +100,7 @@ CONFIG_MAPPING_NAMES = OrderedDict( CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict( [ # Add archive maps here + ("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("pegasus", "PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("gptj", "GPTJ_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("layoutlmv2", "LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP"), @@ -164,6 +166,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict( MODEL_NAMES_MAPPING = OrderedDict( [ # Add full (and cased) model names here + ("fnet", "FNet"), ("gptj", "GPT-J"), ("beit", "BeiT"), ("rembert", "RemBERT"), diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py index 2f8f099e3d8..eb971c96898 100644 --- a/src/transformers/models/auto/modeling_auto.py +++ b/src/transformers/models/auto/modeling_auto.py @@ -28,6 +28,7 @@ logger = logging.get_logger(__name__) MODEL_MAPPING_NAMES = OrderedDict( [ # Base model mapping + ("fnet", "FNetModel"), ("gptj", "GPTJModel"), ("layoutlmv2", "LayoutLMv2Model"), ("beit", "BeitModel"), @@ -97,6 +98,7 @@ MODEL_MAPPING_NAMES = OrderedDict( MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict( [ # Model for pre-training mapping + ("fnet", "FNetForPreTraining"), ("visual_bert", "VisualBertForPreTraining"), ("layoutlm", "LayoutLMForMaskedLM"), ("retribert", "RetriBertModel"), @@ -136,6 +138,7 @@ MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict( MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict( [ # Model with LM heads mapping + ("fnet", "FNetForMaskedLM"), ("gptj", "GPTJForCausalLM"), ("rembert", "RemBertForMaskedLM"), ("roformer", "RoFormerForMaskedLM"), @@ -228,6 +231,7 @@ MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = OrderedDict( MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict( [ # Model for Masked LM mapping + ("fnet", "FNetForMaskedLM"), ("rembert", "RemBertForMaskedLM"), ("roformer", "RoFormerForMaskedLM"), ("big_bird", "BigBirdForMaskedLM"), @@ -290,6 +294,7 @@ MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES = OrderedDict( MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict( [ # Model for Sequence Classification mapping + ("fnet", "FNetForSequenceClassification"), ("gptj", "GPTJForSequenceClassification"), ("layoutlmv2", "LayoutLMv2ForSequenceClassification"), ("rembert", "RemBertForSequenceClassification"), @@ -334,6 +339,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict( MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict( [ # Model for Question Answering mapping + ("fnet", "FNetForQuestionAnswering"), ("layoutlmv2", "LayoutLMv2ForQuestionAnswering"), ("rembert", "RemBertForQuestionAnswering"), ("canine", "CanineForQuestionAnswering"), @@ -379,6 +385,7 @@ MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict( MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict( [ # Model for Token Classification mapping + ("fnet", "FNetForTokenClassification"), ("layoutlmv2", "LayoutLMv2ForTokenClassification"), ("rembert", "RemBertForTokenClassification"), ("canine", "CanineForTokenClassification"), @@ -412,6 +419,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict( MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict( [ # Model for Multiple Choice mapping + ("fnet", "FNetForMultipleChoice"), ("rembert", "RemBertForMultipleChoice"), ("canine", "CanineForMultipleChoice"), ("roformer", "RoFormerForMultipleChoice"), @@ -440,6 +448,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict( MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict( [ ("bert", "BertForNextSentencePrediction"), + ("fnet", "FNetForNextSentencePrediction"), ("megatron-bert", "MegatronBertForNextSentencePrediction"), ("mobilebert", "MobileBertForNextSentencePrediction"), ] diff --git a/src/transformers/models/auto/tokenization_auto.py b/src/transformers/models/auto/tokenization_auto.py index c8751df1745..a706532e9ed 100644 --- a/src/transformers/models/auto/tokenization_auto.py +++ b/src/transformers/models/auto/tokenization_auto.py @@ -51,6 +51,7 @@ if TYPE_CHECKING: else: TOKENIZER_MAPPING_NAMES = OrderedDict( [ + ("fnet", ("FNetTokenizer", "FNetTokenizerFast" if is_tokenizers_available() else None)), ("retribert", ("RetriBertTokenizer", "RetriBertTokenizerFast" if is_tokenizers_available() else None)), ("roformer", ("RoFormerTokenizer", "RoFormerTokenizerFast" if is_tokenizers_available() else None)), ( diff --git a/src/transformers/models/fnet/__init__.py b/src/transformers/models/fnet/__init__.py new file mode 100644 index 00000000000..a28047df9ee --- /dev/null +++ b/src/transformers/models/fnet/__init__.py @@ -0,0 +1,73 @@ +# flake8: noqa +# There's no way to ignore "F401 '...' imported but unused" warnings in this +# module, but to preserve other warnings. So, don't check this module at all. + +# Copyright 2021 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import TYPE_CHECKING + +from ...file_utils import _LazyModule, is_tokenizers_available, is_torch_available + + +_import_structure = { + "configuration_fnet": ["FNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "FNetConfig"], + "tokenization_fnet": ["FNetTokenizer"], +} + +if is_tokenizers_available(): + _import_structure["tokenization_fnet_fast"] = ["FNetTokenizerFast"] + +if is_torch_available(): + _import_structure["modeling_fnet"] = [ + "FNET_PRETRAINED_MODEL_ARCHIVE_LIST", + "FNetForMaskedLM", + "FNetForMultipleChoice", + "FNetForNextSentencePrediction", + "FNetForPreTraining", + "FNetForQuestionAnswering", + "FNetForSequenceClassification", + "FNetForTokenClassification", + "FNetLayer", + "FNetModel", + "FNetPreTrainedModel", + ] + + +if TYPE_CHECKING: + from .configuration_fnet import FNET_PRETRAINED_CONFIG_ARCHIVE_MAP, FNetConfig + from .tokenization_fnet import FNetTokenizer + + if is_tokenizers_available(): + from .tokenization_fnet_fast import FNetTokenizerFast + + if is_torch_available(): + from .modeling_fnet import ( + FNET_PRETRAINED_MODEL_ARCHIVE_LIST, + FNetForMaskedLM, + FNetForMultipleChoice, + FNetForNextSentencePrediction, + FNetForPreTraining, + FNetForQuestionAnswering, + FNetForSequenceClassification, + FNetForTokenClassification, + FNetLayer, + FNetModel, + FNetPreTrainedModel, + ) + + +else: + import sys + + sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure) diff --git a/src/transformers/models/fnet/configuration_fnet.py b/src/transformers/models/fnet/configuration_fnet.py new file mode 100644 index 00000000000..047190a3ed2 --- /dev/null +++ b/src/transformers/models/fnet/configuration_fnet.py @@ -0,0 +1,125 @@ +# coding=utf-8 +# Copyright 2021 Google AI and The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" FNet model configuration """ + +from ...configuration_utils import PretrainedConfig +from ...utils import logging + + +logger = logging.get_logger(__name__) + +FNET_PRETRAINED_CONFIG_ARCHIVE_MAP = { + "google/fnet-base": "https://huggingface.co/google/fnet-base/resolve/main/config.json", + "google/fnet-large": "https://huggingface.co/google/fnet-large/resolve/main/config.json" + # See all FNet models at https://huggingface.co/models?filter=fnet +} + + +class FNetConfig(PretrainedConfig): + r""" + This is the configuration class to store the configuration of a :class:`~transformers.FNetModel`. It is used to + instantiate an FNet model according to the specified arguments, defining the model architecture. Instantiating a + configuration with the defaults will yield a similar configuration to that of the FNet `fnet-base + `__ architecture. + + Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model + outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. + + + Args: + vocab_size (:obj:`int`, `optional`, defaults to 32000): + Vocabulary size of the FNet model. Defines the number of different tokens that can be represented by the + :obj:`inputs_ids` passed when calling :class:`~transformers.FNetModel` or + :class:`~transformers.TFFNetModel`. + hidden_size (:obj:`int`, `optional`, defaults to 768): + Dimension of the encoder layers and the pooler layer. + num_hidden_layers (:obj:`int`, `optional`, defaults to 12): + Number of hidden layers in the Transformer encoder. + intermediate_size (:obj:`int`, `optional`, defaults to 3072): + Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. + hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu_new"`): + The non-linear activation function (function or string) in the encoder and pooler. If string, + :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported. + hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): + The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. + max_position_embeddings (:obj:`int`, `optional`, defaults to 512): + The maximum sequence length that this model might ever be used with. Typically set this to something large + just in case (e.g., 512 or 1024 or 2048). + type_vocab_size (:obj:`int`, `optional`, defaults to 4): + The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.FNetModel` or + :class:`~transformers.TFFNetModel`. + initializer_range (:obj:`float`, `optional`, defaults to 0.02): + The standard deviation of the truncated_normal_initializer for initializing all weight matrices. + layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12): + The epsilon used by the layer normalization layers. + gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`): + If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass. + use_tpu_fourier_optimizations (:obj:`bool`, `optional`, defaults to :obj:`False`): + Determines whether to use TPU optimized FFTs. If :obj:`True`, the model will favor axis-wise FFTs + transforms. Set to :obj:`False` for GPU/CPU hardware, in which case n-dimensional FFTs are used. + tpu_short_seq_length (:obj:`int`, `optional`, defaults to 512): + The sequence length that is expected by the model when using TPUs. This will be used to initialize the DFT + matrix only when `use_tpu_fourier_optimizations` is set to :obj:`True` and the input sequence is shorter + than or equal to 4096 tokens. + + Example:: + + + >>> from transformers import FNetModel, FNetConfig + + >>> # Initializing a FNet fnet-base style configuration + >>> configuration = FNetConfig() + + >>> # Initializing a model from the fnet-base style configuration + >>> model = FNetModel(configuration) + + >>> # Accessing the model configuration + >>> configuration = model.config + """ + model_type = "fnet" + + def __init__( + self, + vocab_size=32000, + hidden_size=768, + num_hidden_layers=12, + intermediate_size=3072, + hidden_act="gelu_new", + hidden_dropout_prob=0.1, + max_position_embeddings=512, + type_vocab_size=4, + initializer_range=0.02, + layer_norm_eps=1e-12, + use_tpu_fourier_optimizations=False, + tpu_short_seq_length=512, + pad_token_id=3, + bos_token_id=1, + eos_token_id=2, + **kwargs + ): + super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs) + + self.vocab_size = vocab_size + self.max_position_embeddings = max_position_embeddings + self.hidden_size = hidden_size + self.num_hidden_layers = num_hidden_layers + self.intermediate_size = intermediate_size + self.hidden_act = hidden_act + self.hidden_dropout_prob = hidden_dropout_prob + self.initializer_range = initializer_range + self.type_vocab_size = type_vocab_size + self.layer_norm_eps = layer_norm_eps + self.use_tpu_fourier_optimizations = use_tpu_fourier_optimizations + self.tpu_short_seq_length = tpu_short_seq_length diff --git a/src/transformers/models/fnet/convert_fnet_original_flax_checkpoint_to_pytorch.py b/src/transformers/models/fnet/convert_fnet_original_flax_checkpoint_to_pytorch.py new file mode 100644 index 00000000000..ffb5667f843 --- /dev/null +++ b/src/transformers/models/fnet/convert_fnet_original_flax_checkpoint_to_pytorch.py @@ -0,0 +1,155 @@ +# coding=utf-8 +# Copyright 2021 The HuggingFace Inc. team. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Convert FNet checkpoint.""" + + +import argparse + +import torch + +from flax.training.checkpoints import restore_checkpoint +from transformers import FNetConfig, FNetForPreTraining +from transformers.utils import logging + + +logging.set_verbosity_info() + + +def convert_flax_checkpoint_to_pytorch(flax_checkpoint_path, fnet_config_file, save_path): + # Initialise PyTorch model + config = FNetConfig.from_json_file(fnet_config_file) + print(f"Building PyTorch model from configuration: {config}") + fnet_pretraining_model = FNetForPreTraining(config) + + checkpoint_dict = restore_checkpoint(flax_checkpoint_path, None) + pretrained_model_params = checkpoint_dict["target"] + + # Embeddings + # Position IDs + state_dict = fnet_pretraining_model.state_dict() + + position_ids = state_dict["fnet.embeddings.position_ids"] + new_state_dict = {"fnet.embeddings.position_ids": position_ids} + # Embedding Layers + new_state_dict["fnet.embeddings.word_embeddings.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["word"]["embedding"] + ) + new_state_dict["fnet.embeddings.position_embeddings.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["position"]["embedding"][0] + ) + new_state_dict["fnet.embeddings.token_type_embeddings.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["type"]["embedding"] + ) + new_state_dict["fnet.embeddings.projection.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["hidden_mapping_in"]["kernel"] + ).T + new_state_dict["fnet.embeddings.projection.bias"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["hidden_mapping_in"]["bias"] + ) + new_state_dict["fnet.embeddings.LayerNorm.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["layer_norm"]["scale"] + ) + new_state_dict["fnet.embeddings.LayerNorm.bias"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["layer_norm"]["bias"] + ) + + # Encoder Layers + for layer in range(config.num_hidden_layers): + new_state_dict[f"fnet.encoder.layer.{layer}.fourier.output.LayerNorm.weight"] = torch.tensor( + pretrained_model_params["encoder"][f"encoder_{layer}"]["mixing_layer_norm"]["scale"] + ) + new_state_dict[f"fnet.encoder.layer.{layer}.fourier.output.LayerNorm.bias"] = torch.tensor( + pretrained_model_params["encoder"][f"encoder_{layer}"]["mixing_layer_norm"]["bias"] + ) + + new_state_dict[f"fnet.encoder.layer.{layer}.intermediate.dense.weight"] = torch.tensor( + pretrained_model_params["encoder"][f"feed_forward_{layer}"]["intermediate"]["kernel"] + ).T + new_state_dict[f"fnet.encoder.layer.{layer}.intermediate.dense.bias"] = torch.tensor( + pretrained_model_params["encoder"][f"feed_forward_{layer}"]["intermediate"]["bias"] + ) + + new_state_dict[f"fnet.encoder.layer.{layer}.output.dense.weight"] = torch.tensor( + pretrained_model_params["encoder"][f"feed_forward_{layer}"]["output"]["kernel"] + ).T + new_state_dict[f"fnet.encoder.layer.{layer}.output.dense.bias"] = torch.tensor( + pretrained_model_params["encoder"][f"feed_forward_{layer}"]["output"]["bias"] + ) + + new_state_dict[f"fnet.encoder.layer.{layer}.output.LayerNorm.weight"] = torch.tensor( + pretrained_model_params["encoder"][f"encoder_{layer}"]["output_layer_norm"]["scale"] + ) + new_state_dict[f"fnet.encoder.layer.{layer}.output.LayerNorm.bias"] = torch.tensor( + pretrained_model_params["encoder"][f"encoder_{layer}"]["output_layer_norm"]["bias"] + ) + + # Pooler Layers + new_state_dict["fnet.pooler.dense.weight"] = torch.tensor(pretrained_model_params["encoder"]["pooler"]["kernel"]).T + new_state_dict["fnet.pooler.dense.bias"] = torch.tensor(pretrained_model_params["encoder"]["pooler"]["bias"]) + + # Masked LM Layers + new_state_dict["cls.predictions.transform.dense.weight"] = torch.tensor( + pretrained_model_params["predictions_dense"]["kernel"] + ).T + new_state_dict["cls.predictions.transform.dense.bias"] = torch.tensor( + pretrained_model_params["predictions_dense"]["bias"] + ) + new_state_dict["cls.predictions.transform.LayerNorm.weight"] = torch.tensor( + pretrained_model_params["predictions_layer_norm"]["scale"] + ) + new_state_dict["cls.predictions.transform.LayerNorm.bias"] = torch.tensor( + pretrained_model_params["predictions_layer_norm"]["bias"] + ) + new_state_dict["cls.predictions.decoder.weight"] = torch.tensor( + pretrained_model_params["encoder"]["embedder"]["word"]["embedding"] + ) + new_state_dict["cls.predictions.decoder.bias"] = torch.tensor( + pretrained_model_params["predictions_output"]["output_bias"] + ) + new_state_dict["cls.predictions.bias"] = torch.tensor(pretrained_model_params["predictions_output"]["output_bias"]) + + # Seq Relationship Layers + new_state_dict["cls.seq_relationship.weight"] = torch.tensor( + pretrained_model_params["classification"]["output_kernel"] + ) + new_state_dict["cls.seq_relationship.bias"] = torch.tensor( + pretrained_model_params["classification"]["output_bias"] + ) + + # Load State Dict + fnet_pretraining_model.load_state_dict(new_state_dict) + + # Save PreTrained + print(f"Saving pretrained model to {save_path}") + fnet_pretraining_model.save_pretrained(save_path) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + # Required parameters + parser.add_argument( + "--flax_checkpoint_path", default=None, type=str, required=True, help="Path to the TensorFlow checkpoint path." + ) + parser.add_argument( + "--fnet_config_file", + default=None, + type=str, + required=True, + help="The config json file corresponding to the pre-trained FNet model. \n" + "This specifies the model architecture.", + ) + parser.add_argument("--save_path", default=None, type=str, required=True, help="Path to the output model.") + args = parser.parse_args() + convert_flax_checkpoint_to_pytorch(args.flax_checkpoint_path, args.fnet_config_file, args.save_path) diff --git a/src/transformers/models/fnet/modeling_fnet.py b/src/transformers/models/fnet/modeling_fnet.py new file mode 100755 index 00000000000..2a1b7f5f2ab --- /dev/null +++ b/src/transformers/models/fnet/modeling_fnet.py @@ -0,0 +1,1175 @@ +# coding=utf-8 +# Copyright 2021 Google Research and The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" PyTorch FNet model. """ + +import warnings +from dataclasses import dataclass +from functools import partial +from typing import Optional, Tuple + +import torch +import torch.utils.checkpoint +from packaging import version +from torch import nn +from torch.nn import CrossEntropyLoss, MSELoss + +from ...file_utils import is_scipy_available + + +if is_scipy_available(): + from scipy import linalg + +from ...activations import ACT2FN +from ...file_utils import ( + add_code_sample_docstrings, + add_start_docstrings, + add_start_docstrings_to_model_forward, + replace_return_docstrings, +) +from ...modeling_outputs import ( + BaseModelOutput, + BaseModelOutputWithPooling, + MaskedLMOutput, + ModelOutput, + MultipleChoiceModelOutput, + NextSentencePredictorOutput, + QuestionAnsweringModelOutput, + SequenceClassifierOutput, + TokenClassifierOutput, +) +from ...modeling_utils import PreTrainedModel, apply_chunking_to_forward +from ...utils import logging +from .configuration_fnet import FNetConfig + + +logger = logging.get_logger(__name__) + +_CHECKPOINT_FOR_DOC = "google/fnet-base" +_CONFIG_FOR_DOC = "FNetConfig" +_TOKENIZER_FOR_DOC = "FNetTokenizer" + +FNET_PRETRAINED_MODEL_ARCHIVE_LIST = [ + "google/fnet-base", + "google/fnet-large" + # See all FNet models at https://huggingface.co/models?filter=fnet +] + + +# Adapted from https://github.com/google-research/google-research/blob/master/f_net/fourier.py +def _two_dim_matmul(x, matrix_dim_one, matrix_dim_two): + """Applies 2D matrix multiplication to 3D input arrays.""" + seq_length = x.shape[1] + matrix_dim_one = matrix_dim_one[:seq_length, :seq_length] + x = x.type(torch.complex64) + return torch.einsum("bij,jk,ni->bnk", x, matrix_dim_two, matrix_dim_one) + + +# # Adapted from https://github.com/google-research/google-research/blob/master/f_net/fourier.py +def two_dim_matmul(x, matrix_dim_one, matrix_dim_two): + return _two_dim_matmul(x, matrix_dim_one, matrix_dim_two) + + +# Adapted from https://github.com/google-research/google-research/blob/master/f_net/fourier.py +def fftn(x): + """ + Applies n-dimensional Fast Fourier Transform (FFT) to input array. + + Args: + x: Input n-dimensional array. + + Returns: + n-dimensional Fourier transform of input n-dimensional array. + """ + out = x + for axis in reversed(range(x.ndim)[1:]): # We don't need to apply FFT to last axis + out = torch.fft.fft(out, axis=axis) + return out + + +class FNetEmbeddings(nn.Module): + """Construct the embeddings from word, position and token_type embeddings.""" + + def __init__(self, config): + super().__init__() + self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id) + self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size) + self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size) + + # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load + # any TensorFlow checkpoint file + self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps) + # NOTE: This is the project layer and will be needed. The original code allows for different embedding and different model dimensions. + self.projection = nn.Linear(config.hidden_size, config.hidden_size) + self.dropout = nn.Dropout(config.hidden_dropout_prob) + + # position_ids (1, len position emb) is contiguous in memory and exported when serialized + self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1))) + + if version.parse(torch.__version__) > version.parse("1.6.0"): + self.register_buffer( + "token_type_ids", + torch.zeros(self.position_ids.size(), dtype=torch.long, device=self.position_ids.device), + persistent=False, + ) + + def forward(self, input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None): + if input_ids is not None: + input_shape = input_ids.size() + else: + input_shape = inputs_embeds.size()[:-1] + + seq_length = input_shape[1] + + if position_ids is None: + position_ids = self.position_ids[:, :seq_length] + + # Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs + # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves + # issue #5664 + if token_type_ids is None: + if hasattr(self, "token_type_ids"): + buffered_token_type_ids = self.token_type_ids[:, :seq_length] + buffered_token_type_ids_expanded = buffered_token_type_ids.expand(input_shape[0], seq_length) + token_type_ids = buffered_token_type_ids_expanded + else: + token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device) + + if inputs_embeds is None: + inputs_embeds = self.word_embeddings(input_ids) + token_type_embeddings = self.token_type_embeddings(token_type_ids) + + embeddings = inputs_embeds + token_type_embeddings + + position_embeddings = self.position_embeddings(position_ids) + embeddings += position_embeddings + embeddings = self.LayerNorm(embeddings) + embeddings = self.projection(embeddings) + embeddings = self.dropout(embeddings) + return embeddings + + +class FNetBasicFourierTransform(nn.Module): + def __init__(self, config): + super().__init__() + self._init_fourier_transform(config) + + def _init_fourier_transform(self, config): + if not config.use_tpu_fourier_optimizations: + self.fourier_transform = partial(torch.fft.fftn, dim=(1, 2)) + elif config.max_position_embeddings <= 4096: + if is_scipy_available(): + self.register_buffer( + "dft_mat_hidden", torch.tensor(linalg.dft(config.hidden_size), dtype=torch.complex64) + ) + self.register_buffer( + "dft_mat_seq", torch.tensor(linalg.dft(config.tpu_short_sequence_length), dtype=torch.complex64) + ) + self.fourier_transform = partial( + two_dim_matmul, matrix_dim_one=self.dft_mat_seq, matrix_dim_two=self.dft_mat_hidden + ) + else: + logging.warning( + "SciPy is needed for DFT matrix calculation and is not found. Using TPU optimized fast fourier transform instead." + ) + self.fourier_transform = fftn + else: + self.fourier_transform = fftn + + def forward(self, hidden_states): + + # NOTE: We do not use torch.vmap as it is not integrated into PyTorch stable versions. + # Interested users can modify the code to use vmap from the nightly versions, getting the vmap from here: + # https://pytorch.org/docs/master/generated/torch.vmap.html. Note that fourier transform methods will need + # change accordingly. + + outputs = self.fourier_transform(hidden_states).real + return (outputs,) + + +class FNetBasicOutput(nn.Module): + def __init__(self, config): + super().__init__() + self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps) + + def forward(self, hidden_states, input_tensor): + hidden_states = self.LayerNorm(input_tensor + hidden_states) + return hidden_states + + +class FNetFourierTransform(nn.Module): + def __init__(self, config): + super().__init__() + self.self = FNetBasicFourierTransform(config) + self.output = FNetBasicOutput(config) + + def forward(self, hidden_states): + self_outputs = self.self(hidden_states) + fourier_output = self.output(self_outputs[0], hidden_states) + outputs = (fourier_output,) + return outputs + + +# Copied from transformers.models.bert.modeling_bert.BertIntermediate with Bert->FNet +class FNetIntermediate(nn.Module): + def __init__(self, config): + super().__init__() + self.dense = nn.Linear(config.hidden_size, config.intermediate_size) + if isinstance(config.hidden_act, str): + self.intermediate_act_fn = ACT2FN[config.hidden_act] + else: + self.intermediate_act_fn = config.hidden_act + + def forward(self, hidden_states): + hidden_states = self.dense(hidden_states) + hidden_states = self.intermediate_act_fn(hidden_states) + return hidden_states + + +# Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->FNet +class FNetOutput(nn.Module): + def __init__(self, config): + super().__init__() + self.dense = nn.Linear(config.intermediate_size, config.hidden_size) + self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps) + self.dropout = nn.Dropout(config.hidden_dropout_prob) + + def forward(self, hidden_states, input_tensor): + hidden_states = self.dense(hidden_states) + hidden_states = self.dropout(hidden_states) + hidden_states = self.LayerNorm(hidden_states + input_tensor) + return hidden_states + + +class FNetLayer(nn.Module): + def __init__(self, config): + super().__init__() + self.chunk_size_feed_forward = config.chunk_size_feed_forward + self.seq_len_dim = 1 # The dimension which has the sequence length + self.fourier = FNetFourierTransform(config) + self.intermediate = FNetIntermediate(config) + self.output = FNetOutput(config) + + def forward(self, hidden_states): + self_fourier_outputs = self.fourier(hidden_states) + fourier_output = self_fourier_outputs[0] + + layer_output = apply_chunking_to_forward( + self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, fourier_output + ) + + outputs = (layer_output,) + + return outputs + + def feed_forward_chunk(self, fourier_output): + intermediate_output = self.intermediate(fourier_output) + layer_output = self.output(intermediate_output, fourier_output) + return layer_output + + +class FNetEncoder(nn.Module): + def __init__(self, config): + super().__init__() + self.config = config + self.layer = nn.ModuleList([FNetLayer(config) for _ in range(config.num_hidden_layers)]) + + def forward(self, hidden_states, output_hidden_states=False, return_dict=True): + all_hidden_states = () if output_hidden_states else None + + for i, layer_module in enumerate(self.layer): + if output_hidden_states: + all_hidden_states = all_hidden_states + (hidden_states,) + + if getattr(self.config, "gradient_checkpointing", False) and self.training: + + def create_custom_forward(module): + def custom_forward(*inputs): + return module(*inputs) + + return custom_forward + + layer_outputs = torch.utils.checkpoint.checkpoint(create_custom_forward(layer_module), hidden_states) + else: + layer_outputs = layer_module(hidden_states) + + hidden_states = layer_outputs[0] + + if output_hidden_states: + all_hidden_states = all_hidden_states + (hidden_states,) + + if not return_dict: + return tuple(v for v in [hidden_states, all_hidden_states] if v is not None) + + return BaseModelOutput(last_hidden_state=hidden_states, hidden_states=all_hidden_states) + + +# Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->FNet +class FNetPooler(nn.Module): + def __init__(self, config): + super().__init__() + self.dense = nn.Linear(config.hidden_size, config.hidden_size) + self.activation = nn.Tanh() + + def forward(self, hidden_states): + # We "pool" the model by simply taking the hidden state corresponding + # to the first token. + first_token_tensor = hidden_states[:, 0] + pooled_output = self.dense(first_token_tensor) + pooled_output = self.activation(pooled_output) + return pooled_output + + +# Copied from transformers.models.bert.modeling_bert.BertPredictionHeadTransform with Bert->FNet +class FNetPredictionHeadTransform(nn.Module): + def __init__(self, config): + super().__init__() + self.dense = nn.Linear(config.hidden_size, config.hidden_size) + if isinstance(config.hidden_act, str): + self.transform_act_fn = ACT2FN[config.hidden_act] + else: + self.transform_act_fn = config.hidden_act + self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps) + + def forward(self, hidden_states): + hidden_states = self.dense(hidden_states) + hidden_states = self.transform_act_fn(hidden_states) + hidden_states = self.LayerNorm(hidden_states) + return hidden_states + + +class FNetLMPredictionHead(nn.Module): + def __init__(self, config): + super().__init__() + self.transform = FNetPredictionHeadTransform(config) + + # The output weights are the same as the input embeddings, but there is + # an output-only bias for each token. + self.decoder = nn.Linear(config.hidden_size, config.vocab_size) + + self.bias = nn.Parameter(torch.zeros(config.vocab_size)) + self.decoder.bias = self.bias + + def forward(self, hidden_states): + hidden_states = self.transform(hidden_states) + hidden_states = self.decoder(hidden_states) + return hidden_states + + def _tie_weights(self): + # To tie those two weights if they get disconnected (on TPU or when the bias is resized) + self.bias = self.decoder.bias + + +class FNetOnlyMLMHead(nn.Module): + def __init__(self, config): + super().__init__() + self.predictions = FNetLMPredictionHead(config) + + def forward(self, sequence_output): + prediction_scores = self.predictions(sequence_output) + return prediction_scores + + +# Copied from transformers.models.bert.modeling_bert.BertOnlyNSPHead with Bert->FNet +class FNetOnlyNSPHead(nn.Module): + def __init__(self, config): + super().__init__() + self.seq_relationship = nn.Linear(config.hidden_size, 2) + + def forward(self, pooled_output): + seq_relationship_score = self.seq_relationship(pooled_output) + return seq_relationship_score + + +# Copied from transformers.models.bert.modeling_bert.BertPreTrainingHeads with Bert->FNet +class FNetPreTrainingHeads(nn.Module): + def __init__(self, config): + super().__init__() + self.predictions = FNetLMPredictionHead(config) + self.seq_relationship = nn.Linear(config.hidden_size, 2) + + def forward(self, sequence_output, pooled_output): + prediction_scores = self.predictions(sequence_output) + seq_relationship_score = self.seq_relationship(pooled_output) + return prediction_scores, seq_relationship_score + + +class FNetPreTrainedModel(PreTrainedModel): + """ + An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained + models. + """ + + config_class = FNetConfig + base_model_prefix = "fnet" + _keys_to_ignore_on_load_missing = [r"position_ids"] + + def _init_weights(self, module): + """Initialize the weights""" + if isinstance(module, nn.Linear): + # Slightly different from the TF version which uses truncated_normal for initialization + # cf https://github.com/pytorch/pytorch/pull/5617 + module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) + # NOTE: Original code uses same initialization as weights for biases as well. + if module.bias is not None: + module.bias.data.zero_() + elif isinstance(module, nn.Embedding): + module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) + if module.padding_idx is not None: + module.weight.data[module.padding_idx].zero_() + elif isinstance(module, nn.LayerNorm): + module.bias.data.zero_() + module.weight.data.fill_(1.0) + + +@dataclass +class FNetForPreTrainingOutput(ModelOutput): + """ + Output type of :class:`~transformers.FNetForPreTraining`. + + Args: + loss (`optional`, returned when ``labels`` is provided, ``torch.FloatTensor`` of shape :obj:`(1,)`): + Total loss as the sum of the masked language modeling loss and the next sequence prediction + (classification) loss. + prediction_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`): + Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). + seq_relationship_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, 2)`): + Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation + before SoftMax). + hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): + Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) + of shape :obj:`(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of + each layer plus the initial embedding outputs. + """ + + loss: Optional[torch.FloatTensor] = None + prediction_logits: torch.FloatTensor = None + seq_relationship_logits: torch.FloatTensor = None + hidden_states: Optional[Tuple[torch.FloatTensor]] = None + + +FNET_START_DOCSTRING = r""" + This model is a PyTorch `torch.nn.Module `_ sub-class. Use + it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and + behavior. + + Parameters: + config (:class:`~transformers.FNetConfig`): Model configuration class with all the parameters of the model. + Initializing with a config file does not load the weights associated with the model, only the + configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model + weights. +""" + +FNET_INPUTS_DOCSTRING = r""" + Args: + input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`): + Indices of input sequence tokens in the vocabulary. + + Indices can be obtained using :class:`transformers.FNetTokenizer`. See + :func:`transformers.PreTrainedTokenizer.encode` and :func:`transformers.PreTrainedTokenizer.__call__` for + details. + + `What are input IDs? <../glossary.html#input-ids>`__ + token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`): + Segment token indices to indicate first and second portions of the inputs. Indices are selected in ``[0, + 1]``: + + - 0 corresponds to a `sentence A` token, + - 1 corresponds to a `sentence B` token. + + `What are token type IDs? <../glossary.html#token-type-ids>`_ + position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`): + Indices of positions of each input sequence tokens in the position embeddings. Selected in the range ``[0, + config.max_position_embeddings - 1]``. + + `What are position IDs? <../glossary.html#position-ids>`_ + + inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`): + Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation. + This is useful if you want more control over how to convert `input_ids` indices into associated vectors + than the model's internal embedding lookup matrix. + output_hidden_states (:obj:`bool`, `optional`): + Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for + more detail. + return_dict (:obj:`bool`, `optional`): + Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple. +""" + + +@add_start_docstrings( + "The bare FNet Model transformer outputting raw hidden-states without any specific head on top.", + FNET_START_DOCSTRING, +) +class FNetModel(FNetPreTrainedModel): + """ + + The model can behave as an encoder, following the architecture described in `FNet: Mixing Tokens with Fourier + Transforms `__ by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago + Ontanon. + + """ + + def __init__(self, config, add_pooling_layer=True): + super().__init__(config) + self.config = config + + self.embeddings = FNetEmbeddings(config) + self.encoder = FNetEncoder(config) + + self.pooler = FNetPooler(config) if add_pooling_layer else None + + self.init_weights() + + def get_input_embeddings(self): + return self.embeddings.word_embeddings + + def set_input_embeddings(self, value): + self.embeddings.word_embeddings = value + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=BaseModelOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + output_hidden_states=None, + return_dict=None, + ): + output_hidden_states = ( + output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states + ) + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + if input_ids is not None and inputs_embeds is not None: + raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") + elif input_ids is not None: + input_shape = input_ids.size() + batch_size, seq_length = input_shape + elif inputs_embeds is not None: + input_shape = inputs_embeds.size()[:-1] + batch_size, seq_length = input_shape + else: + raise ValueError("You have to specify either input_ids or inputs_embeds") + + if ( + self.config.use_tpu_fourier_optimizations + and seq_length <= 4096 + and self.config.tpu_short_seq_length != seq_length + ): + raise ValueError( + "The `tpu_short_seq_length` in FNetConfig should be set equal to the sequence length being passed to the model when using TPU optimizations." + ) + + device = input_ids.device if input_ids is not None else inputs_embeds.device + + if token_type_ids is None: + if hasattr(self.embeddings, "token_type_ids"): + buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length] + buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length) + token_type_ids = buffered_token_type_ids_expanded + else: + token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device) + + embedding_output = self.embeddings( + input_ids=input_ids, + position_ids=position_ids, + token_type_ids=token_type_ids, + inputs_embeds=inputs_embeds, + ) + encoder_outputs = self.encoder( + embedding_output, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + sequence_output = encoder_outputs[0] + + pooler_output = self.pooler(sequence_output) if self.pooler is not None else None + + if not return_dict: + return (sequence_output, pooler_output) + encoder_outputs[1:] + + return BaseModelOutputWithPooling( + last_hidden_state=sequence_output, + pooler_output=pooler_output, + hidden_states=encoder_outputs.hidden_states, + ) + + +@add_start_docstrings( + """ + FNet Model with two heads on top as done during the pretraining: a `masked language modeling` head and a `next + sentence prediction (classification)` head. + """, + FNET_START_DOCSTRING, +) +class FNetForPreTraining(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + + self.fnet = FNetModel(config) + self.cls = FNetPreTrainingHeads(config) + + self.init_weights() + + def get_output_embeddings(self): + return self.cls.predictions.decoder + + def set_output_embeddings(self, new_embeddings): + self.cls.predictions.decoder = new_embeddings + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @replace_return_docstrings(output_type=FNetForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + next_sentence_label=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + labels (:obj:`torch.LongTensor` of shape ``(batch_size, sequence_length)``, `optional`): + Labels for computing the masked language modeling loss. Indices should be in ``[-100, 0, ..., + config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are ignored + (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]`` + next_sentence_label (``torch.LongTensor`` of shape ``(batch_size,)``, `optional`): + Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair + (see :obj:`input_ids` docstring) Indices should be in ``[0, 1]``: + + - 0 indicates sequence B is a continuation of sequence A, + - 1 indicates sequence B is a random sequence. + kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`): + Used to hide legacy arguments that have been deprecated. + + Returns: + + Example:: + + + >>> from transformers import FNetTokenizer, FNetForPreTraining + >>> import torch + >>> tokenizer = FNetTokenizer.from_pretrained('google/fnet-base') + >>> model = FNetForPreTraining.from_pretrained('google/fnet-base') + >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") + >>> outputs = model(**inputs) + >>> prediction_logits = outputs.prediction_logits + >>> seq_relationship_logits = outputs.seq_relationship_logits + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + sequence_output, pooled_output = outputs[:2] + prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output) + + total_loss = None + if labels is not None and next_sentence_label is not None: + loss_fct = CrossEntropyLoss() + masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) + next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) + total_loss = masked_lm_loss + next_sentence_loss + + if not return_dict: + output = (prediction_scores, seq_relationship_score) + outputs[2:] + return ((total_loss,) + output) if total_loss is not None else output + + return FNetForPreTrainingOutput( + loss=total_loss, + prediction_logits=prediction_scores, + seq_relationship_logits=seq_relationship_score, + hidden_states=outputs.hidden_states, + ) + + +@add_start_docstrings("""FNet Model with a `language modeling` head on top. """, FNET_START_DOCSTRING) +class FNetForMaskedLM(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + + self.fnet = FNetModel(config) + self.cls = FNetOnlyMLMHead(config) + + self.init_weights() + + def get_output_embeddings(self): + return self.cls.predictions.decoder + + def set_output_embeddings(self, new_embeddings): + self.cls.predictions.decoder = new_embeddings + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=MaskedLMOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): + Labels for computing the masked language modeling loss. Indices should be in ``[-100, 0, ..., + config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are ignored + (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``. + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + sequence_output = outputs[0] + prediction_scores = self.cls(sequence_output) + + masked_lm_loss = None + if labels is not None: + loss_fct = CrossEntropyLoss() # -100 index = padding token + masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) + + if not return_dict: + output = (prediction_scores,) + outputs[2:] + return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output + + return MaskedLMOutput(loss=masked_lm_loss, logits=prediction_scores, hidden_states=outputs.hidden_states) + + +@add_start_docstrings( + """FNet Model with a `next sentence prediction (classification)` head on top. """, + FNET_START_DOCSTRING, +) +class FNetForNextSentencePrediction(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + + self.fnet = FNetModel(config) + self.cls = FNetOnlyNSPHead(config) + + self.init_weights() + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @replace_return_docstrings(output_type=NextSentencePredictorOutput, config_class=_CONFIG_FOR_DOC) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + output_hidden_states=None, + return_dict=None, + **kwargs, + ): + r""" + labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): + Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair + (see ``input_ids`` docstring). Indices should be in ``[0, 1]``: + + - 0 indicates sequence B is a continuation of sequence A, + - 1 indicates sequence B is a random sequence. + + Returns: + + Example:: + + + >>> from transformers import FNetTokenizer, FNetForNextSentencePrediction + >>> import torch + >>> tokenizer = FNetTokenizer.from_pretrained('google/fnet-base') + >>> model = FNetForNextSentencePrediction.from_pretrained('google/fnet-base') + >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." + >>> next_sentence = "The sky is blue due to the shorter wavelength of blue light." + >>> encoding = tokenizer(prompt, next_sentence, return_tensors='pt') + >>> outputs = model(**encoding, labels=torch.LongTensor([1])) + >>> logits = outputs.logits + >>> assert logits[0, 0] < logits[0, 1] # next sentence was random + """ + + if "next_sentence_label" in kwargs: + warnings.warn( + "The `next_sentence_label` argument is deprecated and will be removed in a future version, use `labels` instead.", + FutureWarning, + ) + labels = kwargs.pop("next_sentence_label") + + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + pooled_output = outputs[1] + + seq_relationship_scores = self.cls(pooled_output) + + next_sentence_loss = None + if labels is not None: + loss_fct = CrossEntropyLoss() + next_sentence_loss = loss_fct(seq_relationship_scores.view(-1, 2), labels.view(-1)) + + if not return_dict: + output = (seq_relationship_scores,) + outputs[2:] + return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output + + return NextSentencePredictorOutput( + loss=next_sentence_loss, + logits=seq_relationship_scores, + hidden_states=outputs.hidden_states, + ) + + +@add_start_docstrings( + """ + FNet Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled + output) e.g. for GLUE tasks. + """, + FNET_START_DOCSTRING, +) +class FNetForSequenceClassification(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + self.num_labels = config.num_labels + self.fnet = FNetModel(config) + + self.dropout = nn.Dropout(config.hidden_dropout_prob) + self.classifier = nn.Linear(config.hidden_size, config.num_labels) + + self.init_weights() + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=SequenceClassifierOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): + Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ..., + config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), + If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + pooled_output = outputs[1] + pooled_output = self.dropout(pooled_output) + logits = self.classifier(pooled_output) + + loss = None + if labels is not None: + if self.num_labels == 1: + # We are doing regression + loss_fct = MSELoss() + loss = loss_fct(logits.view(-1), labels.view(-1)) + else: + loss_fct = CrossEntropyLoss() + loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) + + if not return_dict: + output = (logits,) + outputs[2:] + return ((loss,) + output) if loss is not None else output + + return SequenceClassifierOutput(loss=loss, logits=logits, hidden_states=outputs.hidden_states) + + +@add_start_docstrings( + """ + FNet Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a + softmax) e.g. for RocStories/SWAG tasks. + """, + FNET_START_DOCSTRING, +) +class FNetForMultipleChoice(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + + self.fnet = FNetModel(config) + self.dropout = nn.Dropout(config.hidden_dropout_prob) + self.classifier = nn.Linear(config.hidden_size, 1) + + self.init_weights() + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=MultipleChoiceModelOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): + Labels for computing the multiple choice classification loss. Indices should be in ``[0, ..., + num_choices-1]`` where :obj:`num_choices` is the size of the second dimension of the input tensors. (See + :obj:`input_ids` above) + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] + + input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None + token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None + position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None + inputs_embeds = ( + inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1)) + if inputs_embeds is not None + else None + ) + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + pooled_output = outputs[1] + + pooled_output = self.dropout(pooled_output) + logits = self.classifier(pooled_output) + reshaped_logits = logits.view(-1, num_choices) + + loss = None + if labels is not None: + loss_fct = CrossEntropyLoss() + loss = loss_fct(reshaped_logits, labels) + + if not return_dict: + output = (reshaped_logits,) + outputs[2:] + return ((loss,) + output) if loss is not None else output + + return MultipleChoiceModelOutput(loss=loss, logits=reshaped_logits, hidden_states=outputs.hidden_states) + + +@add_start_docstrings( + """ + FNet Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for + Named-Entity-Recognition (NER) tasks. + """, + FNET_START_DOCSTRING, +) +class FNetForTokenClassification(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + self.num_labels = config.num_labels + + self.fnet = FNetModel(config) + + self.dropout = nn.Dropout(config.hidden_dropout_prob) + self.classifier = nn.Linear(config.hidden_size, config.num_labels) + + self.init_weights() + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=TokenClassifierOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + labels=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): + Labels for computing the token classification loss. Indices should be in ``[0, ..., config.num_labels - + 1]``. + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + sequence_output = outputs[0] + + sequence_output = self.dropout(sequence_output) + logits = self.classifier(sequence_output) + + loss = None + if labels is not None: + loss_fct = CrossEntropyLoss() + # Only keep active parts of the loss + loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) + + if not return_dict: + output = (logits,) + outputs[2:] + return ((loss,) + output) if loss is not None else output + + return TokenClassifierOutput(loss=loss, logits=logits, hidden_states=outputs.hidden_states) + + +@add_start_docstrings( + """ + FNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear + layers on top of the hidden-states output to compute `span start logits` and `span end logits`). + """, + FNET_START_DOCSTRING, +) +class FNetForQuestionAnswering(FNetPreTrainedModel): + def __init__(self, config): + super().__init__(config) + + self.num_labels = config.num_labels + + self.fnet = FNetModel(config) + self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels) + + self.init_weights() + + @add_start_docstrings_to_model_forward(FNET_INPUTS_DOCSTRING.format("batch_size, sequence_length")) + @add_code_sample_docstrings( + tokenizer_class=_TOKENIZER_FOR_DOC, + checkpoint=_CHECKPOINT_FOR_DOC, + output_type=QuestionAnsweringModelOutput, + config_class=_CONFIG_FOR_DOC, + ) + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + start_positions=None, + end_positions=None, + output_hidden_states=None, + return_dict=None, + ): + r""" + start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): + Labels for position (index) of the start of the labelled span for computing the token classification loss. + Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the + sequence are not taken into account for computing the loss. + end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): + Labels for position (index) of the end of the labelled span for computing the token classification loss. + Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the + sequence are not taken into account for computing the loss. + """ + return_dict = return_dict if return_dict is not None else self.config.use_return_dict + + outputs = self.fnet( + input_ids, + token_type_ids=token_type_ids, + position_ids=position_ids, + inputs_embeds=inputs_embeds, + output_hidden_states=output_hidden_states, + return_dict=return_dict, + ) + + sequence_output = outputs[0] + + logits = self.qa_outputs(sequence_output) + start_logits, end_logits = logits.split(1, dim=-1) + start_logits = start_logits.squeeze(-1).contiguous() + end_logits = end_logits.squeeze(-1).contiguous() + + total_loss = None + if start_positions is not None and end_positions is not None: + # If we are on multi-GPU, split add a dimension + if len(start_positions.size()) > 1: + start_positions = start_positions.squeeze(-1) + if len(end_positions.size()) > 1: + end_positions = end_positions.squeeze(-1) + # sometimes the start/end positions are outside our model inputs, we ignore these terms + ignored_index = start_logits.size(1) + start_positions = start_positions.clamp(0, ignored_index) + end_positions = end_positions.clamp(0, ignored_index) + + loss_fct = CrossEntropyLoss(ignore_index=ignored_index) + start_loss = loss_fct(start_logits, start_positions) + end_loss = loss_fct(end_logits, end_positions) + total_loss = (start_loss + end_loss) / 2 + + if not return_dict: + output = (start_logits, end_logits) + outputs[2:] + return ((total_loss,) + output) if total_loss is not None else output + + return QuestionAnsweringModelOutput( + loss=total_loss, start_logits=start_logits, end_logits=end_logits, hidden_states=outputs.hidden_states + ) diff --git a/src/transformers/models/fnet/tokenization_fnet.py b/src/transformers/models/fnet/tokenization_fnet.py new file mode 100644 index 00000000000..1e1bfc050a8 --- /dev/null +++ b/src/transformers/models/fnet/tokenization_fnet.py @@ -0,0 +1,304 @@ +# coding=utf-8 +# Copyright 2021 Google Research, Google AI, Google Brain and the HuggingFace Inc. team. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Tokenization classes for FNet model.""" + +import os +import unicodedata +from shutil import copyfile +from typing import Any, Dict, List, Optional, Tuple + +import sentencepiece as spm + +from ...tokenization_utils import AddedToken, PreTrainedTokenizer +from ...utils import logging + + +logger = logging.get_logger(__name__) +VOCAB_FILES_NAMES = {"vocab_file": "spiece.model"} + +PRETRAINED_VOCAB_FILES_MAP = { + "vocab_file": { + "google/fnet-base": "https://huggingface.co/google/fnet-base/resolve/main/spiece.model", + "google/fnet-large": "https://huggingface.co/google/fnet-large/resolve/main/spiece.model", + }, +} + +PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = { + "google/fnet-base": 512, + "google/fnet-large": 512, +} + +SPIECE_UNDERLINE = "▁" + + +class FNetTokenizer(PreTrainedTokenizer): + """ + Construct an FNet tokenizer. Adapted from :class:`~transformers.AlbertTokenizer`. Based on `SentencePiece + `__. This tokenizer inherits from + :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods. Users should refer to this + superclass for more information regarding those methods. + + Args: + vocab_file (:obj:`str`): + `SentencePiece `__ file (generally has a `.spm` extension) that + contains the vocabulary necessary to instantiate a tokenizer. + do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to lowercase the input when tokenizing. + remove_space (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to strip the text when tokenizing (removing excess spaces before and after the string). + keep_accents (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to keep accents when tokenizing. + unk_token (:obj:`str`, `optional`, defaults to :obj:`""`): + The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this + token instead. + sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`): + The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for + sequence classification or for a text and a question for question answering. It is also used as the last + token of a sequence built with special tokens. + pad_token (:obj:`str`, `optional`, defaults to :obj:`""`): + The token used for padding, for example when batching sequences of different lengths. + cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`): + The classifier token which is used when doing sequence classification (classification of the whole sequence + instead of per-token classification). It is the first token of the sequence when built with special tokens. + mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`): + The token used for masking values. This is the token used when training this model with masked language + modeling. This is the token which the model will try to predict. + sp_model_kwargs (:obj:`dict`, `optional`): + Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece + `__ can be used, among other things, to set: + + - ``enable_sampling``: Enable subword regularization. + - ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout. + + - ``nbest_size = {0,1}``: No sampling is performed. + - ``nbest_size > 1``: samples from the nbest_size results. + - ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) + using forward-filtering-and-backward-sampling algorithm. + - ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for + BPE-dropout. + + Attributes: + sp_model (:obj:`SentencePieceProcessor`): + The `SentencePiece` processor that is used for every conversion (string, tokens and IDs). + """ + + vocab_files_names = VOCAB_FILES_NAMES + pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP + max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES + model_input_names = ["input_ids", "token_type_ids"] + + def __init__( + self, + vocab_file, + do_lower_case=False, + remove_space=True, + keep_accents=True, + unk_token="", + sep_token="[SEP]", + pad_token="", + cls_token="[CLS]", + mask_token="[MASK]", + sp_model_kwargs: Optional[Dict[str, Any]] = None, + **kwargs + ) -> None: + # Mask token behave like a normal word, i.e. include the space before it + mask_token = AddedToken(mask_token, lstrip=True, rstrip=False) if isinstance(mask_token, str) else mask_token + + self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs + + super().__init__( + do_lower_case=do_lower_case, + remove_space=remove_space, + keep_accents=keep_accents, + unk_token=unk_token, + sep_token=sep_token, + pad_token=pad_token, + cls_token=cls_token, + mask_token=mask_token, + sp_model_kwargs=self.sp_model_kwargs, + **kwargs, + ) + + self.do_lower_case = do_lower_case + self.remove_space = remove_space + self.keep_accents = keep_accents + self.vocab_file = vocab_file + + self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs) + self.sp_model.Load(vocab_file) + + @property + def vocab_size(self): + return len(self.sp_model) + + def get_vocab(self): + vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} + vocab.update(self.added_tokens_encoder) + return vocab + + def __getstate__(self): + state = self.__dict__.copy() + state["sp_model"] = None + return state + + def __setstate__(self, d): + self.__dict__ = d + + # for backward compatibility + if not hasattr(self, "sp_model_kwargs"): + self.sp_model_kwargs = {} + + self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs) + self.sp_model.Load(self.vocab_file) + + def preprocess_text(self, inputs): + if self.remove_space: + outputs = " ".join(inputs.strip().split()) + else: + outputs = inputs + outputs = outputs.replace("``", '"').replace("''", '"') + + if not self.keep_accents: + outputs = unicodedata.normalize("NFKD", outputs) + outputs = "".join([c for c in outputs if not unicodedata.combining(c)]) + if self.do_lower_case: + outputs = outputs.lower() + + return outputs + + def _tokenize(self, text: str) -> List[str]: + """Tokenize a string.""" + text = self.preprocess_text(text) + pieces = self.sp_model.encode(text, out_type=str) + new_pieces = [] + for piece in pieces: + if len(piece) > 1 and piece[-1] == str(",") and piece[-2].isdigit(): + cur_pieces = self.sp_model.EncodeAsPieces(piece[:-1].replace(SPIECE_UNDERLINE, "")) + if piece[0] != SPIECE_UNDERLINE and cur_pieces[0][0] == SPIECE_UNDERLINE: + if len(cur_pieces[0]) == 1: + cur_pieces = cur_pieces[1:] + else: + cur_pieces[0] = cur_pieces[0][1:] + cur_pieces.append(piece[-1]) + new_pieces.extend(cur_pieces) + else: + new_pieces.append(piece) + + return new_pieces + + def _convert_token_to_id(self, token): + """Converts a token (str) in an id using the vocab.""" + return self.sp_model.PieceToId(token) + + def _convert_id_to_token(self, index): + """Converts an index (integer) in a token (str) using the vocab.""" + return self.sp_model.IdToPiece(index) + + def convert_tokens_to_string(self, tokens): + return self.sp_model.decode(tokens) + + def build_inputs_with_special_tokens( + self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None + ) -> List[int]: + """ + Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and + adding special tokens. An FNet sequence has the following format: + + - single sequence: ``[CLS] X [SEP]`` + - pair of sequences: ``[CLS] A [SEP] B [SEP]`` + + Args: + token_ids_0 (:obj:`List[int]`): + List of IDs to which the special tokens will be added. + token_ids_1 (:obj:`List[int]`, `optional`): + Optional second list of IDs for sequence pairs. + + Returns: + :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens. + """ + sep = [self.sep_token_id] + cls = [self.cls_token_id] + if token_ids_1 is None: + return cls + token_ids_0 + sep + return cls + token_ids_0 + sep + token_ids_1 + sep + + def get_special_tokens_mask( + self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False + ) -> List[int]: + """ + Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding + special tokens using the tokenizer ``prepare_for_model`` method. + + Args: + token_ids_0 (:obj:`List[int]`): + List of IDs. + token_ids_1 (:obj:`List[int]`, `optional`): + Optional second list of IDs for sequence pairs. + already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not the token list is already formatted with special tokens for the model. + + Returns: + :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token. + """ + + if already_has_special_tokens: + return super().get_special_tokens_mask( + token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True + ) + + if token_ids_1 is not None: + return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1] + return [1] + ([0] * len(token_ids_0)) + [1] + + def create_token_type_ids_from_sequences( + self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None + ) -> List[int]: + """ + Create a mask from the two sequences passed to be used in a sequence-pair classification task. An FNet sequence + pair mask has the following format: :: + + 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence | + + If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s). + + Args: + token_ids_0 (:obj:`List[int]`): + List of IDs. + token_ids_1 (:obj:`List[int]`, `optional`): + Optional second list of IDs for sequence pairs. + + Returns: + :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given + sequence(s). + """ + sep = [self.sep_token_id] + cls = [self.cls_token_id] + + if token_ids_1 is None: + return len(cls + token_ids_0 + sep) * [0] + return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1] + + def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]: + if not os.path.isdir(save_directory): + logger.error(f"Vocabulary path ({save_directory}) should be a directory") + return + out_vocab_file = os.path.join( + save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"] + ) + + if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file): + copyfile(self.vocab_file, out_vocab_file) + + return (out_vocab_file,) diff --git a/src/transformers/models/fnet/tokenization_fnet_fast.py b/src/transformers/models/fnet/tokenization_fnet_fast.py new file mode 100644 index 00000000000..78640b0f26e --- /dev/null +++ b/src/transformers/models/fnet/tokenization_fnet_fast.py @@ -0,0 +1,200 @@ +# coding=utf-8 +# Copyright 2021 Google AI, Google Brain and the HuggingFace Inc. team. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Tokenization classes for FNet model.""" + + +import os +from shutil import copyfile +from typing import List, Optional, Tuple + +from ...file_utils import is_sentencepiece_available +from ...tokenization_utils import AddedToken +from ...tokenization_utils_fast import PreTrainedTokenizerFast +from ...utils import logging + + +if is_sentencepiece_available(): + from .tokenization_fnet import FNetTokenizer +else: + FNetTokenizer = None + +logger = logging.get_logger(__name__) +VOCAB_FILES_NAMES = {"vocab_file": "spiece.model", "tokenizer_file": "tokenizer.json"} + +PRETRAINED_VOCAB_FILES_MAP = { + "vocab_file": { + "google/fnet-base": "https://huggingface.co/google/fnet-base/resolve/main/spiece.model", + "google/fnet-large": "https://huggingface.co/google/fnet-large/resolve/main/spiece.model", + }, + "tokenizer_file": { + "google/fnet-base": "https://huggingface.co/google/fnet-base/resolve/main/tokenizer.json", + "google/fnet-large": "https://huggingface.co/google/fnet-large/resolve/main/tokenizer.json", + }, +} + +PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = { + "google/fnet-base": 512, + "google/fnet-large": 512, +} + +SPIECE_UNDERLINE = "▁" + + +class FNetTokenizerFast(PreTrainedTokenizerFast): + """ + Construct a "fast" FNetTokenizer (backed by HuggingFace's `tokenizers` library). Adapted from + :class:`~transformers.AlbertTokenizerFast`. Based on `Unigram + `__. This tokenizer + inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main methods. Users should + refer to this superclass for more information regarding those methods + + Args: + vocab_file (:obj:`str`): + `SentencePiece `__ file (generally has a `.spm` extension) that + contains the vocabulary necessary to instantiate a tokenizer. + do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not to lowercase the input when tokenizing. + remove_space (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to strip the text when tokenizing (removing excess spaces before and after the string). + keep_accents (:obj:`bool`, `optional`, defaults to :obj:`True`): + Whether or not to keep accents when tokenizing. + unk_token (:obj:`str`, `optional`, defaults to :obj:`""`): + The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this + token instead. + sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`): + The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for + sequence classification or for a text and a question for question answering. It is also used as the last + token of a sequence built with special tokens. + pad_token (:obj:`str`, `optional`, defaults to :obj:`""`): + The token used for padding, for example when batching sequences of different lengths. + cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`): + The classifier token which is used when doing sequence classification (classification of the whole sequence + instead of per-token classification). It is the first token of the sequence when built with special tokens. + mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`): + The token used for masking values. This is the token used when training this model with masked language + modeling. This is the token which the model will try to predict. + """ + + vocab_files_names = VOCAB_FILES_NAMES + pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP + max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES + model_input_names = ["input_ids", "token_type_ids"] + slow_tokenizer_class = FNetTokenizer + + def __init__( + self, + vocab_file=None, + tokenizer_file=None, + do_lower_case=False, + remove_space=True, + keep_accents=True, + unk_token="", + sep_token="[SEP]", + pad_token="", + cls_token="[CLS]", + mask_token="[MASK]", + **kwargs + ): + # Mask token behave like a normal word, i.e. include the space before it + mask_token = AddedToken(mask_token, lstrip=True, rstrip=False) if isinstance(mask_token, str) else mask_token + + super().__init__( + vocab_file, + tokenizer_file=tokenizer_file, + do_lower_case=do_lower_case, + remove_space=remove_space, + keep_accents=keep_accents, + unk_token=unk_token, + sep_token=sep_token, + pad_token=pad_token, + cls_token=cls_token, + mask_token=mask_token, + **kwargs, + ) + + self.do_lower_case = do_lower_case + self.remove_space = remove_space + self.keep_accents = keep_accents + self.vocab_file = vocab_file + self.can_save_slow_tokenizer = False if not self.vocab_file else True + + def build_inputs_with_special_tokens( + self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None + ) -> List[int]: + """ + Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and + adding special tokens. An FNet sequence has the following format: + + - single sequence: ``[CLS] X [SEP]`` + - pair of sequences: ``[CLS] A [SEP] B [SEP]`` + + Args: + token_ids_0 (:obj:`List[int]`): + List of IDs to which the special tokens will be added + token_ids_1 (:obj:`List[int]`, `optional`): + Optional second list of IDs for sequence pairs. + + Returns: + :obj:`List[int]`: list of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens. + """ + sep = [self.sep_token_id] + cls = [self.cls_token_id] + if token_ids_1 is None: + return cls + token_ids_0 + sep + return cls + token_ids_0 + sep + token_ids_1 + sep + + def create_token_type_ids_from_sequences( + self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None + ) -> List[int]: + """ + Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An FNet + sequence pair mask has the following format: + + :: + + 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 + | first sequence | second sequence | + + if token_ids_1 is None, only returns the first portion of the mask (0s). + + Args: + token_ids_0 (:obj:`List[int]`): + List of ids. + token_ids_1 (:obj:`List[int]`, `optional`): + Optional second list of IDs for sequence pairs. + + Returns: + :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given + sequence(s). + """ + sep = [self.sep_token_id] + cls = [self.cls_token_id] + + if token_ids_1 is None: + return len(cls + token_ids_0 + sep) * [0] + return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1] + + def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]: + if not os.path.isdir(save_directory): + logger.error(f"Vocabulary path ({save_directory}) should be a directory") + return + out_vocab_file = os.path.join( + save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"] + ) + + if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file): + copyfile(self.vocab_file, out_vocab_file) + + return (out_vocab_file,) diff --git a/src/transformers/pipelines/question_answering.py b/src/transformers/pipelines/question_answering.py index 0da5e54d6bc..cf8c550e162 100644 --- a/src/transformers/pipelines/question_answering.py +++ b/src/transformers/pipelines/question_answering.py @@ -301,11 +301,18 @@ class QuestionAnsweringPipeline(Pipeline): features = [] for span_idx in range(num_spans): + input_ids_span_idx = encoded_inputs["input_ids"][span_idx] + attention_mask_span_idx = ( + encoded_inputs["attention_mask"][span_idx] if "attention_mask" in encoded_inputs else None + ) + token_type_ids_span_idx = ( + encoded_inputs["token_type_ids"][span_idx] if "token_type_ids" in encoded_inputs else None + ) features.append( SquadFeatures( - input_ids=encoded_inputs["input_ids"][span_idx], - attention_mask=encoded_inputs["attention_mask"][span_idx], - token_type_ids=encoded_inputs["token_type_ids"][span_idx], + input_ids=input_ids_span_idx, + attention_mask=attention_mask_span_idx, + token_type_ids=token_type_ids_span_idx, p_mask=p_mask[span_idx].tolist(), encoding=encoded_inputs[span_idx], # We don't use the rest of the values - and actually @@ -358,7 +365,10 @@ class QuestionAnsweringPipeline(Pipeline): feature = model_outputs["features"][0] example = model_outputs["example"] # Ensure padded tokens & question tokens cannot belong to the set of candidate answers. - undesired_tokens = np.abs(np.array(feature.p_mask) - 1) & feature.attention_mask + undesired_tokens = np.abs(np.array(feature.p_mask) - 1) + + if feature.attention_mask is not None: + undesired_tokens = undesired_tokens & feature.attention_mask # Generate mask undesired_tokens_mask = undesired_tokens == 0.0 diff --git a/src/transformers/utils/dummy_pt_objects.py b/src/transformers/utils/dummy_pt_objects.py index 4eb80cec3f0..59dcd850f72 100644 --- a/src/transformers/utils/dummy_pt_objects.py +++ b/src/transformers/utils/dummy_pt_objects.py @@ -1614,6 +1614,87 @@ class FlaubertWithLMHeadModel: requires_backends(cls, ["torch"]) +FNET_PRETRAINED_MODEL_ARCHIVE_LIST = None + + +class FNetForMaskedLM: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetForMultipleChoice: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetForNextSentencePrediction: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + +class FNetForPreTraining: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + +class FNetForQuestionAnswering: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetForSequenceClassification: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetForTokenClassification: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetLayer: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + +class FNetModel: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + +class FNetPreTrainedModel: + def __init__(self, *args, **kwargs): + requires_backends(self, ["torch"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["torch"]) + + class FSMTForConditionalGeneration: def __init__(self, *args, **kwargs): requires_backends(self, ["torch"]) diff --git a/src/transformers/utils/dummy_tokenizers_objects.py b/src/transformers/utils/dummy_tokenizers_objects.py index 382f0314bbf..b2d4feb0d15 100644 --- a/src/transformers/utils/dummy_tokenizers_objects.py +++ b/src/transformers/utils/dummy_tokenizers_objects.py @@ -128,6 +128,15 @@ class ElectraTokenizerFast: requires_backends(cls, ["tokenizers"]) +class FNetTokenizerFast: + def __init__(self, *args, **kwargs): + requires_backends(self, ["tokenizers"]) + + @classmethod + def from_pretrained(cls, *args, **kwargs): + requires_backends(cls, ["tokenizers"]) + + class FunnelTokenizerFast: def __init__(self, *args, **kwargs): requires_backends(self, ["tokenizers"]) diff --git a/tests/test_modeling_fnet.py b/tests/test_modeling_fnet.py new file mode 100644 index 00000000000..eaa61f779f0 --- /dev/null +++ b/tests/test_modeling_fnet.py @@ -0,0 +1,559 @@ +# coding=utf-8 +# Copyright 2021 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Testing suite for the PyTorch FNet model. """ + + +import unittest +from typing import Dict, List, Tuple + +from transformers import FNetConfig, is_torch_available +from transformers.models.auto import get_values +from transformers.testing_utils import require_tokenizers, require_torch, slow, torch_device + +from .test_configuration_common import ConfigTester +from .test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor + + +if is_torch_available(): + import torch + + from transformers import ( + MODEL_FOR_PRETRAINING_MAPPING, + FNetForMaskedLM, + FNetForMultipleChoice, + FNetForNextSentencePrediction, + FNetForPreTraining, + FNetForQuestionAnswering, + FNetForSequenceClassification, + FNetForTokenClassification, + FNetModel, + FNetTokenizerFast, + ) + from transformers.models.fnet.modeling_fnet import ( + FNET_PRETRAINED_MODEL_ARCHIVE_LIST, + FNetBasicFourierTransform, + is_scipy_available, + ) + + +# Override ConfigTester +class FNetConfigTester(ConfigTester): + def create_and_test_config_common_properties(self): + config = self.config_class(**self.inputs_dict) + if self.has_text_modality: + self.parent.assertTrue(hasattr(config, "vocab_size")) + self.parent.assertTrue(hasattr(config, "hidden_size")) + self.parent.assertTrue(hasattr(config, "num_hidden_layers")) + + +class FNetModelTester: + def __init__( + self, + parent, + batch_size=13, + seq_length=7, + is_training=True, + use_token_type_ids=True, + use_labels=True, + vocab_size=99, + hidden_size=32, + num_hidden_layers=5, + intermediate_size=37, + hidden_act="gelu", + hidden_dropout_prob=0.1, + max_position_embeddings=512, + type_vocab_size=16, + type_sequence_label_size=2, + initializer_range=0.02, + num_labels=3, + num_choices=4, + scope=None, + ): + self.parent = parent + self.batch_size = batch_size + self.seq_length = seq_length + self.is_training = is_training + self.use_token_type_ids = use_token_type_ids + self.use_labels = use_labels + self.vocab_size = vocab_size + self.hidden_size = hidden_size + self.num_hidden_layers = num_hidden_layers + self.intermediate_size = intermediate_size + self.hidden_act = hidden_act + self.hidden_dropout_prob = hidden_dropout_prob + self.max_position_embeddings = max_position_embeddings + self.type_vocab_size = type_vocab_size + self.type_sequence_label_size = type_sequence_label_size + self.initializer_range = initializer_range + self.num_labels = num_labels + self.num_choices = num_choices + self.scope = scope + + def prepare_config_and_inputs(self): + input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) + + token_type_ids = None + if self.use_token_type_ids: + token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size) + + sequence_labels = None + token_labels = None + choice_labels = None + if self.use_labels: + sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size) + token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels) + choice_labels = ids_tensor([self.batch_size], self.num_choices) + + config = self.get_config() + + return config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + + def get_config(self): + return FNetConfig( + vocab_size=self.vocab_size, + hidden_size=self.hidden_size, + num_hidden_layers=self.num_hidden_layers, + intermediate_size=self.intermediate_size, + hidden_act=self.hidden_act, + hidden_dropout_prob=self.hidden_dropout_prob, + max_position_embeddings=self.max_position_embeddings, + type_vocab_size=self.type_vocab_size, + initializer_range=self.initializer_range, + tpu_short_seq_length=self.seq_length, + ) + + @require_torch + def create_and_check_fourier_transform(self, config): + hidden_states = floats_tensor([self.batch_size, self.seq_length, config.hidden_size]) + transform = FNetBasicFourierTransform(config) + fftn_output = transform(hidden_states) + + config.use_tpu_fourier_optimizations = True + if is_scipy_available(): + transform = FNetBasicFourierTransform(config) + dft_output = transform(hidden_states) + + config.max_position_embeddings = 4097 + transform = FNetBasicFourierTransform(config) + fft_output = transform(hidden_states) + + if is_scipy_available(): + self.parent.assertTrue(torch.allclose(fftn_output[0][0], dft_output[0][0], atol=1e-4)) + self.parent.assertTrue(torch.allclose(fft_output[0][0], dft_output[0][0], atol=1e-4)) + self.parent.assertTrue(torch.allclose(fftn_output[0][0], fft_output[0][0], atol=1e-4)) + + def create_and_check_model(self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels): + model = FNetModel(config=config) + model.to(torch_device) + model.eval() + result = model(input_ids, token_type_ids=token_type_ids) + result = model(input_ids) + self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size)) + + def create_and_check_for_pretraining( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + model = FNetForPreTraining(config=config) + model.to(torch_device) + model.eval() + result = model( + input_ids, + token_type_ids=token_type_ids, + labels=token_labels, + next_sentence_label=sequence_labels, + ) + self.parent.assertEqual(result.prediction_logits.shape, (self.batch_size, self.seq_length, self.vocab_size)) + self.parent.assertEqual(result.seq_relationship_logits.shape, (self.batch_size, 2)) + + def create_and_check_for_masked_lm( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + model = FNetForMaskedLM(config=config) + model.to(torch_device) + model.eval() + result = model(input_ids, token_type_ids=token_type_ids, labels=token_labels) + self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size)) + + def create_and_check_for_next_sentence_prediction( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + model = FNetForNextSentencePrediction(config=config) + model.to(torch_device) + model.eval() + result = model( + input_ids, + token_type_ids=token_type_ids, + next_sentence_label=sequence_labels, + ) + self.parent.assertEqual(result.logits.shape, (self.batch_size, 2)) + + def create_and_check_for_question_answering( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + model = FNetForQuestionAnswering(config=config) + model.to(torch_device) + model.eval() + result = model( + input_ids, + token_type_ids=token_type_ids, + start_positions=sequence_labels, + end_positions=sequence_labels, + ) + self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length)) + self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length)) + + def create_and_check_for_sequence_classification( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + config.num_labels = self.num_labels + model = FNetForSequenceClassification(config) + model.to(torch_device) + model.eval() + result = model(input_ids, token_type_ids=token_type_ids, labels=sequence_labels) + self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_labels)) + + def create_and_check_for_token_classification( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + config.num_labels = self.num_labels + model = FNetForTokenClassification(config=config) + model.to(torch_device) + model.eval() + result = model(input_ids, token_type_ids=token_type_ids, labels=token_labels) + self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.num_labels)) + + def create_and_check_for_multiple_choice( + self, config, input_ids, token_type_ids, sequence_labels, token_labels, choice_labels + ): + config.num_choices = self.num_choices + model = FNetForMultipleChoice(config=config) + model.to(torch_device) + model.eval() + multiple_choice_inputs_ids = input_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous() + multiple_choice_token_type_ids = token_type_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous() + result = model( + multiple_choice_inputs_ids, + token_type_ids=multiple_choice_token_type_ids, + labels=choice_labels, + ) + self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_choices)) + + def prepare_config_and_inputs_for_common(self): + config_and_inputs = self.prepare_config_and_inputs() + ( + config, + input_ids, + token_type_ids, + sequence_labels, + token_labels, + choice_labels, + ) = config_and_inputs + inputs_dict = {"input_ids": input_ids, "token_type_ids": token_type_ids} + return config, inputs_dict + + +@require_torch +class FNetModelTest(ModelTesterMixin, unittest.TestCase): + + all_model_classes = ( + ( + FNetModel, + FNetForPreTraining, + FNetForMaskedLM, + FNetForNextSentencePrediction, + FNetForMultipleChoice, + FNetForQuestionAnswering, + FNetForSequenceClassification, + FNetForTokenClassification, + ) + if is_torch_available() + else () + ) + + # Skip Tests + test_pruning = False + test_torchscript = False + test_head_masking = False + test_pruning = False + + # special case for ForPreTraining model + def _prepare_for_class(self, inputs_dict, model_class, return_labels=False): + inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels) + + if return_labels: + if model_class in get_values(MODEL_FOR_PRETRAINING_MAPPING): + inputs_dict["labels"] = torch.zeros( + (self.model_tester.batch_size, self.model_tester.seq_length), dtype=torch.long, device=torch_device + ) + inputs_dict["next_sentence_label"] = torch.zeros( + self.model_tester.batch_size, dtype=torch.long, device=torch_device + ) + return inputs_dict + + # Overriden Tests + def test_attention_outputs(self): + pass + + def test_model_outputs_equivalence(self): + config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() + + def set_nan_tensor_to_zero(t): + t[t != t] = 0 + return t + + def check_equivalence(model, tuple_inputs, dict_inputs, additional_kwargs={}): + with torch.no_grad(): + tuple_output = model(**tuple_inputs, return_dict=False, **additional_kwargs) + dict_output = model(**dict_inputs, return_dict=True, **additional_kwargs).to_tuple() + + def recursive_check(tuple_object, dict_object): + if isinstance(tuple_object, (List, Tuple)): + for tuple_iterable_value, dict_iterable_value in zip(tuple_object, dict_object): + recursive_check(tuple_iterable_value, dict_iterable_value) + elif isinstance(tuple_object, Dict): + for tuple_iterable_value, dict_iterable_value in zip( + tuple_object.values(), dict_object.values() + ): + recursive_check(tuple_iterable_value, dict_iterable_value) + elif tuple_object is None: + return + else: + self.assertTrue( + torch.allclose( + set_nan_tensor_to_zero(tuple_object), set_nan_tensor_to_zero(dict_object), atol=1e-5 + ), + msg=f"Tuple and dict output are not equal. Difference: {torch.max(torch.abs(tuple_object - dict_object))}. Tuple has `nan`: {torch.isnan(tuple_object).any()} and `inf`: {torch.isinf(tuple_object)}. Dict has `nan`: {torch.isnan(dict_object).any()} and `inf`: {torch.isinf(dict_object)}.", + ) + + recursive_check(tuple_output, dict_output) + + for model_class in self.all_model_classes: + model = model_class(config) + model.to(torch_device) + model.eval() + + tuple_inputs = self._prepare_for_class(inputs_dict, model_class) + dict_inputs = self._prepare_for_class(inputs_dict, model_class) + check_equivalence(model, tuple_inputs, dict_inputs) + + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence(model, tuple_inputs, dict_inputs) + + # tuple_inputs = self._prepare_for_class(inputs_dict, model_class) + # dict_inputs = self._prepare_for_class(inputs_dict, model_class) + # check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True}) + + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True}) + + def test_retain_grad_hidden_states_attentions(self): + config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() + config.output_hidden_states = True + config.output_attentions = True + + # no need to test all models as different heads yield the same functionality + model_class = self.all_model_classes[0] + model = model_class(config) + model.to(torch_device) + + inputs = self._prepare_for_class(inputs_dict, model_class) + + outputs = model(**inputs) + + output = outputs[0] + + hidden_states = outputs.hidden_states[0] + + hidden_states.retain_grad() + + output.flatten()[0].backward(retain_graph=True) + + self.assertIsNotNone(hidden_states.grad) + + def setUp(self): + self.model_tester = FNetModelTester(self) + self.config_tester = FNetConfigTester(self, config_class=FNetConfig, hidden_size=37) + + def test_config(self): + self.config_tester.run_common_tests() + + def test_model(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_model(*config_and_inputs) + + def test_for_pretraining(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_pretraining(*config_and_inputs) + + def test_for_masked_lm(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_masked_lm(*config_and_inputs) + + def test_for_multiple_choice(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_multiple_choice(*config_and_inputs) + + def test_for_question_answering(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_question_answering(*config_and_inputs) + + def test_for_sequence_classification(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_sequence_classification(*config_and_inputs) + + def test_for_token_classification(self): + config_and_inputs = self.model_tester.prepare_config_and_inputs() + self.model_tester.create_and_check_for_token_classification(*config_and_inputs) + + @slow + def test_model_from_pretrained(self): + for model_name in FNET_PRETRAINED_MODEL_ARCHIVE_LIST[:1]: + model = FNetModel.from_pretrained(model_name) + self.assertIsNotNone(model) + + +@require_torch +class FNetModelIntegrationTest(unittest.TestCase): + @slow + def test_inference_for_masked_lm(self): + """ + For comparison: + 1. Modify the pre-training model `__call__` to skip computing metrics and return masked_lm_output like so: + ``` + ... + sequence_output, pooled_output = EncoderModel( + self.config, random_seed=self.random_seed, name="encoder")( + input_ids, input_mask, type_ids, deterministic=deterministic) + + masked_lm_output = nn.Dense( + self.config.d_emb, + kernel_init=default_kernel_init, + name="predictions_dense")( + sequence_output) + masked_lm_output = nn.gelu(masked_lm_output) + masked_lm_output = nn.LayerNorm( + epsilon=LAYER_NORM_EPSILON, name="predictions_layer_norm")( + masked_lm_output) + masked_lm_logits = layers.OutputProjection( + kernel=self._get_embedding_table(), name="predictions_output")( + masked_lm_output) + + next_sentence_logits = layers.OutputProjection( + n_out=2, kernel_init=default_kernel_init, name="classification")( + pooled_output) + + return masked_lm_logits + ... + ``` + 2. Run the following: + >>> import jax.numpy as jnp + >>> import sentencepiece as spm + >>> from flax.training import checkpoints + >>> from f_net.models import PreTrainingModel + >>> from f_net.configs.pretraining import get_config, ModelArchitecture + + >>> pretrained_params = checkpoints.restore_checkpoint('./f_net/f_net_checkpoint', None) # Location of original checkpoint + >>> pretrained_config = get_config() + >>> pretrained_config.model_arch = ModelArchitecture.F_NET + + >>> vocab_filepath = "./f_net/c4_bpe_sentencepiece.model" # Location of the sentence piece model + >>> tokenizer = spm.SentencePieceProcessor() + >>> tokenizer.Load(vocab_filepath) + >>> with pretrained_config.unlocked(): + >>> pretrained_config.vocab_size = tokenizer.GetPieceSize() + >>> tokens = jnp.array([[0, 1, 2, 3, 4, 5]]) + >>> type_ids = jnp.zeros_like(tokens, dtype="i4") + >>> attention_mask = jnp.ones_like(tokens) # Dummy. This gets deleted inside the model. + + >>> flax_pretraining_model = PreTrainingModel(pretrained_config) + >>> pretrained_model_params = freeze(pretrained_params['target']) + >>> flax_model_outputs = flax_pretraining_model.apply({"params": pretrained_model_params}, tokens, attention_mask, type_ids, None, None, None, None, deterministic=True) + >>> masked_lm_logits[:, :3, :3] + """ + + model = FNetForMaskedLM.from_pretrained("google/fnet-base") + model.to(torch_device) + + input_ids = torch.tensor([[0, 1, 2, 3, 4, 5]], device=torch_device) + output = model(input_ids)[0] + + vocab_size = 32000 + + expected_shape = torch.Size((1, 6, vocab_size)) + self.assertEqual(output.shape, expected_shape) + + expected_slice = torch.tensor( + [[[-1.7819, -7.7384, -7.5002], [-3.4746, -8.5943, -7.7762], [-3.2052, -9.0771, -8.3468]]], + device=torch_device, + ) + + self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4)) + + @slow + @require_tokenizers + def test_inference_long_sentence(self): + model = FNetForMaskedLM.from_pretrained("google/fnet-base") + model.to(torch_device) + tokenizer = FNetTokenizerFast.from_pretrained("google/fnet-base") + + inputs = tokenizer( + "the man worked as a [MASK].", + "this is his [MASK].", + return_tensors="pt", + padding="max_length", + max_length=512, + ) + inputs = {k: v.to(torch_device) for k, v in inputs.items()} + + logits = model(**inputs).logits + predictions_mask_1 = tokenizer.decode(logits[0, 6].topk(5).indices) + predictions_mask_2 = tokenizer.decode(logits[0, 12].topk(5).indices) + + self.assertEqual(predictions_mask_1.split(" "), ["man", "child", "teacher", "woman", "model"]) + self.assertEqual(predictions_mask_2.split(" "), ["work", "wife", "job", "story", "name"]) + + @slow + def test_inference_for_next_sentence_prediction(self): + model = FNetForNextSentencePrediction.from_pretrained("google/fnet-base") + model.to(torch_device) + + input_ids = torch.tensor([[0, 1, 2, 3, 4, 5]], device=torch_device) + output = model(input_ids)[0] + + expected_shape = torch.Size((1, 2)) + self.assertEqual(output.shape, expected_shape) + + expected_slice = torch.tensor([[-0.2234, -0.0226]], device=torch_device) + + self.assertTrue(torch.allclose(output, expected_slice, atol=1e-4)) + + @slow + def test_inference_model(self): + model = FNetModel.from_pretrained("google/fnet-base") + model.to(torch_device) + + input_ids = torch.tensor([[0, 1, 2, 3, 4, 5]], device=torch_device) + output = model(input_ids)[0] + + expected_shape = torch.Size((1, 6, model.config.hidden_size)) + self.assertEqual(output.shape, expected_shape) + + expected_slice = torch.tensor( + [[[4.1541, -0.1051, -0.1667], [-0.9144, 0.2939, -0.0086], [-0.8472, -0.7281, 0.0256]]], device=torch_device + ) + + self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4)) diff --git a/tests/test_tokenization_fnet.py b/tests/test_tokenization_fnet.py new file mode 100644 index 00000000000..dda35c92517 --- /dev/null +++ b/tests/test_tokenization_fnet.py @@ -0,0 +1,392 @@ +# coding=utf-8 +# Copyright 2019 Hugging Face inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import unittest + +from transformers import FNetTokenizer, FNetTokenizerFast +from transformers.testing_utils import require_sentencepiece, require_tokenizers, slow + +from .test_tokenization_common import TokenizerTesterMixin + + +SAMPLE_VOCAB = os.path.join(os.path.dirname(os.path.abspath(__file__)), "fixtures/spiece.model") + + +@require_sentencepiece +@require_tokenizers +class FNetTokenizationTest(TokenizerTesterMixin, unittest.TestCase): + + tokenizer_class = FNetTokenizer + rust_tokenizer_class = FNetTokenizerFast + test_rust_tokenizer = True + test_sentencepiece = True + test_sentencepiece_ignore_case = True + test_seq2seq = False + + def setUp(self): + super().setUp() + + # We have a SentencePiece fixture for testing + tokenizer = FNetTokenizer(SAMPLE_VOCAB) + tokenizer.save_pretrained(self.tmpdirname) + + def get_input_output_texts(self, tokenizer): + input_text = "this is a test" + output_text = "this is a test" + return input_text, output_text + + def test_convert_token_and_id(self): + """Test ``_convert_token_to_id`` and ``_convert_id_to_token``.""" + token = "" + token_id = 0 + + self.assertEqual(self.get_tokenizer()._convert_token_to_id(token), token_id) + self.assertEqual(self.get_tokenizer()._convert_id_to_token(token_id), token) + + def test_get_vocab(self): + vocab_keys = list(self.get_tokenizer().get_vocab().keys()) + + self.assertEqual(vocab_keys[0], "") + self.assertEqual(vocab_keys[1], "") + self.assertEqual(vocab_keys[-1], "▁eloquent") + self.assertEqual(len(vocab_keys), 30_000) + + def test_vocab_size(self): + self.assertEqual(self.get_tokenizer().vocab_size, 30_000) + + def test_rust_and_python_full_tokenizers(self): + if not self.test_rust_tokenizer: + return + + tokenizer = self.get_tokenizer() + rust_tokenizer = self.get_rust_tokenizer() + + sequence = "I was born in 92000, and this is falsé." + + tokens = tokenizer.tokenize(sequence) + rust_tokens = rust_tokenizer.tokenize(sequence) + self.assertListEqual(tokens, rust_tokens) + + ids = tokenizer.encode(sequence, add_special_tokens=False) + rust_ids = rust_tokenizer.encode(sequence, add_special_tokens=False) + self.assertListEqual(ids, rust_ids) + + rust_tokenizer = self.get_rust_tokenizer() + ids = tokenizer.encode(sequence) + rust_ids = rust_tokenizer.encode(sequence) + self.assertListEqual(ids, rust_ids) + + def test_full_tokenizer(self): + tokenizer = FNetTokenizer(SAMPLE_VOCAB, keep_accents=True) + + tokens = tokenizer.tokenize("This is a test") + self.assertListEqual(tokens, ["▁", "T", "his", "▁is", "▁a", "▁test"]) + + self.assertListEqual(tokenizer.convert_tokens_to_ids(tokens), [13, 1, 4398, 25, 21, 1289]) + + tokens = tokenizer.tokenize("I was born in 92000, and this is falsé.") + self.assertListEqual( + tokens, + ["▁", "I", "▁was", "▁born", "▁in", "▁9", "2000", ",", "▁and", "▁this", "▁is", "▁fal", "s", "é", "."], + ) + ids = tokenizer.convert_tokens_to_ids(tokens) + self.assertListEqual(ids, [13, 1, 23, 386, 19, 561, 3050, 15, 17, 48, 25, 8256, 18, 1, 9]) + + back_tokens = tokenizer.convert_ids_to_tokens(ids) + self.assertListEqual( + back_tokens, + [ + "▁", + "", + "▁was", + "▁born", + "▁in", + "▁9", + "2000", + ",", + "▁and", + "▁this", + "▁is", + "▁fal", + "s", + "", + ".", + ], + ) + + def test_sequence_builders(self): + tokenizer = FNetTokenizer(SAMPLE_VOCAB) + + text = tokenizer.encode("sequence builders") + text_2 = tokenizer.encode("multi-sequence build") + + encoded_sentence = tokenizer.build_inputs_with_special_tokens(text) + encoded_pair = tokenizer.build_inputs_with_special_tokens(text, text_2) + + assert encoded_sentence == [tokenizer.cls_token_id] + text + [tokenizer.sep_token_id] + assert encoded_pair == [tokenizer.cls_token_id] + text + [tokenizer.sep_token_id] + text_2 + [ + tokenizer.sep_token_id + ] + + # Overriden Tests + def test_padding(self, max_length=50): + if not self.test_slow_tokenizer: + # as we don't have a slow version, we can't compare the outputs between slow and fast versions + return + + for tokenizer, pretrained_name, kwargs in self.tokenizers_list: + with self.subTest(f"{tokenizer.__class__.__name__} ({pretrained_name})"): + tokenizer_r = self.rust_tokenizer_class.from_pretrained(pretrained_name, **kwargs) + tokenizer_p = self.tokenizer_class.from_pretrained(pretrained_name, **kwargs) + + self.assertEqual(tokenizer_p.pad_token_id, tokenizer_r.pad_token_id) + pad_token_id = tokenizer_p.pad_token_id + + # Encode - Simple input + input_r = tokenizer_r.encode("This is a simple input", max_length=max_length, pad_to_max_length=True) + input_p = tokenizer_p.encode("This is a simple input", max_length=max_length, pad_to_max_length=True) + self.assert_padded_input_match(input_r, input_p, max_length, pad_token_id) + input_r = tokenizer_r.encode("This is a simple input", max_length=max_length, padding="max_length") + input_p = tokenizer_p.encode("This is a simple input", max_length=max_length, padding="max_length") + self.assert_padded_input_match(input_r, input_p, max_length, pad_token_id) + + input_r = tokenizer_r.encode("This is a simple input", padding="longest") + input_p = tokenizer_p.encode("This is a simple input", padding=True) + self.assert_padded_input_match(input_r, input_p, len(input_r), pad_token_id) + + # Encode - Pair input + input_r = tokenizer_r.encode( + "This is a simple input", "This is a pair", max_length=max_length, pad_to_max_length=True + ) + input_p = tokenizer_p.encode( + "This is a simple input", "This is a pair", max_length=max_length, pad_to_max_length=True + ) + self.assert_padded_input_match(input_r, input_p, max_length, pad_token_id) + input_r = tokenizer_r.encode( + "This is a simple input", "This is a pair", max_length=max_length, padding="max_length" + ) + input_p = tokenizer_p.encode( + "This is a simple input", "This is a pair", max_length=max_length, padding="max_length" + ) + self.assert_padded_input_match(input_r, input_p, max_length, pad_token_id) + input_r = tokenizer_r.encode("This is a simple input", "This is a pair", padding=True) + input_p = tokenizer_p.encode("This is a simple input", "This is a pair", padding="longest") + self.assert_padded_input_match(input_r, input_p, len(input_r), pad_token_id) + + # Encode_plus - Simple input + input_r = tokenizer_r.encode_plus( + "This is a simple input", max_length=max_length, pad_to_max_length=True + ) + input_p = tokenizer_p.encode_plus( + "This is a simple input", max_length=max_length, pad_to_max_length=True + ) + self.assert_padded_input_match(input_r["input_ids"], input_p["input_ids"], max_length, pad_token_id) + + input_r = tokenizer_r.encode_plus( + "This is a simple input", max_length=max_length, padding="max_length" + ) + input_p = tokenizer_p.encode_plus( + "This is a simple input", max_length=max_length, padding="max_length" + ) + self.assert_padded_input_match(input_r["input_ids"], input_p["input_ids"], max_length, pad_token_id) + + input_r = tokenizer_r.encode_plus("This is a simple input", padding="longest") + input_p = tokenizer_p.encode_plus("This is a simple input", padding=True) + self.assert_padded_input_match( + input_r["input_ids"], input_p["input_ids"], len(input_r["input_ids"]), pad_token_id + ) + + # Encode_plus - Pair input + input_r = tokenizer_r.encode_plus( + "This is a simple input", "This is a pair", max_length=max_length, pad_to_max_length=True + ) + input_p = tokenizer_p.encode_plus( + "This is a simple input", "This is a pair", max_length=max_length, pad_to_max_length=True + ) + self.assert_padded_input_match(input_r["input_ids"], input_p["input_ids"], max_length, pad_token_id) + + input_r = tokenizer_r.encode_plus( + "This is a simple input", "This is a pair", max_length=max_length, padding="max_length" + ) + input_p = tokenizer_p.encode_plus( + "This is a simple input", "This is a pair", max_length=max_length, padding="max_length" + ) + self.assert_padded_input_match(input_r["input_ids"], input_p["input_ids"], max_length, pad_token_id) + + input_r = tokenizer_r.encode_plus("This is a simple input", "This is a pair", padding="longest") + input_p = tokenizer_p.encode_plus("This is a simple input", "This is a pair", padding=True) + self.assert_padded_input_match( + input_r["input_ids"], input_p["input_ids"], len(input_r["input_ids"]), pad_token_id + ) + + # Batch_encode_plus - Simple input + input_r = tokenizer_r.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + pad_to_max_length=True, + ) + input_p = tokenizer_p.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + pad_to_max_length=True, + ) + self.assert_batch_padded_input_match(input_r, input_p, max_length, pad_token_id) + + input_r = tokenizer_r.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + padding="max_length", + ) + input_p = tokenizer_p.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + padding="max_length", + ) + self.assert_batch_padded_input_match(input_r, input_p, max_length, pad_token_id) + + input_r = tokenizer_r.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + padding="longest", + ) + input_p = tokenizer_p.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], + max_length=max_length, + padding=True, + ) + self.assert_batch_padded_input_match(input_r, input_p, len(input_r["input_ids"][0]), pad_token_id) + + input_r = tokenizer_r.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], padding="longest" + ) + input_p = tokenizer_p.batch_encode_plus( + ["This is a simple input 1", "This is a simple input 2"], padding=True + ) + self.assert_batch_padded_input_match(input_r, input_p, len(input_r["input_ids"][0]), pad_token_id) + + # Batch_encode_plus - Pair input + input_r = tokenizer_r.batch_encode_plus( + [ + ("This is a simple input 1", "This is a simple input 2"), + ("This is a simple pair 1", "This is a simple pair 2"), + ], + max_length=max_length, + truncation=True, + padding="max_length", + ) + input_p = tokenizer_p.batch_encode_plus( + [ + ("This is a simple input 1", "This is a simple input 2"), + ("This is a simple pair 1", "This is a simple pair 2"), + ], + max_length=max_length, + truncation=True, + padding="max_length", + ) + self.assert_batch_padded_input_match(input_r, input_p, max_length, pad_token_id) + + input_r = tokenizer_r.batch_encode_plus( + [ + ("This is a simple input 1", "This is a simple input 2"), + ("This is a simple pair 1", "This is a simple pair 2"), + ], + padding=True, + ) + input_p = tokenizer_p.batch_encode_plus( + [ + ("This is a simple input 1", "This is a simple input 2"), + ("This is a simple pair 1", "This is a simple pair 2"), + ], + padding="longest", + ) + self.assert_batch_padded_input_match(input_r, input_p, len(input_r["input_ids"][0]), pad_token_id) + + # Using pad on single examples after tokenization + input_r = tokenizer_r.encode_plus("This is a input 1") + input_r = tokenizer_r.pad(input_r) + + input_p = tokenizer_r.encode_plus("This is a input 1") + input_p = tokenizer_r.pad(input_p) + + self.assert_padded_input_match( + input_r["input_ids"], input_p["input_ids"], len(input_r["input_ids"]), pad_token_id + ) + + # Using pad on single examples after tokenization + input_r = tokenizer_r.encode_plus("This is a input 1") + input_r = tokenizer_r.pad(input_r, max_length=max_length, padding="max_length") + + input_p = tokenizer_r.encode_plus("This is a input 1") + input_p = tokenizer_r.pad(input_p, max_length=max_length, padding="max_length") + + self.assert_padded_input_match(input_r["input_ids"], input_p["input_ids"], max_length, pad_token_id) + + # Using pad after tokenization + input_r = tokenizer_r.batch_encode_plus( + ["This is a input 1", "This is a much longer input whilch should be padded"] + ) + input_r = tokenizer_r.pad(input_r) + + input_p = tokenizer_r.batch_encode_plus( + ["This is a input 1", "This is a much longer input whilch should be padded"] + ) + input_p = tokenizer_r.pad(input_p) + + self.assert_batch_padded_input_match(input_r, input_p, len(input_r["input_ids"][0]), pad_token_id) + + # Using pad after tokenization + input_r = tokenizer_r.batch_encode_plus( + ["This is a input 1", "This is a much longer input whilch should be padded"] + ) + input_r = tokenizer_r.pad(input_r, max_length=max_length, padding="max_length") + + input_p = tokenizer_r.batch_encode_plus( + ["This is a input 1", "This is a much longer input whilch should be padded"] + ) + input_p = tokenizer_r.pad(input_p, max_length=max_length, padding="max_length") + + self.assert_batch_padded_input_match(input_r, input_p, max_length, pad_token_id) + + def assert_batch_padded_input_match( + self, + input_r: dict, + input_p: dict, + max_length: int, + pad_token_id: int, + model_main_input_name: str = "input_ids", + ): + for i_r in input_r.values(): + self.assertEqual(len(i_r), 2), self.assertEqual(len(i_r[0]), max_length), self.assertEqual( + len(i_r[1]), max_length + ) + self.assertEqual(len(i_r), 2), self.assertEqual(len(i_r[0]), max_length), self.assertEqual( + len(i_r[1]), max_length + ) + + for i_r, i_p in zip(input_r[model_main_input_name], input_p[model_main_input_name]): + self.assert_padded_input_match(i_r, i_p, max_length, pad_token_id) + + @slow + def test_tokenizer_integration(self): + # fmt: off + expected_encoding = {'input_ids': [[4, 4616, 107, 163, 328, 14, 63, 1726, 106, 11954, 16659, 23, 83, 16688, 11427, 328, 107, 36, 11954, 16659, 23, 83, 16688, 6153, 82, 961, 16688, 3474, 16710, 1696, 2306, 16688, 10854, 2524, 3827, 561, 163, 3474, 16680, 62, 226, 2092, 16680, 379, 3474, 16660, 16680, 2436, 16667, 16671, 16680, 999, 87, 3474, 16680, 2436, 16667, 5208, 800, 16710, 68, 2018, 2959, 3037, 163, 16663, 11617, 16710, 36, 2018, 2959, 4737, 163, 16663, 16667, 16674, 16710, 91, 372, 5087, 16745, 2205, 82, 961, 3608, 38, 1770, 16745, 7984, 36, 2565, 751, 9017, 1204, 864, 218, 1244, 16680, 11954, 16659, 23, 83, 36, 14686, 23, 7619, 16678, 5], [4, 28, 532, 65, 1929, 33, 391, 16688, 3979, 9, 2565, 7849, 299, 225, 34, 2040, 305, 167, 289, 16667, 16078, 32, 1966, 181, 4626, 63, 10575, 71, 851, 1491, 36, 624, 4757, 38, 208, 8038, 16678, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [4, 13, 1467, 5187, 26, 2521, 4567, 16664, 372, 13, 16209, 3314, 16678, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]} # noqa: E501 + # fmt: on + + self.tokenizer_integration_test_util( + expected_encoding=expected_encoding, + model_name="google/fnet-base", + revision="58e0d1f96af163dc8d0a84a2fddf4bd403e4e802", + )