diff --git a/docs/source/en/model_doc/albert.md b/docs/source/en/model_doc/albert.md index b56fca55b28..49d207fe579 100644 --- a/docs/source/en/model_doc/albert.md +++ b/docs/source/en/model_doc/albert.md @@ -27,20 +27,13 @@ rendered properly in your Markdown viewer. [ALBERT](https://huggingface.co/papers/1909.11942) is designed to address memory limitations of scaling and training of [BERT](./bert). It adds two parameter reduction techniques. The first, factorized embedding parametrization, splits the larger vocabulary embedding matrix into two smaller matrices so you can grow the hidden size without adding a lot more parameters. The second, cross-layer parameter sharing, allows layer to share parameters which keeps the number of learnable parameters lower. -<<<<<<< HEAD -======= - -<<<<<<< HEAD ALBERT was created to address problems like -- GPU/TPU memory limitations, longer training times, and unexpected model degradation in BERT. ALBERT uses two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT: - **Factorized embedding parameterization:** The large vocabulary embedding matrix is decomposed into two smaller matrices, reducing memory consumption. - **Cross-layer parameter sharing:** Instead of learning separate parameters for each transformer layer, ALBERT shares parameters across layers, further reducing the number of learnable weights. -ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time. ->>>>>>> 7ba1110083 (Update docs/source/en/model_doc/albert.md ) +ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time. -======= ->>>>>>> 155b733538 (Update albert.md) You can find all the original ALBERT checkpoints under the [ALBERT community](https://huggingface.co/albert) organization. > [!TIP] @@ -51,7 +44,7 @@ The example below demonstrates how to predict the `[MASK]` token with [`Pipeline -```py +```py import torch from transformers import pipeline @@ -80,7 +73,7 @@ model = AutoModelForMaskedLM.from_pretrained( ) prompt = "Plants create energy through a process known as [MASK]." -inputs = tokenizer(prompt, return_tensors="pt").to(model.device) +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model(**inputs) @@ -103,41 +96,30 @@ echo -e "Plants create [MASK] through a process known as photosynthesis." | tran - ## Notes - Inputs should be padded on the right because BERT uses absolute position embeddings. - The embedding size `E` is different from the hidden size `H` because the embeddings are context independent (one embedding vector represents one token) and the hidden states are context dependent (one hidden state represents a sequence of tokens). The embedding matrix is also larger because `V x E` where `V` is the vocabulary size. As a result, it's more logical if `H >> E`. If `E < H`, the model has less parameters. - ## Resources - The resources provided in the following sections consist of a list of official Hugging Face and community (indicated by 🌎) resources to help you get started with AlBERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. - - - [`AlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification). - - [`TFAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification). - [`FlaxAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). - Check the [Text classification task guide](../tasks/sequence_classification) on how to use the model. - - - [`AlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification). - - [`TFAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). - - - [`FlaxAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). - [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course. - Check the [Token classification task guide](../tasks/token_classification) on how to use the model. @@ -163,8 +145,7 @@ The resources provided in the following sections consist of a list of official H - [`AlbertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). - [`TFAlbertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). -- Check the [Multiple choice task guide](../tasks/multiple_choice) on how to use the model. - +- Check the [Multiple choice task guide](../tasks/multiple_choice) on how to use the model. ## AlbertConfig @@ -172,11 +153,7 @@ The resources provided in the following sections consist of a list of official H ## AlbertTokenizer -[[autodoc]] AlbertTokenizer - - build_inputs_with_special_tokens - - get_special_tokens_mask - - create_token_type_ids_from_sequences - - save_vocabulary +[[autodoc]] AlbertTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary ## AlbertTokenizerFast @@ -193,23 +170,19 @@ The resources provided in the following sections consist of a list of official H ## AlbertModel -[[autodoc]] AlbertModel - - forward +[[autodoc]] AlbertModel - forward ## AlbertForPreTraining -[[autodoc]] AlbertForPreTraining - - forward +[[autodoc]] AlbertForPreTraining - forward ## AlbertForMaskedLM -[[autodoc]] AlbertForMaskedLM - - forward +[[autodoc]] AlbertForMaskedLM - forward ## AlbertForSequenceClassification -[[autodoc]] AlbertForSequenceClassification - - forward +[[autodoc]] AlbertForSequenceClassification - forward ## AlbertForMultipleChoice @@ -217,13 +190,11 @@ The resources provided in the following sections consist of a list of official H ## AlbertForTokenClassification -[[autodoc]] AlbertForTokenClassification - - forward +[[autodoc]] AlbertForTokenClassification - forward ## AlbertForQuestionAnswering -[[autodoc]] AlbertForQuestionAnswering - - forward +[[autodoc]] AlbertForQuestionAnswering - forward @@ -231,78 +202,62 @@ The resources provided in the following sections consist of a list of official H ## TFAlbertModel -[[autodoc]] TFAlbertModel - - call +[[autodoc]] TFAlbertModel - call ## TFAlbertForPreTraining -[[autodoc]] TFAlbertForPreTraining - - call +[[autodoc]] TFAlbertForPreTraining - call ## TFAlbertForMaskedLM -[[autodoc]] TFAlbertForMaskedLM - - call +[[autodoc]] TFAlbertForMaskedLM - call ## TFAlbertForSequenceClassification -[[autodoc]] TFAlbertForSequenceClassification - - call +[[autodoc]] TFAlbertForSequenceClassification - call ## TFAlbertForMultipleChoice -[[autodoc]] TFAlbertForMultipleChoice - - call +[[autodoc]] TFAlbertForMultipleChoice - call ## TFAlbertForTokenClassification -[[autodoc]] TFAlbertForTokenClassification - - call +[[autodoc]] TFAlbertForTokenClassification - call ## TFAlbertForQuestionAnswering -[[autodoc]] TFAlbertForQuestionAnswering - - call +[[autodoc]] TFAlbertForQuestionAnswering - call ## FlaxAlbertModel -[[autodoc]] FlaxAlbertModel - - __call__ +[[autodoc]] FlaxAlbertModel - **call** ## FlaxAlbertForPreTraining -[[autodoc]] FlaxAlbertForPreTraining - - __call__ +[[autodoc]] FlaxAlbertForPreTraining - **call** ## FlaxAlbertForMaskedLM -[[autodoc]] FlaxAlbertForMaskedLM - - __call__ +[[autodoc]] FlaxAlbertForMaskedLM - **call** ## FlaxAlbertForSequenceClassification -[[autodoc]] FlaxAlbertForSequenceClassification - - __call__ +[[autodoc]] FlaxAlbertForSequenceClassification - **call** ## FlaxAlbertForMultipleChoice -[[autodoc]] FlaxAlbertForMultipleChoice - - __call__ +[[autodoc]] FlaxAlbertForMultipleChoice - **call** ## FlaxAlbertForTokenClassification -[[autodoc]] FlaxAlbertForTokenClassification - - __call__ +[[autodoc]] FlaxAlbertForTokenClassification - **call** ## FlaxAlbertForQuestionAnswering -[[autodoc]] FlaxAlbertForQuestionAnswering - - __call__ +[[autodoc]] FlaxAlbertForQuestionAnswering - **call** - -