mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 05:10:06 +06:00
Remove merge conflict artifacts in Albert model doc (#38849)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
This commit is contained in:
parent
64e9b049d9
commit
e61160c5db
@ -27,20 +27,13 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
[ALBERT](https://huggingface.co/papers/1909.11942) is designed to address memory limitations of scaling and training of [BERT](./bert). It adds two parameter reduction techniques. The first, factorized embedding parametrization, splits the larger vocabulary embedding matrix into two smaller matrices so you can grow the hidden size without adding a lot more parameters. The second, cross-layer parameter sharing, allows layer to share parameters which keeps the number of learnable parameters lower.
|
[ALBERT](https://huggingface.co/papers/1909.11942) is designed to address memory limitations of scaling and training of [BERT](./bert). It adds two parameter reduction techniques. The first, factorized embedding parametrization, splits the larger vocabulary embedding matrix into two smaller matrices so you can grow the hidden size without adding a lot more parameters. The second, cross-layer parameter sharing, allows layer to share parameters which keeps the number of learnable parameters lower.
|
||||||
|
|
||||||
<<<<<<< HEAD
|
|
||||||
=======
|
|
||||||
|
|
||||||
<<<<<<< HEAD
|
|
||||||
ALBERT was created to address problems like -- GPU/TPU memory limitations, longer training times, and unexpected model degradation in BERT. ALBERT uses two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
|
ALBERT was created to address problems like -- GPU/TPU memory limitations, longer training times, and unexpected model degradation in BERT. ALBERT uses two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
|
||||||
|
|
||||||
- **Factorized embedding parameterization:** The large vocabulary embedding matrix is decomposed into two smaller matrices, reducing memory consumption.
|
- **Factorized embedding parameterization:** The large vocabulary embedding matrix is decomposed into two smaller matrices, reducing memory consumption.
|
||||||
- **Cross-layer parameter sharing:** Instead of learning separate parameters for each transformer layer, ALBERT shares parameters across layers, further reducing the number of learnable weights.
|
- **Cross-layer parameter sharing:** Instead of learning separate parameters for each transformer layer, ALBERT shares parameters across layers, further reducing the number of learnable weights.
|
||||||
|
|
||||||
ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time.
|
ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time.
|
||||||
>>>>>>> 7ba1110083 (Update docs/source/en/model_doc/albert.md
)
|
|
||||||
|
|
||||||
=======
|
|
||||||
>>>>>>> 155b733538 (Update albert.md)
|
|
||||||
You can find all the original ALBERT checkpoints under the [ALBERT community](https://huggingface.co/albert) organization.
|
You can find all the original ALBERT checkpoints under the [ALBERT community](https://huggingface.co/albert) organization.
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
@ -103,41 +96,30 @@ echo -e "Plants create [MASK] through a process known as photosynthesis." | tran
|
|||||||
|
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Inputs should be padded on the right because BERT uses absolute position embeddings.
|
- Inputs should be padded on the right because BERT uses absolute position embeddings.
|
||||||
- The embedding size `E` is different from the hidden size `H` because the embeddings are context independent (one embedding vector represents one token) and the hidden states are context dependent (one hidden state represents a sequence of tokens). The embedding matrix is also larger because `V x E` where `V` is the vocabulary size. As a result, it's more logical if `H >> E`. If `E < H`, the model has less parameters.
|
- The embedding size `E` is different from the hidden size `H` because the embeddings are context independent (one embedding vector represents one token) and the hidden states are context dependent (one hidden state represents a sequence of tokens). The embedding matrix is also larger because `V x E` where `V` is the vocabulary size. As a result, it's more logical if `H >> E`. If `E < H`, the model has less parameters.
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
|
|
||||||
The resources provided in the following sections consist of a list of official Hugging Face and community (indicated by 🌎) resources to help you get started with AlBERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
The resources provided in the following sections consist of a list of official Hugging Face and community (indicated by 🌎) resources to help you get started with AlBERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
||||||
|
|
||||||
|
|
||||||
<PipelineTag pipeline="text-classification"/>
|
<PipelineTag pipeline="text-classification"/>
|
||||||
|
|
||||||
|
|
||||||
- [`AlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification).
|
- [`AlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification).
|
||||||
|
|
||||||
|
|
||||||
- [`TFAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification).
|
- [`TFAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification).
|
||||||
|
|
||||||
- [`FlaxAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb).
|
- [`FlaxAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb).
|
||||||
- Check the [Text classification task guide](../tasks/sequence_classification) on how to use the model.
|
- Check the [Text classification task guide](../tasks/sequence_classification) on how to use the model.
|
||||||
|
|
||||||
|
|
||||||
<PipelineTag pipeline="token-classification"/>
|
<PipelineTag pipeline="token-classification"/>
|
||||||
|
|
||||||
|
|
||||||
- [`AlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification).
|
- [`AlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification).
|
||||||
|
|
||||||
|
|
||||||
- [`TFAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
|
- [`TFAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
- [`FlaxAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification).
|
- [`FlaxAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification).
|
||||||
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
|
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
|
||||||
- Check the [Token classification task guide](../tasks/token_classification) on how to use the model.
|
- Check the [Token classification task guide](../tasks/token_classification) on how to use the model.
|
||||||
@ -165,18 +147,13 @@ The resources provided in the following sections consist of a list of official H
|
|||||||
|
|
||||||
- Check the [Multiple choice task guide](../tasks/multiple_choice) on how to use the model.
|
- Check the [Multiple choice task guide](../tasks/multiple_choice) on how to use the model.
|
||||||
|
|
||||||
|
|
||||||
## AlbertConfig
|
## AlbertConfig
|
||||||
|
|
||||||
[[autodoc]] AlbertConfig
|
[[autodoc]] AlbertConfig
|
||||||
|
|
||||||
## AlbertTokenizer
|
## AlbertTokenizer
|
||||||
|
|
||||||
[[autodoc]] AlbertTokenizer
|
[[autodoc]] AlbertTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary
|
||||||
- build_inputs_with_special_tokens
|
|
||||||
- get_special_tokens_mask
|
|
||||||
- create_token_type_ids_from_sequences
|
|
||||||
- save_vocabulary
|
|
||||||
|
|
||||||
## AlbertTokenizerFast
|
## AlbertTokenizerFast
|
||||||
|
|
||||||
@ -193,23 +170,19 @@ The resources provided in the following sections consist of a list of official H
|
|||||||
|
|
||||||
## AlbertModel
|
## AlbertModel
|
||||||
|
|
||||||
[[autodoc]] AlbertModel
|
[[autodoc]] AlbertModel - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
## AlbertForPreTraining
|
## AlbertForPreTraining
|
||||||
|
|
||||||
[[autodoc]] AlbertForPreTraining
|
[[autodoc]] AlbertForPreTraining - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
## AlbertForMaskedLM
|
## AlbertForMaskedLM
|
||||||
|
|
||||||
[[autodoc]] AlbertForMaskedLM
|
[[autodoc]] AlbertForMaskedLM - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
## AlbertForSequenceClassification
|
## AlbertForSequenceClassification
|
||||||
|
|
||||||
[[autodoc]] AlbertForSequenceClassification
|
[[autodoc]] AlbertForSequenceClassification - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
## AlbertForMultipleChoice
|
## AlbertForMultipleChoice
|
||||||
|
|
||||||
@ -217,13 +190,11 @@ The resources provided in the following sections consist of a list of official H
|
|||||||
|
|
||||||
## AlbertForTokenClassification
|
## AlbertForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] AlbertForTokenClassification
|
[[autodoc]] AlbertForTokenClassification - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
## AlbertForQuestionAnswering
|
## AlbertForQuestionAnswering
|
||||||
|
|
||||||
[[autodoc]] AlbertForQuestionAnswering
|
[[autodoc]] AlbertForQuestionAnswering - forward
|
||||||
- forward
|
|
||||||
|
|
||||||
</pt>
|
</pt>
|
||||||
|
|
||||||
@ -231,78 +202,62 @@ The resources provided in the following sections consist of a list of official H
|
|||||||
|
|
||||||
## TFAlbertModel
|
## TFAlbertModel
|
||||||
|
|
||||||
[[autodoc]] TFAlbertModel
|
[[autodoc]] TFAlbertModel - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForPreTraining
|
## TFAlbertForPreTraining
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForPreTraining
|
[[autodoc]] TFAlbertForPreTraining - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForMaskedLM
|
## TFAlbertForMaskedLM
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForMaskedLM
|
[[autodoc]] TFAlbertForMaskedLM - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForSequenceClassification
|
## TFAlbertForSequenceClassification
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForSequenceClassification
|
[[autodoc]] TFAlbertForSequenceClassification - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForMultipleChoice
|
## TFAlbertForMultipleChoice
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForMultipleChoice
|
[[autodoc]] TFAlbertForMultipleChoice - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForTokenClassification
|
## TFAlbertForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForTokenClassification
|
[[autodoc]] TFAlbertForTokenClassification - call
|
||||||
- call
|
|
||||||
|
|
||||||
## TFAlbertForQuestionAnswering
|
## TFAlbertForQuestionAnswering
|
||||||
|
|
||||||
[[autodoc]] TFAlbertForQuestionAnswering
|
[[autodoc]] TFAlbertForQuestionAnswering - call
|
||||||
- call
|
|
||||||
|
|
||||||
</tf>
|
</tf>
|
||||||
<jax>
|
<jax>
|
||||||
|
|
||||||
## FlaxAlbertModel
|
## FlaxAlbertModel
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertModel
|
[[autodoc]] FlaxAlbertModel - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForPreTraining
|
## FlaxAlbertForPreTraining
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForPreTraining
|
[[autodoc]] FlaxAlbertForPreTraining - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForMaskedLM
|
## FlaxAlbertForMaskedLM
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForMaskedLM
|
[[autodoc]] FlaxAlbertForMaskedLM - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForSequenceClassification
|
## FlaxAlbertForSequenceClassification
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForSequenceClassification
|
[[autodoc]] FlaxAlbertForSequenceClassification - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForMultipleChoice
|
## FlaxAlbertForMultipleChoice
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForMultipleChoice
|
[[autodoc]] FlaxAlbertForMultipleChoice - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForTokenClassification
|
## FlaxAlbertForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForTokenClassification
|
[[autodoc]] FlaxAlbertForTokenClassification - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxAlbertForQuestionAnswering
|
## FlaxAlbertForQuestionAnswering
|
||||||
|
|
||||||
[[autodoc]] FlaxAlbertForQuestionAnswering
|
[[autodoc]] FlaxAlbertForQuestionAnswering - **call**
|
||||||
- __call__
|
|
||||||
|
|
||||||
</jax>
|
</jax>
|
||||||
</frameworkcontent>
|
</frameworkcontent>
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user