diff --git a/docs/source/en/model_doc/albert.md b/docs/source/en/model_doc/albert.md
index b56fca55b28..49d207fe579 100644
--- a/docs/source/en/model_doc/albert.md
+++ b/docs/source/en/model_doc/albert.md
@@ -27,20 +27,13 @@ rendered properly in your Markdown viewer.
 
 [ALBERT](https://huggingface.co/papers/1909.11942) is designed to address memory limitations of scaling and training of [BERT](./bert). It adds two parameter reduction techniques. The first, factorized embedding parametrization, splits the larger vocabulary embedding matrix into two smaller matrices so you can grow the hidden size without adding a lot more parameters. The second, cross-layer parameter sharing, allows layer to share parameters which keeps the number of learnable parameters lower.
 
-<<<<<<< HEAD
-=======
-
-<<<<<<< HEAD
 ALBERT was created to address problems like -- GPU/TPU memory limitations, longer training times, and unexpected model degradation in BERT. ALBERT uses two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
 
 - **Factorized embedding parameterization:** The large vocabulary embedding matrix is decomposed into two smaller matrices, reducing memory consumption.
 - **Cross-layer parameter sharing:** Instead of learning separate parameters for each transformer layer, ALBERT shares parameters across layers, further reducing the number of learnable weights.
 
-ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time. 
->>>>>>> 7ba1110083 (Update docs/source/en/model_doc/albert.md)
+ALBERT uses absolute position embeddings (like BERT) so padding is applied at right. Size of embeddings is 128 While BERT uses 768. ALBERT can processes maximum 512 token at a time.
 
-=======
->>>>>>> 155b733538 (Update albert.md)
 You can find all the original ALBERT checkpoints under the [ALBERT community](https://huggingface.co/albert) organization.
 
 > [!TIP]
@@ -51,7 +44,7 @@ The example below demonstrates how to predict the `[MASK]` token with [`Pipeline
 <hfoptions id="usage">
 <hfoption id="Pipeline">
 
-```py 
+```py
 import torch
 from transformers import pipeline
 
@@ -80,7 +73,7 @@ model = AutoModelForMaskedLM.from_pretrained(
 )
 
 prompt = "Plants create energy through a process known as [MASK]."
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device) 
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 
 with torch.no_grad():
     outputs = model(**inputs)
@@ -103,41 +96,30 @@ echo -e "Plants create [MASK] through a process known as photosynthesis." | tran
 
 </hfoptions>
 
-
 ## Notes
 
 - Inputs should be padded on the right because BERT uses absolute position embeddings.
 - The embedding size `E` is different from the hidden size `H` because the embeddings are context independent (one embedding vector represents one token) and the hidden states are context dependent (one hidden state represents a sequence of tokens). The embedding matrix is also larger because `V x E` where `V` is the vocabulary size. As a result, it's more logical if `H >> E`. If `E < H`, the model has less parameters.
 
-
 ## Resources
 
-
 The resources provided in the following sections consist of a list of official Hugging Face and community (indicated by 🌎) resources to help you get started with AlBERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 
-
 <PipelineTag pipeline="text-classification"/>
 
-
 - [`AlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification).
 
-
 - [`TFAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification).
 
 - [`FlaxAlbertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb).
 - Check the [Text classification task guide](../tasks/sequence_classification) on how to use the model.
 
-
 <PipelineTag pipeline="token-classification"/>
 
-
 - [`AlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification).
 
-
 - [`TFAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
 
-
-
 - [`FlaxAlbertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification).
 - [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
 - Check the [Token classification task guide](../tasks/token_classification) on how to use the model.
@@ -163,8 +145,7 @@ The resources provided in the following sections consist of a list of official H
 - [`AlbertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb).
 - [`TFAlbertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb).
 
-- Check the  [Multiple choice task guide](../tasks/multiple_choice) on how to use the model.
-
+- Check the [Multiple choice task guide](../tasks/multiple_choice) on how to use the model.
 
 ## AlbertConfig
 
@@ -172,11 +153,7 @@ The resources provided in the following sections consist of a list of official H
 
 ## AlbertTokenizer
 
-[[autodoc]] AlbertTokenizer
-    - build_inputs_with_special_tokens
-    - get_special_tokens_mask
-    - create_token_type_ids_from_sequences
-    - save_vocabulary
+[[autodoc]] AlbertTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary
 
 ## AlbertTokenizerFast
 
@@ -193,23 +170,19 @@ The resources provided in the following sections consist of a list of official H
 
 ## AlbertModel
 
-[[autodoc]] AlbertModel
-    - forward
+[[autodoc]] AlbertModel - forward
 
 ## AlbertForPreTraining
 
-[[autodoc]] AlbertForPreTraining
-    - forward
+[[autodoc]] AlbertForPreTraining - forward
 
 ## AlbertForMaskedLM
 
-[[autodoc]] AlbertForMaskedLM
-    - forward
+[[autodoc]] AlbertForMaskedLM - forward
 
 ## AlbertForSequenceClassification
 
-[[autodoc]] AlbertForSequenceClassification
-    - forward
+[[autodoc]] AlbertForSequenceClassification - forward
 
 ## AlbertForMultipleChoice
 
@@ -217,13 +190,11 @@ The resources provided in the following sections consist of a list of official H
 
 ## AlbertForTokenClassification
 
-[[autodoc]] AlbertForTokenClassification
-    - forward
+[[autodoc]] AlbertForTokenClassification - forward
 
 ## AlbertForQuestionAnswering
 
-[[autodoc]] AlbertForQuestionAnswering
-    - forward
+[[autodoc]] AlbertForQuestionAnswering - forward
 
 </pt>
 
@@ -231,78 +202,62 @@ The resources provided in the following sections consist of a list of official H
 
 ## TFAlbertModel
 
-[[autodoc]] TFAlbertModel
-    - call
+[[autodoc]] TFAlbertModel - call
 
 ## TFAlbertForPreTraining
 
-[[autodoc]] TFAlbertForPreTraining
-    - call
+[[autodoc]] TFAlbertForPreTraining - call
 
 ## TFAlbertForMaskedLM
 
-[[autodoc]] TFAlbertForMaskedLM
-    - call
+[[autodoc]] TFAlbertForMaskedLM - call
 
 ## TFAlbertForSequenceClassification
 
-[[autodoc]] TFAlbertForSequenceClassification
-    - call
+[[autodoc]] TFAlbertForSequenceClassification - call
 
 ## TFAlbertForMultipleChoice
 
-[[autodoc]] TFAlbertForMultipleChoice
-    - call
+[[autodoc]] TFAlbertForMultipleChoice - call
 
 ## TFAlbertForTokenClassification
 
-[[autodoc]] TFAlbertForTokenClassification
-    - call
+[[autodoc]] TFAlbertForTokenClassification - call
 
 ## TFAlbertForQuestionAnswering
 
-[[autodoc]] TFAlbertForQuestionAnswering
-    - call
+[[autodoc]] TFAlbertForQuestionAnswering - call
 
 </tf>
 <jax>
 
 ## FlaxAlbertModel
 
-[[autodoc]] FlaxAlbertModel
-    - __call__
+[[autodoc]] FlaxAlbertModel - **call**
 
 ## FlaxAlbertForPreTraining
 
-[[autodoc]] FlaxAlbertForPreTraining
-    - __call__
+[[autodoc]] FlaxAlbertForPreTraining - **call**
 
 ## FlaxAlbertForMaskedLM
 
-[[autodoc]] FlaxAlbertForMaskedLM
-    - __call__
+[[autodoc]] FlaxAlbertForMaskedLM - **call**
 
 ## FlaxAlbertForSequenceClassification
 
-[[autodoc]] FlaxAlbertForSequenceClassification
-    - __call__
+[[autodoc]] FlaxAlbertForSequenceClassification - **call**
 
 ## FlaxAlbertForMultipleChoice
 
-[[autodoc]] FlaxAlbertForMultipleChoice
-    - __call__
+[[autodoc]] FlaxAlbertForMultipleChoice - **call**
 
 ## FlaxAlbertForTokenClassification
 
-[[autodoc]] FlaxAlbertForTokenClassification
-    - __call__
+[[autodoc]] FlaxAlbertForTokenClassification - **call**
 
 ## FlaxAlbertForQuestionAnswering
 
-[[autodoc]] FlaxAlbertForQuestionAnswering
-    - __call__
+[[autodoc]] FlaxAlbertForQuestionAnswering - **call**
 
 </jax>
 </frameworkcontent>
-
-