[doc] consistent True/False/None default format (#14951)

* [doc] consistent True/False/None default format

* Update src/transformers/models/xlnet/modeling_xlnet.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Stas Bekman 2021-12-27 14:31:40 -08:00 committed by GitHub
parent b2f500256e
commit 133c5e40c4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
30 changed files with 72 additions and 72 deletions

View File

@ -57,13 +57,13 @@ Tips:
important preprocessing step is that images and segmentation maps are randomly cropped and padded to the same size,
such as 512x512 or 640x640, after which they are normalized.
- One additional thing to keep in mind is that one can initialize [`SegformerFeatureExtractor`] with
`reduce_labels` set to *True* or *False*. In some datasets (like ADE20k), the 0 index is used in the annotated
`reduce_labels` set to `True` or `False`. In some datasets (like ADE20k), the 0 index is used in the annotated
segmentation maps for background. However, ADE20k doesn't include the "background" class in its 150 labels.
Therefore, `reduce_labels` is used to reduce all labels by 1, and to make sure no loss is computed for the
background class (i.e. it replaces 0 in the annotated maps by 255, which is the *ignore_index* of the loss function
used by [`SegformerForSemanticSegmentation`]). However, other datasets use the 0 index as
background class and include this class as part of all labels. In that case, `reduce_labels` should be set to
*False*, as loss should also be computed for the background class.
`False`, as loss should also be computed for the background class.
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below.
| **Model variant** | **Depths** | **Hidden sizes** | **Decoder hidden size** | **Params (M)** | **ImageNet-1k Top 1** |

View File

@ -446,15 +446,15 @@ class TFGenerationMixin:
use_cache: (`bool`, *optional*, defaults to `True`):
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
forced_bos_token_id (`int`, *optional*):
The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful

View File

@ -878,15 +878,15 @@ class GenerationMixin:
Custom stopping criteria that complement the default stopping criteria built from arguments and a
model's config. If a stopping criteria is passed that is already created with the arguments or a
model's config an error is thrown. This feature is intended for advanced users.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
forced_bos_token_id (`int`, *optional*):
The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful
@ -1302,15 +1302,15 @@ class GenerationMixin:
The id of the *padding* token.
eos_token_id (`int`, *optional*):
The id of the *end-of-sequence* token.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
synced_gpus (`bool`, *optional*, defaults to `False`):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
@ -1529,15 +1529,15 @@ class GenerationMixin:
The id of the *padding* token.
eos_token_id (`int`, *optional*):
The id of the *end-of-sequence* token.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
synced_gpus (`bool`, *optional*, defaults to `False`):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
@ -1767,15 +1767,15 @@ class GenerationMixin:
The id of the *padding* token.
eos_token_id (`int`, *optional*):
The id of the *end-of-sequence* token.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
synced_gpus (`bool`, *optional*, defaults to `False`):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
@ -2061,15 +2061,15 @@ class GenerationMixin:
The id of the *padding* token.
eos_token_id (`int`, *optional*):
The id of the *end-of-sequence* token.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
synced_gpus (`bool`, *optional*, defaults to `False`):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
@ -2356,15 +2356,15 @@ class GenerationMixin:
The id of the *padding* token.
eos_token_id (`int`, *optional*):
The id of the *end-of-sequence* token.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
synced_gpus (`bool`, *optional*, defaults to `False`):
Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

View File

@ -707,7 +707,7 @@ class MLflowCallback(TrainerCallback):
HF_MLFLOW_LOG_ARTIFACTS (`str`, *optional*):
Whether to use MLflow .log_artifact() facility to log artifacts.
This only makes sense if logging to a remote server, e.g. s3 or GCS. If set to *True* or *1*, will copy
This only makes sense if logging to a remote server, e.g. s3 or GCS. If set to `True` or *1*, will copy
whatever is in [`TrainingArguments`]'s `output_dir` to the local or remote artifact storage. Using it
without a remote storage will just copy the files to your artifact location.
"""

View File

@ -1840,7 +1840,7 @@ class PoolerEndLogits(nn.Module):
<Tip>
One of `start_states` or `start_positions` should be not obj:*None*. If both are set, `start_positions`
One of `start_states` or `start_positions` should be not obj:`None`. If both are set, `start_positions`
overrides `start_states`.
</Tip>
@ -1906,7 +1906,7 @@ class PoolerAnswerClass(nn.Module):
<Tip>
One of `start_states` or `start_positions` should be not obj:*None*. If both are set, `start_positions`
One of `start_states` or `start_positions` should be not obj:`None`. If both are set, `start_positions`
overrides `start_states`.
</Tip>

View File

@ -219,7 +219,7 @@ class MecabTokenizer:
Whether to apply unicode normalization to text before tokenization.
**mecab_dic**: (*optional*) string (default "ipadic")
Name of dictionary to be used for MeCab initialization. If you are using a system-installed dictionary,
set this option to *None* and modify *mecab_option*.
set this option to `None` and modify *mecab_option*.
**mecab_option**: (*optional*) string
String passed to MeCab constructor.
"""

View File

@ -632,7 +632,7 @@ def _replace_html_entities(text, keep=(), remove_illegal=True, encoding="utf-8")
List of entity names which should not be replaced. This supports both numeric entities (`&#nnnn;` and
`&#hhhh;`) and named entities (such as `&nbsp;` or `&gt;`).
remove_illegal (bool):
If *True*, entities that can't be converted are removed. Otherwise, entities that can't be converted are
If `True`, entities that can't be converted are removed. Otherwise, entities that can't be converted are
kept "as is".
Returns: A unicode string with the entities removed.

View File

@ -150,7 +150,7 @@ class DetrObjectDetectionOutput(ModelOutput):
possible padding). You can use [`~DetrFeatureExtractor.post_process`] to retrieve the unnormalized bounding
boxes.
auxiliary_outputs (`list[Dict]`, *optional*):
Optional, only returned when auxilary losses are activated (i.e. `config.auxiliary_loss` is set to *True*)
Optional, only returned when auxilary losses are activated (i.e. `config.auxiliary_loss` is set to `True`)
and labels are provided. It is a list of dictionaries containing the two above keys (`logits` and
`pred_boxes`) for each decoder layer.
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
@ -217,7 +217,7 @@ class DetrSegmentationOutput(ModelOutput):
[`~DetrFeatureExtractor.post_process_panoptic`] to evaluate instance and panoptic segmentation masks
respectively.
auxiliary_outputs (`list[Dict]`, *optional*):
Optional, only returned when auxiliary losses are activated (i.e. `config.auxiliary_loss` is set to *True*)
Optional, only returned when auxiliary losses are activated (i.e. `config.auxiliary_loss` is set to `True`)
and labels are provided. It is a list of dictionaries containing the two above keys (`logits` and
`pred_boxes`) for each decoder layer.
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):

View File

@ -306,7 +306,7 @@ class EncoderDecoderModel(PreTrainedModel):
`config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -755,7 +755,7 @@ class FlaxEncoderDecoderModel(FlaxPreTrainedModel):
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
decoder_pretrained_model_name_or_path (:obj: *Union[str, os.PathLike]*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *Union[str, os.PathLike]*, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -319,7 +319,7 @@ class TFEncoderDecoderModel(TFPreTrainedModel):
- A path or url to a *pytorch index checkpoint file* (e.g, `./pt_model/`). In this case,
`encoder_from_pt` should be set to `True`.
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -888,8 +888,8 @@ class LayoutLMv2Tokenizer(PreTrainedTokenizer):
"""
Prepares a sequence or a pair of sequences so that it can be used by the model. It adds special tokens,
truncates sequences if overflowing while taking into account the special tokens and manages a moving window
(with user defined stride) for overflowing tokens. Please Note, for *text_pair* different than *None* and
*truncation_strategy = longest_first* or *True*, it is not possible to return overflowing tokens. Such a
(with user defined stride) for overflowing tokens. Please Note, for *text_pair* different than `None` and
*truncation_strategy = longest_first* or `True`, it is not possible to return overflowing tokens. Such a
combination of arguments will raise an error.
Word-level `boxes` are turned into token-level `bbox`. If provided, word-level `word_labels` are turned into

View File

@ -879,8 +879,8 @@ class LukeTokenizer(RobertaTokenizer):
Prepares a sequence of input id, entity id and entity span, or a pair of sequences of inputs ids, entity ids,
entity spans so that it can be used by the model. It adds special tokens, truncates sequences if overflowing
while taking into account the special tokens and manages a moving window (with user defined stride) for
overflowing tokens. Please Note, for *pair_ids* different than *None* and *truncation_strategy = longest_first*
or *True*, it is not possible to return overflowing tokens. Such a combination of arguments will raise an
overflowing tokens. Please Note, for *pair_ids* different than `None` and *truncation_strategy = longest_first*
or `True`, it is not possible to return overflowing tokens. Such a combination of arguments will raise an
error.
Args:

View File

@ -1324,7 +1324,7 @@ class TFLxmertForPreTraining(TFLxmertPreTrainedModel):
Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
obj_labels: (`Dict[Str: Tuple[tf.Tensor, tf.Tensor]]`, *optional*, defaults to :obj: *None*):
obj_labels: (`Dict[Str: Tuple[tf.Tensor, tf.Tensor]]`, *optional*, defaults to :obj: `None`):
each key is named after each one of the visual losses and each element of the tuple is of the shape
`(batch_size, num_features)` and `(batch_size, num_features, visual_feature_dim)` for each the label id and
the label score respectively
@ -1334,7 +1334,7 @@ class TFLxmertForPreTraining(TFLxmertPreTrainedModel):
- 0 indicates that the sentence does not match the image,
- 1 indicates that the sentence does match the image.
ans (`Torch.Tensor` of shape `(batch_size)`, *optional*, defaults to :obj: *None*):
ans (`Torch.Tensor` of shape `(batch_size)`, *optional*, defaults to :obj: `None`):
a one hot representation hof the correct answer *optional*
Returns:

View File

@ -991,8 +991,8 @@ class MLukeTokenizer(PreTrainedTokenizer):
Prepares a sequence of input id, entity id and entity span, or a pair of sequences of inputs ids, entity ids,
entity spans so that it can be used by the model. It adds special tokens, truncates sequences if overflowing
while taking into account the special tokens and manages a moving window (with user defined stride) for
overflowing tokens. Please Note, for *pair_ids* different than *None* and *truncation_strategy = longest_first*
or *True*, it is not possible to return overflowing tokens. Such a combination of arguments will raise an
overflowing tokens. Please Note, for *pair_ids* different than `None` and *truncation_strategy = longest_first*
or `True`, it is not possible to return overflowing tokens. Such a combination of arguments will raise an
error.
Args:

View File

@ -2035,7 +2035,7 @@ class PerceiverBasicDecoder(PerceiverAbstractDecoder):
config ([*PerceiverConfig*]):
Model configuration.
output_num_channels (`int`, *optional*):
The number of channels in the output. Will only be used in case *final_project* is set to *True*.
The number of channels in the output. Will only be used in case *final_project* is set to `True`.
position_encoding_type (`str`, *optional*, defaults to "trainable"):
The type of position encoding to use. Can be either "trainable", "fourier", or "none".
output_index_dims (`int`, *optional*):
@ -2583,7 +2583,7 @@ def generate_fourier_features(pos, num_bands, max_resolution=(224, 224), concat_
Returns:
`torch.FloatTensor` of shape `(batch_size, sequence_length, n_channels)`: The Fourier position embeddings. If
`concat_pos` is *True* and `sine_only` is *False*, output dimensions are ordered as: [dim_1, dim_2, ..., dim_d,
`concat_pos` is `True` and `sine_only` is `False`, output dimensions are ordered as: [dim_1, dim_2, ..., dim_d,
sin(pi*f_1*dim_1), ..., sin(pi*f_K*dim_1), ..., sin(pi*f_1*dim_d), ..., sin(pi*f_K*dim_d), cos(pi*f_1*dim_1),
..., cos(pi*f_K*dim_1), ..., cos(pi*f_1*dim_d), ..., cos(pi*f_K*dim_d)], where dim_i is pos[:, i] and f_k is the
kth frequency band.

View File

@ -258,7 +258,7 @@ class RagPreTrainedModel(PreTrainedModel):
the model, you need to first set it back in training mode with `model.train()`.
Params:
question_encoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
question_encoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the question encoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
@ -271,7 +271,7 @@ class RagPreTrainedModel(PreTrainedModel):
`config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
generator_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
generator_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the generator. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
@ -444,7 +444,7 @@ RAG_FORWARD_INPUTS_DOCSTRING = r"""
Used by the ([`RagModel`]) model during decoding.
decoder_input_ids (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*):
Provide for generation tasks. *None* by default, construct as per instructions for the generator model
Provide for generation tasks. `None` by default, construct as per instructions for the generator model
you're using with your RAG instance.
decoder_attention_mask (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*):
Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also

View File

@ -245,7 +245,7 @@ class TFRagPreTrainedModel(TFPreTrainedModel):
- A path or url to a *pytorch index checkpoint file* (e.g, `./pt_model/`). In this case,
`question_encoder_from_pt` should be set to `True`.
generator_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
generator_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the generator. Can be either:
- A string with the *shortcut name* of a pretrained model to load from cache or download, e.g.,
@ -426,7 +426,7 @@ RAG_FORWARD_INPUTS_DOCSTRING = r"""
Used by the ([`TFRagModel`]) model during decoding.
decoder_input_ids (`tf.Tensor` of shape `(batch_size, target_sequence_length)`, *optional*):
Provide for generation tasks. *None* by default, construct as per instructions for the generator model
Provide for generation tasks. `None` by default, construct as per instructions for the generator model
you're using with your RAG instance.
decoder_attention_mask (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*):
Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also
@ -1136,15 +1136,15 @@ class TFRagTokenForGeneration(TFRagPreTrainedModel, TFCausalLanguageModelingLoss
encoder-decoder model starts decoding with a different token than *bos*, the id of that token.
n_docs (`int`, *optional*, defaults to `config.n_docs`)
Number of documents to retrieve and/or number of documents for which to generate an answer.
output_attentions (`bool`, *optional*, defaults to *False*):
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to *False*):
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to *False*):
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to *False*):
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
model_specific_kwargs:
Additional model specific kwargs will be forwarded to the `forward` function of the model.

View File

@ -300,7 +300,7 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
`config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -720,7 +720,7 @@ class FlaxVisionEncoderDecoderModel(FlaxPreTrainedModel):
- A path to a *directory* containing model weights saved using
[`~FlaxPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
decoder_pretrained_model_name_or_path (:obj: *Union[str, os.PathLike]*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *Union[str, os.PathLike]*, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -272,7 +272,7 @@ class VisionEncoderDecoderModel(PreTrainedModel):
`config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
decoder_pretrained_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the text decoder. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -403,7 +403,7 @@ class FlaxVisionTextDualEncoderModel(FlaxPreTrainedModel):
) -> FlaxPreTrainedModel:
"""
Params:
vision_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
vision_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the vision model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -404,7 +404,7 @@ class VisionTextDualEncoderModel(PreTrainedModel):
) -> PreTrainedModel:
"""
Params:
vision_model_name_or_path (:obj: *str*, *optional*, defaults to *None*):
vision_model_name_or_path (:obj: *str*, *optional*, defaults to `None`):
Information necessary to initiate the vision model. Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.

View File

@ -1383,7 +1383,7 @@ class XLNetLMHeadModel(XLNetPreTrainedModel):
r"""
labels (`torch.LongTensor` of shape `(batch_size, num_predict)`, *optional*):
Labels for masked language modeling. `num_predict` corresponds to `target_mapping.shape[1]`. If
`target_mapping` is :obj*None*, then `num_predict` corresponds to `sequence_length`.
`target_mapping` is `None`, then `num_predict` corresponds to `sequence_length`.
The labels should correspond to the masked input words that should be predicted and depends on
`target_mapping`. Note in order to perform standard auto-regressive language modeling a *<mask>* token has

View File

@ -285,7 +285,7 @@ class AdamW(Optimizer):
Adam's epsilon for numerical stability.
weight_decay (`float`, *optional*, defaults to 0):
Decoupled weight decay to apply.
correct_bias (`bool`, *optional*, defaults to *True*):
correct_bias (`bool`, *optional*, defaults to `True`):
Whether or not to correct bias in Adam (for instance, in Bert TF repository they use `False`).
"""

View File

@ -168,7 +168,7 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
The beta2 parameter in Adam, which is the exponential decay rate for the 2nd momentum estimates.
epsilon (`float`, *optional*, defaults to 1e-7):
The epsilon parameter in Adam, which is a small constant for numerical stability.
amsgrad (`bool`, *optional*, default to *False*):
amsgrad (`bool`, *optional*, default to `False`):
Whether to apply AMSGrad variant of this algorithm or not, see [On the Convergence of Adam and
Beyond](https://arxiv.org/abs/1904.09237).
weight_decay_rate (`float`, *optional*, defaults to 0):

View File

@ -104,7 +104,7 @@ class AudioClassificationPipeline(Pipeline):
requires *ffmpeg* to be installed on the system. If *inputs* is `bytes` it is supposed to be the
content of an audio file and is interpreted by *ffmpeg* in the same way.
top_k (`int`, *optional*, defaults to None):
The number of top labels that will be returned by the pipeline. If the provided number is *None* or
The number of top labels that will be returned by the pipeline. If the provided number is `None` or
higher than the number of labels available in the model configuration, it will default to the number of
labels.

View File

@ -2868,7 +2868,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
Prepares a sequence of input id, or a pair of sequences of inputs ids so that it can be used by the model. It
adds special tokens, truncates sequences if overflowing while taking into account the special tokens and
manages a moving window (with user defined stride) for overflowing tokens. Please Note, for *pair_ids*
different than *None* and *truncation_strategy = longest_first* or *True*, it is not possible to return
different than `None` and *truncation_strategy = longest_first* or `True`, it is not possible to return
overflowing tokens. Such a combination of arguments will raise an error.
Args:

View File

@ -1034,7 +1034,7 @@ class Trainer:
Args:
resume_from_checkpoint (`str` or `bool`, *optional*):
If a `str`, local path to a saved checkpoint as saved by a previous instance of [`Trainer`]. If a
`bool` and equals *True*, load the last checkpoint in *args.output_dir* as saved by a previous instance
`bool` and equals `True`, load the last checkpoint in *args.output_dir* as saved by a previous instance
of [`Trainer`]. If present, training will resume from the model/optimizer/scheduler states loaded here.
trial (`optuna.Trial` or `Dict[str, Any]`, *optional*):
The trial run or the hyperparameter dictionary for hyperparameter search.

View File

@ -105,7 +105,7 @@ class TrainingArguments:
- `"steps"`: Evaluation is done (and logged) every `eval_steps`.
- `"epoch"`: Evaluation is done at the end of each epoch.
prediction_loss_only (`bool`, *optional*, defaults to *False*):
prediction_loss_only (`bool`, *optional*, defaults to `False`):
When performing evaluation and generating predictions, only returns the loss.
per_device_train_batch_size (`int`, *optional*, defaults to 8):
The batch size per GPU/TPU core/CPU for training.
@ -175,7 +175,7 @@ class TrainingArguments:
logging_steps (`int`, *optional*, defaults to 500):
Number of update steps between two logs if `logging_strategy="steps"`.
logging_nan_inf_filter (`bool`, *optional*, defaults to `True`):
Whether to filter `nan` and `inf` losses for logging. If set to obj:*True* the loss of every step that is
Whether to filter `nan` and `inf` losses for logging. If set to obj:`True` the loss of every step that is
`nan` or `inf` is filtered and the average loss of the current logging window is taken instead.
<Tip>