mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-03 03:31:05 +06:00
top-k instead of top-p in MixtralConfig docstring (#30687)
top-k instead of top-p in docstring
This commit is contained in:
parent
835de4c833
commit
4980d62af3
@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
|
||||
attention_dropout (`float`, *optional*, defaults to 0.0):
|
||||
The dropout ratio for the attention probabilities.
|
||||
num_experts_per_tok (`int`, *optional*, defaults to 2):
|
||||
The number of experts to root per-token, can be also interpreted as the `top-p` routing
|
||||
The number of experts to route per-token, can be also interpreted as the `top-k` routing
|
||||
parameter
|
||||
num_local_experts (`int`, *optional*, defaults to 8):
|
||||
Number of experts per Sparse MLP layer.
|
||||
|
Loading…
Reference in New Issue
Block a user