mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-02 03:01:07 +06:00
Added Mish Activation Function
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681 It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish Might be a good addition to experiment with especially in the Bert Model.
This commit is contained in:
parent
1c542df7e5
commit
070dcf1c02
@ -138,7 +138,11 @@ def swish(x):
|
||||
return x * torch.sigmoid(x)
|
||||
|
||||
|
||||
ACT2FN = {"gelu": gelu, "relu": torch.nn.functional.relu, "swish": swish, "gelu_new": gelu_new}
|
||||
def mish(x):
|
||||
return x * torch.tanh(nn.functional.softplus(x))
|
||||
|
||||
|
||||
ACT2FN = {"gelu": gelu, "relu": torch.nn.functional.relu, "swish": swish, "gelu_new": gelu_new, "mish": mish}
|
||||
|
||||
|
||||
BertLayerNorm = torch.nn.LayerNorm
|
||||
|
Loading…
Reference in New Issue
Block a user