mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
![]() Previously, the identity function was used for dropped tokens with a weight from the expert that was not applied to the hidden states. This was misleading, because dropping means, the expert weight is zero. Instead of trying to fix the weight, we take an easier approach by initializing with zeros. Fixes issue https://github.com/huggingface/transformers/issues/37017 |
||
---|---|---|
.. | ||
__init__.py | ||
test_modeling_switch_transformers.py |