transformers/tests/models/switch_transformers
Mario Michael Krell bde41d69b4
Correctly drop tokens in SwitchTransformer (#37123)
Previously, the identity function was used for dropped tokens
with a weight from the expert that was not applied to the hidden states.
This was misleading, because dropping means, the expert weight is zero.
Instead of trying to fix the weight, we take an easier approach by initializing with zeros.

Fixes issue https://github.com/huggingface/transformers/issues/37017
2025-04-10 16:58:57 +02:00
..
__init__.py Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
test_modeling_switch_transformers.py Correctly drop tokens in SwitchTransformer (#37123) 2025-04-10 16:58:57 +02:00