diff --git a/docs/source/en/model_doc/hubert.md b/docs/source/en/model_doc/hubert.md index 432e127c786..67e7d78beb6 100644 --- a/docs/source/en/model_doc/hubert.md +++ b/docs/source/en/model_doc/hubert.md @@ -71,9 +71,10 @@ pip install -U flash-attn --no-build-isolation Below is an expected speedup diagram comparing the pure inference time between the native implementation in transformers of `facebook/hubert-large-ls960-ft`, the flash-attention-2 and the sdpa (scale-dot-product-attention) version. We show the average speedup obtained on the `librispeech_asr` `clean` validation split: ```python ->>> from transformers import Wav2Vec2Model +>>> from transformers import HubertModel +>>> import torch -model = Wav2Vec2Model.from_pretrained("facebook/hubert-large-ls960-ft", torch_dtype=torch.float16, attn_implementation="flash_attention_2").to(device) +>>> model = HubertModel.from_pretrained("facebook/hubert-large-ls960-ft", torch_dtype=torch.float16, attn_implementation="flash_attention_2").to("cuda") ... ```