mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
Deleted duplicate sentence (#26394)
This commit is contained in:
parent
a09130feee
commit
a8531f3bfd
@ -68,8 +68,6 @@ You can benefit from considerable speedups for fine-tuning and inference, especi
|
||||
|
||||
To overcome this, one should use Flash Attention without padding tokens in the sequence for training (e.g., by packing a dataset, i.e., concatenating sequences until reaching the maximum sequence length. An example is provided [here](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py#L516).
|
||||
|
||||
Below is the expected speedup you can get for a simple forward pass on [tiiuae/falcon-7b](https://hf.co/tiiuae/falcon-7b) with a sequence length of 4096 and various batch sizes without padding tokens:
|
||||
|
||||
Below is the expected speedup you can get for a simple forward pass on [tiiuae/falcon-7b](https://hf.co/tiiuae/falcon-7b) with a sequence length of 4096 and various batch sizes, without padding tokens:
|
||||
|
||||
<div style="text-align: center">
|
||||
|
Loading…
Reference in New Issue
Block a user