mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
[docs] Format fix (#38414)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
fix table
This commit is contained in:
parent
0f41c41a46
commit
78d771c3c2
@ -56,10 +56,10 @@ Attention is calculated independently in each layer of the model, and caching is
|
||||
|
||||
Refer to the table below to compare how caching improves efficiency.
|
||||
|
||||
| without caching | with caching | | | |
|
||||
|---|---|---|---|---|
|
||||
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V` | | | |
|
||||
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) | | | |
|
||||
| without caching | with caching |
|
||||
|---|---|
|
||||
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
|
||||
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user