[docs] Format fix (#38414)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run

fix table
This commit is contained in:
Steven Liu 2025-06-03 09:53:23 -07:00 committed by GitHub
parent 0f41c41a46
commit 78d771c3c2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -56,10 +56,10 @@ Attention is calculated independently in each layer of the model, and caching is
Refer to the table below to compare how caching improves efficiency. Refer to the table below to compare how caching improves efficiency.
| without caching | with caching | | | | | without caching | with caching |
|---|---|---|---|---| |---|---|
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V` | | | | | for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) | | | | | attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |