transformers/benchmark
cyyever 786d9c5ed9
Fix more inefficient PT operations (#37060)
* Fix inefficient operations

* Remove cpu() call

* Reorder detach()

* Reorder detach()

* tolist without detach

* item without detach

* Update src/transformers/models/rag/modeling_rag.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/models/encodec/test_modeling_encodec.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Use detach().cpu().numpy

* Revert some numpy operations

* More fixes

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-31 16:31:24 +01:00
..
config [Benchmark] Reuse optimum-benchmark (#30615) 2024-05-21 15:15:19 +02:00
__init__.py [Benchmark] Reuse optimum-benchmark (#30615) 2024-05-21 15:15:19 +02:00
benchmark.py Fix benchmark script (#32635) 2024-08-22 16:07:47 +02:00
benchmarks_entrypoint.py Fix some typos about benchmark scripts. (#37027) 2025-03-28 14:10:20 +00:00
default.yml feat: add benchmarks_entrypoint.py (#34495) 2024-12-18 18:59:07 +01:00
grafana_dashboard.json feat: add benchmarks_entrypoint.py (#34495) 2024-12-18 18:59:07 +01:00
grafana_datasource.yaml feat: add benchmarks_entrypoint.py (#34495) 2024-12-18 18:59:07 +01:00
init_db.sql feat: add benchmarks_entrypoint.py (#34495) 2024-12-18 18:59:07 +01:00
llama.py Fix more inefficient PT operations (#37060) 2025-03-31 16:31:24 +01:00
optimum_benchmark_wrapper.py [Benchmark] Reuse optimum-benchmark (#30615) 2024-05-21 15:15:19 +02:00
README.md Fix some typos about benchmark scripts. (#37027) 2025-03-28 14:10:20 +00:00
requirements.txt refactor: benchmarks (#33896) 2024-10-11 18:03:29 +02:00

Benchmarks

You might want to add new benchmarks.

You will need to define a python function named run_benchmark in your python file and the file must be located in this benchmark/ directory.

The expected function signature is the following:

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):

Writing metrics to the database

MetricsRecorder is thread-safe, in the sense of the python Thread. This means you can start a background thread to do the readings on the device measurements while not blocking the main thread to execute the model measurements.

cf llama.py to see an example of this in practice.

from benchmarks_entrypoint import MetricsRecorder
import psycopg2

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
  metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
  benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
    # To collect device measurements
    metrics_recorder.collect_device_measurements(
        benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
    )
    # To collect your model measurements
    metrics_recorder.collect_model_measurements(
        benchmark_id,
        {
            "model_load_time": model_load_time,
            "first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
            "second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
            "first_eager_generate_time_secs": first_eager_generate_time,
            "second_eager_generate_time_secs": second_eager_generate_time,
            "time_to_first_token_secs": time_to_first_token,
            "time_to_second_token_secs": time_to_second_token,
            "time_to_third_token_secs": time_to_third_token,
            "time_to_next_token_mean_secs": mean_time_to_next_token,
            "first_compile_generate_time_secs": first_compile_generate_time,
            "second_compile_generate_time_secs": second_compile_generate_time,
            "third_compile_generate_time_secs": third_compile_generate_time,
            "fourth_compile_generate_time_secs": fourth_compile_generate_time,
        },
    )