mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 04:40:06 +06:00

History

Matt 508a704055 No more Tuple, List, Dict (#38797 ) * No more Tuple, List, Dict * make fixup * More style fixes * Docstring fixes with regex replacement * Trigger tests * Redo fixes after rebase * Fix copies * [test all] * update * [test all] * update * [test all] * make style after rebase * Patch the hf_argparser test * Patch the hf_argparser test * style fixes * style fixes * style fixes * Fix docstrings in Cohere test * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>		2025-06-17 19:37:18 +01:00
..
config	[Benchmark] Reuse `optimum-benchmark` (#30615 )	2024-05-21 15:15:19 +02:00
__init__.py	[Benchmark] Reuse `optimum-benchmark` (#30615 )	2024-05-21 15:15:19 +02:00
benchmark.py	Fix typos in comments (#37694 )	2025-04-24 15:59:56 +01:00
benchmarks_entrypoint.py	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
default.yml	feat: add `benchmarks_entrypoint.py` (#34495 )	2024-12-18 18:59:07 +01:00
grafana_dashboard.json	feat: add `benchmarks_entrypoint.py` (#34495 )	2024-12-18 18:59:07 +01:00
grafana_datasource.yaml	feat: add `benchmarks_entrypoint.py` (#34495 )	2024-12-18 18:59:07 +01:00
init_db.sql	feat: add `repository` field to benchmarks table (#38582 )	2025-06-04 15:40:52 +02:00
llama.py	feat: add `repository` field to benchmarks table (#38582 )	2025-06-04 15:40:52 +02:00
optimum_benchmark_wrapper.py	[Benchmark] Reuse `optimum-benchmark` (#30615 )	2024-05-21 15:15:19 +02:00
README.md	Fix some typos about benchmark scripts. (#37027 )	2025-03-28 14:10:20 +00:00
requirements.txt	refactor: benchmarks (#33896 )	2024-10-11 18:03:29 +02:00

README.md

Benchmarks

You might want to add new benchmarks.

You will need to define a python function named run_benchmark in your python file and the file must be located in this benchmark/ directory.

The expected function signature is the following:

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):

Writing metrics to the database

MetricsRecorder is thread-safe, in the sense of the python Thread. This means you can start a background thread to do the readings on the device measurements while not blocking the main thread to execute the model measurements.

cf llama.py to see an example of this in practice.

from benchmarks_entrypoint import MetricsRecorder
import psycopg2

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
  metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
  benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
    # To collect device measurements
    metrics_recorder.collect_device_measurements(
        benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
    )
    # To collect your model measurements
    metrics_recorder.collect_model_measurements(
        benchmark_id,
        {
            "model_load_time": model_load_time,
            "first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
            "second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
            "first_eager_generate_time_secs": first_eager_generate_time,
            "second_eager_generate_time_secs": second_eager_generate_time,
            "time_to_first_token_secs": time_to_first_token,
            "time_to_second_token_secs": time_to_second_token,
            "time_to_third_token_secs": time_to_third_token,
            "time_to_next_token_mean_secs": mean_time_to_next_token,
            "first_compile_generate_time_secs": first_compile_generate_time,
            "second_compile_generate_time_secs": second_compile_generate_time,
            "third_compile_generate_time_secs": third_compile_generate_time,
            "fourth_compile_generate_time_secs": fourth_compile_generate_time,
        },
    )