mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

* toctree

* not-doctested.txt

* collapse sections

* feedback

* update

* rewrite get started sections

* fixes

* fix

* loading models

* fix

* customize models

* share

* fix link

* contribute part 1

* contribute pt 2

* fix toctree

* tokenization pt 1

* Add new model (#32615)

* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* "to be not" -> "not to be" (#32636)

* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* fix hfoption tag

* tokenization pt. 2

* image processor

* fix toctree

* backbones

* feature extractor

* fix file name

* processor

* update not-doctested

* update

* make style

* fix toctree

* revision

* make fixup

* fix toctree

* fix

* make style

* fix hfoption tag

* pipeline

* pipeline gradio

* pipeline web server

* add pipeline

* fix toctree

* not-doctested

* prompting

* llm optims

* fix toctree

* fixes

* cache

* text generation

* fix

* chat pipeline

* chat stuff

* xla

* torch.compile

* cpu inference

* toctree

* gpu inference

* agents and tools

* gguf/tiktoken

* finetune

* toctree

* trainer

* trainer pt 2

* optims

* optimizers

* accelerate

* parallelism

* fsdp

* update

* distributed cpu

* hardware training

* gpu training

* gpu training 2

* peft

* distrib debug

* deepspeed 1

* deepspeed 2

* chat toctree

* quant pt 1

* quant pt 2

* fix toctree

* fix

* fix

* quant pt 3

* quant pt 4

* serialization

* torchscript

* scripts

* tpu

* review

* model addition timeline

* modular

* more reviews

* reviews

* fix toctree

* reviews reviews

* continue reviews

* more reviews

* modular transformers

* more review

* zamba2

* fix

* all frameworks

* pytorch

* supported model frameworks

* flashattention

* rm check_table

* not-doctested.txt

* rm check_support_list.py

* feedback

* updates/feedback

* review

* feedback

* fix

* update

* feedback

* updates

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

2025-03-03 10:33:46 -08:00

7.0 KiB

Raw Blame History

Hyperparameter search

Hyperparameter search discovers an optimal set of hyperparameters that produces the best model performance. [Trainer] supports several hyperparameter search backends - Optuna, SigOpt, Weights & Biases, Ray Tune - through [~Trainer.hyperparameter_search] to optimize an objective or even multiple objectives.

This guide will go over how to set up a hyperparameter search for each of the backends.

pip install optuna/sigopt/wandb/ray[tune]

To use [~Trainer.hyperparameter_search], you need to create a model_init function. This function includes basic model information (arguments and configuration) because it needs to be reinitialized for each search trial in the run.

Warning

The model_init function is incompatible with the optimizers parameter. Subclass [Trainer] and override the [~Trainer.create_optimizer_and_scheduler] method to create a custom optimizer and scheduler.

An example model_init function is shown below.

def model_init(trial):
    return AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        token=True if model_args.use_auth_token else None,
    )

Pass model_init to [Trainer] along with everything else you need for training. Then you can call [~Trainer.hyperparameter_search] to start the search.

[~Trainer.hyperparameter_search] accepts a direction parameter to specify whether to minimize, maximize, or minimize and maximize multiple objectives. You'll also need to set the backend you're using, an object containing the hyperparameters to optimize for, the number of trials to run, and a compute_objective to return the objective values.

Tip

If compute_objective isn't defined, the default compute_objective is called which is the sum of an evaluation metric like F1.

from transformers import Trainer

trainer = Trainer(
    model=None,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    processing_class=tokenizer,
    model_init=model_init,
    data_collator=data_collator,
)
trainer.hyperparameter_search(...)

The following examples demonstrate how to perform a hyperparameter search for the learning rate and training batch size using the different backends.

Optuna optimizes categories, integers, and floats.

def optuna_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
    }

best_trials = trainer.hyperparameter_search(
    direction=["minimize", "maximize"],
    backend="optuna",
    hp_space=optuna_hp_space,
    n_trials=20,
    compute_objective=compute_objective,
)

Ray Tune optimizes floats, integers, and categorical parameters. It also offers multiple sampling distributions for each parameter such as uniform and log-uniform.

def ray_hp_space(trial):
    return {
        "learning_rate": tune.loguniform(1e-6, 1e-4),
        "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
    }

best_trials = trainer.hyperparameter_search( 
    direction=["minimize", "maximize"],
    backend="ray",
    hp_space=ray_hp_space,
    n_trials=20,
    compute_objective=compute_objective,
)

SigOpt optimizes double, integer, and categorical parameters.

def sigopt_hp_space(trial):
    return [
        {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
        {
            "categorical_values": ["16", "32", "64", "128"],
            "name": "per_device_train_batch_size",
            "type": "categorical",
        },
    ]

best_trials = trainer.hyperparameter_search( 
    direction=["minimize", "maximize"],
    backend="sigopt",
    hp_space=sigopt_hp_space,
    n_trials=20,
    compute_objective=compute_objective,
)

Weights & Biases also optimizes integers, floats, and categorical parameters. It also includes support for different search strategies and distribution options.

def wandb_hp_space(trial):
    return {
        "method": "random",
        "metric": {"name": "objective", "goal": "minimize"},
        "parameters": {
            "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
            "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
        },
    }

best_trials = trainer.hyperparameter_search( 
    direction=["minimize", "maximize"],
    backend="wandb",
    hp_space=wandb_hp_space,
    n_trials=20,
    compute_objective=compute_objective,
)

Distributed Data Parallel

[Trainer] only supports hyperparameter search for distributed data parallel (DDP) on the Optuna and SigOpt backends. Only the rank-zero process is used to generate the search trial, and the resulting parameters are passed along to the other ranks.

7.0 KiB Raw Blame History

Hyperparameter search

Distributed Data Parallel

7.0 KiB

Raw Blame History