transformers/docs/source/en/main_classes/model.md
Zach Mueller e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) (#31771)
* 1,100%!

* Clean

* Don't touch DS

* Experiment with dtype allocation

* skip test_load_save_without_tied_weights test

* A little faster

* Include proper upscaling?

* Fixup tests

* Potentially skip?

* Let's see if this fixes git history

* Maintain new dtype

* Fin

* Rm hook idea for now

* New approach, see what breaks

* stage

* Clean

* Stash

* Should be fin now, just need to mark failing models

* Clean up

* Simplify

* Deal with weird models

* Enc/Dec

* Skip w/ reason

* Adjust test

* Fix test

* one more test

* Keep experimenting

* Fix ref

* TO REMOVE: testing feedback CI

* Right push

* Update tests/utils/test_modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* disable

* Add new func

* Test nits from Amy

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Adjust comment

* Adjust comment on skip

* make private

* Fin

* Should be a not flag

* Clarify and rename test

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00

2.4 KiB

Models

The base classes [PreTrainedModel], [TFPreTrainedModel], and [FlaxPreTrainedModel] implement the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).

[PreTrainedModel] and [TFPreTrainedModel] also implement a few methods which are common among all the models to:

  • resize the input token embeddings when new tokens are added to the vocabulary
  • prune the attention heads of the model.

The other methods that are common to each model are defined in [~modeling_utils.ModuleUtilsMixin] (for the PyTorch models) and [~modeling_tf_utils.TFModuleUtilsMixin] (for the TensorFlow models) or for text generation, [~generation.GenerationMixin] (for the PyTorch models), [~generation.TFGenerationMixin] (for the TensorFlow models) and [~generation.FlaxGenerationMixin] (for the Flax/JAX models).

PreTrainedModel

autodoc PreTrainedModel - push_to_hub - all

Custom models should also include a _supports_assign_param_buffer, which determines if superfast init can apply on the particular model. Signs that your model needs this are if test_save_and_load_from_pretrained fails. If so, set this to False.

ModuleUtilsMixin

autodoc modeling_utils.ModuleUtilsMixin

TFPreTrainedModel

autodoc TFPreTrainedModel - push_to_hub - all

TFModelUtilsMixin

autodoc modeling_tf_utils.TFModelUtilsMixin

FlaxPreTrainedModel

autodoc FlaxPreTrainedModel - push_to_hub - all

Pushing to the Hub

autodoc utils.PushToHubMixin

Sharded checkpoints

autodoc modeling_utils.load_sharded_checkpoint