
* 1,100%! * Clean * Don't touch DS * Experiment with dtype allocation * skip test_load_save_without_tied_weights test * A little faster * Include proper upscaling? * Fixup tests * Potentially skip? * Let's see if this fixes git history * Maintain new dtype * Fin * Rm hook idea for now * New approach, see what breaks * stage * Clean * Stash * Should be fin now, just need to mark failing models * Clean up * Simplify * Deal with weird models * Enc/Dec * Skip w/ reason * Adjust test * Fix test * one more test * Keep experimenting * Fix ref * TO REMOVE: testing feedback CI * Right push * Update tests/utils/test_modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * disable * Add new func * Test nits from Amy * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Adjust comment * Adjust comment on skip * make private * Fin * Should be a not flag * Clarify and rename test --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2.4 KiB
Models
The base classes [PreTrainedModel
], [TFPreTrainedModel
], and
[FlaxPreTrainedModel
] implement the common methods for loading/saving a model either from a local
file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS
S3 repository).
[PreTrainedModel
] and [TFPreTrainedModel
] also implement a few methods which
are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
The other methods that are common to each model are defined in [~modeling_utils.ModuleUtilsMixin
]
(for the PyTorch models) and [~modeling_tf_utils.TFModuleUtilsMixin
] (for the TensorFlow models) or
for text generation, [~generation.GenerationMixin
] (for the PyTorch models),
[~generation.TFGenerationMixin
] (for the TensorFlow models) and
[~generation.FlaxGenerationMixin
] (for the Flax/JAX models).
PreTrainedModel
autodoc PreTrainedModel - push_to_hub - all
Custom models should also include a _supports_assign_param_buffer
, which determines if superfast init can apply
on the particular model. Signs that your model needs this are if test_save_and_load_from_pretrained
fails. If so,
set this to False
.
ModuleUtilsMixin
autodoc modeling_utils.ModuleUtilsMixin
TFPreTrainedModel
autodoc TFPreTrainedModel - push_to_hub - all
TFModelUtilsMixin
autodoc modeling_tf_utils.TFModelUtilsMixin
FlaxPreTrainedModel
autodoc FlaxPreTrainedModel - push_to_hub - all
Pushing to the Hub
autodoc utils.PushToHubMixin
Sharded checkpoints
autodoc modeling_utils.load_sharded_checkpoint