mirror of https://github.com/huggingface/transformers.git synced 2025-07-06 14:20:04 +06:00

Make StaticCache configurable at model construct time (#32830 )

* Make StaticCache configurable at model construct time

* integrations import structure

* add new doc file to toc

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Joao Gante <joao@huggingface.co>

2024-09-10 16:35:57 +01:00

2.1 KiB

Raw Blame History

ExecuTorch

ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch ecosystem and supports the deployment of PyTorch models with a focus on portability, productivity, and performance.

ExecuTorch introduces well defined entry points to perform model, device, and/or use-case specific optimizations such as backend delegation, user-defined compiler transformations, memory planning, and more. The first step in preparing a PyTorch model for execution on an edge device using ExecuTorch is to export the model. This is achieved through the use of a PyTorch API called torch.export.

ExecuTorch Integration

An integration point is being developed to ensure that 🤗 Transformers can be exported using torch.export. The goal of this integration is not only to enable export but also to ensure that the exported artifact can be further lowered and optimized to run efficiently in ExecuTorch, particularly for mobile and edge use cases.

autodoc integrations.executorch.TorchExportableModuleWithStaticCache - forward

autodoc integrations.executorch.convert_and_export_with_cache

2.1 KiB Raw Blame History

ExecuTorch

ExecuTorch Integration

2.1 KiB

Raw Blame History