mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-05 05:40:05 +06:00

* update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <hi@lysand.re>
121 lines
5.9 KiB
Markdown
121 lines
5.9 KiB
Markdown
# Modular transformers
|
|
|
|
`transformers` is an opinionated framework; our philosophy is defined in the following [conceptual guide](./philosophy).
|
|
|
|
The core of that philosophy is exemplified by the [single model, single file](https://huggingface.co/blog/transformers-design-philosophy)
|
|
aspect of the library. This component's downside is that it limits the inheritance and importability of components from
|
|
files to others in the toolkit.
|
|
|
|
As a result, model components tend to be repeated across many files. There are as many attention layers defined
|
|
in `transformers` as there are models, and a significant number of those are identical to each other.
|
|
The unfortunate consequence is that independent implementations tend to diverge as fixes and changes get applied
|
|
to specific parts of the code.
|
|
|
|
In order to balance this issue, we introduced the concept of "copies" across the library. By adding a comment indicating
|
|
that code is a copy of another, we can enforce through CI and local commands that copies do not diverge. However,
|
|
while the complexity is low, this is often quite tedious to do.
|
|
|
|
And, finally, this contributes to adding a significant overhead to contributing models which we would like to remove.
|
|
This approach often requires model contributions to add modeling code (~1k lines), processor (~500 lines), tests, docs,
|
|
etc. Model contribution PRs rarely add less than 3-5k lines of code, with much of this code being boilerplate.
|
|
|
|
This raises the bar for contributions, and with Modular Transformers, we're aiming to lower the bar to a much more
|
|
acceptable point.
|
|
|
|
## What is it?
|
|
|
|
Modular Transformers introduces the concept of a "modular" file to a model folder. This modular file accepts code
|
|
that isn't typically accepted in modeling/processing files, as it allows importing from neighbouring models as well
|
|
as inheritance from classes to others.
|
|
|
|
This modular file defines models, processors, and the configuration class that would otherwise be defined in their
|
|
respective modules.
|
|
|
|
Finally, this feature introduces a new `linter` which will "unravel" the modular file into the "single model, single
|
|
file" directory structure. These files will get auto-generated every time the script is run; reducing the required
|
|
contributions to the modular file, and therefore only to the changes between the contributed model and others.
|
|
|
|
Model users will end up importing and using the single-file interface, so no change is expected here. Doing this, we
|
|
hope to combine the best of both worlds: enabling simple contributions while sticking to our philosophy.
|
|
|
|
This is therefore a replacement for the `# Copied from` markers, and previously contributed models can be expected to
|
|
be moved to the new Modular Transformers format in the coming months.
|
|
|
|
### Details
|
|
|
|
The "linter", which unravels the inheritance and creates all single-files from the modular file, will flatten the
|
|
inheritance while trying to be invisible to Python users. At this time, the linter flattens a **single** level of
|
|
inheritance.
|
|
|
|
For example:
|
|
- If a configuration class inherits from another and adds/deletes an argument, the generated file will either directly
|
|
reference it (in case of addition) or completely remove it (in case of deletion).
|
|
- If a class inherits from another, for example: class GemmaModel(LlamaModel):, dependencies are automatically
|
|
inferred. All submodules will be automatically inferred from the superclass.
|
|
|
|
You should be able to write everything (the tokenizer, the image processor, the model, the config) in this `modular`
|
|
file, and the corresponding files will be created for you.
|
|
|
|
### Enforcement
|
|
|
|
[TODO] We are introducing a new test, that makes sure the generated content matches what is present in the `modular_xxxx.py`
|
|
|
|
### Examples
|
|
|
|
Here is a quick example with BERT and RoBERTa. The two models are intimately related: their modeling implementation
|
|
differs solely by a change in the embedding layer.
|
|
|
|
Instead of redefining the model entirely, here is what the `modular_roberta.py` file looks like for the modeling &
|
|
configuration classes (for the sake of the example, the tokenizer is ignored at this time as very different).
|
|
|
|
```python
|
|
from torch import nn
|
|
from ..bert.configuration_bert import BertConfig
|
|
from ..bert.modeling_bert import (
|
|
BertModel,
|
|
BertEmbeddings,
|
|
BertForMaskedLM
|
|
)
|
|
|
|
# The RoBERTa config is identical to BERT's config
|
|
class RobertaConfig(BertConfig):
|
|
model_type = 'roberta'
|
|
|
|
# We redefine the embeddings here to highlight the padding ID difference, and we redefine the position embeddings
|
|
class RobertaEmbeddings(BertEmbeddings):
|
|
def __init__(self, config):
|
|
super().__init__(config())
|
|
|
|
self.padding_idx = config.pad_token_id
|
|
self.position_embeddings = nn.Embedding(
|
|
config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
|
|
)
|
|
|
|
# The RoBERTa model is identical to the BERT model, except for the embedding layer.
|
|
# We redefine the embeddings above, so here there is no need to do additional work
|
|
class RobertaModel(BertModel):
|
|
def __init__(self, config):
|
|
super().__init__(config)
|
|
self.embeddings = RobertaEmbeddings(config)
|
|
|
|
|
|
# The heads now only need to redefine the model inside to the correct `RobertaModel`
|
|
class RobertaForMaskedLM(BertForMaskedLM):
|
|
def __init__(self, config):
|
|
super().__init__(config)
|
|
self.model = RobertaModel(config)
|
|
```
|
|
|
|
Note that if you do not use the dependency that you defined, you will have the following error:
|
|
|
|
```bash
|
|
ValueError: You defined `RobertaEmbeddings` in the modular_roberta.py, it should be used
|
|
when you define `BertModel`, as it is one of it's direct dependencies. Make sure
|
|
you use it in the `__init__` function.
|
|
```
|
|
|
|
Additionally, you may find a list of examples here:
|
|
|
|
## What it is not
|
|
|
|
It is not a replacement for the modeling code (yet?), and if your model is not based on anything else that ever existed, then you can add a `modeling` file as usual. |