transformers/examples/diff-conversion
Arthur 96eb06286b
Diff converter v2 (#30868)
* current working example!

* commit regex and result file

* update

* nit

* push the conversion file

* oups

* roadmap and nits

* attempt diffs for 3 files

* persimmon

* nit

* add diff file that is the same as the modeling_llama.py

* fix rope nits

* updates

* updates with converted versions

* give some breathing space to the code

* delete

* update

* update

* push the actual result

* update regex patterns

* update regex patterns

* fix some issues

* fix some issues

* fix some issues

* updates

* updates

* updates

* updates

* updates

* revert changes done to llama

* updates

* update gemma

* updates

* oups

* current state

* current state

* update

* ouiiii

* nit

* clear diffs

* nit

* fixup

* update

* doc 🚀

* 🔥

* for now use gemma

* deal with comments

* style

* handle funtions

* deal with assigns

* todos

* process inheritage

* keep decorators?

* 🤗

* deal with duplicates

* fixup

* correctly remove duplicate code

* run ruff post script

* ruff deals pretty well with imports, let's leave it to him

* ah maybe not lol

* for now remove all imports from child.

* nit

* conversion of llama

* okay

* convert starcoder2

* synch with main

* update llama diff

* updates

* https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff

* updates

* okay actual state

* non zero exit

* update!

* revert unrelated

* remove other diff files

* updates

* cleanup

* update

* less diff!

* stash

* current updates

* updates

* No need for call

* finished fining deps

* update

* current changes

* current state

* current state

* new status

* nit

* finally

* fixes

* nits

* order is now expected

* use logger info instead of prints

* fixup

* up

* nit

* update

* nits

* update

* correct merge

* update

* update

* update

* add warning

* update caution message

* update

* better merging strategy

* copy class statements :wink

* fixups

* nits

* update

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits

* smaller header

* do cleanup some stuff

* even simpler header?

* fixup

* updates

* ruff

* update examples

* nit

* TODO

* state

* OUUUUUUF

* current state

* nits

* final state

* add a readme

* fixup

* remove diff llama

* fix

* nit

* dummy noy funny

* ruff format tests src utils --check

* everless diffs

* less diffs and fix test

* fixes

* naming nit?

* update converter and add supper example

* nits

* updated for function signatures

* update

* update

* add converted dummies

* autoformat

* single target assign fix

* fixup

* fix some imports

* fixes

* don't push them

* `# noqa: F841`

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-31 18:37:43 +02:00
..
convert_examples.sh Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
diff_dummy.py Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
diff_my_new_model.py Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
diff_my_new_model2.py Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
diff_new_model.py Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
diff_super.py Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00
README.md Diff converter v2 (#30868) 2024-05-31 18:37:43 +02:00

Using the diff_converter linter

pip install libcst is a must!

sh examples/diff-conversion/convert_examples.sh to get the converted outputs

The diff converter is a new linter specific to transformers. It allows us to unpack inheritance in python to convert a modular diff file like diff_gemma.py into a single model single file.

Examples of possible usage are available in the examples/diff-conversion, or diff_gemma for a full model usage.

python utils/diff_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/diff-conversion/diff_my_new_model2.py"

How it works

We use the libcst parser to produce an AST representation of the diff_xxx.py file. For any imports that are made from transformers.models.modeling_xxxx we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the difference dependencies.

The code from the diff file and the class dependency mapping are "merged" to produce the single model single file. We use ruff to automatically remove the potential duplicate imports.

Why we use libcst instead of the native AST?

AST is super powerful, but it does not keep the docstring, comment or code formatting. Thus we decided to go with libcst