* add idefics
* conflicts after merging main
* enable tests but need to fix some
* fix tests
* no print
* fix/skip some slow tests
* continue not skip
* rebasing broken smth, this is the fix
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* redefaulted padding=longest again
* fixup/doc
* add sdpa
* wip
* cleaning
* add ref
* yet more cleaning
* and more :)
* wip llama
* working llama
* add output_attentions=True support
* bigcode sdpa support
* fixes
* gpt-bigcode support, require torch>=2.1.1
* add falcon support
* fix conflicts falcon
* style
* fix attention_mask definition
* remove output_attentions from attnmaskconverter
* support whisper without removing any Copied from statement
* fix mbart default to eager renaming
* fix typo in falcon
* fix is_causal in SDPA
* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
* add warnings when falling back on the manual implementation
* precise doc
* wip replace _flash_attn_enabled by config.attn_implementation
* fix typo
* add tests
* style
* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
* obey to config.attn_implementation if a config is passed in from_pretrained
* fix is_torch_sdpa_available when torch is not installed
* remove dead code
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove duplicate pretraining_tp code
* add dropout in llama
* precise comment on attn_mask
* add fmt: off for _unmask_unattended docstring
* precise num_masks comment
* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
* cleanup modeling_utils
* backward compatibility
* fix style as requested
* style
* improve documentation
* test pass
* style
* add _unmask_unattended tests
* skip meaningless tests for idefics
* hard_check SDPA requirements when specifically requested
* standardize the use if XXX_ATTENTION_CLASSES
* fix SDPA bug with mem-efficient backend on CUDA when using fp32
* fix test
* rely on SDPA is_causal parameter to handle the causal mask in some cases
* fix FALCON_ATTENTION_CLASSES
* remove _flash_attn_2_enabled occurences
* fix test
* add OPT to the list of supported flash models
* improve test
* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
* remove remaining _flash_attn_2_enabled occurence
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove use_attn_implementation
* fix docstring & slight bug
* make attn_implementation internal (_attn_implementation)
* typos
* fix tests
* deprecate use_flash_attention_2=True
* fix test
* add back llama that was removed by mistake
* fix tests
* remove _flash_attn_2_enabled occurences bis
* add check & test that passed attn_implementation is valid
* fix falcon torchscript export
* fix device of mask in tests
* add tip about torch.jit.trace and move bt doc below sdpa
* fix parameterized.expand order
* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
* update sdpaattention class with the new cache
* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bark/modeling_bark.py
* address review comments
* WIP torch.jit.trace fix. left: test both eager & sdpa
* add test for torch.jit.trace for both eager/sdpa
* fix falcon with torch==2.0 that needs to use sdpa
* fix doc
* hopefully last fix
* fix key_value_length that has no default now in mask converter
* is it flacky?
* fix speculative decoding bug
* tests do pass
* fix following #27907
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix image_attention gate in idefics modeling
* update comment
* cleaner gating
* fix gate condition
* create attention gate once
* update comment
* update doc of cross-attention forward
* improve comment
* bring back no_images
* pass cross_attention_gate similarly to no_images gate
* add information on gate shape
* fix no_images placement
* make tests for gate
* take off no_images logic
* update test based on comments
* raise value error if cross_attention_gate is None
* send cross_attention_gate to device
* Revert "send cross_attention_gate to device"
This reverts commit 054f842284.
* send cross_attention_gate to device
* fix device in test + nit
* fill hidden_states with zeros instead of multiplying with the gate
* style
* Update src/transformers/models/idefics/modeling_idefics.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/idefics/modeling_idefics.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* stronger GC tests
* better tests and skip failing tests
* break down into 3 sub-tests
* break down into 3 sub-tests
* refactor a bit
* more refactor
* fix
* last nit
* credits contrib and suggestions
* credits contrib and suggestions
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix wav2vec2
* nit
* stash
* one more file to update
* fix byt5
* vocab size is 256, don't change that!
* use other revision
* test persimon in smaller size
* style
* tests
* nits
* update add tokens from pretrained
* test tokenization
* nits
* potential fnet fix?
* more nits
* nits
* correct test
* assert close
* udpate
* ouch
* fix it
* some more nits
* FINALLU
* use `adept` checkpoints
* more adept checkpoints
* that was invlved!
* add pos embed interpolation for vision encoder
* style
* update config with interpolate_pos_encoding arg
* fix imports formatting
* take off copied from on vision embeddings
* add test for image embeddings interpolation
* add credit for interpolation code
* Update src/transformers/models/idefics/configuration_idefics.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics/vision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix condition to check nbr image patches match shape of pos embeddings
* use kwargs in the forward methods for interpolation
* fix tests
* have interpolate_pos_encoding default to False instead of None
* Update tests/models/idefics/test_modeling_idefics.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics/test_modeling_idefics.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics/test_modeling_idefics.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics/configuration_idefics.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* take off for loop meant to print k,v
* add interpolate_pos_encoding arg in prepare_inputs_for_generation
* add test for interpolated generation
* fix edge case num_patches == num_positions and height == width
* add test for edge case
* fix pos_embed in interpolate
* allow interpolation in bf16 with upcasting
* Update src/transformers/models/idefics/vision.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/idefics/vision.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add multiple images tests for interpolation and generation
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>