transformers/tests/utils
Arthur 19d58d31f1
Add MLLama (#33703)
* current changes

* nit

* Add cross_attenttion_mask to processor

* multi-image fixed

* Add cross_attenttion_mask to processor

* cross attn works in all cases

* WIP refactoring function for image processor

* WIP refactoring image processor functions

* Refactor preprocess to use global loops instead of list nested list comps

* Docstrings

* Add channels unification

* fix dtype issues

* Update docsrings and format

* Consistent max_image_tiles

* current script

* updates

* Add convert to rgb

* Add image processor tests

* updates!

* update

* god damn it I am dumb sometimes

* Precompute aspect ratios

* now this works, full match

* fix 😉

* nits

* style

* fix model and conversion

* nit

* nit

* kinda works

* hack for sdpa non-contiguous bias

* nits here and there

* latest c hanges

* merge?

* run forward

* Add aspect_ratio_mask

* vision attention mask

* update script and config variable names

* nit

* nits

* be able to load

* style

* nits

* there

* nits

* make forward run

* small update

* enable generation multi-turn

* nit

* nit

* Clean up a bit for errors and typos

* A bit more constant fixes

* 90B keys and shapes match

* Fix for 11B model

* Fixup, remove debug part

* Docs

* Make max_aspect_ratio_id to be minimal

* Update image processing code to match new implementation

* Adjust conversion for final checkpoint state

* Change dim in repeat_interleave (accordig to meta code)

* tmp fix for num_tiles

* Fix for conversion (gate<->up, q/k_proj rope permute)

* nits

* codestyle

* Vision encoder fixes

* pass cross attn mask further

* Refactor aspect ratio mask

* Disable text-only generation

* Fix cross attention layers order, remove q/k norm rotation for cross atention layers

* Refactor gated position embeddings

* fix bugs but needs test with new weights

* rope scaling should be llama3

* Fix rope scaling name

* Remove debug for linear layer

* fix copies

* Make mask prepare private func

* Remove linear patch embed

* Make precomputed embeddings as nn.Embedding module

* MllamaPrecomputedAspectRatioEmbedding with config init

* Remove unused self.output_dim

* nit, intermediate layers

* Rename ln and pos_embed

* vision_chunk_size -> image_size

* return_intermediate -> intermediate_layers_indices

* vision_input_dim -> hidden_size

* Fix copied from statements

* fix most tests

* Fix more copied from

* layer_id->layer_idx

* Comment

* Fix tests for processor

* Copied from for _prepare_4d_causal_attention_mask_with_cache_position

* Style fix

* Add MllamaForCausalLM

* WIP fixing tests

* Remove duplicated layers

* Remove dummy file

* Fix style

* Fix consistency

* Fix some TODOs

* fix language_model instantiation, add docstring

* Move docstring, remove todos for precomputed embeds (we cannot init them properly)

* Add initial docstrings

* Fix

* fix some tests

* lets skip these

* nits, remove print, style

* Add one more copied from

* Improve test message

* Make validate func private

* Fix dummy objects

* Refactor `data_format` a bit + add comment

* typos/nits

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix dummy objects and imports

* Add chat template config json

* remove num_kv_heads from vision attention

* fix

* move some commits and add more tests

* fix test

* Remove `update_key_name` from modeling utils

* remove num-kv-heads again

* some prelimiary docs

* Update chat template + tests

* nit, conversion script max_num_tiles from params

* Fix warning for text-only generation

* Update conversion script for instruct models

* Update chat template in converstion + test

* add tests for CausalLM model

* model_max_length, avoid null chat_template

* Refactor conversion script

* Fix forward

* Fix integration tests

* Refactor vision config + docs

* Fix default

* Refactor text config

* Doc fixes

* Remove unused args, fix docs example

* Squashed commit of the following:

commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date:   Wed Sep 18 13:39:15 2024 +0000

    Move model + add output hidden states and output attentions

* Fix num_channels

* Add mllama text and mllama vision models

* Fixing repo consistency

* Style fix

* Fixing repo consistency

* Fixing unused config params

* Fix failed tests after refactoring

* hidden_activation -> hidden_act  for text mlp

* Remove from_pretrained from sub-configs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Reuse lambda in conversion script

* Remove run.py

* Update docs/source/en/model_doc/mllama.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/processing_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused LlamaTokenizerFast

* Fix logging

* Refactor gating

* Remove cycle for collecting intermediate states

* Refactor text-only check, add integration test for text-only

* Revert from pretrained to configs

* Fix example

* Add auto `bos_token` adding in processor

* Fix tips

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Enable supports_gradient_checkpointing model flag

* add eager/sdpa options

* don't skip attn tests and bring back GC skips (did i really remove those?)

* Fix signature, but get error with None gradient

* Fix output attention tests

* Disable GC back

* Change no split modules

* Fix dropout

* Style

* Add Mllama to sdpa list

* Add post init for vision model

* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model

* if skipped, say it, don't pass

* Clean vision tester config

* Doc for args

* Update tests/models/mllama/test_modeling_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add cross_attention_mask to test

* typehint

* Remove todo

* Enable gradient checkpointing

* Docstring

* Style

* Fixing and skipping some tests for new cache

* Mark flaky test

* Skip `test_sdpa_can_compile_dynamic` test

* Fixing some offload tests

* Add direct GenerationMixin inheritance

* Remove unused code

* Add initializer_range to vision config

* update the test to make sure we show if split

* fix gc?

* Fix repo consistency

* Undo modeling utils debug changes

* Fix link

* mllama -> Mllama

* [mllama] -> [Mllama]

* Enable compile test for CausalLM model (text-only)

* Fix TextModel prefix

* Update doc

* Docs for forward, type hints, and vision model prefix

* make sure to reset

* fix init

* small script refactor and styling

* nit

* updates!

* some nits

* Interpolate embeddings for 560 size and update integration tests

* nit

* does not suppor static cache!

* update

* fix

* nit2

* this?

* Fix conversion

* Style

* 4x memory improvement with image cache AFAIK

* Token decorator for tests

* Skip failing tests

* update processor errors

* fix split issues

* style

* weird

* style

* fix failing tests

* update

* nit fixing the whisper tests

* fix path

* update

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
..
import_structures Import structure & first three model refactors (#31329) 2024-09-10 11:10:53 +02:00
__init__.py [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
test_activations_tf.py TF: Add sigmoid activation function (#16819) 2022-04-19 16:13:08 +01:00
test_activations.py Add the GeLU activation from pytorch with the tanh approximation (#21345) 2023-02-02 09:33:04 -05:00
test_add_new_model_like.py fix: Fixed failing tests in tests/utils/test_add_new_model_like.py (#32678) 2024-08-14 12:06:17 +01:00
test_audio_utils.py Remove trust_remote_code when loading Libri Dummy (#31748) 2024-07-23 14:54:38 +08:00
test_backbone_utils.py 🚨 out_indices always a list (#30941) 2024-05-22 15:23:04 +01:00
test_cache_utils.py Add MLLama (#33703) 2024-09-25 19:56:25 +02:00
test_chat_template_utils.py Make tool JSON schemas consistent (#31756) 2024-07-02 20:00:42 +01:00
test_cli.py Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 (#32659) 2024-08-23 11:12:53 +01:00
test_configuration_utils.py support loading model without config.json file (#32356) 2024-09-06 13:49:47 +02:00
test_convert_slow_tokenizer.py Revert error back into warning for byte fallback conversion. (#22607) 2023-04-06 14:00:29 +02:00
test_deprecation.py Decorators for deprecation and named arguments validation (#30799) 2024-06-10 12:35:10 +01:00
test_doc_samples.py Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
test_dynamic_module_utils.py Fix the regex in get_imports to support multiline try blocks and excepts with specific exception types (#23725) 2023-05-24 15:40:19 -04:00
test_feature_extraction_utils.py Fix import paths for test_module (#32888) 2024-08-28 12:08:29 +01:00
test_file_utils.py Inheritance-based framework detection (#21784) 2023-02-27 15:31:55 +00:00
test_generic.py Decorators for deprecation and named arguments validation (#30799) 2024-06-10 12:35:10 +01:00
test_hf_argparser.py Allow for str versions of dicts based on typing (#30227) 2024-04-16 08:15:09 -04:00
test_hub_utils.py Use HF_HUB_OFFLINE + fix has_file in offline mode (#31016) 2024-05-29 11:55:43 +01:00
test_image_processing_utils.py Fix import paths for test_module (#32888) 2024-08-28 12:08:29 +01:00
test_image_utils.py fix: Replaced deprecated mktemp() function (#32123) 2024-07-22 14:13:39 +01:00
test_import_structure.py Fix some missing tests in circleci (#33559) 2024-09-20 20:58:51 +02:00
test_logging.py Fix flaky test for log level (#21776) 2023-02-28 16:24:14 -05:00
test_model_card.py Automatically add transformers tag to the modelcard (#32623) 2024-08-13 07:59:01 +02:00
test_model_output.py Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
test_modeling_flax_utils.py Follow up for #31973 (#32025) 2024-07-25 16:12:23 +02:00
test_modeling_rope_utils.py Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188) 2024-09-04 17:01:12 +02:00
test_modeling_tf_core.py Add tf_keras imports to prepare for Keras 3 (#28588) 2024-01-30 17:26:36 +00:00
test_modeling_tf_utils.py Follow up for #31973 (#32025) 2024-07-25 16:12:23 +02:00
test_modeling_utils.py Generation: deprecate PreTrainedModel inheriting from GenerationMixin (#33203) 2024-09-23 18:28:36 +01:00
test_offline.py Use HF_HUB_OFFLINE + fix has_file in offline mode (#31016) 2024-05-29 11:55:43 +01:00
test_processing_utils.py Uniformize kwargs for Pixtral processor (#33521) 2024-09-17 14:44:27 -04:00
test_skip_decorators.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
test_tokenization_utils.py Fix: Fixed directory path for utils folder in test_tokenization_utils.py (#32601) 2024-08-13 16:48:15 +01:00
test_versions_utils.py improve dev setup comments and hints (#28495) 2024-01-15 18:36:40 +00:00
tiny_model_summary.json CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123) 2024-08-27 11:58:27 +01:00