transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

History

Arthur 19d58d31f1 Add MLLama (#33703 ) * current changes * nit * Add cross_attenttion_mask to processor * multi-image fixed * Add cross_attenttion_mask to processor * cross attn works in all cases * WIP refactoring function for image processor * WIP refactoring image processor functions * Refactor preprocess to use global loops instead of list nested list comps * Docstrings * Add channels unification * fix dtype issues * Update docsrings and format * Consistent max_image_tiles * current script * updates * Add convert to rgb * Add image processor tests * updates! * update * god damn it I am dumb sometimes * Precompute aspect ratios * now this works, full match * fix 😉 * nits * style * fix model and conversion * nit * nit * kinda works * hack for sdpa non-contiguous bias * nits here and there * latest c hanges * merge? * run forward * Add aspect_ratio_mask * vision attention mask * update script and config variable names * nit * nits * be able to load * style * nits * there * nits * make forward run * small update * enable generation multi-turn * nit * nit * Clean up a bit for errors and typos * A bit more constant fixes * 90B keys and shapes match * Fix for 11B model * Fixup, remove debug part * Docs * Make max_aspect_ratio_id to be minimal * Update image processing code to match new implementation * Adjust conversion for final checkpoint state * Change dim in repeat_interleave (accordig to meta code) * tmp fix for num_tiles * Fix for conversion (gate<->up, q/k_proj rope permute) * nits * codestyle * Vision encoder fixes * pass cross attn mask further * Refactor aspect ratio mask * Disable text-only generation * Fix cross attention layers order, remove q/k norm rotation for cross atention layers * Refactor gated position embeddings * fix bugs but needs test with new weights * rope scaling should be llama3 * Fix rope scaling name * Remove debug for linear layer * fix copies * Make mask prepare private func * Remove linear patch embed * Make precomputed embeddings as nn.Embedding module * MllamaPrecomputedAspectRatioEmbedding with config init * Remove unused self.output_dim * nit, intermediate layers * Rename ln and pos_embed * vision_chunk_size -> image_size * return_intermediate -> intermediate_layers_indices * vision_input_dim -> hidden_size * Fix copied from statements * fix most tests * Fix more copied from * layer_id->layer_idx * Comment * Fix tests for processor * Copied from for _prepare_4d_causal_attention_mask_with_cache_position * Style fix * Add MllamaForCausalLM * WIP fixing tests * Remove duplicated layers * Remove dummy file * Fix style * Fix consistency * Fix some TODOs * fix language_model instantiation, add docstring * Move docstring, remove todos for precomputed embeds (we cannot init them properly) * Add initial docstrings * Fix * fix some tests * lets skip these * nits, remove print, style * Add one more copied from * Improve test message * Make validate func private * Fix dummy objects * Refactor `data_format` a bit + add comment * typos/nits Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix dummy objects and imports * Add chat template config json * remove num_kv_heads from vision attention * fix * move some commits and add more tests * fix test * Remove `update_key_name` from modeling utils * remove num-kv-heads again * some prelimiary docs * Update chat template + tests * nit, conversion script max_num_tiles from params * Fix warning for text-only generation * Update conversion script for instruct models * Update chat template in converstion + test * add tests for CausalLM model * model_max_length, avoid null chat_template * Refactor conversion script * Fix forward * Fix integration tests * Refactor vision config + docs * Fix default * Refactor text config * Doc fixes * Remove unused args, fix docs example * Squashed commit of the following: commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830 Author: qubvel <qubvel@gmail.com> Date: Wed Sep 18 13:39:15 2024 +0000 Move model + add output hidden states and output attentions * Fix num_channels * Add mllama text and mllama vision models * Fixing repo consistency * Style fix * Fixing repo consistency * Fixing unused config params * Fix failed tests after refactoring * hidden_activation -> hidden_act for text mlp * Remove from_pretrained from sub-configs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Reuse lambda in conversion script * Remove run.py * Update docs/source/en/model_doc/mllama.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/processing_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused LlamaTokenizerFast * Fix logging * Refactor gating * Remove cycle for collecting intermediate states * Refactor text-only check, add integration test for text-only * Revert from pretrained to configs * Fix example * Add auto `bos_token` adding in processor * Fix tips * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Enable supports_gradient_checkpointing model flag * add eager/sdpa options * don't skip attn tests and bring back GC skips (did i really remove those?) * Fix signature, but get error with None gradient * Fix output attention tests * Disable GC back * Change no split modules * Fix dropout * Style * Add Mllama to sdpa list * Add post init for vision model * Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model * if skipped, say it, don't pass * Clean vision tester config * Doc for args * Update tests/models/mllama/test_modeling_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add cross_attention_mask to test * typehint * Remove todo * Enable gradient checkpointing * Docstring * Style * Fixing and skipping some tests for new cache * Mark flaky test * Skip `test_sdpa_can_compile_dynamic` test * Fixing some offload tests * Add direct GenerationMixin inheritance * Remove unused code * Add initializer_range to vision config * update the test to make sure we show if split * fix gc? * Fix repo consistency * Undo modeling utils debug changes * Fix link * mllama -> Mllama * [mllama] -> [Mllama] * Enable compile test for CausalLM model (text-only) * Fix TextModel prefix * Update doc * Docs for forward, type hints, and vision model prefix * make sure to reset * fix init * small script refactor and styling * nit * updates! * some nits * Interpolate embeddings for 560 size and update integration tests * nit * does not suppor static cache! * update * fix * nit2 * this? * Fix conversion * Style * 4x memory improvement with image cache AFAIK * Token decorator for tests * Skip failing tests * update processor errors * fix split issues * style * weird * style * fix failing tests * update * nit fixing the whisper tests * fix path * update --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal> Co-authored-by: qubvel <qubvel@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>		2024-09-25 19:56:25 +02:00
..
import_structures	Import structure & first three model refactors (#31329 )	2024-09-10 11:10:53 +02:00
__init__.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_activations_tf.py	TF: Add sigmoid activation function (#16819 )	2022-04-19 16:13:08 +01:00
test_activations.py	Add the GeLU activation from pytorch with the tanh approximation (#21345 )	2023-02-02 09:33:04 -05:00
test_add_new_model_like.py	fix: Fixed failing tests in `tests/utils/test_add_new_model_like.py` (#32678 )	2024-08-14 12:06:17 +01:00
test_audio_utils.py	Remove `trust_remote_code` when loading Libri Dummy (#31748 )	2024-07-23 14:54:38 +08:00
test_backbone_utils.py	🚨 out_indices always a list (#30941 )	2024-05-22 15:23:04 +01:00
test_cache_utils.py	Add MLLama (#33703 )	2024-09-25 19:56:25 +02:00
test_chat_template_utils.py	Make tool JSON schemas consistent (#31756 )	2024-07-02 20:00:42 +01:00
test_cli.py	Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659 )	2024-08-23 11:12:53 +01:00
test_configuration_utils.py	support loading model without config.json file (#32356 )	2024-09-06 13:49:47 +02:00
test_convert_slow_tokenizer.py	Revert error back into warning for byte fallback conversion. (#22607 )	2023-04-06 14:00:29 +02:00
test_deprecation.py	Decorators for deprecation and named arguments validation (#30799 )	2024-06-10 12:35:10 +01:00
test_doc_samples.py	Skip tests properly (#31308 )	2024-06-26 21:59:08 +01:00
test_dynamic_module_utils.py	Fix the regex in `get_imports` to support multiline try blocks and excepts with specific exception types (#23725 )	2023-05-24 15:40:19 -04:00
test_feature_extraction_utils.py	Fix import paths for test_module (#32888 )	2024-08-28 12:08:29 +01:00
test_file_utils.py	Inheritance-based framework detection (#21784 )	2023-02-27 15:31:55 +00:00
test_generic.py	Decorators for deprecation and named arguments validation (#30799 )	2024-06-10 12:35:10 +01:00
test_hf_argparser.py	Allow for str versions of dicts based on typing (#30227 )	2024-04-16 08:15:09 -04:00
test_hub_utils.py	Use `HF_HUB_OFFLINE` + fix has_file in offline mode (#31016 )	2024-05-29 11:55:43 +01:00
test_image_processing_utils.py	Fix import paths for test_module (#32888 )	2024-08-28 12:08:29 +01:00
test_image_utils.py	fix: Replaced deprecated `mktemp()` function (#32123 )	2024-07-22 14:13:39 +01:00
test_import_structure.py	Fix some missing tests in circleci (#33559 )	2024-09-20 20:58:51 +02:00
test_logging.py	Fix flaky test for log level (#21776 )	2023-02-28 16:24:14 -05:00
test_model_card.py	Automatically add `transformers` tag to the modelcard (#32623 )	2024-08-13 07:59:01 +02:00
test_model_output.py	Skip tests properly (#31308 )	2024-06-26 21:59:08 +01:00
test_modeling_flax_utils.py	Follow up for #31973 (#32025 )	2024-07-25 16:12:23 +02:00
test_modeling_rope_utils.py	Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 (#33188 )	2024-09-04 17:01:12 +02:00
test_modeling_tf_core.py	Add tf_keras imports to prepare for Keras 3 (#28588 )	2024-01-30 17:26:36 +00:00
test_modeling_tf_utils.py	Follow up for #31973 (#32025 )	2024-07-25 16:12:23 +02:00
test_modeling_utils.py	Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203 )	2024-09-23 18:28:36 +01:00
test_offline.py	Use `HF_HUB_OFFLINE` + fix has_file in offline mode (#31016 )	2024-05-29 11:55:43 +01:00
test_processing_utils.py	Uniformize kwargs for Pixtral processor (#33521 )	2024-09-17 14:44:27 -04:00
test_skip_decorators.py	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
test_tokenization_utils.py	Fix: Fixed directory path for utils folder in `test_tokenization_utils.py` (#32601 )	2024-08-13 16:48:15 +01:00
test_versions_utils.py	improve dev setup comments and hints (#28495 )	2024-01-15 18:36:40 +00:00
tiny_model_summary.json	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00