transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 14:29:01 +06:00

Author	SHA1	Message	Date
hyenal	1c21f48a50	add sdpa to ViT [follow up of #29325 ] (#30555 ) remove blank line (+1 squashed commit) Squashed commits: [24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits) Squashed commits: [08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder [ec96a8db3] [run-slow]vit_msn [ead817eca] fix vit msn multi gpu [d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [3fdbfa88f] doc [a3ff33e4a] finish implementation [e20b7b7fb] Update test_modeling_common.py [e290c5810] Update test_modeling_flax_common.py [d3af86f46] comment [ff7dd32d8] more comments [59b137889] suggestion [7e2ba6d67] attn_implementation as attribute of the class [fe66ab71f] minor [38642b568] Apply suggestions from code review Accept comments Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [22cde7d52] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [48e137cc6] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [99f4c679f] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [61f00ebb0] all tests are passing locally [e9e0b82b7] vision encoder/decoder [4d5076b56] test-vision (+20 squashed commits) Squashed commits: [d1add8db9] yolo [9fde65716] fix flax [986566c28] minor [ca2f21d1f] vit [3333efd7a] easy models change [ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [48ecc7e26] all tests are passing locally [bff7fc366] minor [62f88306f] fix yolo and text_encoder tests [121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [cffaa10dd] fix-copies [ef6c511c4] test vit hybrid [7d4ba8644] vit hybrid [66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1fcc0a031] fixes [cfde6eb21] fixup [e77df1ed3] all except yolo end encoder decoder (+17 squashed commits) Squashed commits: [602913e22] vit + vit_mae are working [547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes [61a97dfa9] it s the complete opposite... [aefab37d4] fix more tests [71802a1b9] fix all torch tests [40b12eb58] encoder - decoder tests [941552b69] slow decorator where appropriate [14d055d80] has_attentions to yolo and msn [3381fa19f] add correct name [e261316a7] repo consistency [31c6d0c08] fixup [9d214276c] minor fix [11ed2e1b7] chore [eca6644c4] add sdpa to vit-based models [cffbf390b] make fix-copies result [6468319b0] fix style [d324cd02a] add sdpa for vit Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com>	2024-05-16 10:56:11 +01:00
amyeroberts	58faa7b824	Deprecate models script - correctly set the model name for the doc file (#30785 ) * Correctly set the moel name for the doc file * Fix up	2024-05-15 15:14:11 +01:00
Lysandre Debut	a42844955f	Loading GGUF files support (#30391 ) * Adds support for loading GGUF files Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> * add q2_k q3_k q5_k support from @99991 * fix tests * Update doc * Style * Docs * fix CI * Update docs/source/en/gguf.md * Update docs/source/en/gguf.md * Compute merges * change logic * add comment for clarity * add comment for clarity * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change logic * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_gguf_pytorch_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * put back comment * add comment about mistral * comments and added tests * fix unconsistent type * more * fix tokenizer * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments about tests and tokenizer + add added_tokens * from_gguf -> gguf_file * replace on docs too --------- Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 14:28:20 +02:00
amyeroberts	0f8fefd481	Deprecate models script (#30184 ) * Add utility for finding candidate models for deprecation * Update model init * Make into configurable script * Fix path * Add sorting of base object alphabetically * Tidy * Refactor __init__ alpha ordering * Update script with logging * fix import * Fix logger * Fix logger * Get config file before moving files * Take models from CLI * Split models into lines to make easier to feed to deprecate_models script * Update * Use posix path * Print instead * Add example in module docstring * Fix up * Add clarifying comments; add models to DEPRECATE_MODELS * Address PR comments * Don't update relative paths on the same level	2024-05-13 16:30:55 +01:00
Yih-Dar	82c1625ec3	Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) (#30699 ) * update * update * update * update * update * update * update * update * Update utils/notification_service.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 17:27:44 +02:00
Omar Sanseviero	c99d88e520	Update CodeLlama references (#30218 ) * Update CodeLlama references * Update slow_documentation_tests.txt * Update slow_documentation_tests.txt	2024-05-09 22:57:52 +02:00
Yih-Dar	884e3b1c53	Rename artifact name `prev_ci_results` to `ci_results` (#30697 ) * rename * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-07 16:59:16 +02:00
Yih-Dar	05ec950c24	Update `workflow_id` in `utils/get_previous_daily_ci.py` (#30695 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-07 16:58:50 +02:00
Aymeric Roucher	0ba15cedbc	Reboot Agents (#30387 ) * Create CodeAgent and ReactAgent * Fix formatting errors * Update documentation for agents * Add custom errors, improve logging * Support variable usage in ReactAgent * add messages * Add message passing format * Create React Code Agent * Update * Refactoring * Fix errors * Improve python interpreter * Only non-tensor inputs should be sent to device * Calculator tool slight refactor * Improve docstrings * Refactor * Fix tests * Fix more tests * Fix even more tests * Fix tests by replacing output and input types * Fix operand type issue * two small fixes * EM TTS * Fix agent running type errors * Change text to speech tests to allow changed outputs * Update doc with new agent types * Improve code interpreter * If max iterations reached, provide a real answer instead of an error * Add edge case in interpreter * Add safe imports to the interpreter * Interpreter tweaks: tuples and listcomp * Make style * Make quality * Add dictcomp to interpreter * Rename ReactJSONAgent to ReactJsonAgent * Misc changes * ToolCollection * Rename agent's logger to self.logger * Add while loops to interpreter * Update doc with new tools. still need to mention collections * Add collections to the doc * Small fixes on logs and interpretor * Fix toolbox return type * Docs + fixup * Skip doctests * Correct prompts with improved examples and formatting * Update prompt * Remove outdated docs * Change agent to accept Toolbox object for tools * Remove calculator tool * Propagate removal of calculator in doc * Fix 2 failing workflows * Simplify additional argument passing * AgentType audio * Minor changes: function name, types * Remove calculator tests * Fix test * Fix torch requirement * Fix final answer tests * Style fixes * Fix tests * Update docstrings with calculator removal * Small type hint fixes * Update tests/agents/test_translation.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_python_interpreter.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/default_tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_agents.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bert/configuration_bert.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/speech_to_text.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_speech_to_text.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_tools_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * pygments * Answer comments * Cleaning up * Simplifying init for all agents * Improving prompts and making code nicer * Style fixes * Add multiple comparator test in interpreter * Style fixes * Improve BERT example in documentation * Add examples to doc * Fix python interpreter quality * Logging improvements * Change test flag to agents * Quality fix * Add example for HfEngine * Improve conversation example for HfEngine * typo fix * Verify doc * Update docs/source/en/agents.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/agents.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/prompts.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/python_interpreter.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/agents.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix style issues * local s2t tool --------- Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-07 12:59:49 +02:00
Yih-Dar	91d155ea92	Avoid duplication in PR slow CI model list (#30634 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-03 18:19:30 +02:00
Yih-Dar	87927b248e	General PR slow CI (#30540 ) * More general PR slow CI * Update utils/pr_slow_ci_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-30 21:05:09 +02:00
amyeroberts	30ee508c6c	Script for finding candidate models for deprecation (#29686 ) * Add utility for finding candidate models for deprecation * Better model filtering * Update * Add warning tip * Fix up * Review comments * Filter requests based on tags * Add copyright header	2024-04-25 10:10:01 +01:00
Yih-Dar	fbb41cd420	consistent job / pytest report / artifact name correspondence (#30392 ) * better names * run better names * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-24 22:32:42 +02:00
Yih-Dar	d0d430f14a	Fix wrong indent in `utils/check_if_new_model_added.py` (#30456 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-24 17:44:12 +02:00
Arthur	89c510d842	Add llama3 (#30334 ) * nuke * add co-author * add co-author * update card * fixup and fix copies to please our ci * nit fixup * super small nits * remove tokenizer_path from call to `write_model` * always safe serialize by default --------- Co-authored-by: pcuenca <pcuenca@users.noreply.github.com> Co-authored-by: xenova <xenova@users.noreply.github.com>	2024-04-24 10:11:19 +02:00
Yih-Dar	fc34f842cc	New model PR needs green (slow tests) CI (#30341 ) * You should not pass Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-24 09:52:55 +02:00
Lysandre Debut	c6bba94040	Remove mentions of models in the READMEs and link to the documentation page in which they are featured. (#30420 ) * REAMDEs * REAMDEs v2	2024-04-24 09:38:31 +02:00
Lysandre Debut	d4e92f1a21	Remove add-new-model in favor of add-new-model-like (#30424 ) * Remove add-new-model in favor of add-new-model-like * nits	2024-04-24 09:38:18 +02:00
Lysandre Debut	0eb8fbcdac	Remove task guides auto-update in favor of links towards task pages (#30429 )	2024-04-24 09:38:10 +02:00
Matt	696ededd2b	Remove old TF port docs (#30426 ) * Remove old TF port guide * repo-consistency * Remove some translations as well for consistency * Remove some translations as well for consistency	2024-04-23 16:06:20 +01:00
João David	d2cec09baa	Add TF swiftformer (#23342 ) * Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Should have been added to previous commit Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-04-19 18:31:43 +01:00
Lysandre Debut	e67ccf0610	Transformers Metadata (#30344 )	2024-04-19 15:08:53 +02:00
tomeras91	3f20877da9	Add jamba (#29943 ) * Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache 🙃 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-04-18 11:04:02 +02:00
Yih-Dar	5fabebdb7d	Fix test fetcher (doctest) + `Idefics2`'s doc example (#30274 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-16 21:25:06 +02:00
Yih-Dar	cbc2cc187a	More fixes for doctest (#30265 ) * fix * update * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-16 11:58:55 +02:00
amyeroberts	6b78360e6d	Add Idefics2 (#30253 ) * Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-15 17:03:03 +01:00
Yih-Dar	440bd3c3c0	update github actions packages' version to suppress warnings (#30249 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-15 15:08:09 +02:00
Yih-Dar	fe2d20d275	Fix doctest more (for `docs/source/en`) (#30247 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-15 14:10:59 +02:00
Yih-Dar	b6b6daf2b7	Refactor doctest (#30210 ) * fix * update * fix * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-15 13:20:36 +02:00
Younes Belkada	2c66600c3f	ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified (#29235 ) * v1 * v1 * more changes * more models * add more markers * swtich to A10 * use cache * Update .github/workflows/push-important-models.yml * Update .github/workflows/push-important-models.yml * Update modeling_llama.py * test * test * another test * test * test * attempt to fix * fix * try automatic tagging * fix * alternative approach for collecting * fix * fix * fix * test * fix * fix * test * revert some changes * fix * fix * fix * final push * fix * revert * test new slack message * oops * Update send-slack.yml * test * test re-usable workflow in steps * Update action.yml * test * another test * test * another test * test * another test * another test (hopefully last one) * attempt to fix * allez * removing comma * test * another test * attempt * test * test * test push * test * test * another test * test * make it better * fix commas * valid json * test * another test * test * final push * test * final push * more customizable messages * test * push * oops * another test * another test * missing indentation * more tweaks * more tweaks * another test * another test * tests * final push * use global variables instead * Update .github/workflows/push-important-models.yml * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * commit to test all models * issue with arrays * another test * attempt to fix failing tests * Update .github/workflows/push-important-models.yml * add ssh * Update .github/workflows/push-important-models.yml * test * test * add install curl * attempt to fix * final fix * test * test * test * fix test * another test * add inherit secrets * push * revert unneeded changes * revert * add env variables * add pip freeze * revert change in gemma * Update .github/workflows/push-important-models.yml * fix mistral and mixtral * add pdb * fix mixtral tesst * fix * fix mistral ? * add fix gemma * fix mistral * fix * test * anoter test * fix * fix * fix mistral tests * fix them again * final fixes for mistral * fix padding right * fix whipser fa2 * fix * fix * fix gemma * test * fix llama * fix * fix * fix llama gemma * add class attribute * fix CI * clarify whisper * compute_capability * rename names in some comments * Add # fmt: skip * make style * Update tests/models/mistral/test_modeling_mistral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * update * change branch * correct workflow * modify file * test * works * final test * another fix * install sudo * final fix * add `-y` * set to `main` * Update .github/actions/post-slack/action.yml Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change title * fixup * add upload report * fix * revert to main * add empty lines + add comment --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-12 10:01:28 +02:00
Arthur	0fe44059ae	Add recurrent gemma (#30143 ) * Fork. * RecurrentGemma initial commit. * Updating __init__.py. * Minor modification to how we initialize the cache. Changing how the config specifies the architecture. * Reformat code to 4 spaces. Fixed a few typos. * Fixed the forward pass. Still unclear on the cache? * Fixed the RecurrentGemmaForCausalLM * Minor comment that we might not need attention_mask and output_attention arguments. * Now cache should work as well. * Adding a temporary example to check whether the model generation works. * Adding the tests and updating imports. * Adding the example file missing in the previous commit. * First working example. * Removing .gitignore and reverting parts of __init__. * Re-add .gitignore. * Addressing comments for configuration. * Move mask creation to `_prepare_inputs_for_generation`. * First try at integration tests: 1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed * Transfoering between machines. * Running normal tests. * Minor fix. * More fixes. * Addressing more comments. * Minor fixes. * first stab at cleanup * more refactoring * fix copies and else * renaming and get init to work * fix causal mask creation * update * nit * fix a hell lot of things * updates * update conversion script * make all keys importable * nits * add auto mappings * properly convert ffw_up and down * add scaling * fix generations * for recurrent dtype * update * fix going beyong window * fixup * add missing files * current updates to remove last einops * finish modeling refactor * TADA * fix compile * fix most failing testt ? ? * update tests * refactor and update * update * nits, fixup and update tests * more fixup * nits * fix imports * test format * fixups * nits * tuple typing * fix code quality * add model card * fix doc * skip most generation tests * nits * style * doc fixes * fix pr and check_copies? * last nit * oupsy * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * update * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * doc nit * fix quality * quality * fix slow test model path * update default dype * ignore attributes that can be safely ignored in check config attributes * 0lallalala come on * save nit * style * remove to dict update * make sure we can also run in float16 * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Aleksandar Botev <botev@google.com> Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com> Co-authored-by: anushanf <anushanf@google.com> Co-authored-by: botev <botevmg@gmail.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-10 16:59:13 +02:00
Marc Sun	58a939c6b7	Fix quantization tests (#29914 ) * revert back to torch 2.1.1 * run test * switch to torch 2.2.1 * udapte dockerfile * fix awq tests * fix test * run quanto tests * update tests * split quantization tests * fix * fix again * final fix * fix report artifact * build docker again * Revert "build docker again" This reverts commit `399a5f9d93`. * debug * revert * style * new notification system * testing notfication * rebuild docker * fix_prev_ci_results * typo * remove warning * fix typo * fix artifact name * debug * issue fixed * debug again * fix * fix time * test notif with faling test * typo * issues again * final fix ? * run all quantization tests again * remove name to clear space * revert modfiication done on workflow * fix * build docker * build only quant docker * fix quantization ci * fix * fix report * better quantization_matrix * add print * revert to the basic one	2024-04-09 17:10:29 +02:00
Yih-Dar	b17b54d3dd	Refactor daily CI workflow (#30012 ) * separate jobs * separate jobs * use channel name directly instead of ID * use channel name directly instead of ID * use channel name directly instead of ID --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-05 15:49:51 +02:00
Yih-Dar	48795317a2	[test fetcher] Always include the directly related test files (#30050 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-05 14:30:36 +02:00
Yih-Dar	24d787ce9d	Add `whisper` to `IMPORTANT_MODELS` (#30046 ) Add whisper Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-05 09:06:40 +02:00
Arthur	fa2c49b00b	Fix copies main ci (#29979 ) * fix copies * nit * style * Update utils/check_copies.py	2024-04-01 12:43:58 +02:00
Arthur	b256516a8c	[`make fix-copies`] update and help (#29924 ) * add some help * style	2024-03-28 08:56:14 +01:00
Bo Zheng	1c39974a4c	Add Qwen2MoE (#29377 ) * add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-27 02:11:55 +01:00
NielsRogge	d91fd7f92c	Add LLaVa-1.6, bis (#29586 ) * First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-20 15:51:12 +00:00
Yih-Dar	66ce9593fd	Fix `check_copies` not capturing the diff in model/paper title and link (#29724 ) * fix * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-19 18:52:36 +01:00
Yih-Dar	87e2ea33aa	Fix `filter_models` (#29710 ) * update * update * update * check --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-18 14:32:42 +01:00
Yoach Lacombe	c43b380e70	Add MusicGen Melody (#28819 ) * first modeling code * make repository * still WIP * update model * add tests * add latest change * clean docstrings and copied from * update docstrings md and readme * correct chroma function * correct copied from and remove unreleated test * add doc to toctree * correct imports * add convert script to notdoctested * Add suggestion from Sanchit Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct get_uncoditional_inputs docstrings * modify README according to SANCHIT feedback * add chroma to audio utils * clean librosa and torchaudio hard dependencies * fix FE * refactor audio decoder -> audio encoder for consistency with previous musicgen * refactor conditional -> encoder * modify sampling rate logics * modify license at the beginning * refactor all_self_attns->all_attentions * remove ignore copy from causallm generate * add copied from for from_sub_models * fix make copies * add warning if audio is truncated * add copied from where relevant * remove artefact * fix convert script * fix torchaudio and FE * modify chroma method according to feedback-> better naming * refactor input_values->input_features * refactor input_values->input_features and fix import fe * add input_features to docstrigs * correct inputs_embeds logics * remove dtype conversion * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation * change warning for chroma length * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change way to save wav, using soundfile * correct docs and change to soundfile * fix import * fix init proj layers * remove line breaks from md * fix issue with docstrings * add FE suggestions * improve is in logics and remove useless imports * remove custom from_pretrained * simplify docstring code * add suggestions for modeling tests * make style * update converting script with sanity check * remove encoder attention mask from conditional generation * replace musicgen melody checkpoints with official orga * rename ylacombe->facebook in checkpoints * fix copies * remove unecessary warning * add shape in code docstrings * add files to slow doc tests * fix md bug and add md to not_tested * make fix-copies * fix hidden states test and batching --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-03-18 13:06:12 +00:00
Yih-Dar	5011908e10	Revert "Fix wrong condition used in `filter_models`" (#29682 ) Revert "Fix wrong condition used in `filter_models` (#29673)" This reverts commit `174aecd099`.	2024-03-15 18:59:37 +01:00
Yih-Dar	174aecd099	Fix wrong condition used in `filter_models` (#29673 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-15 15:38:36 +01:00
Nate Cibik	1fc505b816	Add PvT-v2 Model (#26812 ) * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Updated index.md * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Fixed config docstring. Added channels property * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Fixed config backbone compat * Ran fix-copies * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Fixed issue from rebase * Fixed issue from rebase * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported * Fixed issue from rebase * Fixed issue from rebase * Changed model name in docs * Removed duplicate PvtV2Backbone * Work around type switching issue in tests * Fix model name in config comments * Update docs/source/en/model_doc/pvt_v2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed from using 'sr_type' to 'linear_attention' for clarity * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed old code * Changed from using 'sr_type' to 'linear_attention' for clarity * Fixed Class names to be more descriptive * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed outdated code * Moved paper abstract to single line in pvt_v2.md * Added usage tips to pvt_v2.md * Simplified module inits by passing layer_idx * Fixed typing for hidden_act in PvtV2Config * Removed unusued import * Add pvt_v2 to docs/source/en/_toctree.yml * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Move function parameters to single line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Update year of copyright to 2024 Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Make code more explicit Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated sr_ratio to be more explicit spatial_reduction_ratio * Removed excess type hints in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Move params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Removed needless comment in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update copyright date in pvt_v2.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated copyright date in configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Cleaned comments in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Renamed spatial_reduction Conv2D operation * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py " This reverts commit `c4a04416dd`. * Updated conversion script to reflect module name change * Deprecated reshape_last_stage option in config * Removed unused imports * Code formatting * Fixed outdated decorators on test_inference_fp16 * Added "Copied from" comments in test_modeling_pvt_v2.py * Fixed import listing * Updated model name * Force empty commit for PR refresh * Fixed linting issue * Removed # Copied from comments * Added PVTv2 to README_fr.md * Ran make fix-copies * Replace all FoamoftheSea hub references with OpenGVLab * Fixed out_indices and out_features logic in configuration_pvt_v2.py * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py * Ran code fixup * Fixed order of parent classes in PvtV2Config to fix the to_dict method override --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-13 19:05:20 +00:00
Klaus Hipp	c1e478aa7f	Add missing localized READMEs to the copies check (#29575 ) * Add missing localized READMEs to the copies check * Run check to resolve all inconsistencies	2024-03-11 17:17:42 +00:00
Yih-Dar	e5eb55b88b	Don't use a subset in test fetcher if on `main` branch (#28816 ) save ci life Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-11 16:58:06 +01:00
Arthur	fb1c62e973	[`Add Mamba`] Adds support for the `Mamba` models (#28094 ) * initial-commit * start cleaning * small nits * small nits * current updates * add kernels * small refactoring little step * add comments * styling * nit * nits * Style * Small changes * Push dummy mambda simple slow * nit * Use original names * Use original names and remove norm * Updates for inference params * Style nd updates * nits * Match logits * Add a test * Add expected generated text * nits doc, imports and styling * style * oups * dont install kernels, invite users to install the required kernels * let use use the original packages * styling * nits * fix some copieds * update doc * fix-copies * styling done * nits * fix import check * run but wrong cuda ress * mamba CUDA works :) * fix the fast path * config naming nits * conversion script is not required at this stage * finish fixing the fast path: generation make sense now! * nit * Let's start working on the CIs * style * better style * more nits * test nit * quick fix for now * nits * nit * nit * nit * nits * update test rest * fixup * update test * nit * some fixes * nits * update test values * fix styling * nit * support peft * integrations tests require torchg * also add slow markers * styling * chose forward wisely * nits * update tests * fix gradient checkpointing * fixup * nit * fix doc * check copies * fix the docstring * fix some more tests * style * fix beam search * add init schene * update * nit * fix * fixup the doc * fix the doc * fixup * tentative update but slow is no longer good * nit * should we always use float32? * nits * revert wrong changes * res in float32 * cleanup * skip fmt for now * update generation values * update test values running original model * fixup * update tests + rename inference_params to cache_params + make sure training does not use cache_params * small nits * more nits * fix final CIs * style * nit doc * I hope final doc nits * nit * 🫠 * final touch! * fix torch import * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * Apply suggestions from code review * fix fix and fix * fix base model prefix! * nit * Update src/transformers/models/mamba/__init__.py * Update docs/source/en/model_doc/mamba.md Co-authored-by: Lysandre Debut <hi@lysand.re> * nit --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-03-05 20:01:06 +09:00
NielsRogge	836921fdeb	Add UDOP (#22940 ) * First draft * More improvements * More improvements * More fixes * Fix copies * More improvements * More fixes * More improvements * Convert checkpoint * More improvements, set up tests * Fix more tests * Add UdopModel * More improvements * Fix equivalence test * More fixes * Redesign model * Extend conversion script * Use real inputs for conversion script * Add image processor * Improve conversion script * Add UdopTokenizer * Add fast tokenizer * Add converter * Update README's * Add processor * Add fully fledged tokenizer * Add fast tokenizer * Use processor in conversion script * Add tokenizer tests * Fix one more test * Fix more tests * Fix tokenizer tests * Enable fast tokenizer tests * Fix more tests * Fix additional_special_tokens of fast tokenizer * Fix tokenizer tests * Fix more tests * Fix equivalence test * Rename image to pixel_values * Rename seg_data to bbox * More renamings * Remove vis_special_token * More improvements * Add docs * Fix copied from * Update slow tokenizer * Update fast tokenizer design * Make text input optional * Add first draft of processor tests * Fix more processor tests * Fix decoder_start_token_id * Fix test_initialization * Add integration test * More improvements * Improve processor, add test * Add more copied from * Add more copied from * Add more copied from * Add more copied from * Remove print statement * Update README and auto mapping * Delete files * Delete another file * Remove code * Fix test * Fix docs * Remove asserts * Add doc tests * Include UDOP in exotic model tests * Add expected tesseract decodings * Add sentencepiece * Use same design as T5 * Add UdopEncoderModel * Add UdopEncoderModel to tests * More fixes * Fix fast tokenizer * Fix one more test * Remove parallelisable attribute * Fix copies * Remove legacy file * Copy from T5Tokenizer * Fix rebase * More fixes, copy from T5 * More fixes * Fix init * Use ArthurZ/udop for tests * Make all model tests pass * Remove UdopForConditionalGeneration from auto mapping * Fix more tests * fixups * more fixups * fix the tokenizers * remove un-necessary changes * nits * nits * replace truncate_sequences_boxes with truncate_sequences for fix-copies * nit current path * add a test for input ids * ids that we should get taken from `c9f7a32f57` * nits converting * nits * apply ruff * nits * nits * style * fix slow order of addition * fix udop fast range as well * fixup * nits * Add docstrings * Fix gradient checkpointing * Update code examples * Skip tests * Update integration test * Address comment * Make fixup * Remove extra ids from tokenizer * Skip test * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update year * Address comment * Address more comments * Address comments * Add copied from * Update CI * Rename script * Update model id * Add AddedToken, skip tests * Update CI * Fix doc tests * Do not use Tesseract for the doc tests * Remove kwargs * Add original inputs * Update casting * Fix doc test * Update question * Update question * Use LayoutLMv3ImageProcessor * Update organization * Improve docs * Update forward signature * Make images optional * Remove deprecated device argument * Add comment, add add_prefix_space * More improvements * Remove kwargs --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-04 18:49:02 +01:00
NielsRogge	5e4b69dc12	Convert SlimSAM checkpoints (#28379 ) * First commit * Improve conversion script * Convert more checkpoints * Update src/transformers/models/sam/convert_sam_original_to_hf_format.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Rename file * More updates * Update docstring * Update script --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-04 11:51:16 +01:00

1 2 3 4 5 ...

844 Commits