Raushan Turganbay
23874f5948
Idefics: enable generation tests ( #34062 )
...
* add idefics
* conflicts after merging main
* enable tests but need to fix some
* fix tests
* no print
* fix/skip some slow tests
* continue not skip
* rebasing broken smth, this is the fix
2024-10-15 11:17:14 +02:00
Joao Gante
295a90cb40
Generate: remove most decoder-only LLMs prepare_inputs_for_generation
( #33870 )
2024-10-09 12:15:48 +01:00
Mohamed Abu El-Nasr
cdee5285ca
Fix Failed tests with mobile bert resize tokens embedding ( #33950 )
...
* Fix Failed tests with mobile bert
* Cast to the correct dtype
* Code fixup
* Fix padding_idx larger that embedding_size
* Reduce covariance more. use 1e-7 instead of 1e-5
* Comment fix
* Reduce covariance more. use 1e-9 instead of 1e-7
* Copy new config
* all but MRA fixed
* fix mra
* very flaky
* skip instead
* make fixup
---------
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-10-09 11:23:50 +01:00
Raushan Turganbay
612065efeb
Paligemma: fix static cache test ( #33941 )
...
* fix
* not flaky anymore + style
2024-10-05 09:47:37 +02:00
Longjie Zheng
0d1692a49b
Fix attn mask ignore logic in training-time trace ( #32613 )
...
* fix attn mask logic for training-time trace
* add test
* fix
* fix
* fix
* fix
* fix
* format
* [run-slow] llama
* avoid accelearate
* [run-slow] llama
2024-10-04 19:00:45 +02:00
Mohamed Abu El-Nasr
78ef58325c
🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. ( #33325 )
...
* intilize new embeddings from normal distrib
* Fix typo in comments
* Fix typo in comments
* Fix style
* Fix variables naming
* Add tests
* Fix style
* code consistency nit
* Add deepspeed support
* Add deepspeed support
* Conver embeddings weights to float32 before computations
* Add deepspeed tests
* Cover when vocab_size is smaller than embedding_size
* Style fix
* Add tests for vocab_size smaller than hiddin_size
* Style fix
* Nits in tests
* Nits in tests
* Check for deepspeed before importing it
* Increase vocab_size for positive definite covariance matrix test
* Add warning
* Add multivariate_resizing flag and implement resizing for lm_heads
* Fix typo
* Fix wrong bias indexing
* Fix bias is zero check
* remove multivariate_resizing flag from tests
* Intialize bias from old bias normal distribution
* Fixup
* Code usability
* Use mean_resizing instead of multivariate_resizing
* Fix up
* Fix comments and docs
2024-10-04 16:29:55 +02:00
Joao Gante
6f0ce52760
VLM Generate: tag test_static_cache_matches_dynamic
as flaky ( #33630 )
...
flaky
2024-10-03 12:27:02 +01:00
Arthur
19d58d31f1
Add MLLama ( #33703 )
...
* current changes
* nit
* Add cross_attenttion_mask to processor
* multi-image fixed
* Add cross_attenttion_mask to processor
* cross attn works in all cases
* WIP refactoring function for image processor
* WIP refactoring image processor functions
* Refactor preprocess to use global loops instead of list nested list comps
* Docstrings
* Add channels unification
* fix dtype issues
* Update docsrings and format
* Consistent max_image_tiles
* current script
* updates
* Add convert to rgb
* Add image processor tests
* updates!
* update
* god damn it I am dumb sometimes
* Precompute aspect ratios
* now this works, full match
* fix 😉
* nits
* style
* fix model and conversion
* nit
* nit
* kinda works
* hack for sdpa non-contiguous bias
* nits here and there
* latest c hanges
* merge?
* run forward
* Add aspect_ratio_mask
* vision attention mask
* update script and config variable names
* nit
* nits
* be able to load
* style
* nits
* there
* nits
* make forward run
* small update
* enable generation multi-turn
* nit
* nit
* Clean up a bit for errors and typos
* A bit more constant fixes
* 90B keys and shapes match
* Fix for 11B model
* Fixup, remove debug part
* Docs
* Make max_aspect_ratio_id to be minimal
* Update image processing code to match new implementation
* Adjust conversion for final checkpoint state
* Change dim in repeat_interleave (accordig to meta code)
* tmp fix for num_tiles
* Fix for conversion (gate<->up, q/k_proj rope permute)
* nits
* codestyle
* Vision encoder fixes
* pass cross attn mask further
* Refactor aspect ratio mask
* Disable text-only generation
* Fix cross attention layers order, remove q/k norm rotation for cross atention layers
* Refactor gated position embeddings
* fix bugs but needs test with new weights
* rope scaling should be llama3
* Fix rope scaling name
* Remove debug for linear layer
* fix copies
* Make mask prepare private func
* Remove linear patch embed
* Make precomputed embeddings as nn.Embedding module
* MllamaPrecomputedAspectRatioEmbedding with config init
* Remove unused self.output_dim
* nit, intermediate layers
* Rename ln and pos_embed
* vision_chunk_size -> image_size
* return_intermediate -> intermediate_layers_indices
* vision_input_dim -> hidden_size
* Fix copied from statements
* fix most tests
* Fix more copied from
* layer_id->layer_idx
* Comment
* Fix tests for processor
* Copied from for _prepare_4d_causal_attention_mask_with_cache_position
* Style fix
* Add MllamaForCausalLM
* WIP fixing tests
* Remove duplicated layers
* Remove dummy file
* Fix style
* Fix consistency
* Fix some TODOs
* fix language_model instantiation, add docstring
* Move docstring, remove todos for precomputed embeds (we cannot init them properly)
* Add initial docstrings
* Fix
* fix some tests
* lets skip these
* nits, remove print, style
* Add one more copied from
* Improve test message
* Make validate func private
* Fix dummy objects
* Refactor `data_format` a bit + add comment
* typos/nits
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* fix dummy objects and imports
* Add chat template config json
* remove num_kv_heads from vision attention
* fix
* move some commits and add more tests
* fix test
* Remove `update_key_name` from modeling utils
* remove num-kv-heads again
* some prelimiary docs
* Update chat template + tests
* nit, conversion script max_num_tiles from params
* Fix warning for text-only generation
* Update conversion script for instruct models
* Update chat template in converstion + test
* add tests for CausalLM model
* model_max_length, avoid null chat_template
* Refactor conversion script
* Fix forward
* Fix integration tests
* Refactor vision config + docs
* Fix default
* Refactor text config
* Doc fixes
* Remove unused args, fix docs example
* Squashed commit of the following:
commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date: Wed Sep 18 13:39:15 2024 +0000
Move model + add output hidden states and output attentions
* Fix num_channels
* Add mllama text and mllama vision models
* Fixing repo consistency
* Style fix
* Fixing repo consistency
* Fixing unused config params
* Fix failed tests after refactoring
* hidden_activation -> hidden_act for text mlp
* Remove from_pretrained from sub-configs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Reuse lambda in conversion script
* Remove run.py
* Update docs/source/en/model_doc/mllama.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/processing_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused LlamaTokenizerFast
* Fix logging
* Refactor gating
* Remove cycle for collecting intermediate states
* Refactor text-only check, add integration test for text-only
* Revert from pretrained to configs
* Fix example
* Add auto `bos_token` adding in processor
* Fix tips
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Enable supports_gradient_checkpointing model flag
* add eager/sdpa options
* don't skip attn tests and bring back GC skips (did i really remove those?)
* Fix signature, but get error with None gradient
* Fix output attention tests
* Disable GC back
* Change no split modules
* Fix dropout
* Style
* Add Mllama to sdpa list
* Add post init for vision model
* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model
* if skipped, say it, don't pass
* Clean vision tester config
* Doc for args
* Update tests/models/mllama/test_modeling_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add cross_attention_mask to test
* typehint
* Remove todo
* Enable gradient checkpointing
* Docstring
* Style
* Fixing and skipping some tests for new cache
* Mark flaky test
* Skip `test_sdpa_can_compile_dynamic` test
* Fixing some offload tests
* Add direct GenerationMixin inheritance
* Remove unused code
* Add initializer_range to vision config
* update the test to make sure we show if split
* fix gc?
* Fix repo consistency
* Undo modeling utils debug changes
* Fix link
* mllama -> Mllama
* [mllama] -> [Mllama]
* Enable compile test for CausalLM model (text-only)
* Fix TextModel prefix
* Update doc
* Docs for forward, type hints, and vision model prefix
* make sure to reset
* fix init
* small script refactor and styling
* nit
* updates!
* some nits
* Interpolate embeddings for 560 size and update integration tests
* nit
* does not suppor static cache!
* update
* fix
* nit2
* this?
* Fix conversion
* Style
* 4x memory improvement with image cache AFAIK
* Token decorator for tests
* Skip failing tests
* update processor errors
* fix split issues
* style
* weird
* style
* fix failing tests
* update
* nit fixing the whisper tests
* fix path
* update
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Fanli Lin
8bd1f2f338
[tests] make more tests device-agnostic ( #33580 )
...
* enable
* fix
* add xpu skip
* add marker
* skip for xpu
* add more
* enable on accelerator
* add more cases
* add more tests
* add more
2024-09-20 10:16:43 +01:00
Fanli Lin
b87755aa6d
[tests] skip tests for xpu ( #33553 )
...
* enable
* fix
* add xpu skip
* add marker
* skip for xpu
* add more
* add one more
2024-09-19 19:28:04 +01:00
Raushan Turganbay
d7975a5874
VLMs: enable generation tests ( #33533 )
...
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-09-19 12:04:24 +02:00
Wang, Yi
74026b473e
idefics2 enable_input_require_grads not aligned with disable_input_re… ( #33194 )
...
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* split test case
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix ci failure
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refine test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-09-17 10:39:34 +01:00
Raushan Turganbay
65bb284448
Compile compatibilty for decoder-only models ( #32617 )
...
* squash into one commit
* add qwen2-vl for rope standardization
* fix mistral compile
* fix qwen2-vl
* fix-copies
2024-09-09 10:59:04 +02:00
Raushan Turganbay
43df47d8e7
Llava Onevision: add model ( #32673 )
...
* working version
* fix copies
* update
* tests
* update docs
* codestyle
* add more tests
* add returns for docs
* clean up
* Update src/transformers/models/llava_onevision/processing_llava_onevision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates
* codestyle
* style
* shouldn't be reversed
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* add pooling in videos
* [run-slow] llava_onevision
* num-logits-to-keep
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* video matched orig impl
* fix tests
* chat template was modified
* Update docs/source/en/model_doc/llava_onevision.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add morer info in the doc page
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-05 14:43:20 +05:00
Joao Gante
d750b509fc
Config: unified logic to retrieve text config ( #33219 )
2024-09-04 12:03:30 +01:00
Joao Gante
894d421ee5
Test: add higher atol
in test_forward_with_num_logits_to_keep
( #33093 )
2024-08-26 15:23:30 +01:00
Cyril Vallez
22e6f14525
Reducing memory usage: removing useless logits computation in generate() ( #31292 )
...
* Add .float() in all generation methods logit outputs
* Switch float-casting of logits to training only for main models
* Add `num_logits_to_keep` in Llama and add it by default in generate
* Apply style
* Add num_logits_to_keep as arg in prepare_input_for_generation
* Add support for Mistral
* Revert models except llama and mistral
* Fix default None value in _supports_num_logits_to_keep()
* Fix dimension of dummy input
* Add exception for prophetnet in _supports_num_logits_to_keep()
* Update _supports_num_logits_to_keep() to use inspect.signature()
* Add deprecation cycle + remove modification with pretraining_tp
* Apply style
* Add most used models
* Apply style
* Make `num_logits_to_keep` an int in all cases to remove if-else clause
* Add compile check for the warning
* Fix torch versions
* style
* Add gemma2
* Update warning version
* Add comment about .float operations in generation utils
* Add tests in GenerationTesterMixin and ModelTesterMixin
* Fix batch size for assisted decoding in tests
* fix small issues in test
* refacor test
* fix slicing removing dim issue
* Add nemotron support (should fix check-copy issue in CIs)
* Trigger new CIs
* Trigger new CIs
* Bump version
* Bump version in TODO
* Trigger CIs
* remove blank space
* Trigger CIs
2024-08-23 11:08:34 +01:00
Fanli Lin
e55b33ceb4
[tests] make test_sdpa_can_compile_dynamic
device-agnostic ( #32519 )
...
* enable
* fix
2024-08-19 12:46:59 +01:00
Pablo Montalvo
c1357834e8
Fix tests recurrent ( #32651 )
...
* add fix for recurrentgemma
* [no-filter]
* trigger-ci
* [no-filter]
* [no-filter]
* attempt to fix mysterious zip error
* [no-filter]
* fix lookup error
* [no-filter]
* remove summarization hack
* [no-filter]
2024-08-13 23:40:50 +02:00
Pablo Montalvo
a5a8291ad1
Fix tests ( #32649 )
...
* skip failing tests
* [no-filter]
* [no-filter]
* fix wording catch in FA2 test
* [no-filter]
* trigger normal CI without filtering
2024-08-13 09:46:21 +01:00
Raushan Turganbay
8f2b6d5e3d
Fix: FA2 with packed training ( #32487 )
...
* fix check
* add tests
* [run-slow] llama, gemma2
* oops, whisper actually runs but needed some special treatment
2024-08-12 13:40:07 +05:00
Pablo Montalvo
044281605f
Fix generate with inputs_embeds
as input ( #32493 )
...
* I think inputs_embeds has ndim == 3
* fix sequence length catch
* add generate test
* [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama
* skip whisper
* fix bart test
* more fixes
2024-08-08 18:44:53 +02:00
Raushan Turganbay
a30c865f99
Cache: new Cache format in decoder-only models ( #31421 )
...
* draft bart with new cache
* add cache for decoder-only models
* revert utils
* modify docstring
* revert bart
* minor fixes
* fix copies (not related)
* revert tests
* remove enc-dec related code
* remove bloom
* remove opt (enc-dec)
* update docstring
* git, codegen, gpt_neo, gpt_neox, gpj
* clean up
* copied from statements
* revert
* tmp
* update warning msg
* forgot git
* add more flags
* run-slow git,codegen,gpt_neo,gpt_neox,gpj
* add cache flag to VLMs
* remove files
* style
* video LLMs also need a flag
* style
* llava will go in another PR
* style
* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copy from
* deprecate until v4.45 and warn if not training
* nit
* fix test
* test static cache
* add more tests and fix models
* fix copies
* return sliding window mask
* run slow tests & fix + codestyle
* one more falcon fix for alibi
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-07 10:02:16 +05:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models ( #32227 )
...
* draft
* apply changes to all relevant archs
* rerun ci - check_docstrings.py failing?
* fix docstring
* move 2D->4D mask creation to modeling file
* repo consistency
* fix the batch size = 1 case - calling contiguous is not enough
* nit
* style
* propagate to gemma/gemma-2
* prepare inputs for gemma generation
* implement test and tiny fix in gemma2
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* ci pass
* fix gemma's test_compile_static_cache tests
* flacky
* retrigger ci
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
Raushan Turganbay
81233c069c
Flash-Attn: fix generation when no attention mask or no pading ( #32241 )
...
* fix
* fix prev test (half of failures)
* [run-slow] llama, gemma2
* [run-slow] llama, gemma2
2024-07-26 14:45:55 +05:00
RhuiDih
9cf4f2aa9a
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs ( #31629 )
...
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
2024-07-23 15:56:41 +02:00
bayllama
5a4a76edb7
Modify resize_token_embeddings to ensure output type is same as input ( #31979 )
...
* Change resize_token_embeddings to make it return same Class that is passed to it
* Add explanatory comment as requested in review
* Add explanatory comments for add resizing function in lxmert
* Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining
---------
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net>
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>
2024-07-23 10:28:44 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model ( #31534 )
...
* Chameleon model integration
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
* fix 7B, again. mask away image tokens
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove pretrained_config_map
* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file
* remove tokenizer (use llama's); remove codechameleon tests
* a few copied from statements and minor changes
* copied from in ChameleonModel
* some copies in ChameleonForCausalLM
* a few more copies
* VQModel moved to ChameleonModel (as opposed to being in the processor)
* ChameleonProcessor ready
* Fix chameleon weights convert
* update conversion script
* clean-up processing
* update modeling a bit
* update
* update (throws error...)
* correct conversion ready
* fix tests
* fix docs
* docs
* ve swin norm
* fix device for vocab map
* add normalization
* update
* update script with rope rotations
* final fix on model conversion
* add slow tests
* more info in docs
* fix repo consistency tests
* fix repo tests
* fix-copies
* hope this will make CI happy
* fix for 30b model
* Update docs/source/en/index.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments
* remove assertion in conversion script
* add image processor test
* not copied
* port changes for qk layernorm
* fix-copies
* read token decorator for tests
* [run-slow] chameleon
* one more read-token
* address some comments
* qk norm changes
* tests and repo check
* moved rope permutations to conversion, YAY!
* fix past kv check
* docs
* layernorm done!
* let's be consistent in naming
* fix slow tests
* weird thing with slow CI, but let's see
* once more try
* remove past-kv as tuple following llama
* ignore
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Joao Gante
e4682de635
Masking: remove flakiness from test ( #31939 )
2024-07-15 18:49:37 +01:00
Naman Garg
c1e139c2b0
Adding hiera ( #30356 )
...
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* Removed tim dependency
* added HieraBlock
* fixed: Model name
* added tests for HieraModel, HieraBlock
* fixed imports
* fixed quality & copies
* Fixes
* Update docs/source/en/model_doc/hiera.md
Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed formatting
* Code quality & Import differences
* quality and repo-consistency fix
* fixed no torch error
* Docstring fix
* Docstring fix
* doc string fix
* fixed example usage
* Resolved issues in modeling_hiera
* Removed Hiera MAE
* Added test and resolved bug
* fixed doc string
* First commit
* Finished conversion script and model forward working
* Resolved all issues
* nits
* Improving tests
* Nits
* More nits
* Improving HieraForMaskedImageModeling
* More improvements and nits
* Fixed docstrings of outputs
* More fixes
* More imrpovments
* Updated conversion script
* Fixed docstrings
* Improved tests
* Fixed attentou outputs test
* All tests green
* Removed unnecessary file
* contribution attribution
* Resolved a few issues
* Resolved Comments
* Updated model repo id and fixed bugs
* Removed loss print
* Make tests green
* Updated docstrings
* Fix style
* Fixed num_heads in config
* Removed unnecessary video checkpoint related code in the conversion script
* Fix style
* Changed atol in conversion script
* HieraConfig
* Fix copies
* Fixed typo
* Resolved few issues
* make
* converted conv_nd -> nn.Module
* Removed video complexities
* Removed video complexities
* fix style
* Addressing comments
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix style
* Fixed tests
* Fixed typo
* Fixed interpolate test
* Made torch fx compatible
* Made sure imageprocesor is correct
* Addressed comments
* Noise directly as torch
* Remove unnecesary attr
* Added return_dit
* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated checkpoints
* [run_slow] hiera
* Fixed device mismatch
* [run_slow] hiera
* Fixed GPU tests
* [run_slow] hiera
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
fxmarty
ad4ef3a290
Fix fx tests with inputs_embeds ( #31862 )
...
* fix tests
* [test_all] check
* address review comments
2024-07-11 20:14:03 +08:00
Yih-Dar
9d98706b3f
Fix failed tests in #31851 ( #31879 )
...
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868 )"
This reverts commit b45dd5de9c
.
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-10 14:25:24 +02:00
Yih-Dar
b45dd5de9c
Revert "Fix _init_weights
for ResNetPreTrainedModel
" ( #31868 )
...
Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851 )"
This reverts commit 4c8149d643
.
2024-07-09 23:00:56 +02:00
Yih-Dar
4c8149d643
Fix _init_weights
for ResNetPreTrainedModel
( #31851 )
...
* init
* test
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-09 20:09:08 +02:00
fxmarty
0abf5e8eae
FX symbolic_trace: do not test decoder_inputs_embeds ( #31840 )
...
only test input_embeds, not decoder_input_embeds
2024-07-09 08:07:46 +02:00
fxmarty
ba743700f4
transformers.fx.symbolic_trace supports inputs_embeds ( #31574 )
...
* symbolic trace supports inputs_embeds
* fix test?
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-08 19:17:28 +08:00
Yih-Dar
93cd94b79d
Move some test files (tets/test_xxx_utils.py
) to tests/utils
( #31730 )
...
* move
* move
* move
* move
* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-02 13:46:03 +02:00
Arthur
0cf60f13ab
Add gemma 2 ( #31659 )
...
* inital commit
* Add doc
* protect?
* fixup stuffs
* update tests
* fix build documentation
* mmmmmmm config attributes
* style
* nit
* uodate
* nit
* Fix docs
* protect some stuff
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Pavel Iakubovskii
3c2d4d60d7
Correct @is_flaky test decoration ( #31480 )
...
* Correct @is_flaky decorator
2024-06-24 08:09:21 +01:00
amyeroberts
25245ec26d
Rename test_model_common_attributes -> test_model_get_set_embeddings ( #31321 )
...
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
2024-06-07 19:40:26 +01:00
Fanli Lin
04c7c176d7
[tests] make test_model_parallelism
device-agnostic ( #30844 )
...
* enable on xpu
* fix style
* add comment and mps
2024-05-24 11:51:51 +01:00
Poedator
6739e1d261
test_custom_4d_attention_mask skip with sliding window attn ( #30833 )
2024-05-23 15:22:10 +02:00
Marc Sun
5c186003b8
Fix low cpu mem usage tests ( #30808 )
...
* Fix tests
* fix udop failing test
* remove skip
* style
2024-05-22 14:09:01 +02:00
Benjamin Warner
cd6bd0af34
Add support for torch.compile dynamic shapes ( #30560 )
...
* add torch.compile dynamic support
* Add SDPA dynamic shapes compile test & improve SDPA comment
* comment consistency
2024-05-20 10:36:57 +02:00
Yih-Dar
1b3dba9417
Make Gemma
work with torch.compile
( #30775 )
...
* fix
* [run-slow] gemma
* add test
* add `test_compile_static_cache`
* fix
* style
* remove subprocess
* use attribute
* fix
* style
* update
* [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-16 13:41:33 +02:00
Joao Gante
9d889f870e
Cache: add new flag to distinguish models that Cache
but not static cache ( #30800 )
...
* jamba cache
* new flag
* generate exception
2024-05-16 12:08:35 +01:00
hyenal
1c21f48a50
add sdpa to ViT [follow up of #29325 ] ( #30555 )
...
remove blank line (+1 squashed commit)
Squashed commits:
[24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
Squashed commits:
[08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
[ec96a8db3] [run-slow]vit_msn
[ead817eca] fix vit msn multi gpu
[d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[3fdbfa88f] doc
[a3ff33e4a] finish implementation
[e20b7b7fb] Update test_modeling_common.py
[e290c5810] Update test_modeling_flax_common.py
[d3af86f46] comment
[ff7dd32d8] more comments
[59b137889] suggestion
[7e2ba6d67] attn_implementation as attribute of the class
[fe66ab71f] minor
[38642b568] Apply suggestions from code review
Accept comments
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[22cde7d52] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[48e137cc6] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[99f4c679f] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[61f00ebb0] all tests are passing locally
[e9e0b82b7] vision encoder/decoder
[4d5076b56] test-vision (+20 squashed commits)
Squashed commits:
[d1add8db9] yolo
[9fde65716] fix flax
[986566c28] minor
[ca2f21d1f] vit
[3333efd7a] easy models change
[ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[48ecc7e26] all tests are passing locally
[bff7fc366] minor
[62f88306f] fix yolo and text_encoder tests
[121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[cffaa10dd] fix-copies
[ef6c511c4] test vit hybrid
[7d4ba8644] vit hybrid
[66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1fcc0a031] fixes
[cfde6eb21] fixup
[e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
Squashed commits:
[602913e22] vit + vit_mae are working
[547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes
[61a97dfa9] it s the complete opposite...
[aefab37d4] fix more tests
[71802a1b9] fix all torch tests
[40b12eb58] encoder - decoder tests
[941552b69] slow decorator where appropriate
[14d055d80] has_attentions to yolo and msn
[3381fa19f] add correct name
[e261316a7] repo consistency
[31c6d0c08] fixup
[9d214276c] minor fix
[11ed2e1b7] chore
[eca6644c4] add sdpa to vit-based models
[cffbf390b] make fix-copies result
[6468319b0] fix style
[d324cd02a] add sdpa for vit
Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com>
2024-05-16 10:56:11 +01:00
Edoardo Cetin
4b3eb19fa7
Fix llama model sdpa attention forward function masking bug when output_attentions=True ( #30652 )
...
* Fix llama model forward function with attention=True, same-length encoded sequence.
* Fix style
* propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama)
* Fix style
* ignore unnecessary sdpa mask converter when output_attentions=True
* add tests checking sdpa and eager outputs match when output_attentions=True
* Split if statements in two lines
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix formatting
* Add fix to new jetmoe model
* Add missing output_attentions argument to jetmoe mask creation
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-15 19:48:19 +02:00
Raushan Turganbay
bd9f4d7951
Add Video Llava ( #29733 )
...
* add model draft
* update docstring
* add tests
* support image and video as input
* update for better handling of mixed input and clean-up a bit
* bug when mixed inputs & add tests
* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Merge remote-tracking branch 'upstream/main' into video_llava
* link to abstract of paper in README
* fix test
* fix-copies
* make tests happy
* skip docstest for now
* do not run doctest for now
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address review comments
* failing tests
* Fix vocab_size in common tests for VLMs
* codestyle
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* PR suggestions
* fix-copies
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add full example in docs
* clean-up with new model-id
* [run-slow] video_llava
* update docstring
* [run-slow] video_llava
* remove all achive maps
* fix some tests
* test was supposed to be skipped for llava :)
---------
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 16:42:29 +05:00