Raushan Turganbay
612065efeb
Paligemma: fix static cache test ( #33941 )
...
* fix
* not flaky anymore + style
2024-10-05 09:47:37 +02:00
Joao Gante
38f9f10dd9
Cache: revert DynamicCache init for BC ( #33861 )
...
* tmp commit
* tmp commit
* make fixup
* missing removal
* fix condition
* fix end-to-end compilation
* if -> elif
* BC
* BC
* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")
* wups the import
* 🥴
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-04 22:47:08 +02:00
Arthur
f92d354823
fix red check-copies ( #33964 )
2024-10-04 22:45:37 +02:00
pglorio
f319ba16fa
Add Zamba ( #30950 )
...
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* Moved mamba init into `_init_weights`
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Moved mamba init into `_init_weights`
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* make fixup fixes
* quality test fixes
* Fix Zamba model path
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* Update
* circleci fixes
* fix zamba test from merge
* fix ValueError for disabling mamba kernels
* add HF copyright
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* shared_transf --> shared_transformer
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fixes
* Move attention head dim to config
* Fix circle/ci tests
* Update modeling_zamba.py
* apply GenerationMixin inheritance change from upstream
* apply import ordering
* update needed transformers version for zamba
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add contribution author
* add @slow to avoid CI
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Define attention_hidden_size
* Added doc for attention_head_size
* trigger CI
* Fix doc of attention_hidden_size
* [run-slow] zamba
* Fixed shared layer logic, swapped up<->gate in mlp
* shared_transformer -> shared_transf
* reformat HybridLayer __init__
* fix docstrings in zamba config
* added definition of _get_input_ids_and_config
* fixed formatting of _get_input_ids_and_config
---------
Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
2024-10-04 22:28:05 +02:00
Amit Garg
e3775539c8
PhiMoE ( #33363 )
...
* onboard phimoe model
* removed debug code
* added unit tests
* updated docs
* formatted
* fixed unit tests
* fixed test case
* fixed format
* refactored code
* fixed expected outputs in the integration tests
* Added a warning msg
* Addressed comments
* Addressed comments
* fixed test cases
* added paper link
* Addressed comments
* Refactored PhimoeForCausalLM forward fn
* Refactored PhimoeRotaryEmbedding class
* fixed test cases
* fixed testcase
* fixed test case
* Addressed comments
* fixed test cases
* fixed testcases
* Used cache position instead to get the seq len
2024-10-04 21:39:45 +02:00
Longjie Zheng
0d1692a49b
Fix attn mask ignore logic in training-time trace ( #32613 )
...
* fix attn mask logic for training-time trace
* add test
* fix
* fix
* fix
* fix
* fix
* format
* [run-slow] llama
* avoid accelearate
* [run-slow] llama
2024-10-04 19:00:45 +02:00
Yoach Lacombe
124713c32b
Fix distil whisper segment computation ( #33920 )
...
* Fix distil whisper segment computation
* [run-slow] whisper
2024-10-04 11:18:01 +02:00
Yoni Gozlan
074aa3b3fd
Uniformize kwargs for Idefics/2 processors ( #32568 )
...
* Add uniformize idefics processor kwargs and tests
* Uniformize idefics2 processor kwargs
* add image_processor tests idefics
* add BC args order change idefics2 processor and update doc
* Add support for multiple images per prompt in image-text-to-text mode idefics
* Fix processor input args in idefics tests
* improve test processing common, remove unnecessary tests, update process uniformization
* fix doctrings idefics
* fix tests processors idefics/2
2024-10-03 18:08:24 +02:00
Yoach Lacombe
bf0ffe3d29
[Tests] Diverse Whisper fixes ( #33665 )
...
* fix beam indices in token_timestamps
* fix attention_mask in FA2
* correct translation example with the right example
* correct how somes tests are using outputs + correct num_frames
* fix shortform batch prev cond tests
* make fix-copies
* make fix-copies
* take care of shifting beam indices
* [run-slow] whisper
* [run-slow] whisper
2024-10-03 15:59:01 +02:00
Joao Gante
d29738f5b4
Generate tests: modality-agnostic input preparation ( #33685 )
2024-10-03 14:01:24 +01:00
Arie Pratama Sutiono
f2bf4fcf3d
Add SplinterTokenizer
unit test ( #32652 )
...
* add unit tests for splinter_tokenizer
* add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called
* remove unused import
* remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines
* remove all the spaces
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-03 14:49:56 +02:00
Yoni Gozlan
d7950bff82
uniformize processor Mllama ( #33876 )
...
* uniformize processor Mllama
* nit syntax
* nit
2024-10-02 16:50:15 +02:00
Yoni Gozlan
62e8c759c3
rename all test_processing_*.py to test_processor_*.py ( #33878 )
...
* rename all test_processing_*.py to test_processor_*.py ans fix duplicate test processor paligemma
* fix copies
* fix broken tests
* fix-copies
* fix test processor bridgetower
2024-10-02 16:43:43 +02:00
Pablo Montalvo
50290cf7a0
Uniformize model processors ( #31368 )
...
* add initial design for uniform processors + align model
* add uniform processors for altclip + chinese_clip
* add uniform processors for blip + blip2
* fix mutable default 👀
* add configuration test
* handle structured kwargs w defaults + add test
* protect torch-specific test
* fix style
* fix
* rebase
* update processor to generic kwargs + test
* fix style
* add sensible kwargs merge
* update test
* fix assertEqual
* move kwargs merging to processing common
* rework kwargs for type hinting
* just get Unpack from extensions
* run-slow[align]
* handle kwargs passed as nested dict
* add from_pretrained test for nested kwargs handling
* [run-slow]align
* update documentation + imports
* update audio inputs
* protect audio types, silly
* try removing imports
* make things simpler
* simplerer
* move out kwargs test to common mixin
* [run-slow]align
* skip tests for old processors
* [run-slow]align, clip
* !$#@!! protect imports, darn it
* [run-slow]align, clip
* [run-slow]align, clip
* update common processor testing
* add altclip
* add chinese_clip
* add pad_size
* [run-slow]align, clip, chinese_clip, altclip
* remove duplicated tests
* fix
* add blip, blip2, bridgetower
Added tests for bridgetower which override common. Also modified common
tests to force center cropping if existing
* fix
* update doc
* improve documentation for default values
* add model_max_length testing
This parameter depends on tokenizers received.
* Raise if kwargs are specified in two places
* fix
* removed copied from
* match defaults
* force padding
* fix tokenizer test
* clean defaults
* move tests to common
* add missing import
* fix
* adapt bridgetower tests to shortest edge
* uniformize donut processor + tests
* add wav2vec2
* extend common testing to audio processors
* add testing + bert version
* propagate common kwargs to different modalities
* BC order of arguments
* check py version
* revert kwargs merging
* add draft overlap test
* update
* fix blip2 and wav2vec due to updates
* fix copies
* ensure overlapping kwargs do not disappear
* replace .pop by .get to handle duplicated kwargs
* fix copies
* fix missing import
* add clearly wav2vec2_bert to uniformized models
* fix copies
* increase number of features
* fix style
* [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert
* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert
* fix concatenation
* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert
* Update tests/test_processing_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* 🧹
* address comments
* clean up + tests
* [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-02 10:41:08 +02:00
Yoni Gozlan
61ac161a9d
Add support for custom inputs and batched inputs in ProcessorTesterMixin ( #33711 )
...
* add support for custom inputs and batched inputs in ProcessorTesterMixin
* Fix batch_size behavior ProcessorTesterMixin
* Change format prepare inputs batched
* Remove override test pixtral processor
* Remove unnecessary tests and cleanup after new prepare_inputs functions
* Fix instructBlipVideo image processor
2024-10-01 23:52:03 +02:00
Prakarsh Kaushik
68a2b50069
[Fix] ViViT interpolate_pos_encoding ( #33815 )
...
* fix:test_inference_interpolate_pos_encoding
* style:make style;make fixup
* test: add suggestion to test_modeling_vivit
* chore:add suggestions
* style:make style
* [run_slow] vivit
* ci:slow test fix
* [run_slow] vivit
2024-10-01 20:14:35 +01:00
Adibvafa Fallahpour
c269c5c74d
Fix Mamba slow path bug with dtype mismatch. ( #32691 )
...
* Fix Mamba slow path bug with dtype mismatch.
* Update test_modeling_mamba.py
* Improve style.
* Fix issue with cache position of dtype mismatch test.
* Change test for slow path.
* Revert changes.
* Switch to buggy code and add test to catch it.
* Fix the dtype mismatch bug and add test code to verify it.
* Fix minor bug with test.
* Fix incorrect dtype of model output.
* Fix incorrect dtype of cache.
* Fix incorrect dtype of ssm cache.
* Fix incorrect dtype of conv state.
* Remove assertion for ssm state.
* Add assertion for conv state dtype.
* Fix all issues with dtype mismatch test.
2024-10-01 09:28:40 +02:00
Joshua Lochner
18c5b216f1
Fix ViT-MAE decoder interpolate ( #33330 )
...
* Fix ViT-MAE decoder interpolate
* Add unit test for `interpolate_pos_encoding` w/ custom sizes
* [run_slow] vit_mae
2024-09-30 18:47:13 +02:00
Raushan Turganbay
3e039d3827
Paligemma support for multi-image ( #33447 )
...
* upadte
* Update src/transformers/models/paligemma/processing_paligemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update docs
* better example in tests
* support image tokens
* read token
* Update tests/models/paligemma/test_processing_paligemma.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* nit: naming
* Update docs/source/en/model_doc/paligemma.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* conflicts after rebasing
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-09-27 11:23:14 +02:00
Ita Zaporozhets
6730485b02
clean_up_tokenization_spaces=False if unset ( #31938 )
...
* clean_up_tokenization_spaces=False if unset
* deprecate warning
* updating param for old models
* update models
* make fix-copies
* fix-copies and update bert models
* warning msg
* update prophet and clvp
* updating test since space before is arbitrarily removed
* remove warning for 4.45
2024-09-26 19:38:20 +02:00
Arthur
46841d3eb2
[MllamaProcessor
] Update errors and API with multiple image ( #33715 )
...
* update error
* update and add a test
* update
* update
2024-09-26 16:33:25 +02:00
Franz Louis Cesista
0a21381ba3
Uniformize kwargs for chameleon processor ( #32181 )
...
* uniformize kwargs of Chameleon
* fix linter nit
* rm stride default
* add tests for chameleon processor
* fix tests
* add comment on get_component
* rm Chameleon's slow tokenizer
* add check order images text + nit
* update docs and tests
* Fix LlamaTokenizer tests
* fix gated repo access
* fix wrong import
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2024-09-26 10:18:07 -04:00
Andrés Marafioti
f2c388e3f9
Add Idefics 3! ( #32473 )
...
* Add Idefics 3!
* fixes to make both pipelines identical
* fix for quantized models
* First pass at the review
* remove vocab size from the main config (it's still in the text_config)
* hot fix for merve
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* re-add model_type for text_config
* remove support for old_cache
* remove hidden_size from main config
* rename idefics3 HF repo
* few changes suggested in the PR
* fix to input_data_format computation
* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion
* improve example
* few improvements from amy's review
* big change to enable processing input images as numpy arrays
* Changes to the code to uniformize processor kwargs
* image processing tests
* image processing tests fixes and some bugs they discovered
* addressed review comments from Yoni
* fix modeling tests
* remove special tokens that are not special
* fixes tests
* skip failing tests - they also fail for idefics2
* added paper and readded the tests with multi gpu, who knows
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review amy until image_processing_idefics3
* last comments from Amy
* review amy
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* doc improvement - amy review
* fix runtime error during fine-tuning
* amy's review
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ruff
* amy's comment on the order
* ruff ruff
* fix copies
* square images when they are not splitted
* ruff :(
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics3/test_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix small bug introduced in refactor
* amy's image processing changes
* fixes peft tests and ruff
* modify to_pil_image from transformers. and review from emanuele.
* add modified to_pil_image
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-25 21:28:49 +02:00
Manuel
a55adee890
adding positional encoder changes and tests ( #32600 )
...
* adding positional encoder changes and tests
* adding ruff suggestions
* changes added by python utils/check_copies.py --fix_and_overwrite
* removing pos_encoding added by script
* adding interpolation to clipseg
* formatting
* adding further testing to altclip and better documentation to kosmos2
* skipping test_inputs_embeds_matches_input_ids_with_generate in git model
* fixing clipseg comment suggestions
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* fixing bridgetower test
* fixing altclip tensor output POS test
* adding ruff formatting
* fixing several tests
* formatting with ruff
* adding positional encoder changes and tests
* adding ruff suggestions
* changes added by python utils/check_copies.py --fix_and_overwrite
* removing pos_encoding added by script
* adding interpolation to clipseg
* formatting
* adding further testing to altclip and better documentation to kosmos2
* skipping test_inputs_embeds_matches_input_ids_with_generate in git model
* fixing clipseg comment suggestions
* fixing bridgetower test
* fixing altclip tensor output POS test
* adding ruff formatting
* fixing several tests
* formatting with ruff
* adding right pretrained model
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* fixing test_inference_image_segmentation
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* adding ruff formatting
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* adding new interpolate_pos_encoding function
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* fixing interpolate_POS funciton
* adapting output tensor in teests
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* modifying output tensor
* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip
* adding the correct tensor
* [run_slow] clipseg
* fixing spaces
* [run_slow] clipseg
* [run_slow] clipseg
---------
Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
2024-09-25 19:05:01 +01:00
Arthur
19d58d31f1
Add MLLama ( #33703 )
...
* current changes
* nit
* Add cross_attenttion_mask to processor
* multi-image fixed
* Add cross_attenttion_mask to processor
* cross attn works in all cases
* WIP refactoring function for image processor
* WIP refactoring image processor functions
* Refactor preprocess to use global loops instead of list nested list comps
* Docstrings
* Add channels unification
* fix dtype issues
* Update docsrings and format
* Consistent max_image_tiles
* current script
* updates
* Add convert to rgb
* Add image processor tests
* updates!
* update
* god damn it I am dumb sometimes
* Precompute aspect ratios
* now this works, full match
* fix 😉
* nits
* style
* fix model and conversion
* nit
* nit
* kinda works
* hack for sdpa non-contiguous bias
* nits here and there
* latest c hanges
* merge?
* run forward
* Add aspect_ratio_mask
* vision attention mask
* update script and config variable names
* nit
* nits
* be able to load
* style
* nits
* there
* nits
* make forward run
* small update
* enable generation multi-turn
* nit
* nit
* Clean up a bit for errors and typos
* A bit more constant fixes
* 90B keys and shapes match
* Fix for 11B model
* Fixup, remove debug part
* Docs
* Make max_aspect_ratio_id to be minimal
* Update image processing code to match new implementation
* Adjust conversion for final checkpoint state
* Change dim in repeat_interleave (accordig to meta code)
* tmp fix for num_tiles
* Fix for conversion (gate<->up, q/k_proj rope permute)
* nits
* codestyle
* Vision encoder fixes
* pass cross attn mask further
* Refactor aspect ratio mask
* Disable text-only generation
* Fix cross attention layers order, remove q/k norm rotation for cross atention layers
* Refactor gated position embeddings
* fix bugs but needs test with new weights
* rope scaling should be llama3
* Fix rope scaling name
* Remove debug for linear layer
* fix copies
* Make mask prepare private func
* Remove linear patch embed
* Make precomputed embeddings as nn.Embedding module
* MllamaPrecomputedAspectRatioEmbedding with config init
* Remove unused self.output_dim
* nit, intermediate layers
* Rename ln and pos_embed
* vision_chunk_size -> image_size
* return_intermediate -> intermediate_layers_indices
* vision_input_dim -> hidden_size
* Fix copied from statements
* fix most tests
* Fix more copied from
* layer_id->layer_idx
* Comment
* Fix tests for processor
* Copied from for _prepare_4d_causal_attention_mask_with_cache_position
* Style fix
* Add MllamaForCausalLM
* WIP fixing tests
* Remove duplicated layers
* Remove dummy file
* Fix style
* Fix consistency
* Fix some TODOs
* fix language_model instantiation, add docstring
* Move docstring, remove todos for precomputed embeds (we cannot init them properly)
* Add initial docstrings
* Fix
* fix some tests
* lets skip these
* nits, remove print, style
* Add one more copied from
* Improve test message
* Make validate func private
* Fix dummy objects
* Refactor `data_format` a bit + add comment
* typos/nits
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* fix dummy objects and imports
* Add chat template config json
* remove num_kv_heads from vision attention
* fix
* move some commits and add more tests
* fix test
* Remove `update_key_name` from modeling utils
* remove num-kv-heads again
* some prelimiary docs
* Update chat template + tests
* nit, conversion script max_num_tiles from params
* Fix warning for text-only generation
* Update conversion script for instruct models
* Update chat template in converstion + test
* add tests for CausalLM model
* model_max_length, avoid null chat_template
* Refactor conversion script
* Fix forward
* Fix integration tests
* Refactor vision config + docs
* Fix default
* Refactor text config
* Doc fixes
* Remove unused args, fix docs example
* Squashed commit of the following:
commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date: Wed Sep 18 13:39:15 2024 +0000
Move model + add output hidden states and output attentions
* Fix num_channels
* Add mllama text and mllama vision models
* Fixing repo consistency
* Style fix
* Fixing repo consistency
* Fixing unused config params
* Fix failed tests after refactoring
* hidden_activation -> hidden_act for text mlp
* Remove from_pretrained from sub-configs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Reuse lambda in conversion script
* Remove run.py
* Update docs/source/en/model_doc/mllama.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/processing_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused LlamaTokenizerFast
* Fix logging
* Refactor gating
* Remove cycle for collecting intermediate states
* Refactor text-only check, add integration test for text-only
* Revert from pretrained to configs
* Fix example
* Add auto `bos_token` adding in processor
* Fix tips
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Enable supports_gradient_checkpointing model flag
* add eager/sdpa options
* don't skip attn tests and bring back GC skips (did i really remove those?)
* Fix signature, but get error with None gradient
* Fix output attention tests
* Disable GC back
* Change no split modules
* Fix dropout
* Style
* Add Mllama to sdpa list
* Add post init for vision model
* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model
* if skipped, say it, don't pass
* Clean vision tester config
* Doc for args
* Update tests/models/mllama/test_modeling_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add cross_attention_mask to test
* typehint
* Remove todo
* Enable gradient checkpointing
* Docstring
* Style
* Fixing and skipping some tests for new cache
* Mark flaky test
* Skip `test_sdpa_can_compile_dynamic` test
* Fixing some offload tests
* Add direct GenerationMixin inheritance
* Remove unused code
* Add initializer_range to vision config
* update the test to make sure we show if split
* fix gc?
* Fix repo consistency
* Undo modeling utils debug changes
* Fix link
* mllama -> Mllama
* [mllama] -> [Mllama]
* Enable compile test for CausalLM model (text-only)
* Fix TextModel prefix
* Update doc
* Docs for forward, type hints, and vision model prefix
* make sure to reset
* fix init
* small script refactor and styling
* nit
* updates!
* some nits
* Interpolate embeddings for 560 size and update integration tests
* nit
* does not suppor static cache!
* update
* fix
* nit2
* this?
* Fix conversion
* Style
* 4x memory improvement with image cache AFAIK
* Token decorator for tests
* Skip failing tests
* update processor errors
* fix split issues
* style
* weird
* style
* fix failing tests
* update
* nit fixing the whisper tests
* fix path
* update
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Yoni Gozlan
94f18cf23c
Add OmDet-Turbo ( #31843 )
...
* Add template with add-new-model-like
* Add rough OmDetTurboEncoder and OmDetTurboDecoder
* Add working OmDetTurbo convert to hf
* Change OmDetTurbo encoder to RT-DETR encoder
* Add swin timm backbone as default, add always partition fix for swin timm
* Add labels and tasks caching
* Fix make fix-copies
* Format omdet_turbo
* fix Tokenizer tests
* Fix style and quality
* Reformat omdet_turbo
* Fix quality, style, copies
* Standardize processor kwargs
* Fix style
* Add output_hidden_states and ouput_attentions
* Add personalize multi-head attention, improve docstrings
* Add integrated test and fix copy, style, quality
* Fix unprotected import
* Cleanup comments and fix unprotected imports
* Add fix different prompts in batch (key_padding_mask)
* Add key_padding_mask to custom multi-head attention module
* Replace attention_mask by key_padding_mask
* Remove OmDetTurboModel and refactor
* Refactor processing of classes and abstract use of timm backbone
* Add testing, fix output attentions and hidden states, add cache for anchors generation
* Fix copies, style, quality
* Add documentation, conver key_padding_mask to attention_mask
* revert changes to backbone_utils
* Fic docstrings rst
* Fix unused argument in config
* Fix image link documentation
* Reorder config and cleanup
* Add tokenizer_init_kwargs in merge_kwargs of the processor
* Change AutoTokenizer to CLIPTokenizer in convert
* Fix init_weights
* Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs
* change processor kwargs and make task input optional
* Fix omdet docs
* Remove unnecessary tests for processor kwargs
* Replace nested BatchEncoding output of the processor by a flattened BatchFeature
* Make modifications from Pavel review
* Add changes Amy review
* Remove unused param
* Remove normalize_before param, Modify processor call docstring
* Remove redundant decoder class, add gradient checkpointing for decoder
* Remove commented out code
* Fix inference in fp16 and add fp16 integrated test
* update omdet md doc
* Add OmdetTurboModel
* fix caching and nit
* add OmDetTurboModel to tests
* nit change repeated key test
* Improve inference speed in eager mode
* fix copies
* Fix nit
* remove OmdetTurboModel
* [run-slow] omdet_turbo
* [run-slow] omdet_turbo
* skip dataparallel test
* [run-slow] omdet_turbo
* update weights to new path
* remove unnecessary config in class
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>
2024-09-25 13:26:28 -04:00
NielsRogge
06e27e3dc0
[Pixtral] Improve docs, rename model ( #33491 )
...
* Improve docs, rename model
* Fix style
* Update repo id
2024-09-25 13:53:12 +02:00
Dmitry Rogozhkin
5e2916bc14
tests: fix pytorch tensor placement errors ( #33485 )
...
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"
According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.
Fixes : #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-09-25 12:21:53 +01:00
Yoni Gozlan
5f0c181f4e
Uniformize kwargs for image-text-to-text processors ( #32544 )
...
* uniformize FUYU processor kwargs
* Uniformize instructblip processor kwargs
* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2
* Uniformize llava_next processor
* Fix save_load test for processor with chat_template only as extra init args
* Fix import Unpack
* Fix Fuyu Processor import
* Fix FuyuProcessor import
* Fix FuyuProcessor
* Add defaults for specific kwargs kosmos2
* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs
* Add tests processor Udop
* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature
* Fix overwrite tests kwargs processors
* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop
* Fix processing test fuyu
* remove unnecessary pad_token check in instructblip ProcessorTest
* Fix BC tests and cleanup
* FIx imports fuyu
* Uniformize Pix2Struct
* Fix wrong name for FuyuProcessorKwargs
* Fix slow tests reversed inputs align fuyu llava-next, change udop warning
* Fix wrong logging import udop
* Add check images text input order
* Fix copies
* change text pair handling when positional arg
* rebase on main, fix imports in test_processing_common
* remove optional args and udop uniformization from this PR
* fix failing tests
* remove unnecessary test, fix processing utils and test processing common
* cleanup Unpack
* cleanup
* fix conflict grounding dino
2024-09-24 21:28:19 -04:00
Joao Gante
a7734238ff
Generation tests: update imagegpt input name, remove unused functions ( #33663 )
2024-09-24 16:40:48 +01:00
Joao Gante
e15687fffe
Generation: deprecate PreTrainedModel
inheriting from GenerationMixin
( #33203 )
2024-09-23 18:28:36 +01:00
Yoni Gozlan
1456120929
Uniformize kwargs for Udop processor and update docs ( #33628 )
...
* Add optional kwargs and uniformize udop
* cleanup Unpack
* nit Udop
2024-09-23 12:47:32 -04:00
Avishai Elmakies
78b2929c05
Sdpa dino v2 ( #33403 )
...
* add sdpa to dinov2
* fixup
* add dinov2 to sdpa doc
* update doc order
* [run-slow] dinov2
* common to eager
* [run-slow] dinov2
* update attn implementation in common
* update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit
* [run-slow] dinov2
---------
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
2024-09-21 01:58:00 +01:00
Mayank Mishra
e472e077c2
Granitemoe ( #33207 )
...
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* add granitemoe
* add decoration
* remove moe from sequenceclassification
* fix test
* fix
* fix
* fix
* move rope?
* merge
* drop bias
* drop bias
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* drop
* drop
* fix
* fix
* cleanup
* cleanup
* fix
* fix granite tests
* fp32 test
* fix
* drop jitter
* fix
* rename
* rename
* fix config
* add gen test
---------
Co-authored-by: Yikang Shen <yikang.shn@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-21 01:43:50 +02:00
Yoni Gozlan
c0c6815dc9
Add support for args to ProcessorMixin for backward compatibility ( #33479 )
...
* add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin
* change size and crop_size in processor kwargs tests to do_rescale and rescale_factor
* remove unnecessary llava processor kwargs test overwrite
* nit
* change data_arg_name to input_name
* Remove unnecessary test override
* Remove unnecessary tests Paligemma
* Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring
2024-09-20 11:40:59 -04:00
Joao Gante
2fdb5e74cc
VLM generate: tests can't generate image/video tokens ( #33623 )
2024-09-20 15:43:27 +01:00
amyeroberts
f9b4409726
Remove unnecessary CPM model tests ( #33621 )
...
Remove model tests
2024-09-20 14:20:57 +01:00
Lake Lee
ec1424c6a3
Update modeling_mamba2.py, fix pad size ( #32599 )
...
* Update modeling_mamba2.py
Fix pad_size calculation to ensure it's less than self.chunk_size
* [run_slow] mamba2
* [run-slow] mamba2
* [run-slow] Add @require_read_token decorator to failing tests for token propagation
* [run_slow] mamba2
2024-09-20 11:40:57 +01:00
Fanli Lin
8bd1f2f338
[tests] make more tests device-agnostic ( #33580 )
...
* enable
* fix
* add xpu skip
* add marker
* skip for xpu
* add more
* enable on accelerator
* add more cases
* add more tests
* add more
2024-09-20 10:16:43 +01:00
Fanli Lin
4d8908df27
[tests] enable GemmaIntegrationTest on XPU ( #33555 )
...
enable GemmaIntegrationTest
2024-09-19 19:39:19 +01:00
Fanli Lin
b87755aa6d
[tests] skip tests for xpu ( #33553 )
...
* enable
* fix
* add xpu skip
* add marker
* skip for xpu
* add more
* add one more
2024-09-19 19:28:04 +01:00
Yoni Gozlan
f111d5b783
Uniformize kwargs for Paligemma processor and update docs ( #33571 )
...
* Uniformize paligemma processor
* nit
2024-09-19 14:14:06 -04:00
Joao Gante
52920b5dd5
Cache: don't throw warnings on gemma2
when instantiating a new cache ( #33595 )
2024-09-19 17:42:47 +01:00
Anton Vlasjuk
b50ff5993a
[Mamba2
] Move dt calculations to kernel ( #33520 )
...
* use kernel for dt calculations
* add small test
* [run-slow] mamba2
2024-09-19 17:41:17 +01:00
Pablo Montalvo
413008c580
add uniform processors for altclip + chinese_clip ( #31198 )
...
* add initial design for uniform processors + align model
* add uniform processors for altclip + chinese_clip
* fix mutable default 👀
* add configuration test
* handle structured kwargs w defaults + add test
* protect torch-specific test
* fix style
* fix
* rebase
* update processor to generic kwargs + test
* fix style
* add sensible kwargs merge
* update test
* fix assertEqual
* move kwargs merging to processing common
* rework kwargs for type hinting
* just get Unpack from extensions
* run-slow[align]
* handle kwargs passed as nested dict
* add from_pretrained test for nested kwargs handling
* [run-slow]align
* update documentation + imports
* update audio inputs
* protect audio types, silly
* try removing imports
* make things simpler
* simplerer
* move out kwargs test to common mixin
* [run-slow]align
* skip tests for old processors
* [run-slow]align, clip
* !$#@!! protect imports, darn it
* [run-slow]align, clip
* [run-slow]align, clip
* update common processor testing
* add altclip
* add chinese_clip
* add pad_size
* [run-slow]align, clip, chinese_clip, altclip
* remove duplicated tests
* fix
* update doc
* improve documentation for default values
* add model_max_length testing
This parameter depends on tokenizers received.
* Raise if kwargs are specified in two places
* fix
* match defaults
* force padding
* fix tokenizer test
* clean defaults
* move tests to common
* remove try/catch block
* deprecate kwarg
* format
* add copyright + remove unused method
* [run-slow]altclip, chinese_clip
* clean imports
* fix version
* clean up deprecation
* fix style
* add corner case test on kwarg overlap
* resume processing - add Unpack as importable
* add tmpdirname
* fix altclip
* fix up
* add back crop_size to specific tests
* generalize tests to possible video_processor
* add back crop_size arg
* fixup overlapping kwargs test for qformer_tokenizer
* remove copied from
* fixup chinese_clip tests values
* fixup tests - qformer tokenizers
* [run-slow] altclip, chinese_clip
* remove prepare_image_inputs
2024-09-19 17:21:54 +02:00
Pablo Montalvo
4f0246e535
fix tests with main revision and read token ( #33560 )
...
* fix tests with main revision and read token
* [run-slow]mamba2
* test previously skipped tests
* [run-slow]mamba2
* skip some tests
* [run-slow]mamba2
* finalize tests
* [run-slow]mamba2
2024-09-19 17:10:22 +02:00
Joao Gante
f3b3810fe6
rag: fix CI ( #33578 )
2024-09-19 11:55:26 +01:00
Raushan Turganbay
d7975a5874
VLMs: enable generation tests ( #33533 )
...
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-09-19 12:04:24 +02:00
Raushan Turganbay
e40bb4845e
Load and save video-processor from separate folder ( #33562 )
...
* load and save from video-processor folder
* Update src/transformers/models/llava_onevision/processing_llava_onevision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-19 09:56:52 +02:00
Yoach Lacombe
5af7d41e49
Codec integration ( #33565 )
...
* clean mimi commit
* some nits suggestions from Arthur
* make fixup
* rename repo id + change readme
* Update docs/source/en/model_doc/mimi.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add flaky flag to batching equivalence due to audio_codes failing sometimes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-18 19:23:44 +02:00
Raushan Turganbay
db72894b48
Chat template: save and load correctly for processors ( #33462 )
...
* fix
* add tests
* fix tests
* Update tests/models/llava/test_processor_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
* fix tests
* update tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-18 13:00:44 +02:00
Wang, Yi
454a0f2efd
fix patch_attention_mask incorrect setting which leads to the differe… ( #33499 )
...
* fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix format
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* [run_slow] idefics2
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-09-17 22:24:42 +01:00
Yoni Gozlan
d8500cd229
Uniformize kwargs for Pixtral processor ( #33521 )
...
* add uniformized pixtral and kwargs
* update doc
* fix _validate_images_text_input_order
* nit
2024-09-17 14:44:27 -04:00
Nikita Krasnytskyi
c29a8694b0
Fix missing sequences_scores
in the Whisper beam search output ( #32970 )
...
* added sequences_scores to the output
* added beam_indices to output
* added test to check for beam_indices, sequences_scores and their shape
* removed redundant whitespaces
* make fixup
2024-09-17 19:36:11 +01:00
ErezSC42
46c27577b3
fix to jamba config, asserting attention and expert offset ( #33316 )
...
* fix to jamba config, asserting attention and expert offset
* fix foramtting
* fix foramtting
* fix foramtting
* changed to error raise instead of assertion, added unittests
* fix
* changed t_ to property_
* changed t_ to property_
* quickfix
* ran code styler
2024-09-17 19:29:27 +01:00
Wang, Yi
74026b473e
idefics2 enable_input_require_grads not aligned with disable_input_re… ( #33194 )
...
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* split test case
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix ci failure
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refine test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-09-17 10:39:34 +01:00
Insu Jang
bcf8946f0a
Fix number of patch check for different vision feature select strategy ( #32494 )
...
* Fix number of patch check for different vision feature select strategy
* add test
---------
Co-authored-by: raushan <raushan@huggingface.co>
2024-09-17 09:33:07 +02:00
Yoach Lacombe
18e1a9c719
Fix parametrization-based weight norm ( #33275 )
...
* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading
* make style
* fix sew
* fix sew and sew_d tests
2024-09-17 08:05:21 +02:00
Yoach Lacombe
98adf24883
[Whisper test] Fix some failing tests ( #33450 )
...
* Fix failing tensor placement in Whisper
* fix long form generation tests
* more return_timestamps=True
* make fixup
* [run_slow] whisper
* [run_slow] whisper
2024-09-16 19:05:17 +02:00
Yoni Gozlan
2f62146f0e
Uniformize kwargs for LLaVa processor and update docs ( #32858 )
...
* Uniformize kwargs for LlaVa and update docs
* Change order of processor inputs in docstring
* Improve BC support for reversed images and text inputs
* cleanup llava processor call docstring
* Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning
* Put function check reversed images text outside base processor class
* Refactor _validate_images_text_input_order
* Add ProcessingUtilTester
* fix processing and test_processing
2024-09-16 11:26:26 -04:00
Arthur
8bd2b1e8c2
Add support for Pixtral ( #33449 )
...
* initial commit
* gloups
* updates
* work
* weights match
* nits
* nits
* updates to support the tokenizer :)
* updates
* Pixtral processor (#33454 )
* rough outline
* Add in image break and end tokens
* Fix
* Udo some formatting changes
* Set patch_size default
* Fix
* Fix token expansion
* nit in conversion script
* Fix image token list creation
* done
* add expected results
* Process list of list of images (#33465 )
* updates
* working image and processor
* this is the expected format
* some fixes
* push current updated
* working mult images!
* add a small integration test
* Uodate configuration docstring
* Formatting
* Config docstring fix
* simplify model test
* fixup modeling and etests
* Return BatchMixFeature in image processor
* fix some copies
* update
* nits
* Update model docstring
* Apply suggestions from code review
* Fix up
* updates
* revert modeling changes
* update
* update
* fix load safe
* addd liscence
* update
* use pixel_values as required by the model
* skip some tests and refactor
* Add pixtral image processing tests (#33476 )
* Image processing tests
* Add processing tests
* woops
* defaults reflect pixtral image processor
* fixup post merge
* images -> pixel values
* oups sorry Mr docbuilder
* isort
* fix
* fix processor tests
* small fixes
* nit
* update
* last nits
* oups this was really breaking!
* nits
* is composition needs to be true
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-14 12:28:39 +02:00
Amit Garg
dfd31158ee
[Phi-3] Bug on stale kv cache ( #33129 )
...
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* Addressed comments
* added a unit test
* fixed cache position
* Added a warning msg to the forward fn
* fixed test case
2024-09-13 14:07:19 +02:00
Raushan Turganbay
4b0418df11
Enable padding_side
as call time kwargs ( #33385 )
...
* fix
* add padding-side kwarg
* add padding side in all models & fix tests
* fix copies
* fix tests
2024-09-13 11:58:38 +01:00
Raushan Turganbay
9c4639b622
Return image hidden states ( #33426 )
...
* fix
* return image hidden states
* fix copies
* fix test
2024-09-13 10:20:03 +02:00
benniekiss
5c6257d1fc
[whisper] Clarify error message when setting max_new_tokens ( #33324 )
...
* clarify error message when setting max_new_tokens
* sync error message in test_generate_with_prompt_ids_max_length
* there is no self
2024-09-12 18:48:36 +02:00
Raushan Turganbay
2f611d30d9
Qwen2-VL: clean-up and add more tests ( #33354 )
...
* clean-up on qwen2-vl and add generation tests
* add video tests
* Update tests/models/qwen2_vl/test_processing_qwen2_vl.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix and add better tests
* Update src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update docs and address comments
* Update docs/source/en/model_doc/qwen2_vl.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_vl.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update
* remove size at all
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-12 18:24:04 +02:00
Hannan Komari
8ed635258c
Fix flax whisper tokenizer bug ( #33151 )
...
* Update tokenization_whisper.py
Fix issue with flax whisper model
* Update tokenization_whisper_fast.py
Fix issue with flax whisper model
* Update tokenization_whisper.py
just check len of token_ids
* Update tokenization_whisper_fast.py
just use len of token_ids
* Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update test_tokenization_whisper.py to add test for _convert_to_list method
* Update test_tokenization_whisper.py to fix code style issues
* Fix code style
* Fix code check again
* Update test_tokenization)whisper.py to Improve code style
* Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available
* Update tests/models/whisper/test_tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method
* Revert the changes automatically applied by formatter and was unrelated to PR
* Format for minimal changes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-12 12:21:59 +01:00
Ita Zaporozhets
781bbc4d98
use diff internal model in tests ( #33387 )
...
* use diff internal model in tests
* use diff internal model in tests
2024-09-11 11:27:00 +02:00
Maciej Adamiak
8e8e7d8558
fixed Mask2Former image processor segmentation maps handling ( #33364 )
...
* fixed mask2former image processor segmentation maps handling
* introduced review suggestions
* introduced review suggestions
2024-09-10 11:19:56 +01:00
Raushan Turganbay
7d2d6ce9cb
VLM: fixes after refactor ( #32907 )
...
* leave only half of the changes
* fix tests
* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava
* fix tests, first try
* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava
* fix, second try
* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava
* fix
* [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava
2024-09-10 12:02:37 +02:00
Lysandre Debut
f24f084329
Import structure & first three model refactors ( #31329 )
...
* Import structure & first three model refactors
* Register -> Export. Export all in __all__. Sensible defaults according to filename.
* Apply most comments from Amy and some comments from Lucain
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* Style
* Add comment
* Clearer .py management
* Raise if not in backend mapping
* More specific type
* More efficient listdir
* Misc fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
2024-09-10 11:10:53 +02:00
amyeroberts
f745e7d3f9
Remove repeated prepare_images in processor tests ( #33163 )
...
* Remove repeated prepare_images
* Address comments - update docstring; explanatory comment
2024-09-09 13:20:27 +01:00
Raushan Turganbay
65bb284448
Compile compatibilty for decoder-only models ( #32617 )
...
* squash into one commit
* add qwen2-vl for rope standardization
* fix mistral compile
* fix qwen2-vl
* fix-copies
2024-09-09 10:59:04 +02:00
Ita Zaporozhets
e48e5f1f13
Support reading tiktoken tokenizer.model file ( #31656 )
...
* use existing TikTokenConverter to read tiktoken tokenizer.model file
* del test file
* create titktoken integration file
* adding tiktoken llama test
* ALTNATIVE IMPLEMENTATION: supports llama 405B
* fix one char
* remove redundant line
* small fix
* rm unused import
* flag for converting from tiktokeng
* remove unneeded file
* ruff
* remove llamatiktokenconverter, stick to general converter
* tiktoken support v2
* update test
* remove stale changes
* udpate doc
* protect import
* use is_protobuf_available
* add templateprocessor in tiktokenconverter
* reverting templateprocessor from tiktoken support
* update test
* add require_tiktoken
* dev-ci
* trigger build
* trigger build again
* dev-ci
* [build-ci-image] tiktoken
* dev-ci
* dev-ci
* dev-ci
* dev-ci
* change tiktoken file name
* feedback review
* feedback rev
* applying feedback, removing tiktoken converters
* conform test
* adding docs for review
* add doc file for review
* add doc file for review
* add doc file for review
* support loading model without config.json file
* Revert "support loading model without config.json file"
This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb.
* remove dev var
* updating docs
* safely import protobuf
* fix protobuf import error
* fix protobuf import error
* trying isort to fix ruff error
* fix ruff error
* try to fix ruff again
* try to fix ruff again
* try to fix ruff again
* doc table of contents
* add fix for consistency.dockerfile torchaudio
* ruff
* applying feedback
* minor typo
* merging with push-ci-image
* clean up imports
* revert dockerfile consistency
2024-09-06 14:24:02 +02:00
Shiyu
342e800086
support 3D attention mask in bert ( #32105 )
...
* support 3D/4D attention mask in bert
* test cases
* update doc
* fix doc
2024-09-06 14:20:48 +02:00
GeLee
2b18354106
add self.head_dim for VisionAttention in Qwen2-VL ( #33211 )
...
* add self.head_dim for VisionAttention in Qwen2-VL
* add self.head_dim for VisionAttention in Qwen2-VL
* fix ci
* black the test_modeling_qwen2_vl.py
* use ruff to format test_modeling_qwen2_vl.py
* [run-slow] qwen2_vl
* use tying for python3.8
* fix the import format
* use ruff to fix the ci error I001
* [run-slow] qwen2_vl
* remove unused import
* commit for rebase
* use ruff fix ci
* [run-slow] qwen2_vl
---------
Co-authored-by: root <liji>
2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi
3314fe1760
Add validation for maximum sequence length in modeling_whisper.py ( #33196 )
...
* Add validation for maximum sequence length in modeling_whisper.py
Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.
This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.
* Change exception message in src/transformers/models/whisper/modeling_whisper.py
The exception message is for whisper's label's sequence max length.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py
It's for whisper's config.max_target_positions.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py
* Add test for maximum label's sequence length in test_modeling_whisper.py
* Add self to modeling_whisper.py
* Update test_modeling_whisper.py with respect to automatic validations
* Update modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Remove assert from test_modeling_whisper.py
* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py
* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py
* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py
* Add new tests in test_modeling_whisper.py
* Update test_modeling_whisper.py
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-09-06 14:09:49 +02:00
Ita Zaporozhets
363301f221
support loading model without config.json file ( #32356 )
...
* support loading model without config.json file
* fix condition
* update tests
* add test
* ruff
* ruff
* ruff
2024-09-06 13:49:47 +02:00
Xuehai Pan
e1c2b69c34
Load dynamic module (remote code) only once if code isn't change ( #33162 )
...
* Load remote code only once
* Use hash as load indicator
* Add a new option `force_reload` for old behavior (i.e. always reload)
* Add test for dynamic module is cached
* Add more type annotations to improve code readability
* Address comments from code review
2024-09-06 12:49:35 +01:00
Sanchit Gandhi
51d15eb1c1
[whisper] alternative fix for long-form timestamps ( #32131 )
...
* [whisper] alternative fix for long-form timestamps
* update test
2024-09-06 12:57:08 +02:00
Raushan Turganbay
1759bb9126
Fix: StaticCache & inputs_embeds
( #32932 )
...
squash commit
2024-09-06 12:56:59 +05:00
Shijie
21fac7abba
simple align qwen2vl kv_seq_len calculation with qwen2 ( #33161 )
...
* qwen2vl_align_kv_seqlen_to_qwen2
* flash att test
* [run-slow] qwen2_vl
* [run-slow] qwen2_vl fix OOM
* [run-slow] qwen2_vl
* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* code quality
---------
Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-09-05 21:19:30 +05:00
Joshua Lochner
c6d2848a23
🚨 Fix torch.jit.trace
for interpolate_pos_encoding
in all vision models ( #33226 )
...
* Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models
* Apply formatting
* Add missing `self.config = config`
* Fix copies
* Fix hiera interpolation unit test
* Formatting
* Update `_import_structure`
* make style
* Fix docstring
* Use `# Copied from` instead of utils
* DeiT variable renaming (`class_and_dist_pos_embed`)
* Fix Hiera `interpolate_pos_encoding`
2024-09-05 16:17:34 +02:00
Younes Belkada
47b096412d
Fix: Fix FalconMamba
training issues due to incompatible kernels ( #33195 )
...
* fix FM training kernels
* fix copies
* fix copies
* propagate to slow path
* make it BC
* add comment
* fix test
2024-09-05 11:55:08 +02:00
Raushan Turganbay
43df47d8e7
Llava Onevision: add model ( #32673 )
...
* working version
* fix copies
* update
* tests
* update docs
* codestyle
* add more tests
* add returns for docs
* clean up
* Update src/transformers/models/llava_onevision/processing_llava_onevision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates
* codestyle
* style
* shouldn't be reversed
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* add pooling in videos
* [run-slow] llava_onevision
* num-logits-to-keep
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* video matched orig impl
* fix tests
* chat template was modified
* Update docs/source/en/model_doc/llava_onevision.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add morer info in the doc page
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-05 14:43:20 +05:00
amyeroberts
d2dcff96f8
[InstructBLIP] qformer_tokenizer is required input ( #33222 )
...
* [InstructBLIP] qformer_tokenizer is required input
* Bit safer
* Add to instructblipvideo processor
* Fix up
* Use video inputs
* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
2024-09-04 16:18:06 +01:00
laurentd-lunit
d703477265
[fix] LlavaNextProcessor '_get_unpadded_features' method ( #33263 )
...
* [fix] LlavaNextProcessor '_get_unpadded_features' method
* [tests] add test_image_token_filling
* [chore] style + comment
* [minor] improve readability
* [chore] run make fix-copies
2024-09-04 17:41:51 +05:00
Niklas Muennighoff
ecd61c6286
Add OLMoE ( #32406 )
...
* Add OLMoE
* Add OLMoE
* Updates
* Make norm optional; add keys
* Add output
* Add
* Fix dtype
* Fix eos config
* Update
* Add OLMoE
* Fix OLMoE path
* Format
* Format
* Rmv copy statement
* Rmv copy statement
* Format
* Add copies
* Cp rotary
* Fix aming
* Fix naming
* Update RoPE integration; num_logits_to_keep; Add copy statements
* Add eps to config
* Format
* Add aux loss
* Adapt router_aux_loss_coef
* Update md
* Adapt
* adapt tests
2024-09-03 18:43:12 +02:00
Arthur
b017a9eb11
Refactor CI: more explicit ( #30674 )
...
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-08-30 18:17:25 +02:00
Matt
38d58a4427
Fix local repos with remote code not registering for pipelines ( #33100 )
...
* Extremely experimental fix!
* Try removing the clause entirely
* Add test
* make fixup
* stash commit
* Remove breakpoint
* Add anti-regression test
* make fixup
* Move repos to hf-internal-testing!
2024-08-30 16:56:22 +01:00
JB (Don)
f1a385b1de
[RoBERTa-based] Add support for sdpa ( #30510 )
...
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-28 10:26:00 +02:00
Anton Vlasjuk
3bfd3e4803
Fix: Jamba batched generation ( #32914 )
...
* init fix
* fix mask during cached forward, move mask related stuff to own function
* adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp)
* revert overwriting new integration tests
* move some comments to docstring
2024-08-28 09:24:06 +02:00
Mayank Mishra
c35d2ccf5a
Granite language models ( #31502 )
...
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* fix
* fix
* fix
* move rope?
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* fix-copies
* torch rmsnorm
* add authors
* change model path
* fix
* test
* drop static cache test
* uupdate readme
* drop non-causal
* readme
* drop useless imports
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-27 21:27:21 +02:00
Juan Pizarro
7591ca5bc5
🚨 Add Blip2ForImageTextRetrieval ( #29261 )
...
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 18:50:27 +01:00
Joao Gante
c6b23fda65
Llama: make slow tests green 🟢 ( #33138 )
2024-08-27 14:44:42 +01:00
Joao Gante
ab0ac3b98f
CI: fix efficientnet
pipeline timeout and prevent future similar issues due to large image size ( #33123 )
...
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 11:58:27 +01:00
Sai-Suraj-27
3bf6dd8aa1
fix: Fixed CodeGenTokenizationTest::test_truncation failing test ( #32850 )
...
* Fixed failing CodeGenTokenizationTest::test_truncation.
* [run_slow] Codegen
* [run_slow] codegen
2024-08-27 09:20:59 +02:00
Shijie
19e6e80e10
support qwen2-vl ( #32318 )
...
* support-qwen2-vl
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* hyphen->underscore
* make style
* add-flash2-tipd
* delete-tokenize=False
* remove-image_processor-in-init-file
* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES
* format-doct
* support-Qwen2VLVisionConfig
* remove-standardize_cache_format
* fix-letter-varaibles
* remove-torch-in-image-processor
* remove-useless-docstring
* fix-one-letter-varaible-name
* change-block-name
* default-quick-gelu-in-vision
* remove-useless-doc
* use-preimplemented-flash-forward
* fix-doc
* fix-image-processing-doc
* fix-apply-rotary-embed
* fix-flash-attn-sliding-window
* refactor
* remove-default_template
* remove-reorder_cache
* simple-get-rope_deltas
* update-prepare_inputs_for_generation
* update-attention-mask
* update-rotary_seq_len
* remove-state
* kv_seq_length
* remove-warning
* _supports_static_cache
* remove-legacy-cache
* refactor
* fix-replace
* mrope-section-doc
* code-quality
* code-quality
* polish-doc
* fix-image-processing-test
* update readme
* Update qwen2_vl.md
* fix-test
* Update qwen2_vl.md
* nit
* processor-kwargs
* hard-code-norm_layer
* code-quality
* discard-pixel-values-in-gen
* fix-inconsistent-error-msg
* unify-image-video
* hidden_act
* add-docstring
* vision-encode-as-PreTrainedModel
* pixel-to-target-dtype
* update doc and low memoryvit
* format
* format
* channel-foramt
* fix vit_flashatt
* format
* inherit-Qwen2VLPreTrainedModel
* simplify
* format-test
* remove-one-line-func-in-image-processing
* avoid-one-line-reshape
* simplify-rotary_seq_len
* avoid-single-letter-variable
* no-for-loop-sdpa
* avoid-single-letter-variable
* remove-one-line-reshape
* remove-one-line-reshape
* remove-no-rope-in-vit-logic
* default-mrope
* add-copied-from
* more-docs-for-mrope
* polish-doc
* comment-and-link
* polish-doc
* single-letter-variables
* simplify-image-processing
* video->images
* kv_seq_len-update
* vision-rope-on-the-fly
* vision-eager-attention
* change-processor-order
---------
Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
Joao Gante
970a16ec7f
Forbid PretrainedConfig
from saving generate
parameters; Update deprecations in generate
-related code 🧹 ( #32659 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-23 11:12:53 +01:00
Andrés Marafioti
18199b34e5
[run_slow] idefics2 ( #32840 )
2024-08-22 18:08:03 +02:00
Joao Gante
975b988bfe
Gemma2: eager attention by default ( #32865 )
2024-08-22 15:59:30 +01:00
Joao Gante
f6e2586a36
Jamba: update integration tests ( #32250 )
...
* try test updates
* a few more changes
* a few more changes
* a few more changes
* [run slow] jamba
* skip logits checks on older gpus
* [run slow] jamba
* oops
* [run slow] jamba
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/jamba/test_modeling_jamba.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-22 11:46:10 +01:00
Younes Belkada
93e538ae2e
Mamba / FalconMamba: Fix mamba left padding ( #32677 )
...
* fix mamba left padding
* Apply suggestions from code review
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* fix copies
* test with `inputs_embeds`
* Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copies
* clairfy
* fix last comments
* remove
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-19 16:01:35 +02:00
Kamil Akesbi
8260cb311e
Add Descript-Audio-Codec model ( #31494 )
...
* dac model
* original dac works
* add dac model
* dac can be instatiated
* add forward pass
* load weights
* all weights are used
* convert checkpoint script ready
* test
* add feature extractor
* up
* make style
* apply cookicutter
* fix tests
* iterate on FeatureExtractor
* nit
* update dac doc
* replace nn.Sequential with nn.ModuleList
* nit
* apply review suggestions 1/2
* Update src/transformers/models/dac/modeling_dac.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* up
* apply review suggestions 2/2
* update padding in FeatureExtractor
* apply review suggestions
* iterate on design and tests
* add integration tests
* feature extractor tests
* make style
* all tests pass
* make style
* fixup
* apply review suggestions
* fix-copies
* apply review suggestions
* apply review suggestions
* Update docs/source/en/model_doc/dac.md
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update docs/source/en/model_doc/dac.md
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* anticipate transfer weights to descript
* up
* make style
* apply review suggestions
* update slow test values
* update slow tests
* update test values
* update with CI values
* update with vorace values
* update test with slice
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-08-19 10:21:51 +01:00
MAHIR DAIYAN
843e5e20ca
Add Flax Dinov2 ( #31960 )
...
* tfmsenv restored in main
* installed flax
* forward pass done and all tests passed
* make fix-copies and cleaning the scripts
* fixup attempt 1
* fixup attempt 2
* fixup third attempt
* fixup attempt 4
* fixup attempt 5
* dinov2 doc fixed
* FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE
* external pos_encoding layer removed
* fixup attempt 6
* fixed integration test values
* fixup attempt 7
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* comments removed
* comment removed from the test
* fixup
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* new fixes 1
* interpolate_pos_encoding function removed
* droppath rng fixed, pretrained beit copied-from still not working
* modeling_flax_dinov2.py reformatted
* Update tests/models/dinov2/test_modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* added Copied from, to the tests
* copied from statements removed from tests
* fixed copied from statements in the tests
* [run_slow] dinov2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-08-19 09:28:13 +01:00
Joao Gante
cf32ee1753
Cache: use batch_size
instead of max_batch_size
( #32657 )
...
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-16 11:48:45 +01:00
Fanli Lin
8f9fa3b081
[tests] make test_sdpa_equivalence device-agnostic ( #32520 )
...
* fix on xpu
* [run_all]
2024-08-16 11:34:13 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper
and LogitsProcessor
( #32626 )
2024-08-16 11:20:41 +01:00
jp
e840127370
reopen: llava-next fails to consider padding_side during Training ( #32679 )
...
restore #32386
2024-08-15 11:44:19 +01:00
Yih-Dar
20a04497a8
Fix JetMoeIntegrationTest
( #32332 )
...
JetMoeIntegrationTest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-14 16:22:06 +02:00
Yoni Gozlan
5bcbdff159
Modify ProcessorTesterMixin for better generalization ( #32637 )
...
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs
* remove crop_size argument in align processor tests to be coherent with base tests
* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
2024-08-13 11:48:53 -04:00
Bertrand Thia
cc25757a44
Add Depth Anything V2 Metric models ( #32126 )
...
* add checkpoint and repo names
* adapt head to support metric depth estimation
* add max_depth output scaling
* add expected logits
* improve docs
* fix docstring
* add checkpoint and repo names
* adapt head to support metric depth estimation
* add max_depth output scaling
* add expected logits
* improve docs
* fix docstring
* rename depth_estimation to depth_estimation_type
* add integration test
* Refactored tests to include metric depth model inference test
* Integration test pass when the timm backbone lines are commented (L220-L227)
* address feedback
* replace model path to use organization path
* formatting
* delete deprecated TODO
* address feedback
* [run_slow] depth_anything
2024-08-13 16:16:30 +02:00
Raushan Turganbay
a29eabd0eb
Expand inputs in processors for VLMs ( #30962 )
...
* let it be
* draft
* should not have changed
* add warnings
* fix & add tests
* fix tests
* ipnuts embeds cannot be passed with pixels
* more updates
* paligemma ready!
* minor typos
* update blip-2
* fix tests & raise error
* docstring
* add blip2 test
* tmp
* add image seq length to config
* update docstring
* delete
* fix tests
* fix blip
* fix paligemma
* out-of-place scatter
* add llava-next-video
* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* remove tmp
* codestyle
* nits
* more nits
* remove overriding in tests
* comprehension when merging video
* fix-copies
* revert changes for embeds test
* fix tests after making comprehension
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* more updates
* fix tests
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-08-13 10:14:39 +05:00
Quentin Gallouédec
f1c8542ff7
"to be not" -> "not to be" ( #32636 )
...
* "to be not" -> "not to be"
* Update sam.md
* Update trainer.py
* Update modeling_utils.py
* Update test_modeling_utils.py
* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
Raushan Turganbay
8f2b6d5e3d
Fix: FA2 with packed training ( #32487 )
...
* fix check
* add tests
* [run-slow] llama, gemma2
* oops, whisper actually runs but needed some special treatment
2024-08-12 13:40:07 +05:00
Younes Belkada
7c11491208
Add new model ( #32615 )
...
* v1 - working version
* fix
* fix
* fix
* fix
* rename to correct name
* fix title
* fixup
* rename files
* fix
* add copied from on tests
* rename to `FalconMamba` everywhere and fix bugs
* fix quantization + accelerate
* fix copies
* add `torch.compile` support
* fix tests
* fix tests and add slow tests
* copies on config
* merge the latest changes
* fix tests
* add few lines about instruct
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Arthur
e4522fe399
fix slow integration gemma2 test ( #32534 )
...
no empty revision
2024-08-09 11:28:22 +02:00
Pablo Montalvo
044281605f
Fix generate with inputs_embeds
as input ( #32493 )
...
* I think inputs_embeds has ndim == 3
* fix sequence length catch
* add generate test
* [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama
* skip whisper
* fix bart test
* more fixes
2024-08-08 18:44:53 +02:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio ( #32137 )
...
* add qwen2audio
* Update check_repo.py
* fix style
* fix test
* fix style
* add model size
* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* switch the attention_mask and the feature_attention_mask
* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py
* fix initialization
* update chat_template
* fix consistency issue after copy
* add docstrings to _merge_input_ids_with_audio_features
* add copied from to prepare_inputs_for_generation
* add more details to docs
* rm comment
* add init_std
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* update
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update tests
* rm ignore_index
* update processor
* rm ffmpeg_read
* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update
* typo
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* fix quality
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* add official model
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Sangbum Daniel Choi
d3b3551750
Uniformize kwargs for processors - GroundingDINO ( #31964 )
...
* fix typo
* uniform kwargs
* make style
* add comments
* remove return_tensors
* remove common_kwargs from processor since it propagates
* make style
* return_token_type_ids to True
* revert the default imagekwargs since does not accept any value in the image processro
* revert processing_utils.py
* make style
* add molbap's commit
* fix typo
* fix common processor
* remain
* Revert "add molbap's commit"
This reverts commit a476c6ee88
.
* add unsync PR
* revert
* make CI happy
* nit
* import annotationformat
2024-08-08 14:03:08 +01:00
Pablo Montalvo
80b90e7b2f
Add codestral mamba2 ( #32080 )
...
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
Ao Tang
6a03942db7
Add Nemotron HF Support ( #31699 )
...
* Add nemotron support
* fix inference
* add unit test
* add layernorm1p as a class to avoid meta device mismatch
* test fixed
* Add copied_from statements
* remove pretraining_tp args
* remove nemotronlayernorm
* force LN computation done in FP32
* remove nemotrontokenizer and use llamatokenizer
* license update
* add option for kv_channels for minitron8b
* remove assert
* o_proj fixed
* o_proj reshape
* add gated_proj option
* typo
* remove todos
* fix broken test after merging latest main
* remove nezha/nat after meging main
* chnage default config to 15b model
* add nemo conversion script
* rename conversion script
* remove gate_proj option
* pr comment resolved
* fix unit test
* rename kv_channels to head_dim
* resolve PR issue
* add nemotron md
* fix broken tests
* refactor rope for nemotron
* test fix
* remove linearscaling
* whitespace and import
* fix some copied-from
* code style fix
* reformatted
* add position_embedding to nemotronattention
* rope refactor to only use config, copied-from fix
* format
* Run make fix-copies
* nemotron md with autodoc
* doc fix
* fix order
* pass check_config_docstrings.py
* fix config_attributes
* remove all llama BC related code
* Use PreTrainedTokenizerFast
* ruff check examples
* conversion script update
* add nemotron to toctree
2024-08-06 15:42:05 +02:00
Francisco Kurucz
438d06c95a
Fix get large model config for Switch Transformer encoder only tester ( #32438 )
2024-08-06 11:48:32 +01:00
Pavel Iakubovskii
fb66ef8147
Update kwargs validation for preprocess
with decorator ( #32024 )
...
* BLIP preprocess
* BIT preprocess
* BRIDGETOWER preprocess
* CHAMELEON preprocess
* CHINESE_CLIP preprocess
* CONVNEXT preprocess
* DEIT preprocess
* DONUT preprocess
* DPT preprocess
* FLAVA preprocess
* EFFICIENTNET preprocess
* FUYU preprocess
* GLPN preprocess
* IMAGEGPT preprocess
* INTRUCTBLIPVIDEO preprocess
* VIVIT preprocess
* ZOEDEPTH preprocess
* VITMATTE preprocess
* VIT preprocess
* VILT preprocess
* VIDEOMAE preprocess
* VIDEOLLAVA
* TVP processing
* TVP fixup
* SWIN2SR preprocess
* SIGLIP preprocess
* SAM preprocess
* RT-DETR preprocess
* PVT preprocess
* POOLFORMER preprocess
* PERCEIVER preprocess
* OWLVIT preprocess
* OWLV2 preprocess
* NOUGAT preprocess
* MOBILEVIT preprocess
* MOBILENETV2 preprocess
* MOBILENETV1 preprocess
* LEVIT preprocess
* LAYOUTLMV2 preprocess
* LAYOUTLMV3 preprocess
* Add test
* Update tests
2024-08-06 11:33:05 +01:00
Fanli Lin
e85d86398a
add the missing flash attention test marker ( #32419 )
...
* add flash attention check
* fix
* fix
* add the missing marker
* bug fix
* add one more
* remove order
* add one more
2024-08-06 11:18:58 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens
for luke and mluke models ( #32413 )
...
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize ( #32242 )
...
* fix: persist embedding type of MBartConditonalGeneration after resize
* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 ( #32388 )
...
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder ( #32088 )
...
* fix: SeamlessM4TFeatureExtractor stride remainder
* Added attention mask size test
* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 ( #32339 )
...
* Remove size check between attn_weights and kv_seq_len
* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding ( #31772 )
...
* [whisper] compile compatibility with long-form decoding
* clarify comment
* fix after rebase
* finalise
* fix bsz
* fix cache split
* remove contiguous
* style
* finish
* update doc
* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models ( #32227 )
...
* draft
* apply changes to all relevant archs
* rerun ci - check_docstrings.py failing?
* fix docstring
* move 2D->4D mask creation to modeling file
* repo consistency
* fix the batch size = 1 case - calling contiguous is not enough
* nit
* style
* propagate to gemma/gemma-2
* prepare inputs for gemma generation
* implement test and tiny fix in gemma2
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* ci pass
* fix gemma's test_compile_static_cache tests
* flacky
* retrigger ci
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer ( #32275 )
...
* Fix FA2 call for Perciever layer
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
* Fix up
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq
assignment ( #32330 )
...
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention ( #32188 )
...
* enable flash-attn & static cache
* this works, not the prev
* fix for sliding window layers
* not needed anymore
2024-07-31 10:33:38 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process ( #32191 )
...
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
2024-07-30 23:36:38 +02:00
Kamil Akesbi
3fbaaaa64d
Whisper tokenizer word level timestamps ( #32197 )
...
* fix _fix_key in PreTrainedModel
* fix _find_longest_common_sequence
* add test
* remove result.json
* nit
* update test
2024-07-29 11:19:52 +01:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation ( #30788 )
...
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Sai-Suraj-27
b8e5cd5396
Refactor: Removed un-necessary object
base class ( #32230 )
...
* Refactored to remove un-necessary object base class.
* small fix.
2024-07-26 10:33:02 +02:00
Raushan Turganbay
fad15fba78
Llava: generate without images ( #32183 )
...
* llava w/o images
* tests
2024-07-26 10:17:27 +05:00
Yih-Dar
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-25 16:12:23 +02:00
Kashif Rasul
de2318894e
[warnings] fix E721 warnings ( #32223 )
...
fix E721 warnings
2024-07-25 15:12:23 +02:00
Sanchit Gandhi
5658e749ad
[whisper] fix short-form output type ( #32178 )
...
* [whisper] fix short-form output type
* add test
* make style
* update long-form tests
* fixes
* last fix
* finalise test
2024-07-25 16:58:02 +08:00
Sai-Suraj-27
85a1269e19
fix: Replaced deprecated unittest method
with the correct one ( #32198 )
...
Replaced deprecated unittest method with the correct one.
2024-07-24 18:00:21 +01:00
Matt
edd68f4ed8
🚨 No more default chat templates ( #31733 )
...
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
2024-07-24 17:36:32 +01:00
Joao Gante
e0182f3bd7
RoPE: relaxed rope validation ( #32182 )
...
* relaxed rope check
* lets also accept rope_type=None, defaulting to the original implementation
* type and rope_type can coexist
2024-07-24 15:00:48 +01:00
Sai-Suraj-27
d2c687b3f1
Updated ruff
to the latest version ( #31926 )
...
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
2024-07-23 17:07:31 +02:00
Sanchit Gandhi
3263b34354
Revert "Incorrect Whisper long-form decoding timestamps " ( #32148 )
...
Revert "Incorrect Whisper long-form decoding timestamps (#32003 )"
This reverts commit cd48553fc8
.
2024-07-23 18:34:30 +08:00
Amit Garg
034b477847
Rename Phi-3 rope scaling type ( #31436 )
...
* renamed phi3 rope_scaling type
* fixed trailing whitespaces
* fixed test
* added warning
* fixed format
2024-07-23 12:33:22 +02:00
Merve Noyan
9ced33ca7f
Fix video batching to videollava ( #32139 )
...
---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
2024-07-23 13:23:23 +03:00
Joao Gante
2e113422b3
Llama: RoPE refactor ( #32135 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-23 10:42:55 +01:00
mig-mfreitas
34b43211d7
Add YaRN and Dynamic-YaRN RoPE Scaling Methods ( #30910 )
...
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071 .
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-23 10:07:58 +01:00
Anton Vlasjuk
605f3245dc
Fix mask creations of GPTNeoX
and GPT2
( #31944 )
...
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
2024-07-23 10:11:12 +02:00
Sanchit Gandhi
f83c6f1d02
Remove trust_remote_code
when loading Libri Dummy ( #31748 )
...
* [whisper integration] use parquet dataset for testing
* propagate to others
* more propagation
* last one
2024-07-23 14:54:38 +08:00
Raushan Turganbay
3aefb4ec7f
LLaVaNeXT: pad on right if training ( #32134 )
...
* pad on right if training
* docs
* add tests
2024-07-23 10:23:55 +05:00
Yoni Gottesman
74d0eb3fed
Return assistant generated tokens mask in apply_chat_template ( #30650 )
...
return assistant generated tokens mask in apply_chat_template
2024-07-22 18:24:43 +01:00
Sai-Suraj-27
12b6880c81
fix: Fixed raising TypeError
instead of ValueError
for invalid type ( #32111 )
...
* Raised TypeError instead of ValueError for invalid types.
* Updated formatting using ruff.
* Retrieved few changes.
* Retrieved few changes.
* Updated tests accordingly.
2024-07-22 17:46:17 +01:00
Matt
7ba028fccb
Fix failing test with race condition ( #32140 )
...
* Fix failing test with race condition
* make fixup
* monotonic_ns instead of randint
* uuid4 instead of monotonic_ns
* Add a finally cleanup step
2024-07-22 16:07:29 +01:00
Lucain
f2a1e3ca68
Mention model_info.id instead of model_info.modelId ( #32106 )
2024-07-22 14:14:47 +01:00
Kamil Akesbi
89575b567e
Support generating with fallback for short form audio in Whisper ( #30984 )
...
* remove is_shortform
* adapt _retrieve_max_frames_and_seek for short_form
* return bos token in short and long form
* add decoder_input_ids to short form audios
* add eos token for short form
* handle short form token_timestamps
* no need to return scores
* add is_shortform conditions
* handle when max_new_tokens is None - short form
* handle assistant decoding
* fix
* handle return_dict_in_generate
* handle split_by_batch for encoder_attentions attribute
* handle num_beams>1
* handle num_return_sequences>1 in generate_with_fallback
* handle num_return_sequences>1 with return_dict_in_generate=True
* raise error if max_new_tokens + decoder_inputs_ids > max_target_pos
* fix
* apply review suggestions
* fix
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fix
* logits for both short form and long form
* handle if logits_processor is None
* test
* apply review changes to num_return_sequences
* add _expand_variables_for_generation
* remove short form commented section
* update comments
* uncomment num_beams line in generate_with_fallback
* update assistant decoding
* handle return_segment with short form generation
* up
* fix output format is_shortform
* overwrite beam_sample test
* update _set_return_timestamps
* apply review suggestions
* apply review suggestions
* remove seek_outputs_short_form
* fix _stack_split_outputs
* fix stack dim in _stack_split_outputs
* update tests
* fix past_key_values + beam tests
* fix
* clean _expand_variables_for_generation
* make style
* fix slow tests
* make style
* max_length condition
* make style
* add slow tests for shortform fallback
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* apply review changes
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* up
* fix slow tests
* apply review suggestions
* update test
* make style
* small fix
* fix
* fix test_new_cache_format
* fix past_key_values
* fix
* make style
* fix slow tests
* fix
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-19 13:42:22 +01:00
Kamil Akesbi
cd48553fc8
Incorrect Whisper long-form decoding timestamps ( #32003 )
...
* fix lo form timestamps in decode_batch
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* add test
* make style
* fix copies
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/processing_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply review suggestions
* fix
* fix copies
* fix
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix-copies
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-19 09:26:38 +01:00
Raushan Turganbay
b873234cb6
Llava: add default chat templates ( #31691 )
...
* add default chat templates
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* more clear docstring and docs
* Update docs/source/en/model_doc/llava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* add tests
* remove default templates (see #31733 )
* load chat template from another file
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* revert some changes in docs
* forgot vipllava
* chat template file is not temporary hack
* warn if loading from processor
* not that file
* similarly modify `save_pretrained`
* Update tests/models/llava_next/test_processor_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_processor_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2024-07-19 10:08:56 +05:00
Longjie Zheng
c75969ee28
Add torch.compile Support For Mamba ( #31247 )
...
* modify mamba cache
* set up cache
* add test
* [run-slow] mamba
* [run-slow] mamba
* address comments
* [run-slow] mamba
* use_cache_position
* [run-slow] mamba
* [run-slow] mamba
* [run-slow] mamba
* [run-slow] mamba
* fix
* cache in generate
* [run-slow] mamba
* address comments
* [run-slow] mamba
* [run-slow] mamba
* address comments
* [run-slow] mamba
* fix
* [run-slow] mamba
* fix
* [run-slow] mamba
* fix cache name
* [run-slow] mamba
2024-07-18 11:54:54 -04:00
Raushan Turganbay
673d30b826
Chameleon: minor fixes after shipping ( #32037 )
...
* fix merging
* make chameleon conditional
2024-07-18 16:54:07 +05:00
Pavel Iakubovskii
1c37e8c1a6
Add sdpa
and FA2 for CLIP ( #31940 )
...
* Squashed commit of the following:
commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:23:52 2024 +0000
Add model-specific sdpa tests
commit 60e4c88581abf89ec098da84ed8e92aa904c997d
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:20:53 2024 +0000
Add fallback to eager (expensive operation)
commit c29033d30e7ffde4327e8a15cbbc6bee37546f80
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Thu Jul 11 17:09:55 2024 +0000
Fix attn_implementation propagation
commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:05:27 2024 +0530
style
commit e77e703ca75d00447cda277eca6b886cd32bddc0
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:04:57 2024 +0530
add comment to explain why I had to touch forbidden codebase.
commit ab9d8849758e7773a31778ccba71588d18552623
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:03:02 2024 +0530
fix: flax attribute access.
commit c570fc0abf9d1bd58c291aae3c7e384f995996d2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 08:23:54 2024 +0530
fix tensorflow attribute name.
commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:57:10 2024 +0530
fix attribute access.
commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:44:02 2024 +0530
_from_config.
commit 35aed64ff602422adcf41d7f677a0a24bd9eccae
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 18:46:52 2024 +0530
propagation of attn_implementation.
commit 4c25c19845438b1dc1d35a5adf9436151c8c5940
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:24:36 2024 +0530
style again
commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:19:05 2024 +0530
use from_config.
commit b70c409956d0359fa6ae5372275d2a20ba7e3389
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:13:43 2024 +0530
quality
commit a7b63beff53d0fc754c6564e2a7b51731ddee49d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 14:35:10 2024 +0200
add benchmark numbers
commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:16 2024 +0200
Revert "reflect feedback more"
This reverts commit dc123e71ef
.
commit ca674829d28787349c2a9593a14e0f1d41f04ea4
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:05 2024 +0200
Revert "fix"
This reverts commit 37a1cb35b8
.
commit fab2dd8576c099eb1a3464958cb206a664d28247
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:47:46 2024 +0200
fix
commit fbc6ae50fd6f2d36294d31e191761631b701d696
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:38:30 2024 +0200
reflect feedback more
commit 87245bb020b2d60a89afe318a951df0159404fc9
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 08:54:34 2024 +0530
fixes
commit 1057cc26390ee839251e7f8b3326c4207595fb23
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:49:03 2024 +0530
don't explicit set attn_implementation in tests
commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:43:54 2024 +0530
explicitly override attn_implementation in the towers.
commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:38:42 2024 +0530
import in one-line.
commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:34:58 2024 +0530
move sdpa mention to usage tips.
commit 92884766c64dbb456926a3a84dd427be1349fa95
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 10:58:26 2024 +0530
fix: memory allocation problem.
commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 09:56:59 2024 +0530
fix-copies
commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:16:12 2024 +0530
address arthur's comments.
commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:08:15 2024 +0530
Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 46e04361f37ded5c522ff05e9f725b9f82dce40e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:55:27 2024 +0530
add to docs.
commit 831629158ad40d34d8983f209afb2740ba041af2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:33:10 2024 +0530
styling.g
commit d263a119c77314250f4b4c8469caf42559197f22
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:15:20 2024 +0530
up
commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 18:40:42 2024 +0530
handle causal and attention mask
commit 122f1d60153df6666b634a94e38d073f3f260926
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 15:18:21 2024 +0530
test fixes.
commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 09:39:25 2024 +0530
fix: scaling inside sdpa.
commit 0f629989efc48b7315cf19405a81e02955efe7e5
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Tue Apr 23 08:14:58 2024 +0530
Update src/transformers/models/clip/modeling_clip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 14367316877dc27ea40f767ad1aee38bbc97e4ce
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 22 16:21:36 2024 +0530
add: sdpa support to clip.
* Remove fallback for empty attention mask (expensive operation)
* Fix typing in copies
* Add flash attention
* Add flash attention tests
* List CLIP in FA docs
* Fix embeddings attributes and tf
* [run-slow] clip
* Update clip documentation
* Remove commented code, skip compile dynamic for CLIPModel
* Fix doc
* Fix doc 2
* Remove double transpose
* Add torch version check for contiguous()
* Add comment to test mixin
* Fix copies
* Add comment for mask
* Update docs
* [run-slow] clip
2024-07-18 10:30:37 +05:30
Pavel Iakubovskii
691586b0dc
Fix tests skip ( #32012 )
...
* [run-slow] clip
* [run-slow] clip
* Fix skip -> skipTest
* [run-slow] clip
2024-07-17 08:37:43 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model ( #31534 )
...
* Chameleon model integration
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
* fix 7B, again. mask away image tokens
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove pretrained_config_map
* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file
* remove tokenizer (use llama's); remove codechameleon tests
* a few copied from statements and minor changes
* copied from in ChameleonModel
* some copies in ChameleonForCausalLM
* a few more copies
* VQModel moved to ChameleonModel (as opposed to being in the processor)
* ChameleonProcessor ready
* Fix chameleon weights convert
* update conversion script
* clean-up processing
* update modeling a bit
* update
* update (throws error...)
* correct conversion ready
* fix tests
* fix docs
* docs
* ve swin norm
* fix device for vocab map
* add normalization
* update
* update script with rope rotations
* final fix on model conversion
* add slow tests
* more info in docs
* fix repo consistency tests
* fix repo tests
* fix-copies
* hope this will make CI happy
* fix for 30b model
* Update docs/source/en/index.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments
* remove assertion in conversion script
* add image processor test
* not copied
* port changes for qk layernorm
* fix-copies
* read token decorator for tests
* [run-slow] chameleon
* one more read-token
* address some comments
* qk norm changes
* tests and repo check
* moved rope permutations to conversion, YAY!
* fix past kv check
* docs
* layernorm done!
* let's be consistent in naming
* fix slow tests
* weird thing with slow CI, but let's see
* once more try
* remove past-kv as tuple following llama
* ignore
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Joao Gante
999981daf4
Tests: remove cuda versions when the result is the same 🧹 🧹 ( #31955 )
...
remove cuda versions when the result is the same
2024-07-16 16:49:54 +01:00
Zach Mueller
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) ( #31771 )
...
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00
Joao Gante
e4682de635
Masking: remove flakiness from test ( #31939 )
2024-07-15 18:49:37 +01:00
Aviv Shamsian
7f79a97399
fix prompt strip to support tensors and np arrays ( #27818 )
...
* fix prompt strip to support tensors and np arrays
* framework agnostic
* change logic check before converting prompt into list
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding _convert_to_list to tokenization_whisper_fast
* adding tests for prompt decoding
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* revert minor
* make style formatting
* style formatting after update
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixing _strip_prompt to handle _decode_with_timestamps
* fix copies
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-12 20:07:10 +01:00
Naman Garg
c1e139c2b0
Adding hiera ( #30356 )
...
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* Removed tim dependency
* added HieraBlock
* fixed: Model name
* added tests for HieraModel, HieraBlock
* fixed imports
* fixed quality & copies
* Fixes
* Update docs/source/en/model_doc/hiera.md
Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed formatting
* Code quality & Import differences
* quality and repo-consistency fix
* fixed no torch error
* Docstring fix
* Docstring fix
* doc string fix
* fixed example usage
* Resolved issues in modeling_hiera
* Removed Hiera MAE
* Added test and resolved bug
* fixed doc string
* First commit
* Finished conversion script and model forward working
* Resolved all issues
* nits
* Improving tests
* Nits
* More nits
* Improving HieraForMaskedImageModeling
* More improvements and nits
* Fixed docstrings of outputs
* More fixes
* More imrpovments
* Updated conversion script
* Fixed docstrings
* Improved tests
* Fixed attentou outputs test
* All tests green
* Removed unnecessary file
* contribution attribution
* Resolved a few issues
* Resolved Comments
* Updated model repo id and fixed bugs
* Removed loss print
* Make tests green
* Updated docstrings
* Fix style
* Fixed num_heads in config
* Removed unnecessary video checkpoint related code in the conversion script
* Fix style
* Changed atol in conversion script
* HieraConfig
* Fix copies
* Fixed typo
* Resolved few issues
* make
* converted conv_nd -> nn.Module
* Removed video complexities
* Removed video complexities
* fix style
* Addressing comments
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix style
* Fixed tests
* Fixed typo
* Fixed interpolate test
* Made torch fx compatible
* Made sure imageprocesor is correct
* Addressed comments
* Noise directly as torch
* Remove unnecesary attr
* Added return_dit
* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated checkpoints
* [run_slow] hiera
* Fixed device mismatch
* [run_slow] hiera
* Fixed GPU tests
* [run_slow] hiera
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
Sai-Suraj-27
da79b18087
fix: Removed duplicate
field definitions in some classes ( #31888 )
...
Removed duplicate field definitions in classes.
2024-07-10 13:46:31 +01:00
Yung-Sung Chuang
d094d8d9ec
Generate: Add new decoding strategy "DoLa" in .generate()
( #29619 )
...
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-09 17:37:38 +01:00
Yih-Dar
4879ac2b33
Avoid failure TFBlipModelTest::test_pipeline_image_to_text
( #31827 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-08 13:49:21 +02:00
Pavel Iakubovskii
a177821b24
Add FA2 and sdpa
support for SigLIP ( #31499 )
...
* Rebase to main
* Fix attention implementation autoset for tex and vision configs
* Fixup
* Minor fixes
* Fix copies
* Fix attention_mask for FA2
* Add eqvivalence tests for siglip
* Remove right padding test
* Uncomment flaky
* Fix import
* Add to docs
* Fix test message
* Add sdpa
* Add sdpa equivalence test
* Add siglip sdpa to docs
* Fix typing for attention output
* Add sdpa tests
* Fix signature of FA2
* Autoset attn_implementation in config
* Rename bsz -> batch_size
* Move back autoset attn method
* Mark as flaky
* Correct attention mask padding
* [run-slow] siglip
* Add FA2 and sdpa docs
* Style fix
* Remove flaky for FA2 test
* Change attention implementation set
* Change attn_implementaiton propogation
* Fix typos
* Add modality to assert message
* Add more sdpa backends in test
* [run slow] siglip
* Add math sdpa backend for all options
* [run slow] siglip
2024-07-08 11:10:02 +01:00
NielsRogge
06fd7972ac
Add ZoeDepth ( #30136 )
...
* First draft
* Add docs
* Clean up code
* Convert model
* Add image processor
* Convert Zoe_K
* More improvements
* Improve variable names and docstrings
* Improve variable names
* Improve variable names
* Replace nn.sequential
* More improvements
* Convert ZoeD_NK
* Fix most tests
* Verify pixel values
* Verify pixel values
* Add squeeze
* Update beit to support arbitrary window sizes
* Improve image processor
* Improve docstring
* Improve beit
* Improve model outputs
* Add figure
* Fix beit
* Update checkpoint
* Fix repo id
* Add _keys_to_ignore_on_load_unexpected
* More improvements
* Address comments
* Address comments
* Address comments
* Address comments
* Rename variable name
* Add backbone_hidden_size
* Vectorize
* Vectorize more
* Address comments
* Clarify docstring
* Remove backbone_hidden_size
* Fix image processor
* Remove print statements
* Remove print statement
* Add integration test
* Address comments
* Address comments
* Address comments
* Address comments
* Add requires_backends
* Clean up
* Simplify conversion script
* Simplify more
* Simplify more
* Simplify more
* Clean up
* Make sure beit is loaded correctly
* Address comment
* Address bin_configurations
* Use bin_configurations
* Convert models, add integration tests
* Fix doc test
* Address comments
* Unify regressor classes
* Clarify arguments
* Improve resize_image
* Add num_relative_features
* Address comment
* [run-slow]beit,data2vec,zoedepth
* [run-slow]beit,data2vec,zoedepth
* Address comments
* Address comment
* Address comment
* Replace nn.TransformerEncoderLayer and nn.TransformerEncoder
* Replace nn.MultiheadAttention
* Add attributes for patch transformer to config
* Add tests for ensure_multiple_of
* Update organization
* Add tests
* [run-slow] beit data2vec
* Update ruff
* [run-slow] beit data2vec
* Add comment
* Improve docstrings, add test
* Fix interpolate_pos_encoding
* Fix slow tests
* Add docstring
* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Improve tests and docstrings
* Use run_common_tests
* Improve docstrings
* Improve docstrings
* Improve tests
* Improve tests
* Remove print statements
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-08 11:43:33 +02:00
Billy Cao
1d3eaa6f7e
Add training support for SigLIP ( #31495 )
...
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
2024-07-05 14:50:39 +01:00
Yih-Dar
eef0507f3d
Fix gemma tests ( #31794 )
...
* skip 3 7b tests
* fix
* fix
* fix
* [run-slow] gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-05 10:17:59 +02:00
Pavel Iakubovskii
048f599f35
Fix RT-DETR weights initialization ( #31724 )
...
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
2024-07-03 14:29:02 +01:00
Pavel Iakubovskii
b97521614a
Fix RT-DETR cache for generate_anchors ( #31671 )
...
* Fix cache and type conversion
* Add test
* Fixup
* nit
* [run slow] rt_detr
* Fix test
* Fixup
* [run slow] rt_detr
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
2024-07-03 14:19:57 +01:00
Joao Gante
ddfaf11926
Gemma 2: Update slow tests ( #31759 )
...
gemma 2 slow tests
2024-07-03 11:43:44 +02:00
Sanchit Gandhi
a9701953ff
[whisper] static kv cache ( #31166 )
...
* make work with cache abstraction
* correct for static cache
* hacks for compile
* make fast
* fix
* fix pos ids
* generate
* fix sdpa
* fix sdpa cache pos
* fix fa2
* clean fa2
* integrate cache into generate
* make style
* copies
* more copies
* update eager
* update sdpa
* update fa2
* simplify
* use cache pos
* always compute cross-cache for debug
* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>
* fix fix
* fix fix fix
* more fix
* try encoder-decoder cache (too messy)
* revert encoder-decoder cache
* check cross-attn cache
* use enc-dec dataclass
* use richer enc-dec dataclass
* clean-up
* revert static cache changes
* small fixes
* revert to cpu flag
* fix copies
* add static slow test
* past k/v docstring
* more docstrings
* cache_position docstrings
* add to docs
* add enc-dec cache to docs
* make style
* fix after rebase
* fix beam
* style
* fix generation strategies
* fix most decoder-only tests
* style
* skip test
* more clean up
* small docstrings
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add todo
* only crop self-attn
* check cache in mixin
* style
* fix re-compile after rebase
* move `is_updated` logic to enc-dec wrapper
* revert back
* revert cache back
* finalise design
* fix
* fix fix
* style
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* deprecate
* updates
* final updates
* style
* style
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-02 13:24:15 +01:00
Jacky Lee
82a1fc7256
Fix return_dict in encodec ( #31646 )
...
* fix: use return_dict parameter
* fix: type checks
* fix: unused imports
* update: one-line if else
* remove: recursive check
2024-06-28 12:18:01 +01:00
Arthur
75a6319864
Fix post gemma merge ( #31660 )
...
* nit
* toctree issue
* protect gemma2 tests as well
* sdpa supported
2024-06-27 17:51:42 +02:00
Arthur
0cf60f13ab
Add gemma 2 ( #31659 )
...
* inital commit
* Add doc
* protect?
* fixup stuffs
* update tests
* fix build documentation
* mmmmmmm config attributes
* style
* nit
* uodate
* nit
* Fix docs
* protect some stuff
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
Sangbum Daniel Choi
be50a0338b
change anchor_image_size None for compatibility ( #31640 )
...
* change anchor_image_size None for compatibility
* make fix-copies
2024-06-27 12:36:55 +01:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Billy Cao
1f9f57ab4c
Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference ( #31589 )
...
* Fix dtype casting in modeling_swin2sr to allow non-FP32 inference
* Fix formattting
* Fix for swinv2 too
* Update src/transformers/models/swin2sr/modeling_swin2sr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/swinv2/modeling_swinv2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add FP16 tests for swin2sr and swinv2
* [run_slow] swin2sr, swinv2
* [run_slow] swin2sr, swinv2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 18:46:48 +01:00
Pablo Montalvo
492ee17ec3
Fix paligemma detection inference ( #31587 )
...
* fix extended attention mask
* add slow test for detection instance
* [run-slow]paligemma
2024-06-26 19:17:09 +02:00
Raushan Turganbay
e71f2863d7
Add LLaVa NeXT Video ( #31252 )
...
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-26 21:52:28 +05:00
Pavel Iakubovskii
b1ec745475
Fix RT-DETR inference with float16 and bfloat16 ( #31639 )
...
* [run_slow] rt_detr
* Fix positional embeddings and anchors dtypes
* [run slow] rt_detr
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 17:50:10 +01:00
Anton Vlasjuk
b07770c5eb
[GPT-NeoX
] Add SDPA support ( #31031 )
...
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
2024-06-26 13:56:36 +01:00
amyeroberts
0f67ba1d74
Add ViTImageProcessorFast to tests ( #31424 )
...
* Add ViTImageProcessor to tests
* Correct data format
* Review comments
2024-06-25 13:36:58 +01:00
Raushan Turganbay
fc689d75a0
Add video modality for InstrucBLIP ( #30182 )
...
* squash in single commit
* add docs
* dummy obj
* more changes in diff converter
* tiny fix
* make docs happy
* skip test
* repo consistency tests
* update docstring
* style
* fix tests
* change diff imports
* [run-slow] instructblipvideo
* [run-slow] instructblipvideo
* fix tests and remove logit check
* [run-slow] instructblipvideo
2024-06-25 15:45:39 +05:00
Raushan Turganbay
7e86cb6c6f
Siglip: add _no_split_module
( #31566 )
...
* device-map siglip
* move split modules to PretrainedSigLip
2024-06-25 09:49:55 +05:00
Pavel Iakubovskii
3c2d4d60d7
Correct @is_flaky test decoration ( #31480 )
...
* Correct @is_flaky decorator
2024-06-24 08:09:21 +01:00
Sangbum Daniel Choi
74a207404e
New model support RTDETR ( #29077 )
...
* fill out docs string in configuration
75dcd3a0e8 (r1506391856)
* reduce the input image size for the tests
* remove the unappropriate tests
* only 5 failes exists
* make style
* fill up missed architecture for object detection in docs
* fix auto modeling
* simple fix in missing import
* major change including backbone refactor and objectdetectionoutput refactor
* minor fix only 4 fails left
* intermediate fix
* revert __init__.py
* revert __init__.py
* make style
* fixes in pr_docs
* intermediate fix
* make style
* two fixes
* pass doctest
* only one fix left
* intermediate commit
* all fixed
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/rt_detr/test_modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* function class above the model definition in dice_loss
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* simple fix
* layernorm add config.layer_norm_eps
* fix inputs_docstring
* make style
* simple fix
* add custom coco loading test in image_processor
* fix error in BaseModelOutput
https://github.com/huggingface/transformers/pull/29077#discussion_r1516657790
* simple typo
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intermediate fix
* fix with load_backbone format
* remove unused configuration
* 3 fix test left
* make style
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: Sounak Dey <dey.sounak@gmail.com>
* change last_hidden_state to first index
* all pass fix
TO DO: minor update in comments
* make fix-copies
* remove deepcopy
* pr_document fix
* revert deepcopy due to the issue of unexpceted behavior in decoderlayer
* add atol in final
* add no_split_module
* _no_split_modules = None
* device transfer for model parallelism
* minor fix
* make fix-copies
* fix typo
* add test_image_processor with post_processing
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add config in RTDETRPredictionHead
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set lru_cache with max_size 32
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add lru_cache import and configuration change
* change the order of definition
* make fix-copies
* add docs and change config error
* revert strange make-fix
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* test pass
* fix get_clones related and remove deepcopy
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* nit for paper section
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* rename denoising related parameters
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* check the image transformation logic
* make style
* make style
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* pe_encoding -> positional_encoding_temperature
* remove TODO
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* remove eval_idx since transformer DETR is giving all decoder output
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* change variable name
* make style and docs import update
* Revert "Update src/transformers/models/rt_detr/image_processing_rt_detr.py"
This reverts commit 74aa3e1de0
.
* fix typo
* add postprocessing in docs
* move import scipy to top
* change varaible name
* make fix-copies
* remove eval_idx in test
* move to after first sentence
* update image_processor since box loss requires normalized one
* change appropriate name to auxiliary_outputs
* Update src/transformers/models/rt_detr/__init__.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/__init__.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* make style
* remove panoptic related comments
* make style
* revert valid_processor_keys
* fix aux related test
* make style
* change origination from config to backbone API
* enable the dn_loss
* fix test and conversion
* renewal weight initialization
* change initializer_range
* make fix-up
* fix the loss issue in the auxiliary output and denoising part
* change weight loss to original RTDETR
* fix in initialization
* sync shape format of dn and aux
* make style
* stable fine-tuning and compatible conversion for resnet101
* make style
* skip input_embed
* change encoder related variable
* enable converting rtdetr_r101
* add r101 related conversion code
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change name _shape to _reshape
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* maket style
* make fix-copies
* remove deprecated import
* more fix
* remove last_hidden_state for task-specific model
* Revert "remove last_hidden_state for task-specific model"
This reverts commit ccb7a34051
.
* minore change in convert
* remove print
* make style and fix-copies
* add custom rtdetr backbone for r18, r34
* remove print
* change copied
* add pad_size
* make style
* change layertype to optional to pass the CI
* make style
* add test in modeling_resnet_rt_detr
* make fix-copies
* skip tmp file test
* fix comment
* add docs
* change to modeling_resnet file format
* enabling resnet50 above
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>
* enable all the rtdetr model :)
* finish except CI
* add RTDetrResNetBackbone
* make fix-copies
* fix
TO DO: CI enable
* make style
* rename test
* add docs
* add special fix
* revert resnet
* Update src/transformers/models/rt_detr/modeling_rt_detr_resnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* add more comment
* remove swin comment
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* rename convert and add verify backbone
* Update docs/source/en/_toctree.yml
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* make style
* requests for docs
* more general test docs
* general script docs
* make fix-copies
* final commit
* Revert "Update src/transformers/models/rt_detr/configuration_rt_detr.py"
This reverts commit d136225cd3
.
* skip test_model_get_set_embeddings
* remove target
* add changes
* make fix-copies
* remove decoder_attention_mask
* add load_backbone function for auto_backbone
* remove comment
* fix repo name
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* final commit
* remove unused downsample_in_bottleneck
* new test for autobackbone
* change to appropriate indices
* test fix
* fix dict in test_image_processor
* fix test
* [run-slow] rt_detr, rt_detr_resnet
* change the slow test
* [run-slow] rt_detr
* [run-slow] rt_detr, rt_detr_resnet
* make in to same cuda in CSPRepLayer
* [run-slow] rt_detr, rt_detr_resnet
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sounak Dey <dey.sounak@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>
Co-authored-by: ChoiSangBum <choisangbum@ChoiSangBumui-MacBookPro.local>
2024-06-21 17:50:08 +01:00
Ita Zaporozhets
1e79eade41
SPLIT PR: add user defined symbols and control symbols ( #31305 )
...
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
2024-06-21 01:48:10 -07:00
Yih-Dar
ec905f3a76
unskip 2 tests in cohere ( #31517 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-20 17:21:08 +02:00