Joao Gante
cf32ee1753
Cache: use batch_size
instead of max_batch_size
( #32657 )
...
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-16 11:48:45 +01:00
Fanli Lin
8f9fa3b081
[tests] make test_sdpa_equivalence device-agnostic ( #32520 )
...
* fix on xpu
* [run_all]
2024-08-16 11:34:13 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper
and LogitsProcessor
( #32626 )
2024-08-16 11:20:41 +01:00
jp
e840127370
reopen: llava-next fails to consider padding_side during Training ( #32679 )
...
restore #32386
2024-08-15 11:44:19 +01:00
Yih-Dar
20a04497a8
Fix JetMoeIntegrationTest
( #32332 )
...
JetMoeIntegrationTest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-14 16:22:06 +02:00
Jerry Zhang
78d78cdf8a
Add TorchAOHfQuantizer ( #32306 )
...
* Add TorchAOHfQuantizer
Summary:
Enable loading torchao quantized model in huggingface.
Test Plan:
local test
Reviewers:
Subscribers:
Tasks:
Tags:
* Fix a few issues
* style
* Added tests and addressed some comments about dtype conversion
* fix torch_dtype warning message
* fix tests
* style
* TorchAOConfig -> TorchAoConfig
* enable offload + fix memory with multi-gpu
* update torchao version requirement to 0.4.0
* better comments
* add torch.compile to torchao README, add perf number link
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
2024-08-14 16:14:24 +02:00
Sai-Suraj-27
df323476a3
fix: Fixed failing tests in tests/utils/test_add_new_model_like.py
( #32678 )
...
* Fixed failing tests in tests/utils/test_add_new_model_like.py
* Fixed formatting using ruff.
* Small nit.
2024-08-14 12:06:17 +01:00
Pablo Montalvo
c1357834e8
Fix tests recurrent ( #32651 )
...
* add fix for recurrentgemma
* [no-filter]
* trigger-ci
* [no-filter]
* [no-filter]
* attempt to fix mysterious zip error
* [no-filter]
* fix lookup error
* [no-filter]
* remove summarization hack
* [no-filter]
2024-08-13 23:40:50 +02:00
Yoni Gozlan
5bcbdff159
Modify ProcessorTesterMixin for better generalization ( #32637 )
...
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs
* remove crop_size argument in align processor tests to be coherent with base tests
* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
2024-08-13 11:48:53 -04:00
Sai-Suraj-27
c3cd9d807e
Fix: Fixed directory path for utils folder in test_tokenization_utils.py
( #32601 )
...
* Removed un-necessary expressions.
* Fixed directory path for utils folder in test_tokenization_utils.py
2024-08-13 16:48:15 +01:00
Bertrand Thia
cc25757a44
Add Depth Anything V2 Metric models ( #32126 )
...
* add checkpoint and repo names
* adapt head to support metric depth estimation
* add max_depth output scaling
* add expected logits
* improve docs
* fix docstring
* add checkpoint and repo names
* adapt head to support metric depth estimation
* add max_depth output scaling
* add expected logits
* improve docs
* fix docstring
* rename depth_estimation to depth_estimation_type
* add integration test
* Refactored tests to include metric depth model inference test
* Integration test pass when the timm backbone lines are commented (L220-L227)
* address feedback
* replace model path to use organization path
* formatting
* delete deprecated TODO
* address feedback
* [run_slow] depth_anything
2024-08-13 16:16:30 +02:00
Eric Hartford
481e15604a
Add support for GrokAdamW optimizer ( #32521 )
...
* add grokadamw
* reformat
* code review feedback, unit test
* reformat
* reformat
2024-08-13 13:20:28 +01:00
Fanli Lin
b5016d5de7
fix tensors on different devices in WhisperGenerationMixin
( #32316 )
...
* fix
* enable on xpu
* no manual remove
* move to device
* remove to
* add move to
2024-08-13 11:29:57 +01:00
Pablo Montalvo
a5a8291ad1
Fix tests ( #32649 )
...
* skip failing tests
* [no-filter]
* [no-filter]
* fix wording catch in FA2 test
* [no-filter]
* trigger normal CI without filtering
2024-08-13 09:46:21 +01:00
Lysandre Debut
29c3a0fa01
Automatically add transformers
tag to the modelcard ( #32623 )
...
* Automatically add `transformers` tag to the modelcard
* Specify library_name and test
2024-08-13 07:59:01 +02:00
Raushan Turganbay
a29eabd0eb
Expand inputs in processors for VLMs ( #30962 )
...
* let it be
* draft
* should not have changed
* add warnings
* fix & add tests
* fix tests
* ipnuts embeds cannot be passed with pixels
* more updates
* paligemma ready!
* minor typos
* update blip-2
* fix tests & raise error
* docstring
* add blip2 test
* tmp
* add image seq length to config
* update docstring
* delete
* fix tests
* fix blip
* fix paligemma
* out-of-place scatter
* add llava-next-video
* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* remove tmp
* codestyle
* nits
* more nits
* remove overriding in tests
* comprehension when merging video
* fix-copies
* revert changes for embeds test
* fix tests after making comprehension
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* more updates
* fix tests
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-08-13 10:14:39 +05:00
Quentin Gallouédec
f1c8542ff7
"to be not" -> "not to be" ( #32636 )
...
* "to be not" -> "not to be"
* Update sam.md
* Update trainer.py
* Update modeling_utils.py
* Update test_modeling_utils.py
* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
Sai-Suraj-27
ce4b28830a
fix: Fixed failing test_find_base_model_checkpoint
( #32638 )
...
Fixed failing test_find_base_model_checkpoint.
2024-08-12 19:51:30 +01:00
Raushan Turganbay
8f2b6d5e3d
Fix: FA2 with packed training ( #32487 )
...
* fix check
* add tests
* [run-slow] llama, gemma2
* oops, whisper actually runs but needed some special treatment
2024-08-12 13:40:07 +05:00
Younes Belkada
7c11491208
Add new model ( #32615 )
...
* v1 - working version
* fix
* fix
* fix
* fix
* rename to correct name
* fix title
* fixup
* rename files
* fix
* add copied from on tests
* rename to `FalconMamba` everywhere and fix bugs
* fix quantization + accelerate
* fix copies
* add `torch.compile` support
* fix tests
* fix tests and add slow tests
* copies on config
* merge the latest changes
* fix tests
* add few lines about instruct
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Arthur
e4522fe399
fix slow integration gemma2 test ( #32534 )
...
no empty revision
2024-08-09 11:28:22 +02:00
Guang Yang
0164560353
Fixed test test_static_cache_exportability
with torch 2.4.0 ( #32516 )
...
Workaround the export issue in torch 2.4
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-08-08 18:13:40 +01:00
Pablo Montalvo
044281605f
Fix generate with inputs_embeds
as input ( #32493 )
...
* I think inputs_embeds has ndim == 3
* fix sequence length catch
* add generate test
* [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama
* skip whisper
* fix bart test
* more fixes
2024-08-08 18:44:53 +02:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio ( #32137 )
...
* add qwen2audio
* Update check_repo.py
* fix style
* fix test
* fix style
* add model size
* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* switch the attention_mask and the feature_attention_mask
* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py
* fix initialization
* update chat_template
* fix consistency issue after copy
* add docstrings to _merge_input_ids_with_audio_features
* add copied from to prepare_inputs_for_generation
* add more details to docs
* rm comment
* add init_std
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* update
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update tests
* rm ignore_index
* update processor
* rm ffmpeg_read
* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update
* typo
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* fix quality
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* add official model
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Sangbum Daniel Choi
d3b3551750
Uniformize kwargs for processors - GroundingDINO ( #31964 )
...
* fix typo
* uniform kwargs
* make style
* add comments
* remove return_tensors
* remove common_kwargs from processor since it propagates
* make style
* return_token_type_ids to True
* revert the default imagekwargs since does not accept any value in the image processro
* revert processing_utils.py
* make style
* add molbap's commit
* fix typo
* fix common processor
* remain
* Revert "add molbap's commit"
This reverts commit a476c6ee88
.
* add unsync PR
* revert
* make CI happy
* nit
* import annotationformat
2024-08-08 14:03:08 +01:00
Aymeric Roucher
e0d82534cc
Agents use grammar ( #31735 )
...
* Allow optional use of grammars to constrain generation
2024-08-07 11:42:52 +02:00
Raushan Turganbay
a30c865f99
Cache: new Cache format in decoder-only models ( #31421 )
...
* draft bart with new cache
* add cache for decoder-only models
* revert utils
* modify docstring
* revert bart
* minor fixes
* fix copies (not related)
* revert tests
* remove enc-dec related code
* remove bloom
* remove opt (enc-dec)
* update docstring
* git, codegen, gpt_neo, gpt_neox, gpj
* clean up
* copied from statements
* revert
* tmp
* update warning msg
* forgot git
* add more flags
* run-slow git,codegen,gpt_neo,gpt_neox,gpj
* add cache flag to VLMs
* remove files
* style
* video LLMs also need a flag
* style
* llava will go in another PR
* style
* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copy from
* deprecate until v4.45 and warn if not training
* nit
* fix test
* test static cache
* add more tests and fix models
* fix copies
* return sliding window mask
* run slow tests & fix + codestyle
* one more falcon fix for alibi
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-07 10:02:16 +05:00
Pablo Montalvo
80b90e7b2f
Add codestral mamba2 ( #32080 )
...
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
Ao Tang
6a03942db7
Add Nemotron HF Support ( #31699 )
...
* Add nemotron support
* fix inference
* add unit test
* add layernorm1p as a class to avoid meta device mismatch
* test fixed
* Add copied_from statements
* remove pretraining_tp args
* remove nemotronlayernorm
* force LN computation done in FP32
* remove nemotrontokenizer and use llamatokenizer
* license update
* add option for kv_channels for minitron8b
* remove assert
* o_proj fixed
* o_proj reshape
* add gated_proj option
* typo
* remove todos
* fix broken test after merging latest main
* remove nezha/nat after meging main
* chnage default config to 15b model
* add nemo conversion script
* rename conversion script
* remove gate_proj option
* pr comment resolved
* fix unit test
* rename kv_channels to head_dim
* resolve PR issue
* add nemotron md
* fix broken tests
* refactor rope for nemotron
* test fix
* remove linearscaling
* whitespace and import
* fix some copied-from
* code style fix
* reformatted
* add position_embedding to nemotronattention
* rope refactor to only use config, copied-from fix
* format
* Run make fix-copies
* nemotron md with autodoc
* doc fix
* fix order
* pass check_config_docstrings.py
* fix config_attributes
* remove all llama BC related code
* Use PreTrainedTokenizerFast
* ruff check examples
* conversion script update
* add nemotron to toctree
2024-08-06 15:42:05 +02:00
Francisco Kurucz
438d06c95a
Fix get large model config for Switch Transformer encoder only tester ( #32438 )
2024-08-06 11:48:32 +01:00
Pavel Iakubovskii
fb66ef8147
Update kwargs validation for preprocess
with decorator ( #32024 )
...
* BLIP preprocess
* BIT preprocess
* BRIDGETOWER preprocess
* CHAMELEON preprocess
* CHINESE_CLIP preprocess
* CONVNEXT preprocess
* DEIT preprocess
* DONUT preprocess
* DPT preprocess
* FLAVA preprocess
* EFFICIENTNET preprocess
* FUYU preprocess
* GLPN preprocess
* IMAGEGPT preprocess
* INTRUCTBLIPVIDEO preprocess
* VIVIT preprocess
* ZOEDEPTH preprocess
* VITMATTE preprocess
* VIT preprocess
* VILT preprocess
* VIDEOMAE preprocess
* VIDEOLLAVA
* TVP processing
* TVP fixup
* SWIN2SR preprocess
* SIGLIP preprocess
* SAM preprocess
* RT-DETR preprocess
* PVT preprocess
* POOLFORMER preprocess
* PERCEIVER preprocess
* OWLVIT preprocess
* OWLV2 preprocess
* NOUGAT preprocess
* MOBILEVIT preprocess
* MOBILENETV2 preprocess
* MOBILENETV1 preprocess
* LEVIT preprocess
* LAYOUTLMV2 preprocess
* LAYOUTLMV3 preprocess
* Add test
* Update tests
2024-08-06 11:33:05 +01:00
Fanli Lin
e85d86398a
add the missing flash attention test marker ( #32419 )
...
* add flash attention check
* fix
* fix
* add the missing marker
* bug fix
* add one more
* remove order
* add one more
2024-08-06 11:18:58 +01:00
amyeroberts
7e5d46ded4
Respect the config's attn_implementation if set ( #32383 )
...
* Respect the config's attn if set
* Update test - can override in from_config
* Fix
2024-08-05 16:33:19 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens
for luke and mluke models ( #32413 )
...
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize ( #32242 )
...
* fix: persist embedding type of MBartConditonalGeneration after resize
* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Ita Zaporozhets
3d7c2f9dea
#32184 save total_vocab_size ( #32240 )
...
* save total_vocab_size = vocab_size + user added tokens to speed up operation
* updating length when added_tokens_decoder is set
* add test len(tokenizer)
2024-08-05 09:22:48 +02:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 ( #32388 )
...
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder ( #32088 )
...
* fix: SeamlessM4TFeatureExtractor stride remainder
* Added attention mask size test
* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
Joao Gante
083e13b7c4
RoPE: Add numerical tests ✨ ( #32380 )
...
tests! :D
2024-08-02 09:39:45 +01:00
Zach Mueller
82efc53513
Yell at the user if zero-3 init wasn't performed, but expected to have been done ( #32299 )
...
* Test this zach
* Test for improper init w/o zero3
* Move back
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Get rid of stars in warning
* Make private
* Make clear
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-01 15:18:43 -04:00
OsamaS99
51ab25e293
Fixed Hybrid Cache Shape Initialization. ( #32163 )
...
* fixed hybrid cache init, added test
* Fix Test Typo
---------
Co-authored-by: Aaron Haag <aaron.haag@siemens.com>
2024-08-01 13:57:42 +01:00
Nikos Karampatziakis
ca59d6f77c
Offloaded KV Cache ( #31325 )
...
* Initial implementation of OffloadedCache
* enable usage via cache_implementation
* Address feedback, add tests, remove legacy methods.
* Remove flash-attn, discover synchronization bugs, fix bugs
* Prevent usage in CPU only mode
* Add a section about offloaded KV cache to the docs
* Fix typos in docs
* Clarifications and better explanation of streams
2024-08-01 14:42:07 +02:00
Omar Salman
b4727a1216
Fix conflicting key in init kwargs in PreTrainedTokenizerBase ( #31233 )
...
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase
* Update code to check for callable key in save_pretrained
* Apply PR suggestions
* Invoke CI
* Updates based on PR suggestion
2024-08-01 14:32:13 +02:00
Ita Zaporozhets
2229ebe722
update clean_up_tokenization_spaces warning ( #32371 )
2024-08-01 13:57:41 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 ( #32339 )
...
* Remove size check between attn_weights and kv_seq_len
* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding ( #31772 )
...
* [whisper] compile compatibility with long-form decoding
* clarify comment
* fix after rebase
* finalise
* fix bsz
* fix cache split
* remove contiguous
* style
* finish
* update doc
* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models ( #32227 )
...
* draft
* apply changes to all relevant archs
* rerun ci - check_docstrings.py failing?
* fix docstring
* move 2D->4D mask creation to modeling file
* repo consistency
* fix the batch size = 1 case - calling contiguous is not enough
* nit
* style
* propagate to gemma/gemma-2
* prepare inputs for gemma generation
* implement test and tiny fix in gemma2
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* ci pass
* fix gemma's test_compile_static_cache tests
* flacky
* retrigger ci
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer ( #32275 )
...
* Fix FA2 call for Perciever layer
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
* Fix up
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq
assignment ( #32330 )
...
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention ( #32188 )
...
* enable flash-attn & static cache
* this works, not the prev
* fix for sliding window layers
* not needed anymore
2024-07-31 10:33:38 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process ( #32191 )
...
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
2024-07-30 23:36:38 +02:00
Guang Yang
811a9caa21
Make static cache compatible with torch.export ( #32168 )
2024-07-29 18:19:15 +01:00
Sanchit Gandhi
7f5d644e69
[pipeline] fix padding for 1-d tensors ( #31776 )
...
* [pipeline] fix padding for 1-d tensors
* add test
* make style
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
---------
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>
2024-07-29 21:24:42 +08:00
Kamil Akesbi
3fbaaaa64d
Whisper tokenizer word level timestamps ( #32197 )
...
* fix _fix_key in PreTrainedModel
* fix _find_longest_common_sequence
* add test
* remove result.json
* nit
* update test
2024-07-29 11:19:52 +01:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation ( #30788 )
...
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Raushan Turganbay
f739687684
🚨 Bloom support for cache class ( #31445 )
...
* bloom dynamic cache
* bloom follows standard cache format
* no skips for bloom anymore
* use cache position when possible
* clean up
* codestyle
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* pr comments
* isinstance fix
* address comments
* make musicgen test happy
* [run-slow] bloom
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-29 10:58:59 +05:00
Raushan Turganbay
81233c069c
Flash-Attn: fix generation when no attention mask or no pading ( #32241 )
...
* fix
* fix prev test (half of failures)
* [run-slow] llama, gemma2
* [run-slow] llama, gemma2
2024-07-26 14:45:55 +05:00
Fanli Lin
27c7f971c0
[tests] fix static
cache implementation is not compatible with attn_implementation==flash_attention_2
( #32039 )
...
* add flash attention check
* fix
* fix
2024-07-26 11:41:27 +02:00
Sai-Suraj-27
b8e5cd5396
Refactor: Removed un-necessary object
base class ( #32230 )
...
* Refactored to remove un-necessary object base class.
* small fix.
2024-07-26 10:33:02 +02:00
Raushan Turganbay
fad15fba78
Llava: generate without images ( #32183 )
...
* llava w/o images
* tests
2024-07-26 10:17:27 +05:00
Raushan Turganbay
4ab33c2d81
Generation: stop at eos
for assisted decoding ( #31301 )
...
* fix
* move changes to prompt lookup
* add test
* set eos in assistant model
* style
* fix flakiness
* changes for new `main`
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add comment to explain
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-26 10:16:06 +05:00
Yih-Dar
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-25 16:12:23 +02:00
Kashif Rasul
de2318894e
[warnings] fix E721 warnings ( #32223 )
...
fix E721 warnings
2024-07-25 15:12:23 +02:00
Sanchit Gandhi
5658e749ad
[whisper] fix short-form output type ( #32178 )
...
* [whisper] fix short-form output type
* add test
* make style
* update long-form tests
* fixes
* last fix
* finalise test
2024-07-25 16:58:02 +08:00
Sai-Suraj-27
85a1269e19
fix: Replaced deprecated unittest method
with the correct one ( #32198 )
...
Replaced deprecated unittest method with the correct one.
2024-07-24 18:00:21 +01:00
Matt
edd68f4ed8
🚨 No more default chat templates ( #31733 )
...
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
2024-07-24 17:36:32 +01:00
Penut Chen
1c122a46dc
Support dequantizing GGUF FP16 format ( #31783 )
...
* support gguf fp16
* support gguf bf16 with pytorch
* add gguf f16 test
* remove bf16
2024-07-24 17:59:59 +02:00
Joao Gante
e0182f3bd7
RoPE: relaxed rope validation ( #32182 )
...
* relaxed rope check
* lets also accept rope_type=None, defaulting to the original implementation
* type and rope_type can coexist
2024-07-24 15:00:48 +01:00
amyeroberts
165116bc14
Remove conversational pipeline tests ( #32099 )
...
Remove conversation pipeline tests
2024-07-24 14:03:40 +01:00
Sai-Suraj-27
d2c687b3f1
Updated ruff
to the latest version ( #31926 )
...
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
2024-07-23 17:07:31 +02:00
RhuiDih
9cf4f2aa9a
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs ( #31629 )
...
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
2024-07-23 15:56:41 +02:00
Sanchit Gandhi
3263b34354
Revert "Incorrect Whisper long-form decoding timestamps " ( #32148 )
...
Revert "Incorrect Whisper long-form decoding timestamps (#32003 )"
This reverts commit cd48553fc8
.
2024-07-23 18:34:30 +08:00
Amit Garg
034b477847
Rename Phi-3 rope scaling type ( #31436 )
...
* renamed phi3 rope_scaling type
* fixed trailing whitespaces
* fixed test
* added warning
* fixed format
2024-07-23 12:33:22 +02:00
Merve Noyan
9ced33ca7f
Fix video batching to videollava ( #32139 )
...
---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
2024-07-23 13:23:23 +03:00
Ita Zaporozhets
a1844a3209
gguf conversion add_prefix_space=None for llama3 ( #31937 )
...
* gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test
* typo
* clean test
2024-07-23 11:45:54 +02:00
Joao Gante
2e113422b3
Llama: RoPE refactor ( #32135 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-23 10:42:55 +01:00
bayllama
5a4a76edb7
Modify resize_token_embeddings to ensure output type is same as input ( #31979 )
...
* Change resize_token_embeddings to make it return same Class that is passed to it
* Add explanatory comment as requested in review
* Add explanatory comments for add resizing function in lxmert
* Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining
---------
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net>
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>
2024-07-23 10:28:44 +01:00
mig-mfreitas
34b43211d7
Add YaRN and Dynamic-YaRN RoPE Scaling Methods ( #30910 )
...
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071 .
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-23 10:07:58 +01:00
Anton Vlasjuk
605f3245dc
Fix mask creations of GPTNeoX
and GPT2
( #31944 )
...
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
2024-07-23 10:11:12 +02:00
Sanchit Gandhi
f83c6f1d02
Remove trust_remote_code
when loading Libri Dummy ( #31748 )
...
* [whisper integration] use parquet dataset for testing
* propagate to others
* more propagation
* last one
2024-07-23 14:54:38 +08:00
Raushan Turganbay
3aefb4ec7f
LLaVaNeXT: pad on right if training ( #32134 )
...
* pad on right if training
* docs
* add tests
2024-07-23 10:23:55 +05:00
Marc Sun
96a074fa7e
Add new quant method ( #32047 )
...
* Add new quant method
* update
* fix multi-device
* add test
* add offload
* style
* style
* add simple example
* initial doc
* docstring
* style again
* works ?
* better docs
* switch to non persistant
* remove print
* fix init
* code review
2024-07-22 20:21:59 +02:00
amyeroberts
817a676bd7
Don't default to other weights file when use_safetensors=True ( #31874 )
...
* Don't default to other weights file when use_safetensors=True
* Add tests
* Update tests/utils/test_modeling_utils.py
* Add clarifying comments to tests
* Update tests/utils/test_modeling_utils.py
* Update tests/utils/test_modeling_utils.py
2024-07-22 18:29:50 +01:00
Yoni Gottesman
74d0eb3fed
Return assistant generated tokens mask in apply_chat_template ( #30650 )
...
return assistant generated tokens mask in apply_chat_template
2024-07-22 18:24:43 +01:00
Sai-Suraj-27
12b6880c81
fix: Fixed raising TypeError
instead of ValueError
for invalid type ( #32111 )
...
* Raised TypeError instead of ValueError for invalid types.
* Updated formatting using ruff.
* Retrieved few changes.
* Retrieved few changes.
* Updated tests accordingly.
2024-07-22 17:46:17 +01:00
Matt
7ba028fccb
Fix failing test with race condition ( #32140 )
...
* Fix failing test with race condition
* make fixup
* monotonic_ns instead of randint
* uuid4 instead of monotonic_ns
* Add a finally cleanup step
2024-07-22 16:07:29 +01:00
Lucain
f2a1e3ca68
Mention model_info.id instead of model_info.modelId ( #32106 )
2024-07-22 14:14:47 +01:00
Sai-Suraj-27
0fcfc5ccc9
fix: Replaced deprecated mktemp()
function ( #32123 )
...
Replaced deprecated mktemp function.
2024-07-22 14:13:39 +01:00
Joao Gante
c38c55f4fb
Generate: store special token tensors under a unique variable name ( #31980 )
...
* rename stuff
* english; this one shouldn't be changed
* add a _ to the new var names
* musicgen
* derp
2024-07-22 14:06:49 +01:00
Aymeric Roucher
b381880597
Agents planning ( #31702 )
...
* Allow planning for agents
2024-07-22 10:49:57 +02:00
Lucain
0fdea8607d
Fix tests after huggingface_hub
0.24 ( #32054 )
...
* adapt tests
* style
* comment
2024-07-19 19:32:39 +01:00
Kamil Akesbi
89575b567e
Support generating with fallback for short form audio in Whisper ( #30984 )
...
* remove is_shortform
* adapt _retrieve_max_frames_and_seek for short_form
* return bos token in short and long form
* add decoder_input_ids to short form audios
* add eos token for short form
* handle short form token_timestamps
* no need to return scores
* add is_shortform conditions
* handle when max_new_tokens is None - short form
* handle assistant decoding
* fix
* handle return_dict_in_generate
* handle split_by_batch for encoder_attentions attribute
* handle num_beams>1
* handle num_return_sequences>1 in generate_with_fallback
* handle num_return_sequences>1 with return_dict_in_generate=True
* raise error if max_new_tokens + decoder_inputs_ids > max_target_pos
* fix
* apply review suggestions
* fix
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fix
* logits for both short form and long form
* handle if logits_processor is None
* test
* apply review changes to num_return_sequences
* add _expand_variables_for_generation
* remove short form commented section
* update comments
* uncomment num_beams line in generate_with_fallback
* update assistant decoding
* handle return_segment with short form generation
* up
* fix output format is_shortform
* overwrite beam_sample test
* update _set_return_timestamps
* apply review suggestions
* apply review suggestions
* remove seek_outputs_short_form
* fix _stack_split_outputs
* fix stack dim in _stack_split_outputs
* update tests
* fix past_key_values + beam tests
* fix
* clean _expand_variables_for_generation
* make style
* fix slow tests
* make style
* max_length condition
* make style
* add slow tests for shortform fallback
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* apply review changes
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* up
* fix slow tests
* apply review suggestions
* update test
* make style
* small fix
* fix
* fix test_new_cache_format
* fix past_key_values
* fix
* make style
* fix slow tests
* fix
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-19 13:42:22 +01:00
Kamil Akesbi
cd48553fc8
Incorrect Whisper long-form decoding timestamps ( #32003 )
...
* fix lo form timestamps in decode_batch
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* add test
* make style
* fix copies
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/processing_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply review suggestions
* fix
* fix copies
* fix
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix-copies
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-19 09:26:38 +01:00
Raushan Turganbay
b873234cb6
Llava: add default chat templates ( #31691 )
...
* add default chat templates
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* more clear docstring and docs
* Update docs/source/en/model_doc/llava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* add tests
* remove default templates (see #31733 )
* load chat template from another file
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* revert some changes in docs
* forgot vipllava
* chat template file is not temporary hack
* warn if loading from processor
* not that file
* similarly modify `save_pretrained`
* Update tests/models/llava_next/test_processor_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_processor_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2024-07-19 10:08:56 +05:00
Longjie Zheng
c75969ee28
Add torch.compile Support For Mamba ( #31247 )
...
* modify mamba cache
* set up cache
* add test
* [run-slow] mamba
* [run-slow] mamba
* address comments
* [run-slow] mamba
* use_cache_position
* [run-slow] mamba
* [run-slow] mamba
* [run-slow] mamba
* [run-slow] mamba
* fix
* cache in generate
* [run-slow] mamba
* address comments
* [run-slow] mamba
* [run-slow] mamba
* address comments
* [run-slow] mamba
* fix
* [run-slow] mamba
* fix
* [run-slow] mamba
* fix cache name
* [run-slow] mamba
2024-07-18 11:54:54 -04:00
Raushan Turganbay
673d30b826
Chameleon: minor fixes after shipping ( #32037 )
...
* fix merging
* make chameleon conditional
2024-07-18 16:54:07 +05:00
Pavel Iakubovskii
1c37e8c1a6
Add sdpa
and FA2 for CLIP ( #31940 )
...
* Squashed commit of the following:
commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:23:52 2024 +0000
Add model-specific sdpa tests
commit 60e4c88581abf89ec098da84ed8e92aa904c997d
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:20:53 2024 +0000
Add fallback to eager (expensive operation)
commit c29033d30e7ffde4327e8a15cbbc6bee37546f80
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Thu Jul 11 17:09:55 2024 +0000
Fix attn_implementation propagation
commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:05:27 2024 +0530
style
commit e77e703ca75d00447cda277eca6b886cd32bddc0
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:04:57 2024 +0530
add comment to explain why I had to touch forbidden codebase.
commit ab9d8849758e7773a31778ccba71588d18552623
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:03:02 2024 +0530
fix: flax attribute access.
commit c570fc0abf9d1bd58c291aae3c7e384f995996d2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 08:23:54 2024 +0530
fix tensorflow attribute name.
commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:57:10 2024 +0530
fix attribute access.
commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:44:02 2024 +0530
_from_config.
commit 35aed64ff602422adcf41d7f677a0a24bd9eccae
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 18:46:52 2024 +0530
propagation of attn_implementation.
commit 4c25c19845438b1dc1d35a5adf9436151c8c5940
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:24:36 2024 +0530
style again
commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:19:05 2024 +0530
use from_config.
commit b70c409956d0359fa6ae5372275d2a20ba7e3389
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:13:43 2024 +0530
quality
commit a7b63beff53d0fc754c6564e2a7b51731ddee49d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 14:35:10 2024 +0200
add benchmark numbers
commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:16 2024 +0200
Revert "reflect feedback more"
This reverts commit dc123e71ef
.
commit ca674829d28787349c2a9593a14e0f1d41f04ea4
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:05 2024 +0200
Revert "fix"
This reverts commit 37a1cb35b8
.
commit fab2dd8576c099eb1a3464958cb206a664d28247
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:47:46 2024 +0200
fix
commit fbc6ae50fd6f2d36294d31e191761631b701d696
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:38:30 2024 +0200
reflect feedback more
commit 87245bb020b2d60a89afe318a951df0159404fc9
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 08:54:34 2024 +0530
fixes
commit 1057cc26390ee839251e7f8b3326c4207595fb23
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:49:03 2024 +0530
don't explicit set attn_implementation in tests
commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:43:54 2024 +0530
explicitly override attn_implementation in the towers.
commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:38:42 2024 +0530
import in one-line.
commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:34:58 2024 +0530
move sdpa mention to usage tips.
commit 92884766c64dbb456926a3a84dd427be1349fa95
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 10:58:26 2024 +0530
fix: memory allocation problem.
commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 09:56:59 2024 +0530
fix-copies
commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:16:12 2024 +0530
address arthur's comments.
commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:08:15 2024 +0530
Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 46e04361f37ded5c522ff05e9f725b9f82dce40e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:55:27 2024 +0530
add to docs.
commit 831629158ad40d34d8983f209afb2740ba041af2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:33:10 2024 +0530
styling.g
commit d263a119c77314250f4b4c8469caf42559197f22
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:15:20 2024 +0530
up
commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 18:40:42 2024 +0530
handle causal and attention mask
commit 122f1d60153df6666b634a94e38d073f3f260926
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 15:18:21 2024 +0530
test fixes.
commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 09:39:25 2024 +0530
fix: scaling inside sdpa.
commit 0f629989efc48b7315cf19405a81e02955efe7e5
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Tue Apr 23 08:14:58 2024 +0530
Update src/transformers/models/clip/modeling_clip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 14367316877dc27ea40f767ad1aee38bbc97e4ce
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 22 16:21:36 2024 +0530
add: sdpa support to clip.
* Remove fallback for empty attention mask (expensive operation)
* Fix typing in copies
* Add flash attention
* Add flash attention tests
* List CLIP in FA docs
* Fix embeddings attributes and tf
* [run-slow] clip
* Update clip documentation
* Remove commented code, skip compile dynamic for CLIPModel
* Fix doc
* Fix doc 2
* Remove double transpose
* Add torch version check for contiguous()
* Add comment to test mixin
* Fix copies
* Add comment for mask
* Update docs
* [run-slow] clip
2024-07-18 10:30:37 +05:30
Robin Bakker
b31d595040
Add language to word timestamps for Whisper ( #31572 )
...
* add language to words
_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information
* ran style checks
added missing comma
* add new language test
test that the pipeline can return both the language and timestamp
* remove model configuration in test
Removed model configurations that do not influence test results
* remove model configuration in test
Removed model configurations that do not influence test results
2024-07-17 21:32:53 +01:00
Sai-Suraj-27
72fb02c47d
Fixed log messages
that are resulting in TypeError due to too many arguments ( #32017 )
...
* Fixed log messages that are resulting in TypeErrors due to too many arguments.
* Removed un-necessary imports.
2024-07-17 10:56:44 +01:00
Pavel Iakubovskii
691586b0dc
Fix tests skip ( #32012 )
...
* [run-slow] clip
* [run-slow] clip
* Fix skip -> skipTest
* [run-slow] clip
2024-07-17 08:37:43 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model ( #31534 )
...
* Chameleon model integration
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
* fix 7B, again. mask away image tokens
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove pretrained_config_map
* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file
* remove tokenizer (use llama's); remove codechameleon tests
* a few copied from statements and minor changes
* copied from in ChameleonModel
* some copies in ChameleonForCausalLM
* a few more copies
* VQModel moved to ChameleonModel (as opposed to being in the processor)
* ChameleonProcessor ready
* Fix chameleon weights convert
* update conversion script
* clean-up processing
* update modeling a bit
* update
* update (throws error...)
* correct conversion ready
* fix tests
* fix docs
* docs
* ve swin norm
* fix device for vocab map
* add normalization
* update
* update script with rope rotations
* final fix on model conversion
* add slow tests
* more info in docs
* fix repo consistency tests
* fix repo tests
* fix-copies
* hope this will make CI happy
* fix for 30b model
* Update docs/source/en/index.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments
* remove assertion in conversion script
* add image processor test
* not copied
* port changes for qk layernorm
* fix-copies
* read token decorator for tests
* [run-slow] chameleon
* one more read-token
* address some comments
* qk norm changes
* tests and repo check
* moved rope permutations to conversion, YAY!
* fix past kv check
* docs
* layernorm done!
* let's be consistent in naming
* fix slow tests
* weird thing with slow CI, but let's see
* once more try
* remove past-kv as tuple following llama
* ignore
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Joao Gante
999981daf4
Tests: remove cuda versions when the result is the same 🧹 🧹 ( #31955 )
...
remove cuda versions when the result is the same
2024-07-16 16:49:54 +01:00
Zach Mueller
693cb828ff
Fix bad test about slower init ( #32002 )
...
Bronked main
2024-07-16 10:33:05 -04:00
Fanli Lin
25e5e3fa56
[tests] fix deepspeed zero3 config for test_stage3_nvme_offload
( #31881 )
...
fix config
2024-07-16 16:11:37 +02:00
Zach Mueller
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) ( #31771 )
...
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00
Penut Chen
ac946aac25
Fix the incorrect permutation of gguf ( #31788 )
...
* Fix the incorrect permutation of gguf
* rename num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add typing to num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* rename variables
* refactor permute function name
* update the expected text of the llama3 q4 test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-07-16 08:20:34 +02:00
Joao Gante
e4682de635
Masking: remove flakiness from test ( #31939 )
2024-07-15 18:49:37 +01:00
Yih-Dar
a1a34657d4
Avoid race condition ( #31973 )
...
* [test_all] hub
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-15 17:56:24 +02:00
Joao Gante
739a63166d
Generate: remove deprecated code due to Cache
and cache_position
being default ( #31898 )
...
* tmp commit
* shorter
* nit
* explicit kwargs
* propagate changes
* mass propagation with a few manual touches (let's see how CI behaves)
* fix cacheless case
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* make fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-14 15:16:58 +01:00
Aviv Shamsian
7f79a97399
fix prompt strip to support tensors and np arrays ( #27818 )
...
* fix prompt strip to support tensors and np arrays
* framework agnostic
* change logic check before converting prompt into list
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding _convert_to_list to tokenization_whisper_fast
* adding tests for prompt decoding
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* revert minor
* make style formatting
* style formatting after update
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixing _strip_prompt to handle _decode_with_timestamps
* fix copies
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-12 20:07:10 +01:00
Naman Garg
c1e139c2b0
Adding hiera ( #30356 )
...
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* Removed tim dependency
* added HieraBlock
* fixed: Model name
* added tests for HieraModel, HieraBlock
* fixed imports
* fixed quality & copies
* Fixes
* Update docs/source/en/model_doc/hiera.md
Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed formatting
* Code quality & Import differences
* quality and repo-consistency fix
* fixed no torch error
* Docstring fix
* Docstring fix
* doc string fix
* fixed example usage
* Resolved issues in modeling_hiera
* Removed Hiera MAE
* Added test and resolved bug
* fixed doc string
* First commit
* Finished conversion script and model forward working
* Resolved all issues
* nits
* Improving tests
* Nits
* More nits
* Improving HieraForMaskedImageModeling
* More improvements and nits
* Fixed docstrings of outputs
* More fixes
* More imrpovments
* Updated conversion script
* Fixed docstrings
* Improved tests
* Fixed attentou outputs test
* All tests green
* Removed unnecessary file
* contribution attribution
* Resolved a few issues
* Resolved Comments
* Updated model repo id and fixed bugs
* Removed loss print
* Make tests green
* Updated docstrings
* Fix style
* Fixed num_heads in config
* Removed unnecessary video checkpoint related code in the conversion script
* Fix style
* Changed atol in conversion script
* HieraConfig
* Fix copies
* Fixed typo
* Resolved few issues
* make
* converted conv_nd -> nn.Module
* Removed video complexities
* Removed video complexities
* fix style
* Addressing comments
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix style
* Fixed tests
* Fixed typo
* Fixed interpolate test
* Made torch fx compatible
* Made sure imageprocesor is correct
* Addressed comments
* Noise directly as torch
* Remove unnecesary attr
* Added return_dit
* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated checkpoints
* [run_slow] hiera
* Fixed device mismatch
* [run_slow] hiera
* Fixed GPU tests
* [run_slow] hiera
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
fxmarty
ad4ef3a290
Fix fx tests with inputs_embeds ( #31862 )
...
* fix tests
* [test_all] check
* address review comments
2024-07-11 20:14:03 +08:00
Omar Salman
1499a55008
Add warning message for beta and gamma parameters ( #31654 )
...
* Add warning message for and parameters
* Fix when the warning is raised
* Formatting changes
* Improve testing and remove duplicated warning from _fix_key
2024-07-11 13:01:47 +01:00
Sai-Suraj-27
2e48b3e872
fix: Fixed the 1st argument
name in classmethods ( #31907 )
...
Fixed the first argument name in few classmethods.
2024-07-11 12:11:50 +01:00
Yih-Dar
080e14b24c
Modify warnings
in a with
block to avoid flaky tests ( #31893 )
...
* fix
* [test_all] check before merge
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-10 17:56:12 +02:00
Sai-Suraj-27
da79b18087
fix: Removed duplicate
field definitions in some classes ( #31888 )
...
Removed duplicate field definitions in classes.
2024-07-10 13:46:31 +01:00
Yih-Dar
9d98706b3f
Fix failed tests in #31851 ( #31879 )
...
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868 )"
This reverts commit b45dd5de9c
.
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
* fix
* [test_all] check
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-10 14:25:24 +02:00
Yih-Dar
b45dd5de9c
Revert "Fix _init_weights
for ResNetPreTrainedModel
" ( #31868 )
...
Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851 )"
This reverts commit 4c8149d643
.
2024-07-09 23:00:56 +02:00
Yih-Dar
4c8149d643
Fix _init_weights
for ResNetPreTrainedModel
( #31851 )
...
* init
* test
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-09 20:09:08 +02:00
Yung-Sung Chuang
d094d8d9ec
Generate: Add new decoding strategy "DoLa" in .generate()
( #29619 )
...
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-09 17:37:38 +01:00
Joao Gante
4c2538b863
Test loading generation config with safetensor weights ( #31550 )
...
fix test
2024-07-09 16:22:43 +02:00
fxmarty
0abf5e8eae
FX symbolic_trace: do not test decoder_inputs_embeds ( #31840 )
...
only test input_embeds, not decoder_input_embeds
2024-07-09 08:07:46 +02:00
Yih-Dar
4879ac2b33
Avoid failure TFBlipModelTest::test_pipeline_image_to_text
( #31827 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-08 13:49:21 +02:00
fxmarty
ba743700f4
transformers.fx.symbolic_trace supports inputs_embeds ( #31574 )
...
* symbolic trace supports inputs_embeds
* fix test?
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-08 19:17:28 +08:00
Pavel Iakubovskii
a177821b24
Add FA2 and sdpa
support for SigLIP ( #31499 )
...
* Rebase to main
* Fix attention implementation autoset for tex and vision configs
* Fixup
* Minor fixes
* Fix copies
* Fix attention_mask for FA2
* Add eqvivalence tests for siglip
* Remove right padding test
* Uncomment flaky
* Fix import
* Add to docs
* Fix test message
* Add sdpa
* Add sdpa equivalence test
* Add siglip sdpa to docs
* Fix typing for attention output
* Add sdpa tests
* Fix signature of FA2
* Autoset attn_implementation in config
* Rename bsz -> batch_size
* Move back autoset attn method
* Mark as flaky
* Correct attention mask padding
* [run-slow] siglip
* Add FA2 and sdpa docs
* Style fix
* Remove flaky for FA2 test
* Change attention implementation set
* Change attn_implementaiton propogation
* Fix typos
* Add modality to assert message
* Add more sdpa backends in test
* [run slow] siglip
* Add math sdpa backend for all options
* [run slow] siglip
2024-07-08 11:10:02 +01:00
NielsRogge
06fd7972ac
Add ZoeDepth ( #30136 )
...
* First draft
* Add docs
* Clean up code
* Convert model
* Add image processor
* Convert Zoe_K
* More improvements
* Improve variable names and docstrings
* Improve variable names
* Improve variable names
* Replace nn.sequential
* More improvements
* Convert ZoeD_NK
* Fix most tests
* Verify pixel values
* Verify pixel values
* Add squeeze
* Update beit to support arbitrary window sizes
* Improve image processor
* Improve docstring
* Improve beit
* Improve model outputs
* Add figure
* Fix beit
* Update checkpoint
* Fix repo id
* Add _keys_to_ignore_on_load_unexpected
* More improvements
* Address comments
* Address comments
* Address comments
* Address comments
* Rename variable name
* Add backbone_hidden_size
* Vectorize
* Vectorize more
* Address comments
* Clarify docstring
* Remove backbone_hidden_size
* Fix image processor
* Remove print statements
* Remove print statement
* Add integration test
* Address comments
* Address comments
* Address comments
* Address comments
* Add requires_backends
* Clean up
* Simplify conversion script
* Simplify more
* Simplify more
* Simplify more
* Clean up
* Make sure beit is loaded correctly
* Address comment
* Address bin_configurations
* Use bin_configurations
* Convert models, add integration tests
* Fix doc test
* Address comments
* Unify regressor classes
* Clarify arguments
* Improve resize_image
* Add num_relative_features
* Address comment
* [run-slow]beit,data2vec,zoedepth
* [run-slow]beit,data2vec,zoedepth
* Address comments
* Address comment
* Address comment
* Replace nn.TransformerEncoderLayer and nn.TransformerEncoder
* Replace nn.MultiheadAttention
* Add attributes for patch transformer to config
* Add tests for ensure_multiple_of
* Update organization
* Add tests
* [run-slow] beit data2vec
* Update ruff
* [run-slow] beit data2vec
* Add comment
* Improve docstrings, add test
* Fix interpolate_pos_encoding
* Fix slow tests
* Add docstring
* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Improve tests and docstrings
* Use run_common_tests
* Improve docstrings
* Improve docstrings
* Improve tests
* Improve tests
* Remove print statements
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-08 11:43:33 +02:00
Anton Vlasjuk
a01b033cb4
Fix galore lr display with schedulers ( #31710 )
...
* fix galore lr display with lr schedulers
* style
* add some tests to check for displayed lrs
* copy-paste err for warmup steps
* standardize the default lr to be only in the optimizer
* trying out my luck with the reads
2024-07-05 18:59:09 +01:00
Billy Cao
ac26260436
Allow FP16 or other precision inference for Pipelines ( #31342 )
...
* cast image features to model.dtype where needed to support FP16 or other precision in pipelines
* Update src/transformers/pipelines/image_feature_extraction.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use .to instead
* Add FP16 pipeline support for zeroshot audio classification
* Remove unused torch imports
* Add docs on FP16 pipeline
* Remove unused import
* Add FP16 tests to pipeline mixin
* Add fp16 placeholder for mask_generation pipeline test
* Add FP16 tests for all pipelines
* Fix formatting
* Remove torch_dtype arg from is_pipeline_test_to_skip*
* Fix format
* trigger ci
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-05 17:21:50 +01:00
Billy Cao
1d3eaa6f7e
Add training support for SigLIP ( #31495 )
...
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
2024-07-05 14:50:39 +01:00
Aymeric Roucher
1556025271
Code agent: allow function persistence between steps ( #31769 )
...
* Code agent: allow function persistence between steps
2024-07-05 11:09:11 +02:00
Yih-Dar
eef0507f3d
Fix gemma tests ( #31794 )
...
* skip 3 7b tests
* fix
* fix
* fix
* [run-slow] gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-05 10:17:59 +02:00
Marc Sun
8c5c180de0
Fix serialization for offloaded model ( #31727 )
...
* Fix serialization
* style
* add test
2024-07-05 08:07:07 +02:00
Pavel Iakubovskii
048f599f35
Fix RT-DETR weights initialization ( #31724 )
...
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
2024-07-03 14:29:02 +01:00
Pavel Iakubovskii
b97521614a
Fix RT-DETR cache for generate_anchors ( #31671 )
...
* Fix cache and type conversion
* Add test
* Fixup
* nit
* [run slow] rt_detr
* Fix test
* Fixup
* [run slow] rt_detr
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
2024-07-03 14:19:57 +01:00
Joao Gante
ddfaf11926
Gemma 2: Update slow tests ( #31759 )
...
gemma 2 slow tests
2024-07-03 11:43:44 +02:00
jiqing-feng
7f91f168a1
fix assisted decoding ( #31401 )
...
* fix assisted decoding
* check None
* fix typo
* fix _prepare_special_tokens
* fix style
* fix lint
* add tests for assisted decoding
* fix style
* fix tests check
2024-07-03 09:22:56 +01:00
Matt
cd0935dd55
Make tool JSON schemas consistent ( #31756 )
...
Make the order of array items consistent using sorted()
2024-07-02 20:00:42 +01:00
Joao Gante
82486e5995
🚨 🚨 TextGenerationPipeline: rely on the tokenizer default kwargs ( #31747 )
...
* rely on the tokenizer default kwargs
* fix a few tests
2024-07-02 16:17:42 +02:00
Sanchit Gandhi
a9701953ff
[whisper] static kv cache ( #31166 )
...
* make work with cache abstraction
* correct for static cache
* hacks for compile
* make fast
* fix
* fix pos ids
* generate
* fix sdpa
* fix sdpa cache pos
* fix fa2
* clean fa2
* integrate cache into generate
* make style
* copies
* more copies
* update eager
* update sdpa
* update fa2
* simplify
* use cache pos
* always compute cross-cache for debug
* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>
* fix fix
* fix fix fix
* more fix
* try encoder-decoder cache (too messy)
* revert encoder-decoder cache
* check cross-attn cache
* use enc-dec dataclass
* use richer enc-dec dataclass
* clean-up
* revert static cache changes
* small fixes
* revert to cpu flag
* fix copies
* add static slow test
* past k/v docstring
* more docstrings
* cache_position docstrings
* add to docs
* add enc-dec cache to docs
* make style
* fix after rebase
* fix beam
* style
* fix generation strategies
* fix most decoder-only tests
* style
* skip test
* more clean up
* small docstrings
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add todo
* only crop self-attn
* check cache in mixin
* style
* fix re-compile after rebase
* move `is_updated` logic to enc-dec wrapper
* revert back
* revert cache back
* finalise design
* fix
* fix fix
* style
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* deprecate
* updates
* final updates
* style
* style
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-02 13:24:15 +01:00
Yih-Dar
93cd94b79d
Move some test files (tets/test_xxx_utils.py
) to tests/utils
( #31730 )
...
* move
* move
* move
* move
* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-02 13:46:03 +02:00
Sangbum Daniel Choi
cb298978ad
add gather_use_object arguments ( #31514 )
...
* add gather_use_object arguments
* fix name and pass the CI test for Seq2SeqTrainer
* make style
* make it to functools
* fix typo
* add accelerate version:
* adding warning
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* make style
* Update src/transformers/training_args.py
* check function move to initial part
* add test for eval_use_gather_object
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-06-28 13:50:27 +01:00
Jacky Lee
82a1fc7256
Fix return_dict in encodec ( #31646 )
...
* fix: use return_dict parameter
* fix: type checks
* fix: unused imports
* update: one-line if else
* remove: recursive check
2024-06-28 12:18:01 +01:00
Arthur
75a6319864
Fix post gemma merge ( #31660 )
...
* nit
* toctree issue
* protect gemma2 tests as well
* sdpa supported
2024-06-27 17:51:42 +02:00
Arthur
0cf60f13ab
Add gemma 2 ( #31659 )
...
* inital commit
* Add doc
* protect?
* fixup stuffs
* update tests
* fix build documentation
* mmmmmmm config attributes
* style
* nit
* uodate
* nit
* Fix docs
* protect some stuff
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
Sangbum Daniel Choi
be50a0338b
change anchor_image_size None for compatibility ( #31640 )
...
* change anchor_image_size None for compatibility
* make fix-copies
2024-06-27 12:36:55 +01:00
Billy Cao
3a028101e9
[QoL] Allow dtype str for torch_dtype arg of from_pretrained ( #31590 )
...
* Allow dtype str for torch_dtype in from_pretrained
* Update docstring
* Add tests for str torch_dtype
2024-06-27 12:41:49 +02:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Billy Cao
1f9f57ab4c
Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference ( #31589 )
...
* Fix dtype casting in modeling_swin2sr to allow non-FP32 inference
* Fix formattting
* Fix for swinv2 too
* Update src/transformers/models/swin2sr/modeling_swin2sr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/swinv2/modeling_swinv2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add FP16 tests for swin2sr and swinv2
* [run_slow] swin2sr, swinv2
* [run_slow] swin2sr, swinv2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 18:46:48 +01:00
Pablo Montalvo
492ee17ec3
Fix paligemma detection inference ( #31587 )
...
* fix extended attention mask
* add slow test for detection instance
* [run-slow]paligemma
2024-06-26 19:17:09 +02:00
Raushan Turganbay
e71f2863d7
Add LLaVa NeXT Video ( #31252 )
...
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-26 21:52:28 +05:00
Pavel Iakubovskii
b1ec745475
Fix RT-DETR inference with float16 and bfloat16 ( #31639 )
...
* [run_slow] rt_detr
* Fix positional embeddings and anchors dtypes
* [run slow] rt_detr
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 17:50:10 +01:00
Younes Belkada
3f93fd0694
Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP ( #31161 )
...
* fix llama fsdp
* fixup
* adding FSDP tests for CPU offloading
* fixes
* fix tests
* fix tests
* add it for mixtral
* propagate the changes on other models
* Update src/transformers/models/phi/modeling_phi.py
* Delete utils/testing_scripts/fsdp_cpu_offloading.py
Remove script - FSDP + CPU offloading it tested in the test suite
* Delete utils/testing_scripts/dummy_fsdp_config.yml
* Update + add cache_positions docstring
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 14:50:08 +01:00
Anton Vlasjuk
b07770c5eb
[GPT-NeoX
] Add SDPA support ( #31031 )
...
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
2024-06-26 13:56:36 +01:00
amyeroberts
0f67ba1d74
Add ViTImageProcessorFast to tests ( #31424 )
...
* Add ViTImageProcessor to tests
* Correct data format
* Review comments
2024-06-25 13:36:58 +01:00
Raushan Turganbay
fc689d75a0
Add video modality for InstrucBLIP ( #30182 )
...
* squash in single commit
* add docs
* dummy obj
* more changes in diff converter
* tiny fix
* make docs happy
* skip test
* repo consistency tests
* update docstring
* style
* fix tests
* change diff imports
* [run-slow] instructblipvideo
* [run-slow] instructblipvideo
* fix tests and remove logit check
* [run-slow] instructblipvideo
2024-06-25 15:45:39 +05:00
jiqing-feng
a958c4a801
fix output data type of image classification ( #31444 )
...
* fix output data type of image classification
* add tests for low-precision pipeline
* add bf16 pipeline tests
* fix bf16 tests
* Update tests/pipelines/test_pipelines_image_classification.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix import
* fix import torch
* fix style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-25 11:14:39 +01:00
Raushan Turganbay
7e86cb6c6f
Siglip: add _no_split_module
( #31566 )
...
* device-map siglip
* move split modules to PretrainedSigLip
2024-06-25 09:49:55 +05:00
Hiroshi Matsuda
0e23e60a5a
Fix bug about add_special_tokens and so on ( #31496 )
...
* fix bug about add_special_tokens and so on
* improve add_special_tokens and padding behavior
* add a test case for add_special_tokens and padding
2024-06-24 14:05:16 +01:00
Zhiyong Wang
dce253f645
Add implementation of spectrogram_batch
( #27159 )
...
* Add initial implementation of `spectrogram_batch`
* Format the initial implementation
* Add test suite for the `spectrogram_batch`
* Update `spectrogram_batch` to ensure compatibility with test suite
* Update `spectrogram_batch` to include pre and post-processing
* Add `amplitude_to_db_batch` function and associated tests
* Add `power_to_db_batch` function and associated tests
* Reimplement the test suite for `spectrogram_batch`
* Fix errors in `spectrogram_batch`
* Add the function annotation for `spectrogram_batch`
* Address code quality
* Re-add `test_chroma_equivalence` function
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-24 09:19:12 +02:00
Pavel Iakubovskii
3c2d4d60d7
Correct @is_flaky test decoration ( #31480 )
...
* Correct @is_flaky decorator
2024-06-24 08:09:21 +01:00
Sangbum Daniel Choi
74a207404e
New model support RTDETR ( #29077 )
...
* fill out docs string in configuration
75dcd3a0e8 (r1506391856)
* reduce the input image size for the tests
* remove the unappropriate tests
* only 5 failes exists
* make style
* fill up missed architecture for object detection in docs
* fix auto modeling
* simple fix in missing import
* major change including backbone refactor and objectdetectionoutput refactor
* minor fix only 4 fails left
* intermediate fix
* revert __init__.py
* revert __init__.py
* make style
* fixes in pr_docs
* intermediate fix
* make style
* two fixes
* pass doctest
* only one fix left
* intermediate commit
* all fixed
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/rt_detr/test_modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* function class above the model definition in dice_loss
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* simple fix
* layernorm add config.layer_norm_eps
* fix inputs_docstring
* make style
* simple fix
* add custom coco loading test in image_processor
* fix error in BaseModelOutput
https://github.com/huggingface/transformers/pull/29077#discussion_r1516657790
* simple typo
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intermediate fix
* fix with load_backbone format
* remove unused configuration
* 3 fix test left
* make style
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: Sounak Dey <dey.sounak@gmail.com>
* change last_hidden_state to first index
* all pass fix
TO DO: minor update in comments
* make fix-copies
* remove deepcopy
* pr_document fix
* revert deepcopy due to the issue of unexpceted behavior in decoderlayer
* add atol in final
* add no_split_module
* _no_split_modules = None
* device transfer for model parallelism
* minor fix
* make fix-copies
* fix typo
* add test_image_processor with post_processing
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add config in RTDETRPredictionHead
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set lru_cache with max_size 32
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add lru_cache import and configuration change
* change the order of definition
* make fix-copies
* add docs and change config error
* revert strange make-fix
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* test pass
* fix get_clones related and remove deepcopy
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* nit for paper section
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* rename denoising related parameters
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* check the image transformation logic
* make style
* make style
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* pe_encoding -> positional_encoding_temperature
* remove TODO
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* remove eval_idx since transformer DETR is giving all decoder output
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* change variable name
* make style and docs import update
* Revert "Update src/transformers/models/rt_detr/image_processing_rt_detr.py"
This reverts commit 74aa3e1de0
.
* fix typo
* add postprocessing in docs
* move import scipy to top
* change varaible name
* make fix-copies
* remove eval_idx in test
* move to after first sentence
* update image_processor since box loss requires normalized one
* change appropriate name to auxiliary_outputs
* Update src/transformers/models/rt_detr/__init__.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/rt_detr/__init__.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* make style
* remove panoptic related comments
* make style
* revert valid_processor_keys
* fix aux related test
* make style
* change origination from config to backbone API
* enable the dn_loss
* fix test and conversion
* renewal weight initialization
* change initializer_range
* make fix-up
* fix the loss issue in the auxiliary output and denoising part
* change weight loss to original RTDETR
* fix in initialization
* sync shape format of dn and aux
* make style
* stable fine-tuning and compatible conversion for resnet101
* make style
* skip input_embed
* change encoder related variable
* enable converting rtdetr_r101
* add r101 related conversion code
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/image_processing_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change name _shape to _reshape
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* maket style
* make fix-copies
* remove deprecated import
* more fix
* remove last_hidden_state for task-specific model
* Revert "remove last_hidden_state for task-specific model"
This reverts commit ccb7a34051
.
* minore change in convert
* remove print
* make style and fix-copies
* add custom rtdetr backbone for r18, r34
* remove print
* change copied
* add pad_size
* make style
* change layertype to optional to pass the CI
* make style
* add test in modeling_resnet_rt_detr
* make fix-copies
* skip tmp file test
* fix comment
* add docs
* change to modeling_resnet file format
* enabling resnet50 above
* Update src/transformers/models/rt_detr/modeling_rt_detr.py
Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>
* enable all the rtdetr model :)
* finish except CI
* add RTDetrResNetBackbone
* make fix-copies
* fix
TO DO: CI enable
* make style
* rename test
* add docs
* add special fix
* revert resnet
* Update src/transformers/models/rt_detr/modeling_rt_detr_resnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* add more comment
* remove swin comment
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* rename convert and add verify backbone
* Update docs/source/en/_toctree.yml
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/rt_detr.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* make style
* requests for docs
* more general test docs
* general script docs
* make fix-copies
* final commit
* Revert "Update src/transformers/models/rt_detr/configuration_rt_detr.py"
This reverts commit d136225cd3
.
* skip test_model_get_set_embeddings
* remove target
* add changes
* make fix-copies
* remove decoder_attention_mask
* add load_backbone function for auto_backbone
* remove comment
* fix repo name
* Update src/transformers/models/rt_detr/configuration_rt_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* final commit
* remove unused downsample_in_bottleneck
* new test for autobackbone
* change to appropriate indices
* test fix
* fix dict in test_image_processor
* fix test
* [run-slow] rt_detr, rt_detr_resnet
* change the slow test
* [run-slow] rt_detr
* [run-slow] rt_detr, rt_detr_resnet
* make in to same cuda in CSPRepLayer
* [run-slow] rt_detr, rt_detr_resnet
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sounak Dey <dey.sounak@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Jason Wu <jasonkit@users.noreply.github.com>
Co-authored-by: ChoiSangBum <choisangbum@ChoiSangBumui-MacBookPro.local>
2024-06-21 17:50:08 +01:00
Ita Zaporozhets
1e79eade41
SPLIT PR: add user defined symbols and control symbols ( #31305 )
...
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
2024-06-21 01:48:10 -07:00
Yih-Dar
ec905f3a76
unskip 2 tests in cohere ( #31517 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-20 17:21:08 +02:00
Joao Gante
1fd60fec75
RWKV: enable generation tests ( #31490 )
...
* add rwkv tests
* has_attentions set in individual tests
2024-06-20 14:15:01 +01:00
Younes Belkada
6d4306160a
GGUF: Fix llama 3 GGUF ( #31358 )
...
* Create push-important-models.yml
* llama3 support for GGUF
* fixup
* Update src/transformers/integrations/ggml.py
* fix pre-tokenizer
* fix
* fix
* fix
* fix
* fix
* fix
* address final comment
* handle special tokens + add tests
2024-06-20 14:29:58 +02:00
Joao Gante
83259e406d
Mamba: add generative tests ( #31478 )
2024-06-19 10:27:23 +01:00
Fanli Lin
077c139f57
[tests] rename test_config_object
to test_ds_config_object
( #31403 )
...
fix name
2024-06-19 11:19:15 +02:00
amyeroberts
609e662243
Use self.config_tester.run_common_tests() ( #31431 )
...
* First testing updating config tests
* Use run_common_tests
2024-06-19 10:18:08 +01:00
Anton Vlasjuk
b275a41005
[GPT2
] Add SDPA support ( #31172 )
...
* `gpt2` sdpa support
* fix (at least) one test, style, repo consistency
* fix sdpa mask in forward --> fixes generation
* test
* test2
* test3
* test4
* simplify shapes for attn mask creation and small comments
* hub fail test
* benchmarks
* flash attn 2 mask should not be inverted on enc-dec setup
* fix comment
* apply some suggestion from code review
- only save _attn_implentation once
- remove unnecessary comment
* change elif logic
* [run-slow] gpt2
* modify `test_gpt2_sample_max_time` to follow previous assertion patterns
2024-06-19 09:40:57 +02:00
Matt
28316d0e8b
Fix single letter stop strings ( #31448 )
...
* Fix single letter stop strings
* Change the 0 to a 1 to avoid potential empty vector headaches later
* Restructure for clarity
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add the unsqueeze
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-18 14:07:16 +01:00
Aymeric Roucher
b38612d312
Agents: Improve python interpreter ( #31409 )
...
* Improve Python interpreter
* Add with and assert statements
* Prevent overwriting existing tools
* Check interpreter errors are well logged in code agent
* Add lazy evaluation for and and or
* Improve variable assignment
* Fix early return statements in functions
* Add small import fix on interpreter tool
2024-06-18 11:55:36 +02:00
Ella Charlaix
02300273e2
🚨 Remove dataset with restrictive license ( #31452 )
...
remove dataset with restrictive license
2024-06-17 17:56:51 +01:00
Albert Villanova del Moral
a14b055b65
Pass datasets trust_remote_code ( #31406 )
...
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca
.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e
.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
2024-06-17 17:29:13 +01:00
Bastien Le Chenadec
485fd81471
Support multiple validation datasets when dataloader_persistent_workers=True
( #30627 )
...
* Support multiple validation datasets when dataloader_persistent_workers=True
* Test support of multiple validation datasets
2024-06-17 16:58:39 +01:00
Fanli Lin
9454f437b0
[tests] make TestDeepSpeedModelZoo
device-agnostic ( #31402 )
...
* fix
* use accelerator device count
* ci fix
2024-06-17 16:42:57 +02:00
amyeroberts
02c525d226
Rename misnamed image processor test files ( #31430 )
2024-06-17 10:21:28 +01:00
amyeroberts
20812237ce
Remove empty create_and_test_config_common_properties tests ( #31359 )
...
Remove empty tests
2024-06-14 20:15:48 +01:00
Yoach Lacombe
7e1c7dc8b6
Fix SpeechT5 decoder_attention_mask
shape ( #28071 )
...
* Fix SpeechT5
* add test foward with labels and attention mask
* make style
2024-06-14 15:20:11 +02:00
Yoach Lacombe
d9daeff297
Set seed for M4T retain grad test ( #31419 )
2024-06-14 14:48:04 +02:00
Pablo Montalvo
c624d5ba0b
add initial design for uniform processors + align model ( #31197 )
...
* add initial design for uniform processors + align model
* fix mutable default 👀
* add configuration test
* handle structured kwargs w defaults + add test
* protect torch-specific test
* fix style
* fix
* fix assertEqual
* move kwargs merging to processing common
* rework kwargs for type hinting
* just get Unpack from extensions
* run-slow[align]
* handle kwargs passed as nested dict
* add from_pretrained test for nested kwargs handling
* [run-slow]align
* update documentation + imports
* update audio inputs
* protect audio types, silly
* try removing imports
* make things simpler
* simplerer
* move out kwargs test to common mixin
* [run-slow]align
* skip tests for old processors
* [run-slow]align, clip
* !$#@!! protect imports, darn it
* [run-slow]align, clip
* [run-slow]align, clip
* update doc
* improve documentation for default values
* add model_max_length testing
This parameter depends on tokenizers received.
* Raise if kwargs are specified in two places
* fix
* expand VideoInput
* fix
* fix style
* remove defaults values
* add comment to indicate documentation on adding kwargs
* protect imports
* [run-slow]align
* fix
* remove set() that breaks ordering
* test more
* removed unused func
* [run-slow]align
2024-06-13 16:27:16 +02:00
Marc Sun
254b25abd9
Use huggingface_hub helper function to split state dict ( #31091 )
...
* shard saving from hf hub
* index = None
* fix tests
* indent
2024-06-12 14:10:32 +02:00
Jason (Siyu) Zhu
a2ede66674
Add support to declare imports for code agent ( #31355 )
...
* Support import declaration in Code Agent
2024-06-12 09:32:28 +02:00
amyeroberts
f53fe35b29
Fast image processor ( #28847 )
...
* Draft fast image processors
* Draft working fast version
* py3.8 compatible cache
* Enable loading fast image processors through auto
* Tidy up; rescale behaviour based on input type
* Enable tests for fast image processors
* Smarter rescaling
* Don't default to Fast
* Safer imports
* Add necessary Pillow requirement
* Woops
* Add AutoImageProcessor test
* Fix up
* Fix test for imagegpt
* Fix test
* Review comments
* Add warning for TF and JAX input types
* Rearrange
* Return transforms
* NumpyToTensor transformation
* Rebase - include changes from upstream in ImageProcessingMixin
* Safe typing
* Fix up
* convert mean/std to tesnor to rescale
* Don't store transforms in state
* Fix up
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Warn if fast image processor available
* Update src/transformers/models/vit/image_processing_vit_fast.py
* Transpose incoming numpy images to be in CHW format
* Update mapping names based on packages, auto set fast to None
* Fix up
* Fix
* Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test
* Update src/transformers/models/vit/image_processing_vit_fast.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add equivalence and speed tests
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-06-11 15:47:38 +01:00
Matt
edc1dffd00
Chat Template support for function calling and RAG ( #30621 )
...
* First draft, still missing automatic function conversion
* First draft of the automatic schema generator
* Lots of small fixes
* the walrus has betrayed me
* please stop committing your debug breakpoints
* Lots of cleanup and edge cases, looking better now
* Comments and bugfixes for the type hint parser
* More cleanup
* Add tests, update schema generator
* Update tests, proper handling of return values
* Small docstring change
* More doc updates
* More doc updates
* Add json_schema decorator
* Clean up the TODOs and finish the docs
* self.maxDiff = None to see the whole diff for the nested list test
* add import for add_json_schema
* Quick test fix
* Fix something that was bugging me in the chat template docstring
* Less "anyOf" when unnecessary
* Support return types for the templates that need them
* Proper return type tests
* Switch to Google format docstrings
* Update chat templating docs to match new format
* Stop putting the return type in with the other parameters
* Add Tuple support
* No more decorator - we just do it implicitly!
* Add enum support to get_json_schema
* Update docstring
* Add copyright header
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/chat_templating.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add copyright header
* make fixup
* Fix indentation
* Reformat chat_template_utils
* Correct return value
* Make regexes module-level
* Support more complex, multi-line arg docstrings
* Update error message for ...
* Update ruff
* Add document type validation
* Refactor docs
* Refactor docs
* Refactor docs
* Clean up Tuple error
* Add an extra test for very complex defs and docstrings and clean everything up for it
* Document enum block
* Quick test fixes
* Stop supporting type hints in docstring to fix bugs and simplify the regex
* Update docs for the regex change
* Clean up enum regex
* Wrap functions in {"type": "function", "function": ...}
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Temporary tool calling commit
* Add type hints to chat template utils, partially update docs (incomplete!)
* Code cleanup based on @molbap's suggestion
* Add comments to explain regexes
* Fix up type parsing for unions and lists
* Add custom exception types and adjust tests to look for them
* Update docs with a demo!
* Docs cleanup
* Pass content as string
* Update tool call formatting
* Update docs with new function format
* Update docs
* Update docs with a second tool to show the model choosing correctly
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-06-11 15:46:38 +01:00
amyeroberts
a4e1a1d028
🚨 FLAVA: Remove double softmax ( #31322 )
...
Remove double softmax
2024-06-10 15:01:27 +01:00
Yih-Dar
8fff07ded0
Fix Cohere CI ( #31263 )
...
* [run-slow] cohere
* [run-slow] cohere
* [run-slow] cohere
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-10 15:16:58 +02:00
Pavel Iakubovskii
517df566f5
Decorators for deprecation and named arguments validation ( #30799 )
...
* Fix do_reduce_labels for maskformer image processor
* Deprecate reduce_labels in favor to do_reduce_labels
* Deprecate reduce_labels in favor to do_reduce_labels (segformer)
* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)
* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)
* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)
* Fix typo
* Update mask2former test
* fixup
* Update segmentation examples
* Update docs
* Fixup
* Imports fixup
* Add deprecation decorator draft
* Add deprecation decorator
* Fixup
* Add deprecate_kwarg decorator
* Validate kwargs decorator
* Kwargs validation (beit)
* fixup
* Kwargs validation (mask2former)
* Kwargs validation (maskformer)
* Kwargs validation (oneformer)
* Kwargs validation (segformer)
* Better message
* Fix oneformer processor save-load test
* Update src/transformers/utils/deprecation.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/deprecation.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/deprecation.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update src/transformers/utils/deprecation.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Better handle classmethod warning
* Fix typo, remove warn
* Add header
* Docs and `additional_message`
* Move to filter decorator ot generic
* Proper deprecation for semantic segm scripts
* Add to __init__ and update import
* Basic tests for filter decorator
* Fix doc
* Override `to_dict()` to pop depracated `_max_size`
* Pop unused parameters
* Fix trailing whitespace
* Add test for deprecation
* Add deprecation warning control parameter
* Update generic test
* Fixup deprecation tests
* Introduce init service kwargs
* Revert popping unused params
* Revert oneformer test
* Allow "metadata" to pass
* Better docs
* Fix test
* Add notion in docstring
* Fix notification for both names
* Add func name to warning message
* Fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-06-10 12:35:10 +01:00
Pablo Montalvo
6b11f89c6b
Fix paligemma inverted mask ( #31207 )
...
* pass inverted causal mask
* add sanity check for paligemma finetuning
* [run-slow]paligemma
2024-06-10 11:22:39 +02:00
amyeroberts
25245ec26d
Rename test_model_common_attributes -> test_model_get_set_embeddings ( #31321 )
...
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
2024-06-07 19:40:26 +01:00
BHUVAN M
3b9174f248
interpolation added for TVP. ( #30863 )
...
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* docstring and argument fix
* doc fixes and test case fix suggested in review.
* varibale typo fix
* styling and name fixes for padding interpolation flag.
2024-06-07 18:44:16 +01:00
Matt
065729a692
Remove ConversationalPipeline and Conversation object ( #31165 )
...
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
2024-06-07 17:50:18 +01:00
조준래
60861fe1fd
Implement JSON dump conversion for torch_dtype in TrainingArguments ( #31224 )
...
* Implement JSON dump conversion for torch_dtype in TrainingArguments
* Add unit test for converting torch_dtype in TrainingArguments to JSON
* move unit test for converting torch_dtype into TrainerIntegrationTest class
* reformating using ruff
* convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str
---------
Co-authored-by: jun.4 <jun.4@kakaobrain.com>
2024-06-07 15:43:34 +01:00
Benjamin Badger
ff689f57aa
Extend save_pretrained to offloaded models ( #27412 )
...
* added hidden subset
* debugged hidden subset contrastive search
* added contrastive search compression
* debugged compressed contrastive search
* memory reduction for contrastive search
* debugged mem red
* added low memory option feature
* debugged mem optmimization output stack
* debugged mem optmimization output stack
* debugged low mem
* added low mem cache
* fixed 2047 tensor view
* debugged 2042 past key val inputs
* reformatted tensors
* changed low mem output
* final clean
* removed subset hidden csearch
* fixed hidden device
* fixed hidden device
* changed compressor dtype
* removed hstate compression
* integrated csearch in generate
* test csearch integration into generation
exit()
* fixed csearch kwarg integration with generation
* final wrap and added doc
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* added debug print
* direct hstate cat
* direct hstate cat
* direct hstate cat debug
* direct hstate cat debug
* expanded full hidden state stack
* expanded full hidden state stack
* matched dims for hstates
* matched dims for hstates
* logits fix
* equality test
* equality hidden debug
* debug
* added prints for debug
* added prints for debug
* equality check
* switched squeeze dim
* input format debug
* tracing top_k_ids
* removed trace
* added test context
* added jitter
* added jitter
* added jitter
* returned state
* rebuilt past key value reconstruction
* debugged
* cleaned traces
* added selection for pkv
* changed output to dict
* cleaned
* cleaned
* cleaned up contrastive search test
* moved low_memory kwarg
* debugged
* changed low mem test batch size to 1
* removed output
* debugged test input shape
* reformatted csearch test
* added trace
* removed unsqueeze on final forward pass
* replaced unsqueeze with view
* removed traces
* cleaned
* debugged model kwargs
* removed special models from test
* ran make quality
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* refactored
* refactored
* refactored
* make fixup
* renamed flag sequential
* renamed flag sequential
* iterative onloading
* black style and test utils
* added traces for integrated test
* debugged
* added traces
* make style
* removed traces, make style
* included suggestions and added test
* debugged test
* added offload module check and make style
* is_accelerate_available and make style
* added test decorator
* changed test model and config spec
* added offload condition
* added lazy loading for each shard
* debugged
* modified sharding
* debugged
* added traces
* removed safe serialization
* no index overload;
* trace on safe save ptrs
* added ptr condition
* debugged
* debugged ptr
* moved module map init
* remake shard only for offloaded modules
* refactored
* debugged
* refactored
* debugged
* cleaned and make style
* cleaned and make style
* added trace
* sparse module map
* debugged
* removed module map conditional
* refactored
* debug
* debugged
* added traces
* added shard mem trace
* added shard mem trace
* removed underlying storage check
* refactored
* memory leak removal and make style
* cleaned
* swapped test decs and make style
* added mem checks and make style
* added free mem warning
* implemented some suggestions
* moved onloading to accelerate
* refactored for accelerate integration
* cleaned test
* make style
* debugged offload map name
* cleaned and make style
* replaced meta device check for sharding
* cleaned and make style
* implemented some suggestions
* more suggestions
* update warning
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* more suggestions
* make style
* new make style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-07 07:50:35 -04:00
Cyril Vallez
8bcf9c8dd4
Fix jetmoe model ( #31279 )
...
* Fix jetmoe model
* Remove skip-tests
2024-06-07 11:51:41 +02:00
amyeroberts
bdf36dcd48
Enable HF pretrained backbones ( #31145 )
...
* Enable load HF or tim backbone checkpoints
* Fix up
* Fix test - pass in proper out_indices
* Update docs
* Fix tvp tests
* Fix doc examples
* Fix doc examples
* Try to resolve DPT backbone param init
* Don't conditionally set to None
* Add condition based on whether backbone is defined
* Address review comments
2024-06-06 22:02:38 +01:00
Vu Huy Nguyen
f9296249a3
Pipeline VQA: Add support for list of images and questions as pipeline input ( #31217 )
...
* Add list check for image and question
* Handle passing two lists and update docstring
* Add tests
* Add support for dataset
* Add test for dataset as input
* fixup
* fix unprotected import
* fix unprotected import
* fix import again
* fix param type
2024-06-06 14:50:45 +01:00
amyeroberts
c53fcd8381
Mark MobileNetV1ModelTest::test_batching_equivalence as flaky ( #31258 )
...
* Mark MobileNetV1ModelTest::test_batching_equivalence as flaky
* Add link to issue
* woops
2024-06-06 14:47:58 +01:00
Omar Salman
681183974a
Enable dynamic resolution input for Beit ( #31053 )
...
* Initial attempt
* Updates: PR suggestions
* Interpolate the relative position bias when interpolate_pos_encoding is True
* Add slow tag for the added tests
* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
2024-06-06 14:47:41 +01:00
Marc Sun
99895ae5e2
fix accelerate tests for roberta xl ( #31288 )
...
* fix accelerate tests for roberta xl
* style
2024-06-06 14:44:35 +01:00
Raushan Turganbay
5fabd1e83b
Generation: fix handling of special tokens ( #31254 )
...
* fix special tokens in generatioon
* fix test
* add warning
* fix the check
* warn once
* fix
2024-06-06 15:21:32 +05:00
Raushan Turganbay
7729b77478
Make mamba use cache ( #31116 )
...
* make mamba use cache
* uss cache naming as in mamba
* fix musicgen
2024-06-06 13:37:29 +05:00
amyeroberts
940fde8daf
Skip failing JetMOE generation tests ( #31266 )
...
Skip failing tests for now
2024-06-05 19:06:46 +01:00
bastrob
464d986b6c
Add missing Flaubert tokenizer tests ( #30492 )
...
* add flaubert tokenization test, enrich inheritance in FlaubertTokenizer.
* fix quality code ci
* ensure parameter consistency
* fix ci
* fix copyright year and flatten vocab list.
* fix style
2024-06-05 13:52:16 +02:00
Yih-Dar
fd3238b4b0
Fix MistralIntegrationTest
( #31231 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-04 18:04:08 +02:00
amyeroberts
4ba66fdb4c
Fix pipeline tests - torch imports ( #31227 )
...
* Fix pipeline tests - torch imports
* Frameowrk dependant float conversion
2024-06-04 12:30:23 +01:00
Chujie Zheng
6b22a8f2d8
fix bf16 issue in text classification pipeline ( #30996 )
...
* fix logits dtype
* Add bf16/fp16 tests for text_classification pipeline
* Update test_pipelines_text_classification.py
* fix
* fix
2024-06-04 11:20:48 +01:00
Kristen Pereira
de460e28e1
Add dynamic resolution input/interpolate position embedding to deit ( #31131 )
...
* Added interpolate pos encoding feature and test to deit
* Added interpolate pos encoding feature and test for deit TF model
* readded accidentally delted test for multi_gpu
* storing only patch_size instead of entire config and removed commented code
* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-04 10:29:01 +01:00
Raushan Turganbay
d64e4da713
Video-LLaVa: handle any number of frames ( #31221 )
...
video-llava can handle more frames
2024-06-04 14:20:03 +05:00
DomHudson
e83cf58145
Fix sentence fragment within test comments ( #31218 )
2024-06-04 10:09:24 +01:00
Raushan Turganbay
83238eeebc
Pass device in Logits Processor's init ( #29804 )
...
* add device in logits processor
* remove device when not needed
* codestyle
* tests
* forgot `melody` version
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* codestyle
* updates
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-06-04 10:19:19 +05:00
Yih-Dar
8a1a23ae4d
Fix GPU OOM for mistral.py::Mask4DTestHard
( #31212 )
...
* build
* build
* build
* build
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-03 19:25:15 +02:00
Sangbum Daniel Choi
874ac129bb
fix the get_size_with_aspect_ratio in max_size situation ( #30902 )
...
* fix the get_size_with_aspect_ratio in max_size situation
* make fix-up
* add more general solution
* consider when max_size is not defined
* fix typo
* fix typo
* simple fix
* fix error
* fix if else error
* fix error of size overwrite
* fix yolos image processing
* fix detr image processing
* make
* add longest related test script
* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more test
* add test script about longest size
* remove deprecated
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-03 16:12:08 +01:00
Isotr0py
e4628434d8
Add Qwen2 GGUF loading support ( #31175 )
...
* add qwen2 gguf support
* Update docs
* fix qwen2 tokenizer
* add qwen2 gguf test
* fix typo in qwen2 gguf test
* format code
* Remove mistral, clarify the error message
* format code
* add typing and update docstring
2024-06-03 14:55:10 +01:00
Yih-Dar
df848acc5d
Fix test_compile_static_cache
( #30991 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-03 15:16:28 +02:00
fxmarty
221aaec6ec
Ignore non-causal mask in more cases with SDPA ( #30138 )
...
* update non-causal mask for sdpa
* add test
* update docstrings
* add one more test
* fix cross attention bug
* gentler atol/rtol
2024-06-03 19:08:41 +08:00
Ahmed Moubtahij
39b2ff69d6
Token healing ( #30081 )
...
* token healing impl + trie with extensions
* make fixup
* prefix-robust space tokenization
* examples readme and requirements
* make fixup
* allow input prompt and model
* redundant defaults
* Specialized Trie
* make fixup
* updated tests with new inherited Tree
* input ids to auto device_map
* rm unused import
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* naming convention
* Revert "naming convention"
This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.
* naming convention
* last -hopefully- changes
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-06-03 10:53:15 +02:00
Aymeric Roucher
9837a25481
Add streaming, various fixes ( #30838 )
...
* Implement streaming run in ReAct agents
* Allow additional imports in code agents
* Python interpreter: support classes and exceptions, fixes
2024-05-31 14:16:23 +02:00
Marc Sun
48cada87c3
Fix quantized cache output ( #31143 )
2024-05-31 12:08:55 +02:00
zspo
cda9c82a63
fix get_scheduler when name is warmup_stable_decay ( #31128 )
...
fix get_scheduler args
2024-05-30 15:25:43 +01:00
Younes Belkada
5e5c4d629d
FIX / Quantization: Add extra validation for bnb config ( #31135 )
...
add validation for bnb config
2024-05-30 11:45:03 +02:00
Dhruv Pai
5c88253556
Add on_optimizer_step to callback options ( #31095 )
...
* Modified test
* Added on_optimizer_step to callbacks
* Move callback after step is called
* Added on optimizer step callback
2024-05-29 16:20:59 +02:00
Lucain
c3044ec2f3
Use HF_HUB_OFFLINE
+ fix has_file in offline mode ( #31016 )
...
* Fix has_file in offline mode
* harmonize env variable for offline mode
* Switch to HF_HUB_OFFLINE
* fix test
* revert test_offline to test TRANSFORMERS_OFFLINE
* Add new offline test
* merge conflicts
* docs
2024-05-29 11:55:43 +01:00
amyeroberts
a564d10afe
Deprecate low use models ( #30781 )
...
* Deprecate models
- graphormer
- time_series_transformer
- xlm_prophetnet
- qdqbert
- nat
- ernie_m
- tvlt
- nezha
- mega
- jukebox
- vit_hybrid
- x_clip
- deta
- speech_to_text_2
- efficientformer
- realm
- gptsan_japanese
* Fix up
* Fix speech2text2 imports
* Make sure message isn't indented
* Fix docstrings
* Correctly map for deprecated models from model_type
* Uncomment out
* Add back time series transformer and x-clip
* Import fix and fix-up
* Fix up with updated ruff
2024-05-28 18:07:07 +01:00
Younes Belkada
3264be4114
TST: Fix instruct-blip tests ( #31088 )
...
* fix flan t5 tests
* better format
2024-05-28 18:29:11 +02:00
Yih-Dar
3af7bf30ad
skip test_multi_gpu_data_parallel_forward
for vit
and deit
( #31086 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-28 17:44:52 +02:00
Raushan Turganbay
779bc360ff
Watermark: fix tests ( #30961 )
...
* fix tests
* style
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-28 17:07:42 +05:00
Lysandre Debut
a3c7b59e31
Fix failing tokenizer tests ( #31083 )
...
* Fix failing tokenizer tests
* Use small tokenizer
* Fix remaining reference
2024-05-28 13:34:23 +02:00
Pavel Iakubovskii
98e2d48e9a
Fix OWLv2 post_process_object_detection for multiple images ( #31082 )
...
* Add test for multiple images
* [run slow] owlv2
* Fix box rescaling
* [run slow] owlv2
2024-05-28 12:06:06 +01:00
oOraph
936ab7bae5
fix from_pretrained in offline mode when model is preloaded in cache ( #31010 )
...
* Unit test to verify fix
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
* fix from_pretrained in offline mode when model is preloaded in cache
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
* minor: fmt
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
---------
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>
2024-05-28 11:56:05 +02:00
Yih-Dar
8e3b1fef97
Remove ninja
from docker image build ( #31080 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-28 11:36:26 +02:00
Yih-Dar
9d35edbb30
skip test_model_parallelism
for 2 model test classes ( #31067 )
...
skip
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-27 18:36:39 +02:00
Yoach Lacombe
d355741eca
Fix pad_to_max_length Whisper ( #30787 )
...
* fix pad_to_max_length Whisper
* add tests
* make style
2024-05-27 16:09:05 +02:00
Marc Sun
b84cd67526
Fix quanto tests ( #31062 )
...
fix quanto tests
2024-05-27 15:53:45 +02:00
Ita Zaporozhets
deba7655e6
Add split special tokens ( #30772 )
...
* seems like `split_special_tokens` is used here
* split special token
* add new line at end of file
* moving split special token test to common tests
* added assertions
* test
* fixup
* add co-author
* passing rest of args to gptsan_japanese, fixing tests
* removing direct comparison of fast and slow models
* adding test support for UDOP and LayoutXLM
* ruff fix
* readd check if slow tokenizer
* modify test to handle bos tokens
* removing commented function
* trigger build
* applying review feedback - updated docstrings, var names, and simplified tests
* ruff fixes
* Update tests/test_tokenization_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* applying feedback, comments
* shutil temp directory fix
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
Co-authored-by: itazap <itazap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>
2024-05-24 08:38:58 -07:00
BHUVAN M
e5103a76cc
added interpolation for vitmae model in pytorch as well as tf. ( #30732 )
...
* added interpolation for vitmae model in pytorch as well as tf.
* Update modeling_vit_mae.py
irreugalr import fixed
* small changes and proper formatting
* changes suggested in review.
* modified decoder interpolate_func
* arguments and docstring fix
* Apply suggestions from code review
doc fixes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-24 16:20:09 +01:00
Younes Belkada
658b849aeb
Quantization / TST: Fix remaining quantization tests ( #31000 )
...
* Fix remaining quant tests
* Update test_quanto.py
2024-05-24 14:35:59 +02:00
Lucain
fd3c128040
Fix resume_download future warning ( #31007 )
...
* Fix resume_download future warning
* better like this
* Add regression test
2024-05-24 14:35:40 +02:00
Marc Sun
ae87f9797b
FIX / TST: Fix expected results on Mistral AWQ test ( #30971 )
...
fix awq mistral test
2024-05-24 14:06:31 +02:00
Fanli Lin
04c7c176d7
[tests] make test_model_parallelism
device-agnostic ( #30844 )
...
* enable on xpu
* fix style
* add comment and mps
2024-05-24 11:51:51 +01:00
Yixiang Gao
42d8dd8716
Perceiver interpolate position embedding ( #30979 )
...
* add test that currently fails
* test passed
* all perceiver passed
* fixup, style, quality, repo-consistency, all passed
* Apply suggestions from code review: default to False + compute sqrt once only
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix a minor bracket
* replace dim with self._num_channels
* add arguments to the rest preprocessors
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-24 11:13:58 +01:00
Ita Zaporozhets
7f6e87413f
add prefix space ignored in llama #29625 ( #30964 )
...
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
2024-05-24 01:03:00 -07:00
Yasmin Moslem
6d3d5b1039
Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py ( #29834 )
...
* Fix typo in tokenization_nllb.py
Change `adder_tokens_decoder` into `added_tokens_decoder` and improve the warning's readability.
* Fix typo in tokenization_nllb_fast.py
Change `adder_tokens_decoder` into `added_tokens_decoder` and improve the warning's readability.
* Remove deprecated attributes in tokenization_nllb.py
Remove deprecated attributes: `lang_code_to_id`, `fairseq_tokens_to_ids`, `id_to_lang_code`, and `fairseq_ids_to_tokens`
* Remove deprecated attribute in tokenization_nllb_fast.py
Remove deprecated attribute `lang_code_to_id`
* Remove deprecated properties in tokenization_nllb.py
Remove deprecated properties - fix format
* Remove deprecated properties in tokenization_nllb_fast.py
Remove deprecated properties - fix format
* Update test_tokenization_nllb.py
* update test_tokenization_nllb.py
* Update tokenization_nllb.py
* Update test_tokenization_seamless_m4t.py
* Update test_tokenization_seamless_m4t.py
2024-05-23 18:53:26 +02:00
Aritra Roy Gosthipaty
965e98dc54
[Port] TensorFlow implementation of Mistral ( #29708 )
...
* chore: initial commit
* chore: adding imports and inits
* chore: adding the causal and classification code
* chore: adding names to the layers
* chore: using single self attn layer
* chore: built the model and layers
* chore: start with testing
* chore: docstring change, transpose fix
* fix: rotary embedding
* chore: adding cache implementation
* remove unused torch
* chore: fixing the indexing issue
* make fix-copies
* Use modeling_tf_utils.keras
* make fixup
* chore: fixing tests
* chore: adding past key value logic
* chore: adding multi label classfication test
* fix: switching on the built parameters in the layers
* fixing repo consistency
* ruff formats
* style changes
* fix: tf and pt equivalence
* removing returns from docstrings
* fix docstrings
* fix docstrings
* removing todos
* fix copies
* fix docstring
* fix docstring
* chore: using easier rotate_half
* adding integration tests
* chore: addressing review related to rotary embedding layer
* review changes
* [run-slow] mistral
* skip: test save load after resize token embedding
* style
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
2024-05-23 17:48:49 +01:00
Yih-Dar
2a89673fe5
Update 4 MptIntegrationTests
expected outputs ( #30989 )
...
* fix
* fix
* fix
* fix
* fix
* [run-slow] mpt
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-23 18:27:54 +02:00
Fanli Lin
21339a5213
[tests] add torch.use_deterministic_algorithms
for XPU ( #30774 )
...
* add xpu check
* add marker
* add documentation
* update doc
* fix ci
* remove from global init
* fix
2024-05-23 16:53:07 +01:00
Marc Sun
8366b57241
Fix accelerate failing tests ( #30836 )
...
* Fix accelerate tests
* fix clip
* skip dbrx tests
* fix GPTSan
* fix M2M100Model
* same fix as jamba
* fix mt5
* Fix T5Model
* Fix umt5 model
* fix switch_transformers
* fix whisper
* fix gptsan again
* fix siglip recent test
* skip siglip tests
* wrong place fixed
2024-05-23 17:18:58 +02:00
Poedator
6739e1d261
test_custom_4d_attention_mask skip with sliding window attn ( #30833 )
2024-05-23 15:22:10 +02:00
Raushan Turganbay
d583f1317b
Quantized KV Cache ( #30483 )
...
* clean-up
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* more suggestions
* mapping if torch available
* run tests & add 'support_quantized' flag
* fix jamba test
* revert, will be fixed by another PR
* codestyle
* HQQ and versatile cache classes
* final update
* typo
* make tests happy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-23 17:25:20 +05:00
Kamil Akesbi
eb1a77bbb0
Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size ( #30637 )
...
* fiw input to generate in pipeline
* fixup
* pass input_features to generate with assistant
* error if model and assistant with different enc size
* fix
* apply review suggestions
* use self.config.is_encoder_decoder
* pass inputs to generate directly
* add slow tests
* Update src/transformers/generation/utils.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* apply review
* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply code review
* update attributes encoder_xyz to check
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add slow test
* solve conflicts
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-05-23 09:59:38 +01:00
Pablo Montalvo
a25f7d3c12
Paligemma causal attention mask ( #30967 )
...
* PaliGemma working causal attention
* Formatting
* Style
* Docstrings + remove commented code
* Update docstring for PaliGemma Config
* PaliGemma - add separator ind to model/labels
* Refactor + docstring paligemma processor method
* Style
* return token type ids when tokenizing labels
* use token type ids when building causal mask
* add token type ids to tester
* remove separator from config
* fix style
* don't ignore separator
* add processor documentation
* simplify tokenization
* fix causal mask
* style
* fix label propagation, revert suffix naming
* fix style
* fix labels tokenization
* [run-slow]paligemma
* add eos if suffixes are present
* [run-slow]paligemma
* [run-slow]paligemma
* add misssing tokens to fast version
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix style
* [run-slow]paligemma
---------
Co-authored-by: Peter Robicheaux <peter@roboflow.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-22 19:37:15 +02:00
Sanchit Gandhi
0948c827de
[Whisper] Strip prompt before finding common subsequence ( #27836 )
2024-05-22 17:25:47 +01:00
Raushan Turganbay
b1065aa08a
Generation: get special tokens from model config ( #30899 )
...
* fix
* let's do this way?
* codestyle
* update
* add tests
2024-05-22 18:15:41 +02:00
amyeroberts
dff54ad2d9
🚨 out_indices always a list ( #30941 )
...
* out_indices always a list
* Update src/transformers/utils/backbone_utils.py
* Update src/transformers/utils/backbone_utils.py
* Move type casting
* nit
2024-05-22 15:23:04 +01:00
Pablo Montalvo
250ae9f746
Paligemma - fix slow tests, add bf16 and f16 slow tests ( #30851 )
...
* fix slow tests, add bf16 and f16 slow tests
* few fixes
* [run-slow]paligemma
* add gate decorator
* [run-slow]paligemma
* add missing gating
* [run-slow]paligemma
* [run-slow]paligemma
2024-05-22 16:20:07 +02:00
Jonatan Kłosko
1518508467
Avoid extra chunk in speech recognition ( #29539 )
2024-05-22 14:07:51 +01:00
Marc Sun
5c186003b8
Fix low cpu mem usage tests ( #30808 )
...
* Fix tests
* fix udop failing test
* remove skip
* style
2024-05-22 14:09:01 +02:00
Arthur
673440d073
update ruff version ( #30932 )
...
* update ruff version
* fix research projects
* Empty
* Fix errors
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Matthew Beckers
3b09d3f05f
fix: center_crop occasionally outputs off-by-one dimension matrix ( #30934 )
...
If required padding for a crop larger than input image is odd-numbered,
the padding would be rounded down instead of rounded up, causing the
output dimension to be one smaller than it should be.
2024-05-21 13:56:52 +01:00
Zach Mueller
daf281f44f
Enforce saving at end of training if saving option chosen ( #30160 )
...
* Enforce saving at end of training
* Fix test
* Rework test
* Fixup tests'
* Update comment based on sourab feedback
* Clean
2024-05-21 07:50:11 -04:00
Mohit Sharma
7a4792e6b3
CI: AMD MI300 tests fix ( #30797 )
...
* add fix
* update import
* updated dicts and comments
* remove prints
* Update testing_utils.py
2024-05-21 12:46:07 +01:00
Younes Belkada
8871b26150
FEAT / Trainer: LOMO optimizer support ( #30178 )
...
* add V1 - adalomo not working yet
* add todo docs + refactor from comments
* adjust LR
* add docs
* add more elaborated test
* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* fix
* push
* add accelerate check
* fix DDP case
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
* init kwargs
* safely add attribute
* revert to enum logic
* Update src/transformers/trainer.py
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-21 10:16:37 +02:00
Younes Belkada
c876d12127
FIX / TST: Fix expected results on Mistral slow test (A10) ( #30909 )
...
Update test_modeling_mistral.py
2024-05-21 09:14:14 +02:00
Longjie Zheng
616bb11d48
Add torch.compile for Mistral ( #30642 )
...
* first version
* fix sliding window
* fix style
* add sliding window cache
* fix style
* address comments
* fix test
* fix style
* move sliding window check inside cache init
* revert changes on irrelevant files & add comment on SlidingWindowCache
* address comments & fix style
fix style
* update causal mask
* [run-slow] mistral
* [run-slow] mistral
* [run-slow] mistral
* [run-slow] mistral
* [run-slow] mistral
* [run-slow] llama
* [run-slow] mistral
* [run-slow] mistral
* [run-slow] mistral
* revert CI from a10 to t4
* wrap up
2024-05-20 16:27:24 +02:00
Zach Mueller
92d1d97c05
Introduce configured_state arg for accelerator_config ( #29781 )
...
* Introduce configured_state
* Include note on tuning
* Allow for users to have defined a state already
* Include tests
* Add note on hpam tune
* Guard a bit better
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Finish rebase
* Finish rebase
* Guard carefully
* Fixup test
* Refactor
* Fin refactor
* Comment
* Update wrt feedback
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-20 09:21:40 -04:00
Yoach Lacombe
e6708709cb
Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM ( #28706 )
...
* Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM
* update with a type filter
* add raises error test
* fix added test
2024-05-20 13:40:42 +02:00
Hafedh
c11ac7857b
fix for custom pipeline configuration ( #29004 )
...
* fix for custom pipeline configuration
* fix for custom pipelines
* remove extra exception
* added test for custom pipelines extra tag
* format with ruff
* limit extra tag for first time only
* format with ruff
* improve tests for custom pipelines
2024-05-20 11:38:32 +02:00
Kamil Akesbi
1c2bb3ac54
add return_token_timestamps to WhisperProcessor ( #30812 )
...
* compute num_frames in WhisperFeatureExtractor
* add return_num_frames in WhisperFeatureProcessor + adapt pipeline
* return_timestamps renaming + pipeline fix
* fix
* fix
* fix
* add tests
* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* apply review changes
* fix
* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* apply review
* fix
* review changes
* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* make style quality
* EXPECTED_OUTPUT in single line
* small numpy->torch fix
* fix
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-20 09:53:58 +01:00
Raushan Turganbay
5d0bf59b4d
LLaVa-Next: Update docs with batched inference ( #30857 )
...
* update docs with batch ex
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* accept nested list of img
---------
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2024-05-20 13:45:56 +05:00
Benjamin Warner
cd6bd0af34
Add support for torch.compile dynamic shapes ( #30560 )
...
* add torch.compile dynamic support
* Add SDPA dynamic shapes compile test & improve SDPA comment
* comment consistency
2024-05-20 10:36:57 +02:00
Joseph Enguehard
07bf2dff78
Add TokenClassification for Mistral, Mixtral and Qwen2 ( #29878 )
...
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-20 10:06:57 +02:00
Abhiroop Tejomay
481a957814
Enable dynamic resolution input for Swin Transformer and variants ( #30656 )
...
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-17 18:38:46 +01:00
Pavel Iakubovskii
bf646fbf2d
Add fixed resize and pad strategy for object detection ( #30742 )
...
* Add resize and pad strategy
* Merge get_size functions
* Add pad_size + tests to object detection models
* Fixup
* Update docstrings
* Fixup
2024-05-17 16:21:26 +01:00
Arthur
0a9300f474
Support arbitrary processor ( #30875 )
...
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
2024-05-17 16:51:31 +02:00
Younes Belkada
3d7d3a87a0
TEST: Add llama logits tests ( #30835 )
...
* add llama logits test
* fix
* fix tests
"
"
* fix for a10
* format
* format
* fix
* [run-slow] remove fmt: skip
* Your commit message
* test commit
* Revert "test commit"
This reverts commit b66e01e55f
.
* [run-slow]llama
* Update tests/models/llama/test_modeling_llama.py
* [run-slow]llama
* empty commit
2024-05-17 12:23:00 +02:00
Yih-Dar
1b3dba9417
Make Gemma
work with torch.compile
( #30775 )
...
* fix
* [run-slow] gemma
* add test
* add `test_compile_static_cache`
* fix
* style
* remove subprocess
* use attribute
* fix
* style
* update
* [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-16 13:41:33 +02:00
Joao Gante
9d889f870e
Cache: add new flag to distinguish models that Cache
but not static cache ( #30800 )
...
* jamba cache
* new flag
* generate exception
2024-05-16 12:08:35 +01:00
hyenal
1c21f48a50
add sdpa to ViT [follow up of #29325 ] ( #30555 )
...
remove blank line (+1 squashed commit)
Squashed commits:
[24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
Squashed commits:
[08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
[ec96a8db3] [run-slow]vit_msn
[ead817eca] fix vit msn multi gpu
[d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[3fdbfa88f] doc
[a3ff33e4a] finish implementation
[e20b7b7fb] Update test_modeling_common.py
[e290c5810] Update test_modeling_flax_common.py
[d3af86f46] comment
[ff7dd32d8] more comments
[59b137889] suggestion
[7e2ba6d67] attn_implementation as attribute of the class
[fe66ab71f] minor
[38642b568] Apply suggestions from code review
Accept comments
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[22cde7d52] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[48e137cc6] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[99f4c679f] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[61f00ebb0] all tests are passing locally
[e9e0b82b7] vision encoder/decoder
[4d5076b56] test-vision (+20 squashed commits)
Squashed commits:
[d1add8db9] yolo
[9fde65716] fix flax
[986566c28] minor
[ca2f21d1f] vit
[3333efd7a] easy models change
[ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[48ecc7e26] all tests are passing locally
[bff7fc366] minor
[62f88306f] fix yolo and text_encoder tests
[121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[cffaa10dd] fix-copies
[ef6c511c4] test vit hybrid
[7d4ba8644] vit hybrid
[66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1fcc0a031] fixes
[cfde6eb21] fixup
[e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
Squashed commits:
[602913e22] vit + vit_mae are working
[547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes
[61a97dfa9] it s the complete opposite...
[aefab37d4] fix more tests
[71802a1b9] fix all torch tests
[40b12eb58] encoder - decoder tests
[941552b69] slow decorator where appropriate
[14d055d80] has_attentions to yolo and msn
[3381fa19f] add correct name
[e261316a7] repo consistency
[31c6d0c08] fixup
[9d214276c] minor fix
[11ed2e1b7] chore
[eca6644c4] add sdpa to vit-based models
[cffbf390b] make fix-copies result
[6468319b0] fix style
[d324cd02a] add sdpa for vit
Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com>
2024-05-16 10:56:11 +01:00
Edoardo Cetin
4b3eb19fa7
Fix llama model sdpa attention forward function masking bug when output_attentions=True ( #30652 )
...
* Fix llama model forward function with attention=True, same-length encoded sequence.
* Fix style
* propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama)
* Fix style
* ignore unnecessary sdpa mask converter when output_attentions=True
* add tests checking sdpa and eager outputs match when output_attentions=True
* Split if statements in two lines
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix formatting
* Add fix to new jetmoe model
* Add missing output_attentions argument to jetmoe mask creation
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-15 19:48:19 +02:00
Younes Belkada
3f435823e0
FEAT / Bitsandbytes: Add dequantize
API for bitsandbytes quantized models ( #30806 )
...
* add method
* change method name
* more comments
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixup
* add docstrings and fix comment
* warn users on the de-quantized dtype
* Update src/transformers/quantizers/base.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* final suggestion - use private method
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 17:17:09 +02:00
Xuan-Phi Nguyen
5ca085b882
Better llava next. ( #29850 )
...
* Better llava next.
- Batched forward with multiple image of different sizes (number of patches).
- Support training, for cases without any image.
- Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."]
Current limitation:
- Haven't done testing
- Only support right padding (for training)
- left padding (batched generation) is not ready yet.
- PR not ready.
* fix bugs in batched generation
* add tests
* fix batch-gen bugs, left-padding positions and incorrect attention mask
* remove better modeling llava
* fix formatting
* fix test
* fix test
* fix testing
* fix test
* fix formatting
* Update src/transformers/models/llava_next/modeling_llava_next.py
add clarity
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update modeling_llava_next.py
remove assert
* fix bug modeling_llava_next.py
* update modeling
* fix bugs
* fix format
* fix error
* fix new_token_positions
* Update modeling_llava_next.py
* update formatting
* add args
* removecomments
* add slow tests for batched inference
* failing tf/flax tests
* this one ic correct
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix docs
* make fixup
* more fixup
* add test for batch equivalence
* Update tests/models/llava_next/test_modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* pr comments
* hardcode padding side for bs=1
* update
* [run-slow] llava_next
* [run-slow] llava_next
* make fix-copies
---------
Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-05-15 19:02:56 +05:00
Sourab Mangrulkar
bdfefbadaf
Update ds_config_zero3.json ( #30829 )
2024-05-15 10:02:31 -04:00
amyeroberts
64c06df325
Jamba - Skip 4d custom attention mask test ( #30826 )
...
* Jamba - Skip 4d custom attention mask test
* Skip assistant greedy test
2024-05-15 13:57:28 +01:00
Lysandre Debut
a42844955f
Loading GGUF files support ( #30391 )
...
* Adds support for loading GGUF files
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
* add q2_k q3_k q5_k support from @99991
* fix tests
* Update doc
* Style
* Docs
* fix CI
* Update docs/source/en/gguf.md
* Update docs/source/en/gguf.md
* Compute merges
* change logic
* add comment for clarity
* add comment for clarity
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change logic
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/modeling_gguf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* put back comment
* add comment about mistral
* comments and added tests
* fix unconsistent type
* more
* fix tokenizer
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments about tests and tokenizer + add added_tokens
* from_gguf -> gguf_file
* replace on docs too
---------
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 14:28:20 +02:00
Raushan Turganbay
bd9f4d7951
Add Video Llava ( #29733 )
...
* add model draft
* update docstring
* add tests
* support image and video as input
* update for better handling of mixed input and clean-up a bit
* bug when mixed inputs & add tests
* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Merge remote-tracking branch 'upstream/main' into video_llava
* link to abstract of paper in README
* fix test
* fix-copies
* make tests happy
* skip docstest for now
* do not run doctest for now
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address review comments
* failing tests
* Fix vocab_size in common tests for VLMs
* codestyle
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* PR suggestions
* fix-copies
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add full example in docs
* clean-up with new model-id
* [run-slow] video_llava
* update docstring
* [run-slow] video_llava
* remove all achive maps
* fix some tests
* test was supposed to be skipped for llava :)
---------
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 16:42:29 +05:00
Ondřej Cífka
be3aa43e5f
Support mixed-language batches in WhisperGenerationMixin
( #29688 )
...
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-05-15 09:53:17 +02:00
Pablo Montalvo
1360801a69
Add PaliGemma ( #30814 )
...
* add new model like
* add state dict slicing + new model config
* update palma config and weights, passes vision activations
* fix
* update
* reorder loading/unpacking
* clean up
* add debug statements
* change device
* fix
* debugging
* fix noncausal mask
* fixup sdpa + causal mask
* fix activation function
* remove debug before changing modeling file
* add variants
* debug attention mask in generate
* revert to non-debug sdpa
* revert gemma modifications
* add custom language modeling
* use Processor
* add language modeling file to init
* try thin wrapper around generate
* Update
* update mask
* breakpoints galore
* remove conflict
* switch to left-padding
* add incomplete model doc
* add paligemma global files
* batch rename paligemma
* make generation match outputs and captioning
* style
* style
* remove copied from + doc
* remove more copied from
* remove copy from projector
* minor fix
* update config and style
* add readme - dummy
* CORRECT image captioning
* moving to args
* add siglip proper + fix merging image + text features
* take update_causal_mask from upstream
* remove breakpoint
* leverage AutoModel
* fix input_ids slicing
* make siglip head conditional
* remove encoder_decoder value
* remove unneeded modeling file
* add commented 4d attention mask
* FIXED generation with 4D mask
* Update src/transformers/models/siglip/modeling_siglip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix left padding detection
* shuffle order of verifications
* fix missing labels for training
* fix
* vectorize merging of features, improve slicing
* improve testing before conversion
* handle merging in processor
* image token index depends on checkpoint
* add variants, save processor too
* save processors, base tokenizer off spm file
* expand model embeddings due to additional image token
* pass image processing args
* add convert rgb to siglip processor
* add \n token separately
* fix tokenizer and prompts
* fix docstrings
* change to camel
* fix casing
* debug pos_ids and sdpa
* pass and use cache_position
* add flag for newline tokenization
* Update src/transformers/models/paligemma/processing_paligemma.py
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
* simplify conversion script
* add copied from
* add precision to conversion script
* Update src/transformers/models/paligemma/modeling_paligemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* clean up
* Shift attention mask from `1:`
After discussion with @molbap
* add docs, fix quality
* quality, tied weights inheritance, and logits/label alignment
* fix more tests
* pass attn_implementation to language model correctly
* add SiglipVisionTransformer to no split modules
* skip paligemma test for sdpa dispatch to flash
* skip incompatible tests
* quality
* [broken archive maps]
* Apply suggestions
- remove archive lists
- style
- take shape of inputs_embeds for batch
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/utils/dummy_pt_objects.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* simplify conversion script
* add suggestions
* add suggestions
* add copied from
* fix
* move labels out
* revert
* fix
* remove placeholder labels if None
* use cache_position
* fix quality + docstrings
* fix quality
* fix paligemma 4d gemma mask incompatibility
* fix config docstring
* fix query and attn_mask dtype
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-05-14 22:07:15 +02:00
Yikang Shen
ccdabc5642
Add JetMoE model ( #30005 )
...
* init jetmoe code
* update archive maps
* remove flax import
* fix import error
* update README
* ruff fix
* update readme
* fix
* update config
* fix issue
* merge files
* fix model bug
* fix test
* auto fix
* model size
* add comments
* fix form
* add flash attention support
* fix attention head number
* fix init
* fix support list
* sort auto mapping
* fix test
* fix docs
* update test
* fix test
* fix test
* change variable name
* fix config
* fix init
* update format
* clean code
* fix config
* fix config
* change default config
* update config
* fix issues
* update formate
* update config argument
* update format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* change to mixtral aux loss
* change to cache_position
* debug
* fix bugs
* debug
* fix format
* fix format
* fix copy
* fix format
* fix format
* fix sort
* fix sort
* fix sort
* add copy comment
* add copy from
* remove debug code
* revert readme update
* add copy
* debug
* remove debug code
* fix flash attention
* add comments
* clean code
* clean format
* fix format
* fix format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* change variable name
* add copied from
* fix variable name
* remove deprecated functinos
* sync to llama implementation
* fix format
* fix copy
* fix format
* update format
* remove repr
* add comment for moe weight
* fix copy
* Update src/transformers/models/jetmoe/configuration_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add comments and reformat config
* fix format
* fix format
* fix format
* update test
* update doc string in config
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update config doc
* update attention cache
* fix format
* fix copy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-14 16:32:01 +02:00
Raushan Turganbay
5ad960f1f4
Add Watermarking LogitsProcessor and WatermarkDetector ( #29676 )
...
* add watermarking processor
* remove the other hashing (context width=1 always)
* make style
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update watermarking process
* add detector
* update tests to use detector
* fix failing tests
* rename `input_seq`
* make style
* doc for processor
* minor fixes
* docs
* make quality
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add PR suggestions
* let's use lru_cache's default max size (128)
* import processor if torch available
* maybe like this
* lets move the config to torch independet file
* add docs
* tiny docs fix to make the test happy
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* PR suggestions
* add docs
* fix test
* fix docs
* address pr comments
* style
* Revert "style"
This reverts commit 7f33cc34ff
.
* correct style
* make doctest green
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-05-14 13:31:39 +05:00
fxmarty
37bba2a32d
CI: update to ROCm 6.0.2 and test MI300 ( #30266 )
...
* update to ROCm 6.0.2 and test MI300
* add callers for mi300
* update dockerfile
* fix trainer tests
* remove apex
* style
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* update to torch 2.3
* add workflow dispatch target
* we may need branches: mi300-ci after all
* nit
* fix docker build
* nit
* add check runner
* remove docker-gpu
* fix issues
* fix
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-13 18:14:36 +02:00
Marc Sun
539ed75d50
skip low_cpu_mem_usage tests ( #30782 )
2024-05-13 18:00:43 +02:00
Alazar
94306352f4
Port IDEFICS to tensorflow ( #26870 )
...
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-05-13 15:59:46 +01:00
Fanli Lin
69d9bca55a
enable Pipeline to get device from model ( #30534 )
...
* check model.device
* fix
* style fix
* move model device
* remove print
* add comment
* fix
* add unit test
* optimize
* change test names and add more cases
* Update tests/pipelines/test_pipelines_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-13 15:00:39 +01:00
Poedator
a0779b9e19
Llama: fix custom 4D masks, v2 ( #30348 )
...
* 4d mask fixes
* Update custom 4D mask logic
* test moved to mixin
* extra tests 4d mask
* upd 4d mask and StaticCache handling
* added Mask4DTestHard to mistral tests
* post-rebase fixes
* test fixes for StaticCache
* make fix-copies
* upd 1 after #30476
* fix common tests
* rm elif attention_mask.dim() == 4:
* tests combined, fixed, mixtral supported
* bigbird style chg reverted
* rm if attention_mask.dim() == 2
* modeling_llama formatting chg
---------
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-05-13 13:46:06 +02:00
Nilabhra Roy Chowdhury
e52741f601
Support for Falcon2-11B ( #30771 )
...
* remove unrelated changes
* remove unrelated changes on phi and stable LM
* add: Test for Falcon 10B
* fix: formatting
* fix: loading the falcon 10B in 8 bit precision using bitsanbytes.
* fix: device placement
* fix: broken tests.
* fix: backwards compatibility for falcon 1B architecture.
* chore: updated test.
* chore: test_modeling_falcon.py to use the 11B model.
* chore: minor edit
* chore: formating.
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
2024-05-13 13:32:43 +02:00
Zafir Stojanovski
f63d822242
Blip dynamic input resolution ( #30722 )
...
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
2024-05-13 12:20:16 +01:00
Marc Sun
de6e0db184
[awq] replace scale when we have GELU ( #30074 )
...
* fix awq test
* style
* add log
* new fix
* style
* only modifying impacted model in the end
* rename function
2024-05-13 11:41:03 +02:00
Joao Gante
7130a22db9
Generate: consistently handle special tokens as tensors ( #30624 )
...
* tmp commit
* [test_all] mvp
* missing not
* [test_all] final test fixes
* fix musicgen_melody and rag
* [test_all] empty commit
* PR comments
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-09 18:01:57 +01:00
Jacky Lee
218f44135f
Fix image post-processing for OWLv2 ( #30686 )
...
* feat: add note about owlv2
* fix: post processing coordinates
* remove: workaround document
* fix: extra quotes
* update: owlv2 docstrings
* fix: copies check
* feat: add unit test for resize
* Update tests/models/owlv2/test_image_processor_owlv2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-09 17:02:03 +01:00
Joao Gante
df53c6e5d9
Generate: add min_p
sampling ( #30639 )
...
* min_p
* more relaxed test to avoid numerical issues
* Update src/transformers/generation/logits_process.py
Co-authored-by: menhguin <minh1228@gmail.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: menhguin <minh1228@gmail.com>
* docstring clarifications
* PR comments
* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* make fixup
---------
Co-authored-by: menhguin <minh1228@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-09 14:36:53 +01:00
Lysandre Debut
297b732bdf
Removal of deprecated maps ( #30576 )
...
* [test_all] Remove all imports
Remove remaining ARCHIVE MAPS
Remove remaining PRETRAINED maps
* review comments
* [test_all] empty commit to trigger tests
2024-05-09 14:15:56 +02:00