Yih-Dar
765732e92c
unpin numpy<2.0
( #32018 )
...
* unpin np
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-18 11:26:01 +02:00
Pavel Iakubovskii
1c37e8c1a6
Add sdpa
and FA2 for CLIP ( #31940 )
...
* Squashed commit of the following:
commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:23:52 2024 +0000
Add model-specific sdpa tests
commit 60e4c88581abf89ec098da84ed8e92aa904c997d
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Fri Jul 12 18:20:53 2024 +0000
Add fallback to eager (expensive operation)
commit c29033d30e7ffde4327e8a15cbbc6bee37546f80
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date: Thu Jul 11 17:09:55 2024 +0000
Fix attn_implementation propagation
commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:05:27 2024 +0530
style
commit e77e703ca75d00447cda277eca6b886cd32bddc0
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:04:57 2024 +0530
add comment to explain why I had to touch forbidden codebase.
commit ab9d8849758e7773a31778ccba71588d18552623
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 09:03:02 2024 +0530
fix: flax attribute access.
commit c570fc0abf9d1bd58c291aae3c7e384f995996d2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 08:23:54 2024 +0530
fix tensorflow attribute name.
commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:57:10 2024 +0530
fix attribute access.
commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Sat May 25 07:44:02 2024 +0530
_from_config.
commit 35aed64ff602422adcf41d7f677a0a24bd9eccae
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 18:46:52 2024 +0530
propagation of attn_implementation.
commit 4c25c19845438b1dc1d35a5adf9436151c8c5940
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:24:36 2024 +0530
style again
commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:19:05 2024 +0530
use from_config.
commit b70c409956d0359fa6ae5372275d2a20ba7e3389
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 24 09:13:43 2024 +0530
quality
commit a7b63beff53d0fc754c6564e2a7b51731ddee49d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 14:35:10 2024 +0200
add benchmark numbers
commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:16 2024 +0200
Revert "reflect feedback more"
This reverts commit dc123e71ef
.
commit ca674829d28787349c2a9593a14e0f1d41f04ea4
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:50:05 2024 +0200
Revert "fix"
This reverts commit 37a1cb35b8
.
commit fab2dd8576c099eb1a3464958cb206a664d28247
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:47:46 2024 +0200
fix
commit fbc6ae50fd6f2d36294d31e191761631b701d696
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 10 13:38:30 2024 +0200
reflect feedback more
commit 87245bb020b2d60a89afe318a951df0159404fc9
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 08:54:34 2024 +0530
fixes
commit 1057cc26390ee839251e7f8b3326c4207595fb23
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:49:03 2024 +0530
don't explicit set attn_implementation in tests
commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:43:54 2024 +0530
explicitly override attn_implementation in the towers.
commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:38:42 2024 +0530
import in one-line.
commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri May 3 07:34:58 2024 +0530
move sdpa mention to usage tips.
commit 92884766c64dbb456926a3a84dd427be1349fa95
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 10:58:26 2024 +0530
fix: memory allocation problem.
commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 29 09:56:59 2024 +0530
fix-copies
commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:16:12 2024 +0530
address arthur's comments.
commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Fri Apr 26 20:08:15 2024 +0530
Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 46e04361f37ded5c522ff05e9f725b9f82dce40e
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:55:27 2024 +0530
add to docs.
commit 831629158ad40d34d8983f209afb2740ba041af2
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:33:10 2024 +0530
styling.g
commit d263a119c77314250f4b4c8469caf42559197f22
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Wed Apr 24 09:15:20 2024 +0530
up
commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 18:40:42 2024 +0530
handle causal and attention mask
commit 122f1d60153df6666b634a94e38d073f3f260926
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 15:18:21 2024 +0530
test fixes.
commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Tue Apr 23 09:39:25 2024 +0530
fix: scaling inside sdpa.
commit 0f629989efc48b7315cf19405a81e02955efe7e5
Author: Sayak Paul <spsayakpaul@gmail.com>
Date: Tue Apr 23 08:14:58 2024 +0530
Update src/transformers/models/clip/modeling_clip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
commit 14367316877dc27ea40f767ad1aee38bbc97e4ce
Author: sayakpaul <spsayakpaul@gmail.com>
Date: Mon Apr 22 16:21:36 2024 +0530
add: sdpa support to clip.
* Remove fallback for empty attention mask (expensive operation)
* Fix typing in copies
* Add flash attention
* Add flash attention tests
* List CLIP in FA docs
* Fix embeddings attributes and tf
* [run-slow] clip
* Update clip documentation
* Remove commented code, skip compile dynamic for CLIPModel
* Fix doc
* Fix doc 2
* Remove double transpose
* Add torch version check for contiguous()
* Add comment to test mixin
* Fix copies
* Add comment for mask
* Update docs
* [run-slow] clip
2024-07-18 10:30:37 +05:30
Robin Bakker
b31d595040
Add language to word timestamps for Whisper ( #31572 )
...
* add language to words
_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information
* ran style checks
added missing comma
* add new language test
test that the pipeline can return both the language and timestamp
* remove model configuration in test
Removed model configurations that do not influence test results
* remove model configuration in test
Removed model configurations that do not influence test results
2024-07-17 21:32:53 +01:00
Francesco Cariaggi
cb23d1b20b
Pass missing arguments to SeamlessM4Tv2ConformerEncoderLayer.forward()
when gradient checkpointing is enabled ( #31945 )
...
* pass missing arguments when gradient checkpointing is enabled for SeamlessM4Tv2
* fix same bug in SeamlessM4Tv1
* pass args, not kwargs
2024-07-17 20:42:53 +01:00
Dmitry Rogozhkin
bc36c26fa6
doc: fix broken BEiT and DiNAT model links on Backbone page ( #32029 )
...
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-07-17 20:24:10 +01:00
Moses Hohman
63be8e6f39
Fix typo in classification function selection logic to improve code consistency ( #32031 )
...
Make problem_type condition consistent with num_labels condition
The latter condition generally overrides the former, so this is more of a code reading issue. I'm not sure the bug would ever actually get triggered under normal use.
2024-07-17 20:20:39 +01:00
Sai-Suraj-27
72fb02c47d
Fixed log messages
that are resulting in TypeError due to too many arguments ( #32017 )
...
* Fixed log messages that are resulting in TypeErrors due to too many arguments.
* Removed un-necessary imports.
2024-07-17 10:56:44 +01:00
Pavel Iakubovskii
691586b0dc
Fix tests skip ( #32012 )
...
* [run-slow] clip
* [run-slow] clip
* Fix skip -> skipTest
* [run-slow] clip
2024-07-17 08:37:43 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model ( #31534 )
...
* Chameleon model integration
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
* fix 7B, again. mask away image tokens
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove pretrained_config_map
* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file
* remove tokenizer (use llama's); remove codechameleon tests
* a few copied from statements and minor changes
* copied from in ChameleonModel
* some copies in ChameleonForCausalLM
* a few more copies
* VQModel moved to ChameleonModel (as opposed to being in the processor)
* ChameleonProcessor ready
* Fix chameleon weights convert
* update conversion script
* clean-up processing
* update modeling a bit
* update
* update (throws error...)
* correct conversion ready
* fix tests
* fix docs
* docs
* ve swin norm
* fix device for vocab map
* add normalization
* update
* update script with rope rotations
* final fix on model conversion
* add slow tests
* more info in docs
* fix repo consistency tests
* fix repo tests
* fix-copies
* hope this will make CI happy
* fix for 30b model
* Update docs/source/en/index.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments
* remove assertion in conversion script
* add image processor test
* not copied
* port changes for qk layernorm
* fix-copies
* read token decorator for tests
* [run-slow] chameleon
* one more read-token
* address some comments
* qk norm changes
* tests and repo check
* moved rope permutations to conversion, YAY!
* fix past kv check
* docs
* layernorm done!
* let's be consistent in naming
* fix slow tests
* weird thing with slow CI, but let's see
* once more try
* remove past-kv as tuple following llama
* ignore
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Zach Mueller
4037a2b5b1
SpeechEncoderDecoder doesn't support param buffer assignments ( #32009 )
...
One more model
2024-07-16 18:18:32 -04:00
Zach Mueller
6f40a213eb
Fix if else and *actually* enable superfast init ( #32007 )
...
* Fix if else
* rm err raise
2024-07-16 14:35:57 -04:00
Alexander Wettig
e391706420
Fix gather when collecting 'num_input_tokens_seen' ( #31974 )
...
* Move token count to device before gathering
* Run 'make style; make quality'
2024-07-16 19:35:10 +01:00
Joao Gante
c22efa6196
Bug report update -- round 2 ( #32006 )
...
* like this?
* Update .github/ISSUE_TEMPLATE/bug-report.yml
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 19:22:45 +01:00
Sai-Suraj-27
88e0813d8d
fix: Fixed incorrect dictionary assignment in src/transformers/__init__.py
( #31993 )
...
Fixed incorrect dictionary assignment.
2024-07-16 17:28:14 +01:00
조준래
036d3de23d
add flash-attn deterministic option to flash-attn>=2.4.1 ( #31961 )
...
* add flash-attn deterministic option to flash-attn>=2.4.1
* Add Missing Import
* Fix ruff linting issues
* Replace `is_flash_attn_greater_or_equal_2_41` with the existing `is_flash_attn_greater_or_equal`
---------
Co-authored-by: jun.4 <jun.4@kakaobrain.com>
2024-07-16 17:55:41 +02:00
Joao Gante
89eec5cf20
Bug report update ( #31983 )
2024-07-16 16:51:05 +01:00
Joao Gante
999981daf4
Tests: remove cuda versions when the result is the same 🧹 🧹 ( #31955 )
...
remove cuda versions when the result is the same
2024-07-16 16:49:54 +01:00
Zach Mueller
693cb828ff
Fix bad test about slower init ( #32002 )
...
Bronked main
2024-07-16 10:33:05 -04:00
Fanli Lin
25e5e3fa56
[tests] fix deepspeed zero3 config for test_stage3_nvme_offload
( #31881 )
...
fix config
2024-07-16 16:11:37 +02:00
Zach Mueller
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) ( #31771 )
...
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-16 09:32:01 -04:00
huismiling
03a3becc48
Cambricon MLUs support SDPA and flash_attn ( #31102 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
* Cambricon support SDPA and flash_attn
2024-07-16 14:33:22 +02:00
Penut Chen
ac946aac25
Fix the incorrect permutation of gguf ( #31788 )
...
* Fix the incorrect permutation of gguf
* rename num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add typing to num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* rename variables
* refactor permute function name
* update the expected text of the llama3 q4 test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-07-16 08:20:34 +02:00
Joao Gante
6fbea6d237
Generate: doc nits ( #31982 )
...
nits
2024-07-15 19:59:20 +01:00
Joao Gante
e4682de635
Masking: remove flakiness from test ( #31939 )
2024-07-15 18:49:37 +01:00
Yih-Dar
a1a34657d4
Avoid race condition ( #31973 )
...
* [test_all] hub
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-15 17:56:24 +02:00
Yih-Dar
11efb4fc09
Notify new docker images built for circleci ( #31701 )
...
* hello
* hello
* hello
* hello
* hello
* hello
* hello
* notify
* trigger
* use new channel
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-15 17:16:36 +02:00
Sai-Suraj-27
556a4205f0
fix: Fixed the arguments in create_repo()
function call ( #31947 )
...
* Fixed the arguments in create_repo() function call.
* Formatted the code properly using ruff.
* Formatted the code more clearly.
2024-07-15 15:56:17 +01:00
Joao Gante
907500423d
Generate: handle logits_warper
update in models with custom generate fn ( #31957 )
...
handle logits_warper update in models with custom generate fn
2024-07-15 12:07:53 +02:00
Sai-Suraj-27
454bc14d90
fix: Removed a wrong key-word argument in sigmoid_focal_loss()
function call ( #31951 )
...
Removed a wrong key-word argument in sigmoid_focal_loss() function call.
2024-07-15 10:05:08 +01:00
Joao Gante
a5c642fe7a
Whisper: move to tensor cpu before converting to np array at decode time ( #31954 )
2024-07-14 16:39:42 +01:00
Joao Gante
df1c248a6d
Generate: v4.42 deprecations 🧹 🧹 ( #31956 )
...
v4_42 deprecations
2024-07-14 16:39:24 +01:00
Joao Gante
739a63166d
Generate: remove deprecated code due to Cache
and cache_position
being default ( #31898 )
...
* tmp commit
* shorter
* nit
* explicit kwargs
* propagate changes
* mass propagation with a few manual touches (let's see how CI behaves)
* fix cacheless case
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* make fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-14 15:16:58 +01:00
fxmarty
8480fda6ee
Fix GenerationMixin.generate
compatibility with pytorch profiler ( #31935 )
...
use torch.compiler.is_compiling() when possible
2024-07-14 14:44:38 +01:00
Aviv Shamsian
7f79a97399
fix prompt strip to support tensors and np arrays ( #27818 )
...
* fix prompt strip to support tensors and np arrays
* framework agnostic
* change logic check before converting prompt into list
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding _convert_to_list to tokenization_whisper_fast
* adding tests for prompt decoding
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* revert minor
* make style formatting
* style formatting after update
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixing _strip_prompt to handle _decode_with_timestamps
* fix copies
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-07-12 20:07:10 +01:00
Joao Gante
d1a1bcf56a
Docker: TF pin on the consistency job ( #31928 )
...
* pin
* dev-ci
* dev-ci
* dev-ci
* test pushed image
2024-07-12 14:28:46 +02:00
jiqing-feng
aec1ca3a58
[Bug Fix] fix qa pipeline tensor to numpy ( #31585 )
...
* fix qa pipeline
* fix tensor to numpy
2024-07-11 22:22:26 +01:00
Naman Garg
c1e139c2b0
Adding hiera ( #30356 )
...
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* Removed tim dependency
* added HieraBlock
* fixed: Model name
* added tests for HieraModel, HieraBlock
* fixed imports
* fixed quality & copies
* Fixes
* Update docs/source/en/model_doc/hiera.md
Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed formatting
* Code quality & Import differences
* quality and repo-consistency fix
* fixed no torch error
* Docstring fix
* Docstring fix
* doc string fix
* fixed example usage
* Resolved issues in modeling_hiera
* Removed Hiera MAE
* Added test and resolved bug
* fixed doc string
* First commit
* Finished conversion script and model forward working
* Resolved all issues
* nits
* Improving tests
* Nits
* More nits
* Improving HieraForMaskedImageModeling
* More improvements and nits
* Fixed docstrings of outputs
* More fixes
* More imrpovments
* Updated conversion script
* Fixed docstrings
* Improved tests
* Fixed attentou outputs test
* All tests green
* Removed unnecessary file
* contribution attribution
* Resolved a few issues
* Resolved Comments
* Updated model repo id and fixed bugs
* Removed loss print
* Make tests green
* Updated docstrings
* Fix style
* Fixed num_heads in config
* Removed unnecessary video checkpoint related code in the conversion script
* Fix style
* Changed atol in conversion script
* HieraConfig
* Fix copies
* Fixed typo
* Resolved few issues
* make
* converted conv_nd -> nn.Module
* Removed video complexities
* Removed video complexities
* fix style
* Addressing comments
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix style
* Fixed tests
* Fixed typo
* Fixed interpolate test
* Made torch fx compatible
* Made sure imageprocesor is correct
* Addressed comments
* Noise directly as torch
* Remove unnecesary attr
* Added return_dit
* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated checkpoints
* [run_slow] hiera
* Fixed device mismatch
* [run_slow] hiera
* Fixed GPU tests
* [run_slow] hiera
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
Apoorv Khandelwal
574e68d554
Allow Trainer.get_optimizer_cls_and_kwargs
to be overridden ( #31875 )
...
* Change `Trainer.get_optimizer_cls_and_kwargs` to `self.`
* Make `get_optimizer_cls_and_kwargs` an instance method
* Fixing typo
* Revert `get_optimizer_cls_and_kwargs` to staticmethod
* restore newline to trainer.py eof
2024-07-11 22:13:06 +01:00
t11s
52585019a1
🚨 fix(SigLip): remove spurious exclusion of first vision output token ( #30952 )
...
fix(SigLip): remove spurious exclusion of first vision output token in classifier
2024-07-11 19:40:57 +01:00
Joao Gante
6a05f68f51
Generate: fix SlidingWindowCache.reset()
( #31917 )
...
fix sliding cache
2024-07-11 19:35:46 +01:00
Arthur
e314395277
Refactor flash attention implementation in transformers ( #31446 )
...
* dumb commit
* nit
* update
* something like this
* unpack in modeling utils
* safe import
* oups
* update
* nits
* diff convert gemma
* update
* start propagating
* udpate other modeling code as well
* update for sliding window models
* nits
* more init cleanups
* styling
* fixup
* noice
* pass fixup
* typo typing_extension -> typing_extensions
* torch.nn.functionnal -> torch.nn.functional
* add to import structure
* unpack
* simplify a bit more for this first version
* nut
* update
* update
* nit
* ease the import of `Unpack`
* remove useless `use_sliding_window`
* no qua please
* protect import?
* style
* [run-slow]
* [run slow] llama,gemma,mistral,mixtral
* remove extra kwargs
* fix llama
* address review comments
* apply diff_model_converter to modeling_gemma.py
* remove cache_position 1
* remove cache_position 2
* some cleaning
* refactor gemma2 as well
* apply review comments
* rename file to modeling_flash_attention_utils.py
* siglip refactor
* remove dead code
* is the hub down?
* still down?
* fix siglip
* fix gemma2
* fatal: Could not read from remote repository.
* fix typo in softcap implem
* flacky
* Failed: Timeout >120.0s
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
2024-07-11 20:37:31 +08:00
fxmarty
ad4ef3a290
Fix fx tests with inputs_embeds ( #31862 )
...
* fix tests
* [test_all] check
* address review comments
2024-07-11 20:14:03 +08:00
Omar Salman
1499a55008
Add warning message for beta and gamma parameters ( #31654 )
...
* Add warning message for and parameters
* Fix when the warning is raised
* Formatting changes
* Improve testing and remove duplicated warning from _fix_key
2024-07-11 13:01:47 +01:00
Sangbum Daniel Choi
23d6d0cc06
add gather_use_object arguments II ( #31799 )
...
* add gather_use_object arguments
* fix name and pass the CI test for Seq2SeqTrainer
* make style
* make it to functools
* fix typo
* add accelerate version:
* adding warning
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* make style
* Update src/transformers/training_args.py
* check function move to initial part
* add test for eval_use_gather_object
* fix minor
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-07-11 12:23:02 +01:00
Sai-Suraj-27
2e48b3e872
fix: Fixed the 1st argument
name in classmethods ( #31907 )
...
Fixed the first argument name in few classmethods.
2024-07-11 12:11:50 +01:00
Isotr0py
48c20700e1
Fix missing methods for Fuyu ( #31880 )
...
* add missing methods for FuyuForCausalLM
* fix a typo
* format code
* add missing tie_weights
* format code
2024-07-11 11:01:46 +01:00
Arthur
f4ec7a286a
[Gemma2
] Support FA2 softcapping ( #31887 )
...
* Support softcapping
* strictly greater than
* update
2024-07-11 11:57:35 +02:00
Arthur
f67e0f7fb7
[ConvertSlow
] make sure the order is preserved for addedtokens ( #31902 )
...
* preserve the order
* oups
* oups
* nit
* trick
* fix issues
2024-07-11 11:56:41 +02:00
Raushan Turganbay
14d3b3f0f0
Processor accepts any kwargs ( #31889 )
...
* accept kwargs in processors
* return unused kwargs
* fix tests
* typo
* update the other way
2024-07-11 13:20:30 +05:00
turboderp
a695c18649
Fixes to alternating SWA layers in Gemma2 ( #31775 )
...
* HybridCache: Flip order of alternating global-attn/sliding-attn layers
* HybridCache: Read sliding_window argument from cache_kwargs
* Gemma2Model: Flip order of alternating global-attn/sliding-attn layers
* Code formatting
2024-07-11 10:03:46 +02:00