* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* Correct the new defaults (#34377)
* Correct the new defaults
* CIs
* add check
* Update utils.py
* Update utils.py
* Add the max_length in generate test checking shape without passing length
* style
* CIs
* fix fx CI issue
* [auto. ping] Avoid sending empty info + add more team members (#34383)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix glm (#34388)
* Fix duplicated
* fix import
* Use non nested images and batched text Idefics2/3 (#34222)
* add support for non nested images and add tests
* add tests error scenario
* fix style
* added single and no image to error tests
* Fix onnx non-expotable inplace aten op (#34376)
* fix onnx non-expotable inplace op
* mistral, qwen2, qwen2_vl, starcoder2
* fixup copies
* Fix right padding in LLaVA models (#34305)
* fix right pad llavas
* device mismatch
* no filter (#34391)
* no filter
* no filter
* no filter
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* SynthID: better example (#34372)
* better example
* Update src/transformers/generation/configuration_utils.py
* Update src/transformers/generation/logits_process.py
* nits
* Tests: upgrade `test_eager_matches_sdpa_generate` (#34386)
* Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
* Avoid check expected exception when it is on CUDA (#34408)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typos in agents_advanced.md (#34405)
* [docs] Cache implementations (#34325)
cache
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* [run-slow] hubert
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* gpt neox flex attention + refactor
* some formatting
* small fix on dropout
* add assertion on flex attn test
* flaky ci :(
* add head mask support
* style
* handle dtype, replace torch where
* fixup flex with output attns
* code review and several other fixes
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* style
* remove unnecessary comment
* remove incorrect comment
* make flex attn check more agnostic tor versions and centralized
* change peft input dtype check to value since q and k could be affected by other stuff like RoPE
* i forgor
* flaky
* code review and small fixes
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Fix test_eager_matches_sdpa_inference for XPU backend
As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH
which is implemented on PyTorch level using aten operators and is device
agnostic with respect to implementation of each aten operator. Thus, we can
reuse CUDA (or CPU) MATH weights for XPU.
Fixes: #34888
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* fix test_tiny_timestamp_generation
* fix test_large_timestamp_generation
* fix test_whisper_shortform_single_batch_prev_cond
* fix test_whisper_shortform_multi_batch_hard_prev_cond
* return_timestamps necessary with long form
* fix test_default_multilingual_transcription_long_form
* fix test_tiny_token_timestamp_generation_longform
* fix test_whisper_longform_multi_batch_hard
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* fix typo
* do not expect special tokens
* fix test_whisper_longform_single_batch_beam
* fix test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* these tests does not make sense anymore
* this test does not make sense anymore
* make fixup
* suggested nits
* add test with forced_decoder_ids
* this test does not make sense anymore
* change assert for unittest test cases
* make fixup
* test with prompt_ids and task and language
* fix unittest test case call
* fix test_tiny_generation
* fix test_tiny_en_generation
* fix test_tiny_en_batched_generation
* fix test_tiny_longform_timestamps_generation
* fix test_tiny_timestamp_generation
* fix test_large_generation
* fix test_large_batched_generation
* fix test_large_generation_multilingual
* fix test_large_timestamp_generation
* fix test_large_timestamp_generation
* fix test_tiny_token_timestamp_generation_longform
* fix test_tiny_en_batched_generation
* make fixup
* [run-slow] whisper
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* add deformable detr image processor fast
* add fast processor to doc
* fix copies
* nit docstring
* Add tests gpu/cpu and fix docstrings
* fix docstring
* import changes from detr
* fix imports
* rebase and fix
* fix input data format change in detr and rtdetr fast
* softcapping
* soft cap before the mask
* style
* ...
* super nit
* update
* fixes
* update
* small issue with modular
* fix modular imports
* update
* fixup
* simplify a hell lot
* simplify cleaning imports
* finish fixing
* update our design
* nits
* use a deprecation cycle
* updates
* Fix modular (recursive deps need to always be computed after merges!)
* push
* fix
* update
* fix modular order
* make fix-copies
* updates
* update
* ?
* don't compile for now
* ?
* fix some stuff
* donc!
* fix copies
* update
* fixup
* ?
* fix two tests
* fix?
* for now, don't use head info
* eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :))
* fix-copies
* revert sdpa check
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* rebase, fix-copies and push
* add a slow integration test
* update the test
* fix left padding issue
* fix test
* remove duplicate scaling
* quality
* add a small test and make sure it works
* 2b
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* Add model skeletion with transformers-cli add-new-model-like
* Convert config to modular, add rms_norm_eps, delete clip_qkv
* Convert model to modular, add RMSNorm
* Add flash attention with qk norm and no qkv clipping
* Add decoder layer with RMSNorm after attention/feedforward layers
* Add base and causal model
* Add converter improvements from OLMo repo
* Update weight loading in OLMo to HF converter
* Set correct default for rms_norm_eps
* Set correct pipeline_model_mapping in test
* Run make fixup
* Fix model type
* Re-run modular conversion
* Manually set config docs to fix build errors
* Convert olmo-1124 to olmo_1124 to fix flash attention docs errors
* Start updating tests
* Update tests
* Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124
* Rename input_layernorm and post_attention_layernorm to reflect their ops better
* Use correct tokenizer
* Remove test unsupported by GPT2 tokenizer
* Create GenerationConfig outside of from_pretrained call
* Use simpler init file structure
* Add explicit __all__ to support simplified init
* Make safetensor serialization the default
* Update OLMo November 2024 docs
* save/load sub-configs
* nit forgot these
* fix copies
* move test to common
* use dict for sub-configs
* add load-save-laod test
* clean up modeling check
* oops this are correct keys
* fix some tests, missed some composite configs
* this model was missed
* kinda works
* update
* add tests
* update
* use special tokens in processors
* typo
* fix copies
* fix
* fix moshi after rebase
* update
* fix tests
* update
* Update docs/source/en/main_classes/tokenizer.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs
* test for load time adding tokens
* fix some more tests which are now fetched better
* one more fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* blip2 tests
* instructblips
* copies
* fix slow tests
* fix
* uncomment this
* clean up after rebase
* should be model main input
* fix overwritten tests
* oops len should be multiple of frame number
* style
* fix some tests
* Standardize image-text-to-text-models-output
add post_process_image_text_to_text to chameleon and cleanup
Fix legacy kwarg behavior and deprecation warning
add post_process_image_text_to_text to qwen2_vl and llava_onevision
Add post_process_image_text_to_text to idefics3, mllama, pixtral processor
* nit var name post_process_image_text_to_text udop
* nit fix deprecation warnings
* Add image-text-to-text pipeline
* add support for image url in chat template for pipeline
* Reformat to be fully compatible with chat templates
* Add tests chat template
* Fix imports and tests
* Add pipeline tag
* change logic handling of single prompt ans multiple images
* add pipeline mapping to models
* fix batched inference
* fix tests
* Add manual batching for preprocessing
* Fix outputs with nested images
* Add support for all common processing kwargs
* Add default padding when multiple text inputs (batch size>1)
* nit change version deprecation warning
* Add support for text only inference
* add chat_template warnings
* Add pipeline tests and add copied from post process function
* Fix batched pipeline tests
* nit
* Fix pipeline tests blip2
* remove unnecessary max_new_tokens
* revert processing kosmos2 and remove unnecessary max_new_tokens
* fix pipeline tests idefics
* Force try loading processor if pipeline supports it
* revert load_processor change
* hardcode loading only processor
* remove unnecessary try except
* skip imagetexttotext tests for kosmos2 as tiny model causes problems
* Make code clearer
* Address review comments
* remove preprocessing logic from pipeline
* fix fuyu
* add BC resize fuyu
* Move post_process_image_text_to_text to ProcessorMixin
* add guard in post_process
* fix zero shot object detection pipeline
* add support for generator input in pipeline
* nit
* change default image-text-to-text model to llava onevision
* fix owlv2 size dict
* Change legacy deprecation warning to only show when True
* add fast image processor rtdetr
* add gpu/cpu test and fix docstring
* remove prints
* add to doc
* nit docstring
* avoid iterating over images/annotations several times
* change torch typing
* Add image processor fast documentation
* tmp commit
* tmp commit
* cull overwrites of deleted tests
* typo
* more specific docstring
* make fixup
* parameterize at the top?
* correction
* more deletions :D
* tmp commit
* for VLMs too
* fix _check_outputs
* test nit
* make fixup
* fix another flaky
* test_generate_from_inputs_embeds -- handle missing attention mask
* feat: Added int conversion and unwrapping
* test: added tests for post_process_keypoint_detection of SuperPointImageProcessor
* docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib
* test: changed test to not depend on SuperPointModel forward
* test: added missing require_torch decorator
* docs: changed pyplot parameters for the keypoints to be more visible in the example
* tests: changed import torch location to make test_flax and test_tf
* Revert "tests: changed import torch location to make test_flax and test_tf"
This reverts commit 39b32a2f69.
* tests: fixed import
* chore: applied suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* tests: fixed import
* tests: fixed import (bis)
* tests: fixed import (ter)
* feat: added choice of type for target_size and changed tests accordingly
* docs: updated code snippet to reflect the addition of target size type choice in post process method
* tests: fixed imports (...)
* tests: fixed imports (...)
* style: formatting file
* docs: fixed typo from image[0] to image.size[0]
* docs: added output image and fixed some tests
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative
* docs: changed SuperPoint's docs to print output instead of just accessing
* style: applied make style
* docs: added missing output type and precision in docstring of post_process_keypoint_detection
* perf: deleted loop to perform keypoint conversion in one statement
* fix: moved keypoint conversion at the end of model forward
* docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method
* fix: changed type hint
* refactor: removed unnecessary brackets
* revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Correct the new defaults
* CIs
* add check
* Update utils.py
* Update utils.py
* Add the max_length in generate test checking shape without passing length
* style
* CIs
* fix fx CI issue
* Enable grad accum fix across all models + trainer fully in forward()
* handle peft case
* Account for DDP: need to run scale tests
* Use accelerator state
* Quality
* Guard
* Experiment w/ only fairseq fix
* Fairseq only
* Revert multiply_grads fix
* Mult by grad accum to fully bring back solution
* Style
* Good to go now
* Skip fx tests for now
* Bookmark
* Working now
* add colorize_depth and matplotlib availability check
* add post_process_depth_estimation for zoedepth + tests
* add post_process_depth_estimation for DPT + tests
* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth
* run `make fixup`
* fix import related error on tests
* fix more import related errors on test
* forgot some `torch` calls in declerations
* remove `torch` call in zoedepth tests that caused error
* updated docs for depth estimation
* small fix for `colorize` input/output types
* remove `colorize_depth`, fix various names, remove matplotlib dependency
* fix formatting
* run fixup
* different images for test
* update examples in `forward` functions
* fixed broken links
* fix output types for docs
* possible format fix inside `<Tip>`
* Readability related updates
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Readability related update
* cleanup after merge
* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`
* rewrite dict merging to support python 3.8
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* this worked in normal generation, needs more tests
* fix almost all tests in t5
* nit
* longt5, umt5, mt5
* style
* udop, pix2struct
* more models
* fix some tests
* fix onnx tests
* tracing tests fixed
* compile enabled and tested for t5 models
* fix small bug in slow tests
* [run-slow] t5
* uncomment
* style
* update with new generation refactoring
* nit
* fix copies
* this is the fix, had to change t5 to fix copies
* update
* [run-slow] t5
* [run-slow] t5
* update
* add test for encoder only T5
* clean up after rebase
* fix pop2piano
* add comment
* style
* fix copies after rebase
* fix copies missed this one
* first try
* codestyle
* idefics2 is happy
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
* fix-copies
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
* blip-2 needs to init vision from config
* when was this removed O_o
* minor fix
* tests
* this way?
* tests
* model-agnostic code
* codestyle
* add tests for idefics
* modify general test for VLMs
* no generation test for vlm yet!
* no generation test here also
* wanr in VIT-SDPA if output attn
* add more tests
* user can pass dict as attn impl
* repo consistency
* update
* muicgen
* no prints
* forgot speech enc-dec and clip
* how many composite models we have?
* musicgen meelody is same as mudicgen
* +siglip
* fix tests + add some more
* remove idefics custom overriden code
* make idefics2 automappable
* nits
* skip tests
* doctests
* Update src/transformers/models/idefics2/configuration_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/clip/test_modeling_clip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* major update, no need for automap
* clean up
* add FA2 test
* more tests
* style
* skip tests
* why did these started failing now?
* no attributes for FA2 needed
* one tiny test
* address comment about FA2 false warning
* style
* add new models and resolve conflicts
* fix copies
* let it be this way for now, come back tomorrow to review
* some more fixes
* update
* more updates
* update
* fix copies
* style and tests
* another big update
* fix tests
* fix tests
* update
* another update
* fix tests
* fix copies
* fix tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add idefics
* conflicts after merging main
* enable tests but need to fix some
* fix tests
* no print
* fix/skip some slow tests
* continue not skip
* rebasing broken smth, this is the fix
* mistral qna start
* mixtral qna
* oops
* qwen2 qna
* qwen2moe qna
* add missing input embed methods
* add copied to all methods, can't directly from llama due to the prefix
* make top level copied from
* Generate using exported model and enable gemma2-2b in ExecuTorch
* [run_slow] gemma, gemma2
* truncate expected output message
* Bump required torch version to support gemma2 export
* [run_slow] gemma, gemma2
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
* add sdpa to OPT
* chore: remove redundant whitespace in OPTDecoder class
* fixup
* bug fix
* add sdpa and attention generate test
* fixup
* Refactor OPTAttention forward method for improved readability and maintainability
* undo refactor for _shape and key,val states
* add OPT to doc, fixup didn't find it for some reason
* change order
* change default attn_implemntation in testing to eager
* [run-slow] opt
* change test_eager_matches_sdpa_generate to the one llama
* Update default attention implementation in testing common
* [run-slow] opt
* remove uneeded print
* [run-slow] opt
* refactor model testers to have attn_implementation="eager"
* [run-slow] opt
* convert test_eager_matches_sdpa_generate to opt-350M
* bug fix when creating mask for opt
* [run-slow] opt
* if layer head mask default to eager
* if head mask is not none fall to eager
* [run-slow] opt
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Clean up Unpack imports (#33631)
clean up Unpack imports
* Fix DPT /Dinov2 sdpa regression on main (#33660)
* fallback to eager if output attentions.
* fix copies
* handle dependency errors in check_imports (#33622)
* handle dependency errors in check_imports
* change log level to warning
* add back self.max_position_embeddings = config.max_position_embeddings (#33550)
* add back self.max_position_embeddings = config.max_position_embeddings
* fix-copies
* Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (#33613)
fix llavaqwen2 model conversion
* Uniformize kwargs for Udop processor and update docs (#33628)
* Add optional kwargs and uniformize udop
* cleanup Unpack
* nit Udop
* Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203)
* Enable BNB multi-backend support (#31098)
* enable cpu bnb path
* fix style
* fix code style
* fix 4 bit path
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* add multi backend refactor tests
* fix style
* tweak 4bit quantizer + fix corresponding tests
* tweak 8bit quantizer + *try* fixing corresponding tests
* fix dequant bnb 8bit
* account for Intel CPU in variability of expected outputs
* enable cpu and xpu device map
* further tweaks to account for Intel CPU
* fix autocast to work with both cpu + cuda
* fix comments
* fix comments
* switch to testing_utils.torch_device
* allow for xpu in multi-gpu tests
* fix tests 4bit for CPU NF4
* fix bug with is_torch_xpu_available needing to be called as func
* avoid issue where test reports attr err due to other failure
* fix formatting
* fix typo from resolving of merge conflict
* polish based on last PR review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix CI
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix error log
* fix error msg
* add \n in error log
* make quality
* rm bnb cuda restriction in doc
* cpu model don't need dispatch
* fix doc
* fix style
* check cuda avaliable in testing
* fix tests
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix doc
* fix check multibackends
* fix import sort
* remove check torch in bnb
* docs: update bitsandbytes references with multi-backend info
* docs: fix small mistakes in bnb paragraph
* run formatting
* reveret bnb check
* move bnb multi-backend check to import_utils
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix bnb check
* minor fix for bnb
* check lib first
* fix code style
* Revert "run formatting"
This reverts commit ac108c6d6b.
* fix format
* give warning when bnb version is low and no cuda found]
* fix device assignment check to be multi-device capable
* address akx feedback on get_avlbl_dev fn
* revert partially, as we don't want the function that public, as docs would be too much (enforced)
---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix error string after refactoring into get_chat_template (#33652)
* Fix error string after refactoring into get_chat_template
* Take suggestion from CR
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* uniformize git processor (#33668)
* uniformize git processor
* update doctring
* Modular `transformers`: modularity and inheritance for new model additions (#33248)
* update exampel
* update
* push the converted diff files for testing and ci
* correct one example
* fix class attributes and docstring
* nits
* oups
* fixed config!
* update
* nitd
* class attributes are not matched against the other, this is missing
* fixed overwriting self.xxx now onto the attributes I think
* partial fix, now order with docstring
* fix docstring order?
* more fixes
* update
* fix missing docstrings!
* examples don't all work yet
* fixup
* nit
* updated
* hick
* update
* delete
* update
* update
* update
* fix
* all default
* no local import
* fix more diff
* some fix related to "safe imports"
* push fixed
* add helper!
* style
* add a check
* all by default
* add the
* update
* FINALLY!
* nit
* fix config dependencies
* man that is it
* fix fix
* update diffs
* fix the last issue
* re-default to all
* alll the fixes
* nice
* fix properties vs setter
* fixup
* updates
* update dependencies
* make sure to install what needs to be installed
* fixup
* quick fix for now
* fix!
* fixup
* update
* update
* updates
* whitespaces
* nit
* fix
* simplify everything, and make it file agnostic (should work for image processors)
* style
* finish fixing all import issues
* fixup
* empty modeling should not be written!
* Add logic to find who depends on what
* update
* cleanup
* update
* update gemma to support positions
* some small nits
* this is the correct docstring for gemma2
* fix merging of docstrings
* update
* fixup
* update
* take doc into account
* styling
* update
* fix hidden activation
* more fixes
* final fixes!
* fixup
* fixup instruct blip video
* update
* fix bugs
* align gemma2 with the rest as well
* updats
* revert
* update
* more reversiom
* grind
* more
* arf
* update
* order will matter
* finish del stuff
* update
* rename to modular
* fixup
* nits
* update makefile
* fixup
* update order of the checks!
* fix
* fix docstring that has a call inside
* fiix conversion check
* style
* add some initial documentation
* update
* update doc
* some fixup
* updates
* yups
* Mostly todo gimme a minut
* update
* fixup
* revert some stuff
* Review docs for the modular transformers (#33472)
Docs
* good update
* fixup
* mmm current updates lead to this code
* okay, this fixes it
* cool
* fixes
* update
* nit
* updates
* nits
* fix doc
* update
* revert bad changes
* update
* updates
* proper update
* update
* update?
* up
* update
* cool
* nits
* nits
* bon bon
* fix
* ?
* minimise changes
* update
* update
* update
* updates?
* fixed gemma2
* kind of a hack
* nits
* update
* remove `diffs` in favor of `modular`
* fix make fix copies
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Fix CIs post merging modular transformers (#33681)
update
* Fixed docstring for cohere model regarding unavailability of prune_he… (#33253)
* Fixed docstring for cohere model regarding unavailability of prune_head() methods
The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality.
* Update src/transformers/models/cohere/modeling_cohere.py
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Generation tests: update imagegpt input name, remove unused functions (#33663)
* Improve Error Messaging for Flash Attention 2 on CPU (#33655)
Update flash-attn error message on CPU
Rebased to latest branch
* Gemma2: fix config initialization (`cache_implementation`) (#33684)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used (#33556)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used
* Fixed formatting with `ruff`.
* Uniformize kwargs for image-text-to-text processors (#32544)
* uniformize FUYU processor kwargs
* Uniformize instructblip processor kwargs
* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2
* Uniformize llava_next processor
* Fix save_load test for processor with chat_template only as extra init args
* Fix import Unpack
* Fix Fuyu Processor import
* Fix FuyuProcessor import
* Fix FuyuProcessor
* Add defaults for specific kwargs kosmos2
* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs
* Add tests processor Udop
* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature
* Fix overwrite tests kwargs processors
* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop
* Fix processing test fuyu
* remove unnecessary pad_token check in instructblip ProcessorTest
* Fix BC tests and cleanup
* FIx imports fuyu
* Uniformize Pix2Struct
* Fix wrong name for FuyuProcessorKwargs
* Fix slow tests reversed inputs align fuyu llava-next, change udop warning
* Fix wrong logging import udop
* Add check images text input order
* Fix copies
* change text pair handling when positional arg
* rebase on main, fix imports in test_processing_common
* remove optional args and udop uniformization from this PR
* fix failing tests
* remove unnecessary test, fix processing utils and test processing common
* cleanup Unpack
* cleanup
* fix conflict grounding dino
* 🚨🚨 Setting default behavior of assisted decoding (#33657)
* tests: fix pytorch tensor placement errors (#33485)
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"
According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.
Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* bump tokenizers, fix added tokens fast (#32535)
* update based on tokenizers release
* update
* nits
* update
* revert re addition
* don't break that yet
* fmt
* revert unwanted
* update tokenizers version
* update dep table
* update
* update in conversion script as well
* some fix
* revert
* fully revert
* fix training
* remove set trace
* fixup
* update
* update
* [Pixtral] Improve docs, rename model (#33491)
* Improve docs, rename model
* Fix style
* Update repo id
* fix code quality after merge
* HFQuantizer implementation for compressed-tensors library (#31704)
* Add compressed-tensors HFQuantizer implementation
* flag serializable as False
* run
* revive lines deleted by ruff
* fixes to load+save from sparseml, edit config to quantization_config, and load back
* address satrat comment
* compressed_tensors to compressed-tensors and revert back is_serializable
* rename quant_method from sparseml to compressed-tensors
* tests
* edit tests
* clean up tests
* make style
* cleanup
* cleanup
* add test skip for when compressed tensors is not installed
* remove pydantic import + style
* delay torch import in test
* initial docs
* update main init for compressed tensors config
* make fix-copies
* docstring
* remove fill_docstring
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* review comments
* review comments
* comments - suppress warnings on state dict load, tests, fixes
* bug-fix - remove unnecessary call to apply quant lifecycle
* run_compressed compatability
* revert changes not needed for compression
* no longer need unexpected keys fn
* unexpected keys not needed either
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add to_diff_dict
* update docs and expand testing
* Update _toctree.yml with compressed-tensors
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update doc
* add note about saving a loaded model
---------
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* update model card for opt
* add batch size to inference table
* [slow-run] opt
* [run-slow] opt
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Tibor Reiss <75096465+tibor-reiss@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Muhammad Naufil <m.naufil1@gmail.com>
Co-authored-by: sizhky <yyeshr@gmail.com>
Co-authored-by: Umar Butler <umar@umar.au>
Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* Fix Failed tests with mobile bert
* Cast to the correct dtype
* Code fixup
* Fix padding_idx larger that embedding_size
* Reduce covariance more. use 1e-7 instead of 1e-5
* Comment fix
* Reduce covariance more. use 1e-9 instead of 1e-7
* Copy new config
* all but MRA fixed
* fix mra
* very flaky
* skip instead
* make fixup
---------
Co-authored-by: Joao Gante <joao@huggingface.co>
* Add Auto model for image-text-to-text
* Remove donut from processing auto, add chameleon ti image text to text models
* add qwen2_vl and llava_onevision
* add pixtral to auto model for image-text-to-text
* add mllama and idefics3
* remove models in IGNORE_NON_AUTO_CONFIGURED
* add AutoModelForImageTextToText to tests and doc
* Initial commit for MyT5 model
* custom implementation of MyT5 tokenizer, unused files deleted
* unittest for myt5 tokenizer
* upadate of import structure and style
* removed remmanents of MyT5Config
* fixed docstrings
* Updates after review: filled documentaion file, new docstrings and tests added
* Fixed code style issues
* fixed copied from to refer to function
* updated loading myt5 tokenizer in tests, added sample byte map file to fixtures
* changes after review
* removed redundant copied from
* removed redundant copied from
* optimalization and loading model from hf
* [run_slow] myt5
* [run-slow] myt5
* Updated en documentation for myt5
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* onboard phimoe model
* removed debug code
* added unit tests
* updated docs
* formatted
* fixed unit tests
* fixed test case
* fixed format
* refactored code
* fixed expected outputs in the integration tests
* Added a warning msg
* Addressed comments
* Addressed comments
* fixed test cases
* added paper link
* Addressed comments
* Refactored PhimoeForCausalLM forward fn
* Refactored PhimoeRotaryEmbedding class
* fixed test cases
* fixed testcase
* fixed test case
* Addressed comments
* fixed test cases
* fixed testcases
* Used cache position instead to get the seq len
* fix beam indices in token_timestamps
* fix attention_mask in FA2
* correct translation example with the right example
* correct how somes tests are using outputs + correct num_frames
* fix shortform batch prev cond tests
* make fix-copies
* make fix-copies
* take care of shifting beam indices
* [run-slow] whisper
* [run-slow] whisper
* add unit tests for splinter_tokenizer
* add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called
* remove unused import
* remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines
* remove all the spaces
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add support for custom inputs and batched inputs in ProcessorTesterMixin
* Fix batch_size behavior ProcessorTesterMixin
* Change format prepare inputs batched
* Remove override test pixtral processor
* Remove unnecessary tests and cleanup after new prepare_inputs functions
* Fix instructBlipVideo image processor
* Fix Mamba slow path bug with dtype mismatch.
* Update test_modeling_mamba.py
* Improve style.
* Fix issue with cache position of dtype mismatch test.
* Change test for slow path.
* Revert changes.
* Switch to buggy code and add test to catch it.
* Fix the dtype mismatch bug and add test code to verify it.
* Fix minor bug with test.
* Fix incorrect dtype of model output.
* Fix incorrect dtype of cache.
* Fix incorrect dtype of ssm cache.
* Fix incorrect dtype of conv state.
* Remove assertion for ssm state.
* Add assertion for conv state dtype.
* Fix all issues with dtype mismatch test.
* clean_up_tokenization_spaces=False if unset
* deprecate warning
* updating param for old models
* update models
* make fix-copies
* fix-copies and update bert models
* warning msg
* update prophet and clvp
* updating test since space before is arbitrarily removed
* remove warning for 4.45
* Add Idefics 3!
* fixes to make both pipelines identical
* fix for quantized models
* First pass at the review
* remove vocab size from the main config (it's still in the text_config)
* hot fix for merve
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* re-add model_type for text_config
* remove support for old_cache
* remove hidden_size from main config
* rename idefics3 HF repo
* few changes suggested in the PR
* fix to input_data_format computation
* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion
* improve example
* few improvements from amy's review
* big change to enable processing input images as numpy arrays
* Changes to the code to uniformize processor kwargs
* image processing tests
* image processing tests fixes and some bugs they discovered
* addressed review comments from Yoni
* fix modeling tests
* remove special tokens that are not special
* fixes tests
* skip failing tests - they also fail for idefics2
* added paper and readded the tests with multi gpu, who knows
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review amy until image_processing_idefics3
* last comments from Amy
* review amy
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* doc improvement - amy review
* fix runtime error during fine-tuning
* amy's review
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ruff
* amy's comment on the order
* ruff ruff
* fix copies
* square images when they are not splitted
* ruff :(
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics3/test_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix small bug introduced in refactor
* amy's image processing changes
* fixes peft tests and ruff
* modify to_pil_image from transformers. and review from emanuele.
* add modified to_pil_image
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"
According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.
Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* add sdpa to dinov2
* fixup
* add dinov2 to sdpa doc
* update doc order
* [run-slow] dinov2
* common to eager
* [run-slow] dinov2
* update attn implementation in common
* update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit
* [run-slow] dinov2
---------
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
* add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin
* change size and crop_size in processor kwargs tests to do_rescale and rescale_factor
* remove unnecessary llava processor kwargs test overwrite
* nit
* change data_arg_name to input_name
* Remove unnecessary test override
* Remove unnecessary tests Paligemma
* Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* clean mimi commit
* some nits suggestions from Arthur
* make fixup
* rename repo id + change readme
* Update docs/source/en/model_doc/mimi.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add flaky flag to batching equivalence due to audio_codes failing sometimes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix format
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* [run_slow] idefics2
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* added sequences_scores to the output
* added beam_indices to output
* added test to check for beam_indices, sequences_scores and their shape
* removed redundant whitespaces
* make fixup
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* split test case
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix ci failure
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refine test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading
* make style
* fix sew
* fix sew and sew_d tests
* Fix failing tensor placement in Whisper
* fix long form generation tests
* more return_timestamps=True
* make fixup
* [run_slow] whisper
* [run_slow] whisper
* Uniformize kwargs for LlaVa and update docs
* Change order of processor inputs in docstring
* Improve BC support for reversed images and text inputs
* cleanup llava processor call docstring
* Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning
* Put function check reversed images text outside base processor class
* Refactor _validate_images_text_input_order
* Add ProcessingUtilTester
* fix processing and test_processing
* initial commit
* gloups
* updates
* work
* weights match
* nits
* nits
* updates to support the tokenizer :)
* updates
* Pixtral processor (#33454)
* rough outline
* Add in image break and end tokens
* Fix
* Udo some formatting changes
* Set patch_size default
* Fix
* Fix token expansion
* nit in conversion script
* Fix image token list creation
* done
* add expected results
* Process list of list of images (#33465)
* updates
* working image and processor
* this is the expected format
* some fixes
* push current updated
* working mult images!
* add a small integration test
* Uodate configuration docstring
* Formatting
* Config docstring fix
* simplify model test
* fixup modeling and etests
* Return BatchMixFeature in image processor
* fix some copies
* update
* nits
* Update model docstring
* Apply suggestions from code review
* Fix up
* updates
* revert modeling changes
* update
* update
* fix load safe
* addd liscence
* update
* use pixel_values as required by the model
* skip some tests and refactor
* Add pixtral image processing tests (#33476)
* Image processing tests
* Add processing tests
* woops
* defaults reflect pixtral image processor
* fixup post merge
* images -> pixel values
* oups sorry Mr docbuilder
* isort
* fix
* fix processor tests
* small fixes
* nit
* update
* last nits
* oups this was really breaking!
* nits
* is composition needs to be true
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* Addressed comments
* added a unit test
* fixed cache position
* Added a warning msg to the forward fn
* fixed test case
* Update tokenization_whisper.py
Fix issue with flax whisper model
* Update tokenization_whisper_fast.py
Fix issue with flax whisper model
* Update tokenization_whisper.py
just check len of token_ids
* Update tokenization_whisper_fast.py
just use len of token_ids
* Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update test_tokenization_whisper.py to add test for _convert_to_list method
* Update test_tokenization_whisper.py to fix code style issues
* Fix code style
* Fix code check again
* Update test_tokenization)whisper.py to Improve code style
* Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available
* Update tests/models/whisper/test_tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method
* Revert the changes automatically applied by formatter and was unrelated to PR
* Format for minimal changes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Import structure & first three model refactors
* Register -> Export. Export all in __all__. Sensible defaults according to filename.
* Apply most comments from Amy and some comments from Lucain
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* Style
* Add comment
* Clearer .py management
* Raise if not in backend mapping
* More specific type
* More efficient listdir
* Misc fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* add self.head_dim for VisionAttention in Qwen2-VL
* add self.head_dim for VisionAttention in Qwen2-VL
* fix ci
* black the test_modeling_qwen2_vl.py
* use ruff to format test_modeling_qwen2_vl.py
* [run-slow] qwen2_vl
* use tying for python3.8
* fix the import format
* use ruff to fix the ci error I001
* [run-slow] qwen2_vl
* remove unused import
* commit for rebase
* use ruff fix ci
* [run-slow] qwen2_vl
---------
Co-authored-by: root <liji>
* Add validation for maximum sequence length in modeling_whisper.py
Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.
This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.
* Change exception message in src/transformers/models/whisper/modeling_whisper.py
The exception message is for whisper's label's sequence max length.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py
It's for whisper's config.max_target_positions.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py
* Add test for maximum label's sequence length in test_modeling_whisper.py
* Add self to modeling_whisper.py
* Update test_modeling_whisper.py with respect to automatic validations
* Update modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Remove assert from test_modeling_whisper.py
* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py
* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py
* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py
* Add new tests in test_modeling_whisper.py
* Update test_modeling_whisper.py
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Load remote code only once
* Use hash as load indicator
* Add a new option `force_reload` for old behavior (i.e. always reload)
* Add test for dynamic module is cached
* Add more type annotations to improve code readability
* Address comments from code review
* [InstructBLIP] qformer_tokenizer is required input
* Bit safer
* Add to instructblipvideo processor
* Fix up
* Use video inputs
* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* init fix
* fix mask during cached forward, move mask related stuff to own function
* adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp)
* revert overwriting new integration tests
* move some comments to docstring
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs
* remove crop_size argument in align processor tests to be coherent with base tests
* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
* fix typo
* uniform kwargs
* make style
* add comments
* remove return_tensors
* remove common_kwargs from processor since it propagates
* make style
* return_token_type_ids to True
* revert the default imagekwargs since does not accept any value in the image processro
* revert processing_utils.py
* make style
* add molbap's commit
* fix typo
* fix common processor
* remain
* Revert "add molbap's commit"
This reverts commit a476c6ee88.
* add unsync PR
* revert
* make CI happy
* nit
* import annotationformat
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* Draft fast image processors
* Draft working fast version
* py3.8 compatible cache
* Enable loading fast image processors through auto
* Tidy up; rescale behaviour based on input type
* Enable tests for fast image processors
* Smarter rescaling
* Don't default to Fast
* Safer imports
* Add necessary Pillow requirement
* Woops
* Add AutoImageProcessor test
* Fix up
* Fix test for imagegpt
* Fix test
* Review comments
* Add warning for TF and JAX input types
* Rearrange
* Return transforms
* NumpyToTensor transformation
* Rebase - include changes from upstream in ImageProcessingMixin
* Safe typing
* Fix up
* convert mean/std to tesnor to rescale
* Don't store transforms in state
* Fix up
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Warn if fast image processor available
* Update src/transformers/models/vit/image_processing_vit_fast.py
* Transpose incoming numpy images to be in CHW format
* Update mapping names based on packages, auto set fast to None
* Fix up
* Fix
* Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test
* Update src/transformers/models/vit/image_processing_vit_fast.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add equivalence and speed tests
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* docstring and argument fix
* doc fixes and test case fix suggested in review.
* varibale typo fix
* styling and name fixes for padding interpolation flag.
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
* Initial attempt
* Updates: PR suggestions
* Interpolate the relative position bias when interpolate_pos_encoding is True
* Add slow tag for the added tests
* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
* Added interpolate pos encoding feature and test to deit
* Added interpolate pos encoding feature and test for deit TF model
* readded accidentally delted test for multi_gpu
* storing only patch_size instead of entire config and removed commented code
* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix the get_size_with_aspect_ratio in max_size situation
* make fix-up
* add more general solution
* consider when max_size is not defined
* fix typo
* fix typo
* simple fix
* fix error
* fix if else error
* fix error of size overwrite
* fix yolos image processing
* fix detr image processing
* make
* add longest related test script
* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more test
* add test script about longest size
* remove deprecated
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* seems like `split_special_tokens` is used here
* split special token
* add new line at end of file
* moving split special token test to common tests
* added assertions
* test
* fixup
* add co-author
* passing rest of args to gptsan_japanese, fixing tests
* removing direct comparison of fast and slow models
* adding test support for UDOP and LayoutXLM
* ruff fix
* readd check if slow tokenizer
* modify test to handle bos tokens
* removing commented function
* trigger build
* applying review feedback - updated docstrings, var names, and simplified tests
* ruff fixes
* Update tests/test_tokenization_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* applying feedback, comments
* shutil temp directory fix
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
Co-authored-by: itazap <itazap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>
* added interpolation for vitmae model in pytorch as well as tf.
* Update modeling_vit_mae.py
irreugalr import fixed
* small changes and proper formatting
* changes suggested in review.
* modified decoder interpolate_func
* arguments and docstring fix
* Apply suggestions from code review
doc fixes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add test that currently fails
* test passed
* all perceiver passed
* fixup, style, quality, repo-consistency, all passed
* Apply suggestions from code review: default to False + compute sqrt once only
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix a minor bracket
* replace dim with self._num_channels
* add arguments to the rest preprocessors
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True
* Testing for the non-safe-tensors case, since the default is safe-tensors already
* Running fixup/fix-copies
* Adding accelerate annotations to tests
* change cis
* nits
* update
* minor updates
* [push-ci-image]
* nit [push-ci-image]
* nitsssss
* [build-ci-image]
* [push-ci-image]
* [push-ci-image]
* both
* [push-ci-image]
* this?
* [push-ci-image]
* pypi-kenlm needs g++
* [push-ci-image]
* nit
* more nits [push-ci-image]
* nits [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* add vision
* [push-ci-image]
* [push-ci-image]
* add new dummy file but will need to update them [push-ci-image]
* [push-ci-image]
* show package size as well
* [push-ci-image]
* potentially ignore failures
* workflow updates
* nits [push-ci-image]
* [push-ci-image]
* fix consistency
* clean nciida triton
* also show big packages [push-ci-image]
* nit
* update
* another one
* line escape?
* add accelerate [push-ci-image]
* updates [push-ci-image]
* nits to run tests, no push-ci
* try to parse skip reason to make sure nothing is skipped that should no be skippped
* nit?
* always show skipped reasons
* nits
* better parsing of the test outputs
* action="store_true",
* failure on failed
* show matched
* debug
* update short summary with skipped, failed and errors
* nits
* nits
* coolu pdates
* remove docbuilder
* fix
* always run checks
* oups
* nits
* don't error out on library printing
* non zero exi codes
* no warning
* nit
* WAT?
* format nit
* [push-ci-image]
* fail if fail is needed
* [push-ci-image]
* sound file for torch light?
* [push-ci-image]
* order is important [push-ci-image]
* [push-ci-image] reduce even further
* [push-ci-image]
* use pytest rich !
* yes [push-ci-image]
* oupsy
* bring back the full traceback, but pytest rich should help
* nit
* [push-ci-image]
* re run
* nit
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* empty push to trigger
* [push-ci-image]
* nit? [push-ci-image]
* empty
* try to install timm with no deps
* [push-ci-image]
* oups [push-ci-image]
* [push-ci-image]
* [push-ci-image] ?
* [push-ci-image] open ssh client for git checkout fast
* empty for torch light
* updates [push-ci-image]
* nit
* @v4 for checkout
* [push-ci-image]
* [push-ci-image]
* fix fetch tests with parallelism
* [push-ci-image]
* more parallelism
* nit
* more nits
* empty to re-trigger
* empty to re-trigger
* split by timing
* did not work with previous commit
* junit.xml
* no path?
* mmm this?
* junitxml format
* split by timing
* nit
* fix junit family
* now we can test if the xunit1 is compatible!
* this?
* fully list tests
* update
* update
* oups
* finally
* use classname
* remove working directory to make sure the path does not interfere
* okay no juni should have the correct path
* name split?
* sort by classname is what make most sense
* some testing
* naem
* oups
* test something fun
* autodetect
* 18?
* nit
* file size?
* uip
* 4 is best
* update to see versions
* better print
* [push-ci-image]
* [push-ci-image]
* please install the correct keras version
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* uv is fucking me up
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* nits
* [push-ci-image]
* [push-ci-image]
* install issues an pins
* tapas as well
* nits
* more paralellism
* short tb
* soundfile
* soundfile
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* oups
* [push-ci-image]
* fix some things
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* use torch-light for hub
* small git lfs for hub job
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fix tf tapas
* [push-ci-image]
* nits
* [push-ci-image]
* don't update the test
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* no use them
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* update tf proba
* [push-ci-image]
* [push-ci-image]
* woops
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* test with built dockers
* [push-ci-image]
* skip annoying tests
* revert fix copy
* update test values
* update
* last skip and fixup
* nit
* ALL GOOOD
* quality
* Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py
* Update docker/quality.dockerfile
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update src/transformers/models/tapas/modeling_tf_tapas.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* use torch-speed
* updates
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fuck ken-lm [push-ci-image]
* [push-ci-image]
* [push-ci-image]
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move scaling to nn.Module
* let the test be here for now (need to fix)
* failing tests
* last failing models
* Revert commit 4c14817f38
* clean-up
* oops forgot
* codestyle
* raise NotImplemented when possible
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* skip tests in respective modeling files
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Enable instantiating model with pretrained backbone weights
* Clarify pretrained import
* Use load_backbone instead
* Add backbone_kwargs to config
* Fix up
* Add tests
* Tidy up
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add backbone_kwargs to config
* Pass kwargs to constructors
* Draft
* Fix tests
* Add back timm - weight naming
* More tidying up
* Whoops
* Tidy up
* Handle when kwargs are none
* Update tests
* Revert test changes
* Deformable detr test - don't use default
* Don't mutate; correct model attributes
* Add some clarifying comments
* nit - grammar is hard
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Adding SDPA support for BERT
* Using the proper input name for testing model input in inference()
* Adding documentation for SDPA in BERT model page
* Use the stable link for the documentation
* Adding a gate to only call .contiguous() for torch < 2.2.0
* Additions and fixes to the documentation
* Minor updates to documentation
* Adding extra requirements needed for the contiguous() bug
* Adding "Adapted from" in plcae of the "Copied from"
* Add benchmark speedup tables to the documentation
* Minor fixes to the documentation
* Use ClapText as a replacemenet for Bert in the Copied-From
* Some more fixes for the fix-copies references
* Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
[test all]
* Undo changes to separate test
* Refactored SDPA self attention code for KV projections
* Change use_sdpa to attn_implementation
* Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
* first modeling code
* make repository
* still WIP
* update model
* add tests
* add latest change
* clean docstrings and copied from
* update docstrings md and readme
* correct chroma function
* correct copied from and remove unreleated test
* add doc to toctree
* correct imports
* add convert script to notdoctested
* Add suggestion from Sanchit
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct get_uncoditional_inputs docstrings
* modify README according to SANCHIT feedback
* add chroma to audio utils
* clean librosa and torchaudio hard dependencies
* fix FE
* refactor audio decoder -> audio encoder for consistency with previous musicgen
* refactor conditional -> encoder
* modify sampling rate logics
* modify license at the beginning
* refactor all_self_attns->all_attentions
* remove ignore copy from causallm generate
* add copied from for from_sub_models
* fix make copies
* add warning if audio is truncated
* add copied from where relevant
* remove artefact
* fix convert script
* fix torchaudio and FE
* modify chroma method according to feedback-> better naming
* refactor input_values->input_features
* refactor input_values->input_features and fix import fe
* add input_features to docstrigs
* correct inputs_embeds logics
* remove dtype conversion
* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
* change warning for chroma length
* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* change way to save wav, using soundfile
* correct docs and change to soundfile
* fix import
* fix init proj layers
* add draft training
* fix cross entropy
* clean loss computation
* fix labels
* remove line breaks from md
* fix issue with docstrings
* add FE suggestions
* improve is in logics and remove useless imports
* remove custom from_pretrained
* simplify docstring code
* add suggestions for modeling tests
* make style
* update converting script with sanity check
* remove encoder attention mask from conditional generation
* replace musicgen melody checkpoints with official orga
* rename ylacombe->facebook in checkpoints
* fix copies
* remove unecessary warning
* add shape in code docstrings
* add files to slow doc tests
* fix md bug and add md to not_tested
* make fix-copies
* fix hidden states test and batching
* update training code
* add training tests for melody
* add training for o.g musicgen
* fix copied from
* remove final todos
* make style
* fix style
* add suggestions from review
* add ref to the original loss computation code
* rename method + fix labels in tests
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* chore(root): Initial commit of Phi-3 files.
* fix(root): Fixes Phi-3 missing on readme.
* fix(root): Ensures files are consistent.
* fix(phi3): Fixes unit tests.
* fix(tests): Fixes style of phi-3 test file.
* chore(tests): Adds integration tests for Phi-3.
* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.
* fix(phi3): Fixes incorrect docstrings.
* fix(phi3): Fixes docstring typos.
* fix(phi3): Adds support for Su and Yarn embeddings.
* fix(phi3): Improves according first batch of reviews.
* fix(phi3): Uses up_states instead of y in Phi3MLP.
* fix(phi3): Uses gemma rotary embedding to support torch.compile.
* fix(phi3): Improves how rotary embedding classes are defined.
* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.
* fix(phi3): Adds last suggestions to modeling file.
* fix(phi3): Splits inv_freq calculation in two lines.
* Fixed main train issues
* Added loss test
* Update src/transformers/models/seggpt/modeling_seggpt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added missing labels arg in SegGptModel forward
* Fixed typo
* Added slow test to test loss calculation
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* push legacy to fast as well
* super strange
* Update src/transformers/convert_slow_tokenizer.py
* make sure we are BC
* fix Llama test
* nit
* revert
* more test
* style
* update
* small update w.r.t tokenizers
* nit
* don't split
* lol
* add a test for `add_prefix_space=False`
* fix gemma tokenizer as well
* update
* fix gemma
* nicer failures
* fixup
* update
* fix the example for legacy = False
* use `huggyllama/llama-7b` for the PR doctest
* nit
* use from_slow
* fix llama
* Duplicate swiftformer
* Convert SwiftFormerPatchEmbedding
* Convert SwiftFormerEmbeddings
* Convert TFSwiftFormerMlp
* Convert TFSwiftFormerConvEncoder
* Convert TFSwiftFormerLocalRepresentation
* convert TFSwiftFormerEncoderBlock
* Convert SwiftFormerStage
* Convert SwiftFormerEncoder
* Add TFSWiftFormerPreTrainedModel
* Convert SwiftFormerForImageClassification
* Add kwargs and start drop path
* Fix syntax
* Change Model class name
* Add TFSwiftFormer to __init__
* Duplicate test_modeling_swiftformer
* First test conversions
* Change require_torch to require_tf
* Add exports to swiftformer __init__
* Add TFSwiftFormerModel wrapper
* Fix __init__ and run black
* Remove docstring from MainLayer, fix padding
* Use keras.layers.Activation on keras.Sequential
* Fix swiftformer exports
* Fix activation layer from config
* Remove post_inits
* Use tf.keras.layers.ZeroPadding2D
* Convert torch normalize
* Change tf test input shape
* Fix softmax and reduce_sum
* Convert expand_dims and repeat
* Add missing reshape and tranpose
* Simplify TFSwiftFormerEncoderBlock.call
* Fix mismatch in patch embeddings
* Fix expected output shape to match channels last
* Fix swiftformer typo
* Disable test_onnx
* Fix TFSwiftFormerForImageClassification call
* Add unpack inputs
* Convert flatten(2).mean(-1)
* Change vision dummy inputs (to be reviewed)
* Change test_forward_signature to use .call
* Fix @unpack_inputs
* Set return_tensors="tf" and rename class
* Rename wrongly named patch_embeddings layer
* Add serving_output and change dummy_input shape
* Make dimensions BCHW and transpose inside embedding layer
* Change SwiftFormerEncoderBlock
* Fix ruff problems
* Add image size to swiftformer config
* Change tranpose to MainLayer and use -1 for reshape
* Remove serving_outputs and dummy_inputs
* Remove test_initialization test from tf model
* Make Sequential component a separate layer
* Fix layers' names
* Tranpose encoder outputs
* Fix tests and check if hidden states is not None
* Fix TFSwiftFormerForImageClassification
* Run make fixup
* Run make fix-copies
* Update modeling_tf_auto
* Update docs
* Fix modeling auto mapping
* Update modelint_tf_swiftformer docs
* Fill image_size doc and type
* Add reduction=None to loss computation
* Update docs
* make style
* Debug: Delete the tip to see if that changes anything
* Re-add tip
* Remove add_code_sample_docstrings
* Remove unused import
* Get the debug to actually tell us the problem it has with the docs
* Try a substitution to match the PyTorch file?
* Add swiftformer to ignore list
* Add build() methods
* Update copyright year
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove FIXME comment
* Remove from_pt
* Update copyright year
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Rename one-letter variables
* Remove FIXMEs related to momentum
* Remove old TODO comment
* Remove outstanding FIXME comments
* Get dropout rate from config
* Add specific dropout config for MLP
* Add convencoder dropout to config
* Pass config to SwiftFormerDropPath layer
* Fix drop_path variable name and add Adapted from comment
* Run ruff
* Removed copied from comment
* Run fix copies
* Change drop_path to identity to match pt
* Cleanup build() methods and move to new keras imports
* Update docs/source/en/model_doc/swiftformer.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Raise error if drop_path_rate > 0.0
* Apply suggestions from code review
Replace (self.dim), with self.dim,
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove drop_path function
* Add training to TFSwiftFormerEncoder
* Set self.built = True last
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Should have been added to previous commit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Change default_feature_extractor to default_image_processor
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Import Keras from modeling_tf_utils
* Remove relative import
* Run ruff --fix
* Move import keras to tf_available
* Add copied from comment to test_forward_signature
* Reduce batch size and num_labels
* Extract loss logic to hf_compute_loss
* Run ruff format
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* wip
* fix __init__.py
* add docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments 1
* work on make fixup
* pass configs down
* add sdpa attention
* remove DbrxBlock
* add to configuration_auto
* docstring now passes formatting test
* fix style
* update READMEs
* add dbrx to modeling_auto
* make fix-copies generated this
* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* config docstring passes formatting test
* rename moe_loss_weight to router_aux_loss_coef
* add to flash-attn documentation
* fix model-path in tests
* Explicitly make `"suli"` the default `ffn_act_fn`
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
* fix _flash_attn_uses_top_left_mask and is_causal
* fix tests path
* don't use token type IDs
* follow Llama and remove token_type_ids from test
* init ConfigTester differently so tests pass
* remove multiple choice test
* remove question + answer test
* remove sequence classification test
* remove token classification test
* copy Llama tests and remove token_type_ids from test inputs
* do not test pruning or headmasking; style code
* add _tied_weights_keys parameter to pass test
* add type hints
* fix type check
* update config tester
* remove masked_lm test
* remove encoder tests
* initialize DbrxModelTester with correct params
* style
* torch_dtype does not rely on torch
* run make fixup, fix-copies
* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
* add copyright info
* fix imports and DbrxRotaryEmbedding
* update DbrxModel docstring
* use copies
* change model path in docstring
* use config in DbrxFFN
* fix flashattention2, sdpaattention
* input config to DbrXAttention, DbrxNormAttentionNorm
* more fixes
* fix
* fix again!
* add informative comment
* fix ruff?
* remove print statement + style
* change doc-test
* fix doc-test
* fix docstring
* delete commented out text
* make defaults match dbrx-instruct
* replace `router_aux_loss_coef` with `moe_loss_weight`
* is_decoder=True
* remove is_decoder from configtester
* implement sdpa properly
* make is_decoder pass tests
* start on the GenerationTesterMixin tests
* add dbrx to sdpa documentation
* skip weight typing test
* style
* initialize smaller model
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add DBRX to toctree
* skip test_new_cache_format
* make config defaults smaller again
* add pad_token_id
* remove pad_token_id from config
* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* Update src/transformers/models/dbrx/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix typo
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs, fix configuration_auto.py
* address pr comments
* remove is_decoder flag
* slice
* fix requires grad
* remove grad
* disconnect differently
* remove grad
* enable grads
* patch
* detach expert
* nissan al ghaib
* Update modeling_dbrx.py
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* replace "Gemma" with "Dbrx"
* remove # type: ignore
* don't hardcode vocab_size
* remove ToDo
* Re-add removed idefics2 line
* Update test to use tiny-random!
* Remove TODO
* Remove one more case of loading the entire dbrx-instruct in the tests
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address some comments
* small model
* add dbrx to tokenization_auto
* More docstrings with add_start_docstrings
* Dbrx for now
* add PipelineTesterMixin
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove flash-attn2 import error
* fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add useage example
* put on one line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix ffn_act_fn
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change "dbrx" to "DBRX" for display purposes.
* fix __init__.py?
* fix __init__.py
* fix README
* return the aux_loss
* remove extra spaces
* fix configuration_auto.py
* fix format in tokenization_auto
* remove new line
* add more useage examples
---------
Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add jamba arch
* apply "make fix-copies" changes
* fix link to model in JambaConfig docstring
* Add n_ctx in modeling file because repo-consistency wants that
* Add jamba to flash attention and sdpa documentation
* mamba dt_proj quant fix now works for LoRA as well
* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
* add jamba to tokenization auto
* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
* simple PR fixes
* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
* Add copied comment on JambaMLP (it's the same as MixtralMLP)
* remove padding_mask warnings. It's not supported anymore
* fix docstring. Float instead of int
* A few more minor PR fixes
* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
* Return None attention weights from mamba layers. Append to all attentions only if not None.
* remove some leftover jamba archive lists
* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
* Add Jamba paper on READMEs
* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
* Add copied from comment
* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
* clearer docstring for _convert_to_standard_cache
* style fixes
* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
* rename test so it still overrides what its meant to override
* draft
* oups
* nit
* remove more complexe logic
* fix names used in config
* fix fix fix
* style
* fix some more failing tests
* generate did not init the cache 🙃
* more small nits
* typo
* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
* fix init of pkv with torch.tensor()
* empty tensor
* fix some init issues
* stupid changes required by generate because it does not even support it's own DynamicCache class
* more fixes
* fix general assisted gen cache_position bug
* tests passing
* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
* fix docstrings and typehints for past_key_values
* style fixes
* fix docs
* change typehint due to copy from Mixtral
* forgot import
* import order
* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
* Add integration test with tiny tandom Jamba model on hub
* fix flash attention cache shapes
* bring back forgotten hidden states
* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
* align integration test after modeling fixes
* bugfix - mamba can use precomputed states only of forward pass is on a single token
* bugfix - mamba can use precomputed states only if they match the batch size
* typo
* remove making _prepare_4d_causal_attention_mask a leaf function
* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Add OLMo using add-new-model-like with Llama
* Fix incorrect tokenizer for OLMo
* Copy-paste relevant OLMo methods and their imports
* Add OLMo config
* Modify OLMo config to follow HF conventions
* Remove unneeded Llama code from OLMo model
* Add ability for OLMo model to output attentions
* Add OLMoPreTrainedModel and OLMoModel
* Add OLMoForCausalLM
* Minor fixes to OLMo model for style and missing functions
* Implement OLMo tokenizer
* Implement OLMo to HF conversion script
* Add tests for OLMo model
* Add tests for OLMo fast tokenizer
* Add auto-generated dummy objects
* Remove unimplemented OLMo classes from auto and init classes and re-format
* Add README and associated auto-generated files
* Use OLMo names for common properties
* Run make fixup
* Remove `|` from OLMo typing
* Remove unneeded tokenization_olmo.py
* Revert model, config and converter to add-new-model-like Llama
* Move logic for adding bos/eos token into GPTNeoxTokenizerFast
* Change OLMoConfig defaults to match OLMo-7B
* Use GPTNeoXToknizerFast in OLMo tokenizer tests
* Modify auto-generated OLMoModelTests to work for OLMo
* Add non-parametric layer norm OLMoLayerNorm
* Update weight conversion script for OLMo
* Fix __init__ and auto structure for OLMo
* Fix errors from make fixup
* Remove OLMoTokenizerFast from documentation
* Add missing 'Copied from' for OLMoModel._update_causal_mask
* Run make fix-copies
* Rearrange string replacements in OLMoForCausalLM Copied from
* Move OLMo and Llama CausalLM.forward example into global constants
* Fix OLMO_GENERATION_EXAMPLE doc string typo
* Add option for qkv clipping to OLMo
* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
* Fix OLMo tokenization bug using conversion script
* Keep model in full precision after conversion
* Do not add eos token automatically
* Update references to OLMo model in HF Hub
* Do not add eos token during encoding by default
* Fix Llama generation example
* Run make fixup
* OLMo 7B integration test fix
* Remove unneeded special case for OLMoConfig
* OLMo 7B Twin 2T integration test fix
* Fix test_model_7b_greedy_generation
* Remove test_compile_static_cache
* Fix OLMo and Llama generation example
* Run make fixup
* Revert "OLMo 7B integration test fix"
This reverts commit 4df56a4b15.
* Revert "OLMo 7B Twin 2T integration test fix"
This reverts commit 9ff65a4a29.
* Ungate 7B integration tests and fix greedy generation test
* Add retries for flaky test_eager_matches_sdpa_generate
* Fix output of doc example for OLMoForCausalLM.forward
* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
* Try fix incorrect characters in OLMoForCausalLM.forward doct test
* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
* Remove pretraining_tp from OLMo config and model
* Add missing 'Copied from' instances
* Remove unneeded causal_mask from OLMoModel
* Revert Llama changes
* Ignore copy for OLMoForCausalLM.forward
* Change 'OLMo' to 'Olmo' in classes
* Move minimal OLMo tokenization tests to model tests
* Add missed 'Copied from' for repeat_kv
* Add create token type ids to CodeGenTokenizer
* Fix inconsistent length of token type ids
* Format source codes
* Fix inconsistent order of methods
* Update docstring
* add test_tokenizer_integration test
* Format source codes
* Add `copied from` comment to CodeGenTokenizerFast
* Add doc of create_token_type_ids_from_sequences
* Make return_token_type_ids False by default
* Make test_tokenizer_integration as slow test
* Add return_token_type_ids to tokenizer init arg
* Add test for tokenizer's init return_token_type_ids
* Format source codes
* Remove auto class
* Update ImagePointDescriptionOutput
* Update model outputs
* Rename output class
* Revert "Remove auto class"
This reverts commit ed4a8f549d.
* Address comments
* Fork.
* RecurrentGemma initial commit.
* Updating __init__.py.
* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.
* Reformat code to 4 spaces.
Fixed a few typos.
* Fixed the forward pass.
Still unclear on the cache?
* Fixed the RecurrentGemmaForCausalLM
* Minor comment that we might not need attention_mask and output_attention arguments.
* Now cache should work as well.
* Adding a temporary example to check whether the model generation works.
* Adding the tests and updating imports.
* Adding the example file missing in the previous commit.
* First working example.
* Removing .gitignore and reverting parts of __init__.
* Re-add .gitignore.
* Addressing comments for configuration.
* Move mask creation to `_prepare_inputs_for_generation`.
* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
* Transfoering between machines.
* Running normal tests.
* Minor fix.
* More fixes.
* Addressing more comments.
* Minor fixes.
* first stab at cleanup
* more refactoring
* fix copies and else
* renaming and get init to work
* fix causal mask creation
* update
* nit
* fix a hell lot of things
* updates
* update conversion script
* make all keys importable
* nits
* add auto mappings
* properly convert ffw_up and down
* add scaling
* fix generations
* for recurrent dtype
* update
* fix going beyong window
* fixup
* add missing files
* current updates to remove last einops
* finish modeling refactor
* TADA
* fix compile
* fix most failing testt ? ?
* update tests
* refactor and update
* update
* nits, fixup and update tests
* more fixup
* nits
* fix imports
* test format
* fixups
* nits
* tuple typing
* fix code quality
* add model card
* fix doc
* skip most generation tests
* nits
* style
* doc fixes
* fix pr and check_copies?
* last nit
* oupsy
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update
* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update based on review
* doc nit
* fix quality
* quality
* fix slow test model path
* update default dype
* ignore attributes that can be safely ignored in check config attributes
* 0lallalala come on
* save nit
* style
* remove to dict update
* make sure we can also run in float16
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* init: add StableLm 2 support
* add integration test for parallel residual and qk layernorm
* update(modeling): match qk norm naming for consistency with phi/persimmon
* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
* refactor: rename head states var in `StableLmLayerNormPerHead`
* tests: update test model and add generate check
* add _torch_extract_fbank_features_batch function in feature_extractor_whisper
* reformat feature_extraction_whisper.py file
* handle batching in single function
* add gpu test & doc
* add batch test & device in each __call__
* add device arg in doc string
---------
Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* redefaulted padding=longest again
* fixup/doc
* Fix generate_with_fallback **kwargs
* Change pop to get
* Delete keys from kwargs to prevent overriding generation_config
* Revert to passing kwargs by reference, but make a (shallow) copy
* dict -> copy.copy
* Add test_whisper_longform_multi_batch_beam
* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode
* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode
* Exclude pad_token filtering since it is used as CTC-blank token
* Add small test for skip_special_tokens
* Update decoding test for added new token
* add FA2 to o.g Musicgen
* make style
* add FA2 support to Musicgen Melody
* add generation FA2 tests to o.g Musicgen
* make style and fix copies
* add Musicgen to FA2 docs + deprecate list
* add sdpa supports to Musicgen's
* make style and fix copies
* refactor attention implementation arguments
* add Copied from to sdpa tests
* add copied form in sdpa tests melody
* add copied for FA2 generation tests
* add FA2 inference copied from
* make style
* Fix sinusoidal_embeddings in FlaubertModel
* Fix for Informer
* Fix for XLM
* Move sinusoidal emb for XLM
* Move sinusoidal emb for Flaubert
* Small cleanup
* Add comments on tests code copied from
* Add with Distilbert->
* fix bug and add tests
* nit
* otherway to get the cur len instead of attention mask
* more places where this might have been broken
* nit
* oups
* inputs_embeds vs input_embeds
* test generated outptus
* style
* nit
* fix
* skip failing biogpt
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* attempt to fix
* the actual fix that works with compilation!
* this?
* temporary update
* nit?
* dispatcg to memory efficient?
* update both models that have static cache support
* fix copies fix compile
* make sure fix
* fix cohere and gemma
* fix beams?
* nit
* slipped through the cracks
* nit
* nits
* update
* fix-copies
* skip failing tests
* nits
* Added SuperPoint docs
* Added tests
* Removed commented part
* Commit to create and fix add_superpoint branch with a new branch
* Fixed dummy_pt_objects
* Committed missing files
* Fixed README.md
* Apply suggestions from code review
Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py
* Removed AutoModelForKeypointDetection and related stuff
* Fixed inconsistencies in image_processing_superpoint.py
* Moved infer_on_model logic simply in test_inference
* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py
* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale
* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixed from (w, h) to (h, w) as input for tests
* Removed unnecessary condition
* Moved last_hidden_state to be the first returned
* Moved last_hidden_state to be the first returned (bis)
* Moved last_hidden_state to be the first returned (ter)
* Switched image_width and image_height in tests to match recent changes
* Added config as first SuperPointConvBlock init argument
* Reordered README's after merge
* Added missing first config argument to SuperPointConvBlock instantiations
* Removed formatting error
* Added SuperPoint to README's de, pt-br, ru, te and vi
* Checked out README_fr.md
* Fixed README_fr.md
* Test fix README_fr.md
* Test fix README_fr.md
* Last make fix-copies !
* Updated checkpoint path
* Removed unused SuperPoint doc
* Added missing image
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed unnecessary import
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added SuperPoint to _toctree.yml
---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
* use user_defined_symbols
* fixup
* nit
* add a very robust test
* make sure all models are tested with the `pretrained_tokenizer_to_test`
* should we make sure we test all of them?
* merge
* remove the id
* fix test
* update
* ousies
* oups
* fixup
* fix copies check
* remove `pretrained_tokenizer_to_test`
* Cohere Model Release (#1)
Cohere Model Release
* Remove unnecessary files and code (#2)
Some cleanup
* Delete cohere-model directory (#3)
* Make Fix (#5)
* Pr fixes (#6)
* fixes for pr
* pr fixes for the format
* pr fixes for the format
* src/transformers/models/auto/tokenization_auto.py
* Tokenizer test (#8)
* tokenizer test
* format fix
* Adding Docs and other minor changes (#7)
* Add modeling tests (#9)
* Smol Fix (#11)
* tokenization tests are fixed
* format fixes
* fix pr doc tests
* fix pr doc tests
* fix pr doc tests
* fix pr style check
* small changes in cohere.md
* FIX: Address final comments for transformers integration (#13)
* fix modeling final nits and add proper test file
* for now leave empty tests
* add integration test
* push new test
* fix modeling cohere (#14)
* Update chat templates to use the new API (#15)
---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit c4a04416dd.
* Updated conversion script to reflect module name change
* Deprecated reshape_last_stage option in config
* Removed unused imports
* Code formatting
* Fixed outdated decorators on test_inference_fp16
* Added "Copied from" comments in test_modeling_pvt_v2.py
* Fixed import listing
* Updated model name
* Force empty commit for PR refresh
* Fixed linting issue
* Removed # Copied from comments
* Added PVTv2 to README_fr.md
* Ran make fix-copies
* Replace all FoamoftheSea hub references with OpenGVLab
* Fixed out_indices and out_features logic in configuration_pvt_v2.py
* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
* Ran code fixup
* Fixed order of parent classes in PvtV2Config to fix the to_dict method override
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial implementation of flash attention for gptj
* modify flash attention and overwrite test_flash_attn_2_generate_padding_right
* update flash attention support list
* remove the copy line in the `CodeGenBlock`
* address copy mechanism
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add GPTJ attention classes
* add expected outputs in the gptj test
* Ensure repo consistency with 'make fix-copies'
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add tests for batching support
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* fixes and comments
* use cosine distance for conv models
* skip mra model testing
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* finzalize and make style
* check model type by input names
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixed batch size for all testers
* Revert "fixed batch size for all testers"
This reverts commit 525f3a0a05.
* add batch_size for all testers
* dict from model output
* do not skip layoutlm
* bring back some code from git revert
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* clean-up
* where did minus go in tolerance
* make whisper happy
* deal with consequences of losing minus
* deal with consequences of losing minus
* maskformer needs its own test for happiness
* fix more models
* tag flaky CV models from Amy's approval
* make codestyle
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* left-padding test revisited
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* First draft
* More improvements
* More improvements
* More fixes
* Fix copies
* More improvements
* More fixes
* More improvements
* Convert checkpoint
* More improvements, set up tests
* Fix more tests
* Add UdopModel
* More improvements
* Fix equivalence test
* More fixes
* Redesign model
* Extend conversion script
* Use real inputs for conversion script
* Add image processor
* Improve conversion script
* Add UdopTokenizer
* Add fast tokenizer
* Add converter
* Update README's
* Add processor
* Add fully fledged tokenizer
* Add fast tokenizer
* Use processor in conversion script
* Add tokenizer tests
* Fix one more test
* Fix more tests
* Fix tokenizer tests
* Enable fast tokenizer tests
* Fix more tests
* Fix additional_special_tokens of fast tokenizer
* Fix tokenizer tests
* Fix more tests
* Fix equivalence test
* Rename image to pixel_values
* Rename seg_data to bbox
* More renamings
* Remove vis_special_token
* More improvements
* Add docs
* Fix copied from
* Update slow tokenizer
* Update fast tokenizer design
* Make text input optional
* Add first draft of processor tests
* Fix more processor tests
* Fix decoder_start_token_id
* Fix test_initialization
* Add integration test
* More improvements
* Improve processor, add test
* Add more copied from
* Add more copied from
* Add more copied from
* Add more copied from
* Remove print statement
* Update README and auto mapping
* Delete files
* Delete another file
* Remove code
* Fix test
* Fix docs
* Remove asserts
* Add doc tests
* Include UDOP in exotic model tests
* Add expected tesseract decodings
* Add sentencepiece
* Use same design as T5
* Add UdopEncoderModel
* Add UdopEncoderModel to tests
* More fixes
* Fix fast tokenizer
* Fix one more test
* Remove parallelisable attribute
* Fix copies
* Remove legacy file
* Copy from T5Tokenizer
* Fix rebase
* More fixes, copy from T5
* More fixes
* Fix init
* Use ArthurZ/udop for tests
* Make all model tests pass
* Remove UdopForConditionalGeneration from auto mapping
* Fix more tests
* fixups
* more fixups
* fix the tokenizers
* remove un-necessary changes
* nits
* nits
* replace truncate_sequences_boxes with truncate_sequences for fix-copies
* nit current path
* add a test for input ids
* ids that we should get taken from c9f7a32f57
* nits converting
* nits
* apply ruff
* nits
* nits
* style
* fix slow order of addition
* fix udop fast range as well
* fixup
* nits
* Add docstrings
* Fix gradient checkpointing
* Update code examples
* Skip tests
* Update integration test
* Address comment
* Make fixup
* Remove extra ids from tokenizer
* Skip test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update year
* Address comment
* Address more comments
* Address comments
* Add copied from
* Update CI
* Rename script
* Update model id
* Add AddedToken, skip tests
* Update CI
* Fix doc tests
* Do not use Tesseract for the doc tests
* Remove kwargs
* Add original inputs
* Update casting
* Fix doc test
* Update question
* Update question
* Use LayoutLMv3ImageProcessor
* Update organization
* Improve docs
* Update forward signature
* Make images optional
* Remove deprecated device argument
* Add comment, add add_prefix_space
* More improvements
* Remove kwargs
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* 🐛 Fix oneformer instance post processing when using panoptic task type
* ✅ Add unit test for oneformer instance post processing panoptic bug
---------
Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>