* Bookmark, initial impelemtation. Need to test
* Clean
* Working fully, woop woop
* I think working version now, testing
* Fin!
* rm cast, could keep None
* Fix typing issue
* rm typehint
* Add test
* Add tests and make more rigid
* Add test for parse_json_file
* Change Path to PathLike
* Fix `Import block is un-sorted or un-formatted`
* revert parse_json_file
* Fix ruff format
* Add parse_json_file test
* Remove auto class
* Update ImagePointDescriptionOutput
* Update model outputs
* Rename output class
* Revert "Remove auto class"
This reverts commit ed4a8f549d.
* Address comments
* Fork.
* RecurrentGemma initial commit.
* Updating __init__.py.
* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.
* Reformat code to 4 spaces.
Fixed a few typos.
* Fixed the forward pass.
Still unclear on the cache?
* Fixed the RecurrentGemmaForCausalLM
* Minor comment that we might not need attention_mask and output_attention arguments.
* Now cache should work as well.
* Adding a temporary example to check whether the model generation works.
* Adding the tests and updating imports.
* Adding the example file missing in the previous commit.
* First working example.
* Removing .gitignore and reverting parts of __init__.
* Re-add .gitignore.
* Addressing comments for configuration.
* Move mask creation to `_prepare_inputs_for_generation`.
* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
* Transfoering between machines.
* Running normal tests.
* Minor fix.
* More fixes.
* Addressing more comments.
* Minor fixes.
* first stab at cleanup
* more refactoring
* fix copies and else
* renaming and get init to work
* fix causal mask creation
* update
* nit
* fix a hell lot of things
* updates
* update conversion script
* make all keys importable
* nits
* add auto mappings
* properly convert ffw_up and down
* add scaling
* fix generations
* for recurrent dtype
* update
* fix going beyong window
* fixup
* add missing files
* current updates to remove last einops
* finish modeling refactor
* TADA
* fix compile
* fix most failing testt ? ?
* update tests
* refactor and update
* update
* nits, fixup and update tests
* more fixup
* nits
* fix imports
* test format
* fixups
* nits
* tuple typing
* fix code quality
* add model card
* fix doc
* skip most generation tests
* nits
* style
* doc fixes
* fix pr and check_copies?
* last nit
* oupsy
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update
* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update based on review
* doc nit
* fix quality
* quality
* fix slow test model path
* update default dype
* ignore attributes that can be safely ignored in check config attributes
* 0lallalala come on
* save nit
* style
* remove to dict update
* make sure we can also run in float16
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d93.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
* See if we can get tests to pass with the fixed weights
* See if we can get tests to pass with the fixed weights
* Replace the revisions now that we don't need them anymore
* init: add StableLm 2 support
* add integration test for parallel residual and qk layernorm
* update(modeling): match qk norm naming for consistency with phi/persimmon
* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
* refactor: rename head states var in `StableLmLayerNormPerHead`
* tests: update test model and add generate check
* add _torch_extract_fbank_features_batch function in feature_extractor_whisper
* reformat feature_extraction_whisper.py file
* handle batching in single function
* add gpu test & doc
* add batch test & device in each __call__
* add device arg in doc string
---------
Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
* if output is tuple like facebook/hf-seamless-m4t-medium, waveform is the first element
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* add test and fix batch issue
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* add dict output support for seamless_m4t
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* redefaulted padding=longest again
* fixup/doc
* Fix generate_with_fallback **kwargs
* Change pop to get
* Delete keys from kwargs to prevent overriding generation_config
* Revert to passing kwargs by reference, but make a (shallow) copy
* dict -> copy.copy
* Add test_whisper_longform_multi_batch_beam
* Hard error when ignoring tensors. (#27484)
* [WIP] Hard error when ignoring tensors.
* Better selection/error when saving a checkpoint.
- Find all names we should normally drop (those are in the transformers
config)
- Find all disjoint tensors (for those we can safely trigger a copy to
get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
but we try to find them all anyway.)
- For all identical names:
- If they are in the config, just ignore them everything is fine
- If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
disjoint. raise a hard error.
* Adding a failing test on `main` that passes here.
* We don't need to keep the subfolder logic in this test.
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add small tests.
* Dead variable.
* Fixup.
* Fixing tied_Weights_keys on generic models.
* Fixup + T5 encoder/decoder tying (with different layers)
* Code quality.
* Dynamic member.
* trigger
* Fixing encoder name for other types of encoder/decoder combos.
* Fix scoping.
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fixing the tied_weights after the call.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode
* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode
* Exclude pad_token filtering since it is used as CTC-blank token
* Add small test for skip_special_tokens
* Update decoding test for added new token
* add FA2 to o.g Musicgen
* make style
* add FA2 support to Musicgen Melody
* add generation FA2 tests to o.g Musicgen
* make style and fix copies
* add Musicgen to FA2 docs + deprecate list
* add sdpa supports to Musicgen's
* make style and fix copies
* refactor attention implementation arguments
* add Copied from to sdpa tests
* add copied form in sdpa tests melody
* add copied for FA2 generation tests
* add FA2 inference copied from
* make style
* fix issue with logit processor in beam search in Flax
* adding FlaxNoRepeatNGramLogitsProcessor class + unit test
* style correction and code verification
* add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests
* fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams
* replace non-jit compatible masking of ngrams that are not yet generated with jittable version
* Revert "fix issue with logit processor in beam search in Flax"
This reverts commit 09b70d7e4d.
* add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor
* change the method of casting to boolean of banned tokens indices
* fix code style
* remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop
* remove useless loop iterations
* set some variables that were calculated and used multiple times
* fix format
* Fix sinusoidal_embeddings in FlaubertModel
* Fix for Informer
* Fix for XLM
* Move sinusoidal emb for XLM
* Move sinusoidal emb for Flaubert
* Small cleanup
* Add comments on tests code copied from
* Add with Distilbert->
* fix bug and add tests
* nit
* otherway to get the cur len instead of attention mask
* more places where this might have been broken
* nit
* oups
* inputs_embeds vs input_embeds
* test generated outptus
* style
* nit
* fix
* skip failing biogpt
* Start rework
* Fix failing test
* Include max
* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py
* add tests and raise ValueError when optimizer is None
* add second layer to test and freeze its weigths
* check if torch is available before running tests
* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Automatic safetensors conversion when lacking these files (#29390)
* Automatic safetensors conversion when lacking these files
* Remove debug
* Thread name
* Typo
* Ensure that raises do not affect the main thread
* Catch all errors
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* replace the 'decord' with 'av' in VideoClassificationPipeline
* fix the check of backend in VideoClassificationPipeline
* adjust the order of imports
* format 'video_classification.py'
* format 'video_classification.py' with ruff
---------
Co-authored-by: wanqiancheng <13541261013@163.com>
* add warnings if training args differ from checkpoint args stored in trainer_state.json
* run formatting and styling
* add a test
* format and styling
---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
* attempt to fix
* the actual fix that works with compilation!
* this?
* temporary update
* nit?
* dispatcg to memory efficient?
* update both models that have static cache support
* fix copies fix compile
* make sure fix
* fix cohere and gemma
* fix beams?
* nit
* slipped through the cracks
* nit
* nits
* update
* fix-copies
* skip failing tests
* nits
* Initial commit (still lots of unfinished bits)
* (Still untested) add safetensors sharding to save_pretrained
* Fix savetensors saving, update default shard size to match PT
* Add proper loading of TF-format safetensors
* Revert default size in case that changes things
* Fix incorrect index name
* Update loading priority
* Update tests
* Make the tests a little more stringent
* Expand tests
* Add sharded cross-test
* Fix argument name
* One more test fix
* Adding mlx to the list of allowed formats
* Remove irrelevant block for safetensors
* Refactor warning logging into a separate function
* Remove unused skip_logger_warnings arg
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move function def
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added SuperPoint docs
* Added tests
* Removed commented part
* Commit to create and fix add_superpoint branch with a new branch
* Fixed dummy_pt_objects
* Committed missing files
* Fixed README.md
* Apply suggestions from code review
Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py
* Removed AutoModelForKeypointDetection and related stuff
* Fixed inconsistencies in image_processing_superpoint.py
* Moved infer_on_model logic simply in test_inference
* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py
* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale
* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixed from (w, h) to (h, w) as input for tests
* Removed unnecessary condition
* Moved last_hidden_state to be the first returned
* Moved last_hidden_state to be the first returned (bis)
* Moved last_hidden_state to be the first returned (ter)
* Switched image_width and image_height in tests to match recent changes
* Added config as first SuperPointConvBlock init argument
* Reordered README's after merge
* Added missing first config argument to SuperPointConvBlock instantiations
* Removed formatting error
* Added SuperPoint to README's de, pt-br, ru, te and vi
* Checked out README_fr.md
* Fixed README_fr.md
* Test fix README_fr.md
* Test fix README_fr.md
* Last make fix-copies !
* Updated checkpoint path
* Removed unused SuperPoint doc
* Added missing image
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed unnecessary import
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added SuperPoint to _toctree.yml
---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
* use user_defined_symbols
* fixup
* nit
* add a very robust test
* make sure all models are tested with the `pretrained_tokenizer_to_test`
* should we make sure we test all of them?
* merge
* remove the id
* fix test
* update
* ousies
* oups
* fixup
* fix copies check
* remove `pretrained_tokenizer_to_test`
* add galore v1
* add import
* add tests and doc
* fix doctest
* forward contrib credits from discussions
* forward contrib credits from discussions
* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* fix failing tests'
* switch to `optim_target_modules` and clarify docs
* more clarification
* enhance lookup logic
* update a test to add peak memory
* add regex, all-linear and single string support
* add layer-wise optimization through DummyOptimizers and LRSchedulers
* forward contrib credits from discussions and original idea
* add a section about DDP not supported in layerwise
* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* fix self
* check only if layer_wise
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* oops
* make use of intervals
* clarify comment
* add matching tests
* GaLoRe -> GaLore
* move to `get_scheduler`
* add note on docs
* add a warning
* adapt a bit the docs
* update docstring
* support original API
* Update docs/source/en/trainer.md
* slightly refactor
* Update docs/source/en/trainer.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix args parsing and add tests
* remove warning for regex
* fix type hint
* add note about extra args
* make `is_regex` return optional
---------
Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Cohere Model Release (#1)
Cohere Model Release
* Remove unnecessary files and code (#2)
Some cleanup
* Delete cohere-model directory (#3)
* Make Fix (#5)
* Pr fixes (#6)
* fixes for pr
* pr fixes for the format
* pr fixes for the format
* src/transformers/models/auto/tokenization_auto.py
* Tokenizer test (#8)
* tokenizer test
* format fix
* Adding Docs and other minor changes (#7)
* Add modeling tests (#9)
* Smol Fix (#11)
* tokenization tests are fixed
* format fixes
* fix pr doc tests
* fix pr doc tests
* fix pr doc tests
* fix pr style check
* small changes in cohere.md
* FIX: Address final comments for transformers integration (#13)
* fix modeling final nits and add proper test file
* for now leave empty tests
* add integration test
* push new test
* fix modeling cohere (#14)
* Update chat templates to use the new API (#15)
---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Allow apply_chat_template to pass kwargs to the template
* Fix priority for template_kwargs
* Fix docstring
* style fix
* Add the option for the model to have a dict of templates
* Error message cleanup
* Add test for chat template dicts
* Simplify the chat template dict test and apply it to all tokenizers in self.get_tokenizers()
* Save chat template dicts as lists with fixed key names
* Add test for serialization/reloading
* Add require_jinja just to be safe, even though I don't think we use it
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit c4a04416dd.
* Updated conversion script to reflect module name change
* Deprecated reshape_last_stage option in config
* Removed unused imports
* Code formatting
* Fixed outdated decorators on test_inference_fp16
* Added "Copied from" comments in test_modeling_pvt_v2.py
* Fixed import listing
* Updated model name
* Force empty commit for PR refresh
* Fixed linting issue
* Removed # Copied from comments
* Added PVTv2 to README_fr.md
* Ran make fix-copies
* Replace all FoamoftheSea hub references with OpenGVLab
* Fixed out_indices and out_features logic in configuration_pvt_v2.py
* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
* Ran code fixup
* Fixed order of parent classes in PvtV2Config to fix the to_dict method override
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial implementation of flash attention for gptj
* modify flash attention and overwrite test_flash_attn_2_generate_padding_right
* update flash attention support list
* remove the copy line in the `CodeGenBlock`
* address copy mechanism
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add GPTJ attention classes
* add expected outputs in the gptj test
* Ensure repo consistency with 'make fix-copies'
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add tests for batching support
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* fixes and comments
* use cosine distance for conv models
* skip mra model testing
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* finzalize and make style
* check model type by input names
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixed batch size for all testers
* Revert "fixed batch size for all testers"
This reverts commit 525f3a0a05.
* add batch_size for all testers
* dict from model output
* do not skip layoutlm
* bring back some code from git revert
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* clean-up
* where did minus go in tolerance
* make whisper happy
* deal with consequences of losing minus
* deal with consequences of losing minus
* maskformer needs its own test for happiness
* fix more models
* tag flaky CV models from Amy's approval
* make codestyle
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix image-to-text batch incorrect output issue
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* update ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* left-padding test revisited
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* added exllama kernels support for awq models
* doc
* style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* refactor
* moved exllama post init to after device dispatching
* bump autoawq version
* added exllama test
* style
* configurable exllama kernels
* copy exllama_config from gptq
* moved exllama version check to post init
* moved to quantization dockerfile
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* First draft
* More improvements
* More improvements
* More fixes
* Fix copies
* More improvements
* More fixes
* More improvements
* Convert checkpoint
* More improvements, set up tests
* Fix more tests
* Add UdopModel
* More improvements
* Fix equivalence test
* More fixes
* Redesign model
* Extend conversion script
* Use real inputs for conversion script
* Add image processor
* Improve conversion script
* Add UdopTokenizer
* Add fast tokenizer
* Add converter
* Update README's
* Add processor
* Add fully fledged tokenizer
* Add fast tokenizer
* Use processor in conversion script
* Add tokenizer tests
* Fix one more test
* Fix more tests
* Fix tokenizer tests
* Enable fast tokenizer tests
* Fix more tests
* Fix additional_special_tokens of fast tokenizer
* Fix tokenizer tests
* Fix more tests
* Fix equivalence test
* Rename image to pixel_values
* Rename seg_data to bbox
* More renamings
* Remove vis_special_token
* More improvements
* Add docs
* Fix copied from
* Update slow tokenizer
* Update fast tokenizer design
* Make text input optional
* Add first draft of processor tests
* Fix more processor tests
* Fix decoder_start_token_id
* Fix test_initialization
* Add integration test
* More improvements
* Improve processor, add test
* Add more copied from
* Add more copied from
* Add more copied from
* Add more copied from
* Remove print statement
* Update README and auto mapping
* Delete files
* Delete another file
* Remove code
* Fix test
* Fix docs
* Remove asserts
* Add doc tests
* Include UDOP in exotic model tests
* Add expected tesseract decodings
* Add sentencepiece
* Use same design as T5
* Add UdopEncoderModel
* Add UdopEncoderModel to tests
* More fixes
* Fix fast tokenizer
* Fix one more test
* Remove parallelisable attribute
* Fix copies
* Remove legacy file
* Copy from T5Tokenizer
* Fix rebase
* More fixes, copy from T5
* More fixes
* Fix init
* Use ArthurZ/udop for tests
* Make all model tests pass
* Remove UdopForConditionalGeneration from auto mapping
* Fix more tests
* fixups
* more fixups
* fix the tokenizers
* remove un-necessary changes
* nits
* nits
* replace truncate_sequences_boxes with truncate_sequences for fix-copies
* nit current path
* add a test for input ids
* ids that we should get taken from c9f7a32f57
* nits converting
* nits
* apply ruff
* nits
* nits
* style
* fix slow order of addition
* fix udop fast range as well
* fixup
* nits
* Add docstrings
* Fix gradient checkpointing
* Update code examples
* Skip tests
* Update integration test
* Address comment
* Make fixup
* Remove extra ids from tokenizer
* Skip test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update year
* Address comment
* Address more comments
* Address comments
* Add copied from
* Update CI
* Rename script
* Update model id
* Add AddedToken, skip tests
* Update CI
* Fix doc tests
* Do not use Tesseract for the doc tests
* Remove kwargs
* Add original inputs
* Update casting
* Fix doc test
* Update question
* Update question
* Use LayoutLMv3ImageProcessor
* Update organization
* Improve docs
* Update forward signature
* Make images optional
* Remove deprecated device argument
* Add comment, add add_prefix_space
* More improvements
* Remove kwargs
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* 🐛 Fix oneformer instance post processing when using panoptic task type
* ✅ Add unit test for oneformer instance post processing panoptic bug
---------
Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>
* remove control flow
* update gptneox
* update ....
* nits
* Actually let's just break. Otherwise we are silently failing which imo is not optimal
* version BC
* fix tests
* fix eager causal
* nit
* add a test
* style
* nits
* nits
* more nits for the test
* update and fix
* make sure cuda graphs are not skipped
* read token is needed for meta llama
* update!
* fiixup
* compile test should be slow
* fix thet fix copies
* stle 🫠