* Fork.
* RecurrentGemma initial commit.
* Updating __init__.py.
* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.
* Reformat code to 4 spaces.
Fixed a few typos.
* Fixed the forward pass.
Still unclear on the cache?
* Fixed the RecurrentGemmaForCausalLM
* Minor comment that we might not need attention_mask and output_attention arguments.
* Now cache should work as well.
* Adding a temporary example to check whether the model generation works.
* Adding the tests and updating imports.
* Adding the example file missing in the previous commit.
* First working example.
* Removing .gitignore and reverting parts of __init__.
* Re-add .gitignore.
* Addressing comments for configuration.
* Move mask creation to `_prepare_inputs_for_generation`.
* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
* Transfoering between machines.
* Running normal tests.
* Minor fix.
* More fixes.
* Addressing more comments.
* Minor fixes.
* first stab at cleanup
* more refactoring
* fix copies and else
* renaming and get init to work
* fix causal mask creation
* update
* nit
* fix a hell lot of things
* updates
* update conversion script
* make all keys importable
* nits
* add auto mappings
* properly convert ffw_up and down
* add scaling
* fix generations
* for recurrent dtype
* update
* fix going beyong window
* fixup
* add missing files
* current updates to remove last einops
* finish modeling refactor
* TADA
* fix compile
* fix most failing testt ? ?
* update tests
* refactor and update
* update
* nits, fixup and update tests
* more fixup
* nits
* fix imports
* test format
* fixups
* nits
* tuple typing
* fix code quality
* add model card
* fix doc
* skip most generation tests
* nits
* style
* doc fixes
* fix pr and check_copies?
* last nit
* oupsy
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update
* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update based on review
* doc nit
* fix quality
* quality
* fix slow test model path
* update default dype
* ignore attributes that can be safely ignored in check config attributes
* 0lallalala come on
* save nit
* style
* remove to dict update
* make sure we can also run in float16
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ImportError: Trainer with PyTorch requires accelerate>=0.20.1 Fix
Adding the evaluate and accelerate installs at the beginning of the cell to fix the issue
* ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1
* Import Error Fix
* Update installation.md
* Update quicktour.md
* rollback other lang changes
* Update _config.py
* updates for other languages
* fixing error
* Tutorial Update
* Update tokenization_utils_base.py
* Just use an optimizer string to pass the doctest?
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
* add FA2 to o.g Musicgen
* make style
* add FA2 support to Musicgen Melody
* add generation FA2 tests to o.g Musicgen
* make style and fix copies
* add Musicgen to FA2 docs + deprecate list
* add sdpa supports to Musicgen's
* make style and fix copies
* refactor attention implementation arguments
* add Copied from to sdpa tests
* add copied form in sdpa tests melody
* add copied for FA2 generation tests
* add FA2 inference copied from
* make style
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* model_summary.md - Add link to Harvard's Annotated Transformer.
* model_summary.md - slight wording change + capitalize name of the paper
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)
* Added SuperPoint docs
* Added tests
* Removed commented part
* Commit to create and fix add_superpoint branch with a new branch
* Fixed dummy_pt_objects
* Committed missing files
* Fixed README.md
* Apply suggestions from code review
Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py
* Removed AutoModelForKeypointDetection and related stuff
* Fixed inconsistencies in image_processing_superpoint.py
* Moved infer_on_model logic simply in test_inference
* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py
* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale
* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixed from (w, h) to (h, w) as input for tests
* Removed unnecessary condition
* Moved last_hidden_state to be the first returned
* Moved last_hidden_state to be the first returned (bis)
* Moved last_hidden_state to be the first returned (ter)
* Switched image_width and image_height in tests to match recent changes
* Added config as first SuperPointConvBlock init argument
* Reordered README's after merge
* Added missing first config argument to SuperPointConvBlock instantiations
* Removed formatting error
* Added SuperPoint to README's de, pt-br, ru, te and vi
* Checked out README_fr.md
* Fixed README_fr.md
* Test fix README_fr.md
* Test fix README_fr.md
* Last make fix-copies !
* Updated checkpoint path
* Removed unused SuperPoint doc
* Added missing image
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed unnecessary import
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added SuperPoint to _toctree.yml
---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
* add galore v1
* add import
* add tests and doc
* fix doctest
* forward contrib credits from discussions
* forward contrib credits from discussions
* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* fix failing tests'
* switch to `optim_target_modules` and clarify docs
* more clarification
* enhance lookup logic
* update a test to add peak memory
* add regex, all-linear and single string support
* add layer-wise optimization through DummyOptimizers and LRSchedulers
* forward contrib credits from discussions and original idea
* add a section about DDP not supported in layerwise
* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* fix self
* check only if layer_wise
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* oops
* make use of intervals
* clarify comment
* add matching tests
* GaLoRe -> GaLore
* move to `get_scheduler`
* add note on docs
* add a warning
* adapt a bit the docs
* update docstring
* support original API
* Update docs/source/en/trainer.md
* slightly refactor
* Update docs/source/en/trainer.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix args parsing and add tests
* remove warning for regex
* fix type hint
* add note about extra args
* make `is_regex` return optional
---------
Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Update pipeline_tutorial.md to include gradio
* Update pipeline_tutorial.md
* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update pipeline_tutorial.md
* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cohere Model Release (#1)
Cohere Model Release
* Remove unnecessary files and code (#2)
Some cleanup
* Delete cohere-model directory (#3)
* Make Fix (#5)
* Pr fixes (#6)
* fixes for pr
* pr fixes for the format
* pr fixes for the format
* src/transformers/models/auto/tokenization_auto.py
* Tokenizer test (#8)
* tokenizer test
* format fix
* Adding Docs and other minor changes (#7)
* Add modeling tests (#9)
* Smol Fix (#11)
* tokenization tests are fixed
* format fixes
* fix pr doc tests
* fix pr doc tests
* fix pr doc tests
* fix pr style check
* small changes in cohere.md
* FIX: Address final comments for transformers integration (#13)
* fix modeling final nits and add proper test file
* for now leave empty tests
* add integration test
* push new test
* fix modeling cohere (#14)
* Update chat templates to use the new API (#15)
---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit c4a04416dd.
* Updated conversion script to reflect module name change
* Deprecated reshape_last_stage option in config
* Removed unused imports
* Code formatting
* Fixed outdated decorators on test_inference_fp16
* Added "Copied from" comments in test_modeling_pvt_v2.py
* Fixed import listing
* Updated model name
* Force empty commit for PR refresh
* Fixed linting issue
* Removed # Copied from comments
* Added PVTv2 to README_fr.md
* Ran make fix-copies
* Replace all FoamoftheSea hub references with OpenGVLab
* Fixed out_indices and out_features logic in configuration_pvt_v2.py
* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
* Ran code fixup
* Fixed order of parent classes in PvtV2Config to fix the to_dict method override
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial implementation of flash attention for gptj
* modify flash attention and overwrite test_flash_attn_2_generate_padding_right
* update flash attention support list
* remove the copy line in the `CodeGenBlock`
* address copy mechanism
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add GPTJ attention classes
* add expected outputs in the gptj test
* Ensure repo consistency with 'make fix-copies'
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* torchscript and trainer md es translation
* corrected md es files and even corrected spelling in en md
* made es corrections to trainer.md
* deleted entrenamiento... title on yml
* placed entrenamiento in right place
* First draft
* More improvements
* More improvements
* More fixes
* Fix copies
* More improvements
* More fixes
* More improvements
* Convert checkpoint
* More improvements, set up tests
* Fix more tests
* Add UdopModel
* More improvements
* Fix equivalence test
* More fixes
* Redesign model
* Extend conversion script
* Use real inputs for conversion script
* Add image processor
* Improve conversion script
* Add UdopTokenizer
* Add fast tokenizer
* Add converter
* Update README's
* Add processor
* Add fully fledged tokenizer
* Add fast tokenizer
* Use processor in conversion script
* Add tokenizer tests
* Fix one more test
* Fix more tests
* Fix tokenizer tests
* Enable fast tokenizer tests
* Fix more tests
* Fix additional_special_tokens of fast tokenizer
* Fix tokenizer tests
* Fix more tests
* Fix equivalence test
* Rename image to pixel_values
* Rename seg_data to bbox
* More renamings
* Remove vis_special_token
* More improvements
* Add docs
* Fix copied from
* Update slow tokenizer
* Update fast tokenizer design
* Make text input optional
* Add first draft of processor tests
* Fix more processor tests
* Fix decoder_start_token_id
* Fix test_initialization
* Add integration test
* More improvements
* Improve processor, add test
* Add more copied from
* Add more copied from
* Add more copied from
* Add more copied from
* Remove print statement
* Update README and auto mapping
* Delete files
* Delete another file
* Remove code
* Fix test
* Fix docs
* Remove asserts
* Add doc tests
* Include UDOP in exotic model tests
* Add expected tesseract decodings
* Add sentencepiece
* Use same design as T5
* Add UdopEncoderModel
* Add UdopEncoderModel to tests
* More fixes
* Fix fast tokenizer
* Fix one more test
* Remove parallelisable attribute
* Fix copies
* Remove legacy file
* Copy from T5Tokenizer
* Fix rebase
* More fixes, copy from T5
* More fixes
* Fix init
* Use ArthurZ/udop for tests
* Make all model tests pass
* Remove UdopForConditionalGeneration from auto mapping
* Fix more tests
* fixups
* more fixups
* fix the tokenizers
* remove un-necessary changes
* nits
* nits
* replace truncate_sequences_boxes with truncate_sequences for fix-copies
* nit current path
* add a test for input ids
* ids that we should get taken from c9f7a32f57
* nits converting
* nits
* apply ruff
* nits
* nits
* style
* fix slow order of addition
* fix udop fast range as well
* fixup
* nits
* Add docstrings
* Fix gradient checkpointing
* Update code examples
* Skip tests
* Update integration test
* Address comment
* Make fixup
* Remove extra ids from tokenizer
* Skip test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update year
* Address comment
* Address more comments
* Address comments
* Add copied from
* Update CI
* Rename script
* Update model id
* Add AddedToken, skip tests
* Update CI
* Fix doc tests
* Do not use Tesseract for the doc tests
* Remove kwargs
* Add original inputs
* Update casting
* Fix doc test
* Update question
* Update question
* Use LayoutLMv3ImageProcessor
* Update organization
* Improve docs
* Update forward signature
* Make images optional
* Remove deprecated device argument
* Add comment, add add_prefix_space
* More improvements
* Remove kwargs
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add tasks_explained.md to es/
* Fix little typo in en/ version
* translate speach/audio section
* translate part of vision computer section | fix little typo in en/
* Fix little typo in en/
* Translate vision computer section | remove ** ** to * * in both files
* Translate NLP section | fix link to task/translation in en/
* Updete link in es/tasks_summary.md
* Fix task_summary title link
The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/
* Add chat support to text generation pipeline
* Better handling of single elements
* Deprecate ConversationalPipeline
* stash commit
* Add missing add_special_tokens kwarg
* Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline
* Add ✨TF✨ tests
* @require_tf
* Add type hint
* Add specific deprecation version
* Remove unnecessary do_sample
* Remove todo - the discrepancy has been resolved
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/pipelines/text_generation.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
The documentation says "We refer to this Model parallelism as “Vertical” because of how models are typically visualized.", but then visualizes the model horizontally. This change visualizes the model indeed vertically.
fix typo:
from:
"model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")"
to:
model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")