* wip
* fix __init__.py
* add docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments 1
* work on make fixup
* pass configs down
* add sdpa attention
* remove DbrxBlock
* add to configuration_auto
* docstring now passes formatting test
* fix style
* update READMEs
* add dbrx to modeling_auto
* make fix-copies generated this
* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* config docstring passes formatting test
* rename moe_loss_weight to router_aux_loss_coef
* add to flash-attn documentation
* fix model-path in tests
* Explicitly make `"suli"` the default `ffn_act_fn`
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
* fix _flash_attn_uses_top_left_mask and is_causal
* fix tests path
* don't use token type IDs
* follow Llama and remove token_type_ids from test
* init ConfigTester differently so tests pass
* remove multiple choice test
* remove question + answer test
* remove sequence classification test
* remove token classification test
* copy Llama tests and remove token_type_ids from test inputs
* do not test pruning or headmasking; style code
* add _tied_weights_keys parameter to pass test
* add type hints
* fix type check
* update config tester
* remove masked_lm test
* remove encoder tests
* initialize DbrxModelTester with correct params
* style
* torch_dtype does not rely on torch
* run make fixup, fix-copies
* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
* add copyright info
* fix imports and DbrxRotaryEmbedding
* update DbrxModel docstring
* use copies
* change model path in docstring
* use config in DbrxFFN
* fix flashattention2, sdpaattention
* input config to DbrXAttention, DbrxNormAttentionNorm
* more fixes
* fix
* fix again!
* add informative comment
* fix ruff?
* remove print statement + style
* change doc-test
* fix doc-test
* fix docstring
* delete commented out text
* make defaults match dbrx-instruct
* replace `router_aux_loss_coef` with `moe_loss_weight`
* is_decoder=True
* remove is_decoder from configtester
* implement sdpa properly
* make is_decoder pass tests
* start on the GenerationTesterMixin tests
* add dbrx to sdpa documentation
* skip weight typing test
* style
* initialize smaller model
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add DBRX to toctree
* skip test_new_cache_format
* make config defaults smaller again
* add pad_token_id
* remove pad_token_id from config
* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* Update src/transformers/models/dbrx/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix typo
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs, fix configuration_auto.py
* address pr comments
* remove is_decoder flag
* slice
* fix requires grad
* remove grad
* disconnect differently
* remove grad
* enable grads
* patch
* detach expert
* nissan al ghaib
* Update modeling_dbrx.py
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* replace "Gemma" with "Dbrx"
* remove # type: ignore
* don't hardcode vocab_size
* remove ToDo
* Re-add removed idefics2 line
* Update test to use tiny-random!
* Remove TODO
* Remove one more case of loading the entire dbrx-instruct in the tests
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address some comments
* small model
* add dbrx to tokenization_auto
* More docstrings with add_start_docstrings
* Dbrx for now
* add PipelineTesterMixin
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove flash-attn2 import error
* fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add useage example
* put on one line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix ffn_act_fn
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change "dbrx" to "DBRX" for display purposes.
* fix __init__.py?
* fix __init__.py
* fix README
* return the aux_loss
* remove extra spaces
* fix configuration_auto.py
* fix format in tokenization_auto
* remove new line
* add more useage examples
---------
Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add jamba arch
* apply "make fix-copies" changes
* fix link to model in JambaConfig docstring
* Add n_ctx in modeling file because repo-consistency wants that
* Add jamba to flash attention and sdpa documentation
* mamba dt_proj quant fix now works for LoRA as well
* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
* add jamba to tokenization auto
* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
* simple PR fixes
* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
* Add copied comment on JambaMLP (it's the same as MixtralMLP)
* remove padding_mask warnings. It's not supported anymore
* fix docstring. Float instead of int
* A few more minor PR fixes
* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
* Return None attention weights from mamba layers. Append to all attentions only if not None.
* remove some leftover jamba archive lists
* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
* Add Jamba paper on READMEs
* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
* Add copied from comment
* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
* clearer docstring for _convert_to_standard_cache
* style fixes
* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
* rename test so it still overrides what its meant to override
* draft
* oups
* nit
* remove more complexe logic
* fix names used in config
* fix fix fix
* style
* fix some more failing tests
* generate did not init the cache 🙃
* more small nits
* typo
* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
* fix init of pkv with torch.tensor()
* empty tensor
* fix some init issues
* stupid changes required by generate because it does not even support it's own DynamicCache class
* more fixes
* fix general assisted gen cache_position bug
* tests passing
* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
* fix docstrings and typehints for past_key_values
* style fixes
* fix docs
* change typehint due to copy from Mixtral
* forgot import
* import order
* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
* Add integration test with tiny tandom Jamba model on hub
* fix flash attention cache shapes
* bring back forgotten hidden states
* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
* align integration test after modeling fixes
* bugfix - mamba can use precomputed states only of forward pass is on a single token
* bugfix - mamba can use precomputed states only if they match the batch size
* typo
* remove making _prepare_4d_causal_attention_mask a leaf function
* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Add OLMo using add-new-model-like with Llama
* Fix incorrect tokenizer for OLMo
* Copy-paste relevant OLMo methods and their imports
* Add OLMo config
* Modify OLMo config to follow HF conventions
* Remove unneeded Llama code from OLMo model
* Add ability for OLMo model to output attentions
* Add OLMoPreTrainedModel and OLMoModel
* Add OLMoForCausalLM
* Minor fixes to OLMo model for style and missing functions
* Implement OLMo tokenizer
* Implement OLMo to HF conversion script
* Add tests for OLMo model
* Add tests for OLMo fast tokenizer
* Add auto-generated dummy objects
* Remove unimplemented OLMo classes from auto and init classes and re-format
* Add README and associated auto-generated files
* Use OLMo names for common properties
* Run make fixup
* Remove `|` from OLMo typing
* Remove unneeded tokenization_olmo.py
* Revert model, config and converter to add-new-model-like Llama
* Move logic for adding bos/eos token into GPTNeoxTokenizerFast
* Change OLMoConfig defaults to match OLMo-7B
* Use GPTNeoXToknizerFast in OLMo tokenizer tests
* Modify auto-generated OLMoModelTests to work for OLMo
* Add non-parametric layer norm OLMoLayerNorm
* Update weight conversion script for OLMo
* Fix __init__ and auto structure for OLMo
* Fix errors from make fixup
* Remove OLMoTokenizerFast from documentation
* Add missing 'Copied from' for OLMoModel._update_causal_mask
* Run make fix-copies
* Rearrange string replacements in OLMoForCausalLM Copied from
* Move OLMo and Llama CausalLM.forward example into global constants
* Fix OLMO_GENERATION_EXAMPLE doc string typo
* Add option for qkv clipping to OLMo
* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
* Fix OLMo tokenization bug using conversion script
* Keep model in full precision after conversion
* Do not add eos token automatically
* Update references to OLMo model in HF Hub
* Do not add eos token during encoding by default
* Fix Llama generation example
* Run make fixup
* OLMo 7B integration test fix
* Remove unneeded special case for OLMoConfig
* OLMo 7B Twin 2T integration test fix
* Fix test_model_7b_greedy_generation
* Remove test_compile_static_cache
* Fix OLMo and Llama generation example
* Run make fixup
* Revert "OLMo 7B integration test fix"
This reverts commit 4df56a4b15.
* Revert "OLMo 7B Twin 2T integration test fix"
This reverts commit 9ff65a4a29.
* Ungate 7B integration tests and fix greedy generation test
* Add retries for flaky test_eager_matches_sdpa_generate
* Fix output of doc example for OLMoForCausalLM.forward
* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
* Try fix incorrect characters in OLMoForCausalLM.forward doct test
* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
* Remove pretraining_tp from OLMo config and model
* Add missing 'Copied from' instances
* Remove unneeded causal_mask from OLMoModel
* Revert Llama changes
* Ignore copy for OLMoForCausalLM.forward
* Change 'OLMo' to 'Olmo' in classes
* Move minimal OLMo tokenization tests to model tests
* Add missed 'Copied from' for repeat_kv
* Configuring Translation Pipelines documents update #27753
Configuring Translation Pipelines documents update
* Language Format Addition
* adding supported list of languages list
* Fork.
* RecurrentGemma initial commit.
* Updating __init__.py.
* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.
* Reformat code to 4 spaces.
Fixed a few typos.
* Fixed the forward pass.
Still unclear on the cache?
* Fixed the RecurrentGemmaForCausalLM
* Minor comment that we might not need attention_mask and output_attention arguments.
* Now cache should work as well.
* Adding a temporary example to check whether the model generation works.
* Adding the tests and updating imports.
* Adding the example file missing in the previous commit.
* First working example.
* Removing .gitignore and reverting parts of __init__.
* Re-add .gitignore.
* Addressing comments for configuration.
* Move mask creation to `_prepare_inputs_for_generation`.
* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
* Transfoering between machines.
* Running normal tests.
* Minor fix.
* More fixes.
* Addressing more comments.
* Minor fixes.
* first stab at cleanup
* more refactoring
* fix copies and else
* renaming and get init to work
* fix causal mask creation
* update
* nit
* fix a hell lot of things
* updates
* update conversion script
* make all keys importable
* nits
* add auto mappings
* properly convert ffw_up and down
* add scaling
* fix generations
* for recurrent dtype
* update
* fix going beyong window
* fixup
* add missing files
* current updates to remove last einops
* finish modeling refactor
* TADA
* fix compile
* fix most failing testt ? ?
* update tests
* refactor and update
* update
* nits, fixup and update tests
* more fixup
* nits
* fix imports
* test format
* fixups
* nits
* tuple typing
* fix code quality
* add model card
* fix doc
* skip most generation tests
* nits
* style
* doc fixes
* fix pr and check_copies?
* last nit
* oupsy
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update
* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update based on review
* doc nit
* fix quality
* quality
* fix slow test model path
* update default dype
* ignore attributes that can be safely ignored in check config attributes
* 0lallalala come on
* save nit
* style
* remove to dict update
* make sure we can also run in float16
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Cohere Model Release (#1)
Cohere Model Release
* Remove unnecessary files and code (#2)
Some cleanup
* Delete cohere-model directory (#3)
* Make Fix (#5)
* Pr fixes (#6)
* fixes for pr
* pr fixes for the format
* pr fixes for the format
* src/transformers/models/auto/tokenization_auto.py
* Tokenizer test (#8)
* tokenizer test
* format fix
* Adding Docs and other minor changes (#7)
* Add modeling tests (#9)
* Smol Fix (#11)
* tokenization tests are fixed
* format fixes
* fix pr doc tests
* fix pr doc tests
* fix pr doc tests
* fix pr style check
* small changes in cohere.md
* FIX: Address final comments for transformers integration (#13)
* fix modeling final nits and add proper test file
* for now leave empty tests
* add integration test
* push new test
* fix modeling cohere (#14)
* Update chat templates to use the new API (#15)
---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit c4a04416dd.
* Updated conversion script to reflect module name change
* Deprecated reshape_last_stage option in config
* Removed unused imports
* Code formatting
* Fixed outdated decorators on test_inference_fp16
* Added "Copied from" comments in test_modeling_pvt_v2.py
* Fixed import listing
* Updated model name
* Force empty commit for PR refresh
* Fixed linting issue
* Removed # Copied from comments
* Added PVTv2 to README_fr.md
* Ran make fix-copies
* Replace all FoamoftheSea hub references with OpenGVLab
* Fixed out_indices and out_features logic in configuration_pvt_v2.py
* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
* Ran code fixup
* Fixed order of parent classes in PvtV2Config to fix the to_dict method override
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/
* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
fix typo:
from:
"model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")"
to:
model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
* first commit
* correct default value non causal
* update config and modeling code
* update converting checkpoint
* clean modeling and fix tests
* make style
* add new config parameters to docstring
* fix copied from statements
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* make position_embeddings_type docstrings clearer
* clean converting script
* remove function not used
* clean modeling file
* apply suggestion for test file + add convert script to not_doctested
* modify tests according to review - cleaner logic and more tests
* Apply nit suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add checker of valid position embeddings type
* instantiate new layer norm layer with the right eps
* fix freeze_feature_encoder since it can be None in some cases
* add test same output in convert script
* restore wav2vec2conformer and add new model
* create processor and FE + clean
* add new model code
* fix convert script and set default config parameters
* correct model id paths
* make style
* make fix-copies and cleaning files
* fix copied from statements
* complete .md and fixe copies
* clean convert script argument defaults
* fix config parameters docstrings
* fix config docstring
* add copied from and enrich FE tests
* fix copied from and repo-consistency
* add autotokenizer
* make test input length shorter and change docstring code
* fix docstrings and copied from
* add add_adapter to ASR training example
* make testing of adapters more robust
* adapt to multi adapter layers
* refactor input_values->input_features and remove w2v2-bert feature extractor
* remove pretraining model
* remove depreciated features and useless lines
* add copied from and ignore statements to modeling tests
* remove pretraining model #2
* change import in convert script
* change default in convert script
* update readme and remove useless line
* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor BERT to Bert for consistency
* remove useless ignore copy statement
* add persistent to buffer in rotary
* add eps in LayerNorm init and remove copied from
* add adapter activation parameters and add copied from statements
* Fix copied statements and add unitest.skip reasons
* add copied statement in test_processor
* refactor processor
* make style
* replace numpy random by torch rand
* remove expected output CTC
* improve converting script with processor class
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove gumbel class
* remove tests related to previously deleted class
* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct typos
* remove uused parameters
* update processor to takes both text and audio
* update checkpoints
* update expected output and add ctc expected output
* add label_attention_mask
* replace pt with np in processor tests
* fix typo
* revert to behaviour with labels_attention_mask
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* start - docs, SpeechT5 copy and rename
* add relevant code from FastSpeech2 draft, have tests pass
* make it an actual conformer, demo ex.
* matching inference with original repo, includes debug code
* refactor nn.Sequentials, start more desc. var names
* more renaming
* more renaming
* vocoder scratchwork
* matching vocoder outputs
* hifigan vocoder conversion script
* convert model script, rename some config vars
* replace postnet with speecht5's implementation
* passing common tests, file cleanup
* expand testing, add output hidden states and attention
* tokenizer + passing tokenizer tests
* variety of updates and tests
* g2p_en pckg setup
* import structure edits
* docstrings and cleanup
* repo consistency
* deps
* small cleanup
* forward signature param order
* address comments except for masks and labels
* address comments on attention_mask and labels
* address second round of comments
* remove old unneeded line
* address comments part 1
* address comments pt 2
* rename auto mapping
* fixes for failing tests
* address comments part 3 (bart-like, train loss)
* make style
* pass config where possible
* add forward method + tests to WithHifiGan model
* make style
* address arg passing and generate_speech comments
* address Arthur comments
* address Arthur comments pt2
* lint changes
* Sanchit comment
* add g2p-en to doctest deps
* move up self.encoder
* onnx compatible tensor method
* fix is symbolic
* fix paper url
* move models to espnet org
* make style
* make fix-copies
* update docstring
* Arthur comments
* update docstring w/ new updates
* add model architecture images
* header size
* md wording update
* make style
* fix: minor enhancement and fix in bounding box visualization example
The example that was trying to visualize the bounding box was not considering an edge case,
where the bounding box can be un-normalized. So using the same set of code, we can not get
results with a different dataset with un-normalized bounding box. This commit fixes that.
* run make clean
* add an additional note on the scenarios where the box viz code works
---------
Co-authored-by: Anindyadeep <anindya@pop-os.localdomain>