* init: add StableLm 2 support
* add integration test for parallel residual and qk layernorm
* update(modeling): match qk norm naming for consistency with phi/persimmon
* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
* refactor: rename head states var in `StableLmLayerNormPerHead`
* tests: update test model and add generate check
* ImportError: Trainer with PyTorch requires accelerate>=0.20.1 Fix
Adding the evaluate and accelerate installs at the beginning of the cell to fix the issue
* ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1
* Import Error Fix
* Update installation.md
* Update quicktour.md
* rollback other lang changes
* Update _config.py
* updates for other languages
* fixing error
* Tutorial Update
* Update tokenization_utils_base.py
* Just use an optimizer string to pass the doctest?
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
* add _torch_extract_fbank_features_batch function in feature_extractor_whisper
* reformat feature_extraction_whisper.py file
* handle batching in single function
* add gpu test & doc
* add batch test & device in each __call__
* add device arg in doc string
---------
Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
* separate jobs
* separate jobs
* use channel name directly instead of ID
* use channel name directly instead of ID
* use channel name directly instead of ID
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update quantizer_bnb_4bit.py
There is an mistake in ValueError on line 86 of quantizer_bnb_4bit.py. In the error string there should be "....you need to set `llm_int8_enable_fp32_cpu_offload=True`...." instead of "load_in_8bit_fp32_cpu_offload=True". I think you updated the BitsAndBytesConfig() arguments, but forgot to change the ValueError in quantizer_bnb_4bit.py.
* Update quantizer_bnb_4bit.py
Changed ValueError string "...you need to set load_in_8bit_fp32_cpu_offload=True..." to "....you need to set llm_int8_enable_fp32_cpu_offload=True...."
* if output is tuple like facebook/hf-seamless-m4t-medium, waveform is the first element
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* add test and fix batch issue
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* add dict output support for seamless_m4t
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* redefaulted padding=longest again
* fixup/doc
* implement convert_mamba_ssm_checkpoint_to_pytorch
* Add test test_model_from_mamba_ssm_conversion
* moved convert_ssm_config_to_hf_config to inside mamba_ssm_available check
* fix skipif clause
* moved skips to inside test since skipif decorator isn't working for some reason
* Added validation
* removed test
* fixup
* only compare logits
* remove weight rename
* Update src/transformers/models/mamba/convert_mamba_ssm_checkpoint_to_pytorch.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix generate_with_fallback **kwargs
* Change pop to get
* Delete keys from kwargs to prevent overriding generation_config
* Revert to passing kwargs by reference, but make a (shallow) copy
* dict -> copy.copy
* Add test_whisper_longform_multi_batch_beam
To address the issue of NaN logit outputs for certain combinations
of the `image_size`, `patch_size` and `depths` configuration
parameters, an assertion was made to ensure that the resulting
`window_size` field in the model's Self Attention class is greater
than 1, preventing divisions by zero in the normalization of
`relative_coords_table`.
Fix: #28675
* Hard error when ignoring tensors. (#27484)
* [WIP] Hard error when ignoring tensors.
* Better selection/error when saving a checkpoint.
- Find all names we should normally drop (those are in the transformers
config)
- Find all disjoint tensors (for those we can safely trigger a copy to
get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
but we try to find them all anyway.)
- For all identical names:
- If they are in the config, just ignore them everything is fine
- If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
disjoint. raise a hard error.
* Adding a failing test on `main` that passes here.
* We don't need to keep the subfolder logic in this test.
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add small tests.
* Dead variable.
* Fixup.
* Fixing tied_Weights_keys on generic models.
* Fixup + T5 encoder/decoder tying (with different layers)
* Code quality.
* Dynamic member.
* trigger
* Fixing encoder name for other types of encoder/decoder combos.
* Fix scoping.
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fixing the tied_weights after the call.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode
* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode
* Exclude pad_token filtering since it is used as CTC-blank token
* Add small test for skip_special_tokens
* Update decoding test for added new token