* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
* first commit
* correct default value non causal
* update config and modeling code
* update converting checkpoint
* clean modeling and fix tests
* make style
* add new config parameters to docstring
* fix copied from statements
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* make position_embeddings_type docstrings clearer
* clean converting script
* remove function not used
* clean modeling file
* apply suggestion for test file + add convert script to not_doctested
* modify tests according to review - cleaner logic and more tests
* Apply nit suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add checker of valid position embeddings type
* instantiate new layer norm layer with the right eps
* fix freeze_feature_encoder since it can be None in some cases
* add test same output in convert script
* restore wav2vec2conformer and add new model
* create processor and FE + clean
* add new model code
* fix convert script and set default config parameters
* correct model id paths
* make style
* make fix-copies and cleaning files
* fix copied from statements
* complete .md and fixe copies
* clean convert script argument defaults
* fix config parameters docstrings
* fix config docstring
* add copied from and enrich FE tests
* fix copied from and repo-consistency
* add autotokenizer
* make test input length shorter and change docstring code
* fix docstrings and copied from
* add add_adapter to ASR training example
* make testing of adapters more robust
* adapt to multi adapter layers
* refactor input_values->input_features and remove w2v2-bert feature extractor
* remove pretraining model
* remove depreciated features and useless lines
* add copied from and ignore statements to modeling tests
* remove pretraining model #2
* change import in convert script
* change default in convert script
* update readme and remove useless line
* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor BERT to Bert for consistency
* remove useless ignore copy statement
* add persistent to buffer in rotary
* add eps in LayerNorm init and remove copied from
* add adapter activation parameters and add copied from statements
* Fix copied statements and add unitest.skip reasons
* add copied statement in test_processor
* refactor processor
* make style
* replace numpy random by torch rand
* remove expected output CTC
* improve converting script with processor class
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove gumbel class
* remove tests related to previously deleted class
* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct typos
* remove uused parameters
* update processor to takes both text and audio
* update checkpoints
* update expected output and add ctc expected output
* add label_attention_mask
* replace pt with np in processor tests
* fix typo
* revert to behaviour with labels_attention_mask
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add first draft
* Use appropriate gelu function
* More improvements
* More improvements
* More improvements
* Convert checkpoint
* More improvements
* Improve docs, remove print statements
* More improvements
* Add link
* remove unused masking function
* begin tokenizer
* do_lower_case
* debug
* set split_special_tokens=True
* Remove script
* Fix style
* Fix rebase
* Use same design as CLIP
* Add fast tokenizer
* Add SiglipTokenizer to init, remove extra_ids
* Improve conversion script
* Use smaller inputs in conversion script
* Update conversion script
* More improvements
* Add processor to conversion script
* Add tests
* Remove print statements
* Add tokenizer tests
* Fix more tests
* More improvements related to weight initialization
* More improvements
* Make more tests pass
* More improvements
* More improvements
* Add copied from
* Add canonicalize_text
* Enable fast tokenizer tests
* More improvements
* Fix most slow tokenizer tests
* Address comments
* Fix style
* Remove script
* Address some comments
* Add copied from to tests
* Add more copied from
* Add more copied from
* Add more copied from
* Remove is_flax_available
* More updates
* Address comment
* Remove SiglipTokenizerFast for now
* Add caching
* Remove umt5 test
* Add canonicalize_text inside _tokenize, thanks Arthur
* Fix image processor tests
* Skip tests which are not applicable
* Skip test_initialization
* More improvements
* Compare pixel values
* Fix doc tests, add integration test
* Add do_normalize
* Remove causal mask and leverage ignore copy
* Fix attention_mask
* Fix remaining tests
* Fix dummies
* Rename temperature and bias
* Address comments
* Add copied from to tokenizer tests
* Add SiglipVisionModel to auto mapping
* Add copied from to image processor tests
* Improve doc
* Remove SiglipVisionModel from index
* Address comments
* Improve docs
* Simplify config
* Add first draft
* Make it like mistral
* More improvements
* Fix attention_mask
* Fix output_attentions
* Add note in docs
* Convert multilingual model
* Convert large checkpoint
* Convert more checkpoints
* Add pipeline support, correct image_mean and image_std
* Use padding=max_length by default
* Make processor like llava
* Add code snippet
* Convert more checkpoints
* Set keep_punctuation_string=None as in OpenCLIP
* Set normalized=False for special tokens
* Fix doc test
* Update integration test
* Add figure
* Update organization
* Happy new year
* Use AutoModel everywhere
---------
Co-authored-by: patil-suraj <surajp815@gmail.com>
* start - docs, SpeechT5 copy and rename
* add relevant code from FastSpeech2 draft, have tests pass
* make it an actual conformer, demo ex.
* matching inference with original repo, includes debug code
* refactor nn.Sequentials, start more desc. var names
* more renaming
* more renaming
* vocoder scratchwork
* matching vocoder outputs
* hifigan vocoder conversion script
* convert model script, rename some config vars
* replace postnet with speecht5's implementation
* passing common tests, file cleanup
* expand testing, add output hidden states and attention
* tokenizer + passing tokenizer tests
* variety of updates and tests
* g2p_en pckg setup
* import structure edits
* docstrings and cleanup
* repo consistency
* deps
* small cleanup
* forward signature param order
* address comments except for masks and labels
* address comments on attention_mask and labels
* address second round of comments
* remove old unneeded line
* address comments part 1
* address comments pt 2
* rename auto mapping
* fixes for failing tests
* address comments part 3 (bart-like, train loss)
* make style
* pass config where possible
* add forward method + tests to WithHifiGan model
* make style
* address arg passing and generate_speech comments
* address Arthur comments
* address Arthur comments pt2
* lint changes
* Sanchit comment
* add g2p-en to doctest deps
* move up self.encoder
* onnx compatible tensor method
* fix is symbolic
* fix paper url
* move models to espnet org
* make style
* make fix-copies
* update docstring
* Arthur comments
* update docstring w/ new updates
* add model architecture images
* header size
* md wording update
* make style
* add sdpa
* wip
* cleaning
* add ref
* yet more cleaning
* and more :)
* wip llama
* working llama
* add output_attentions=True support
* bigcode sdpa support
* fixes
* gpt-bigcode support, require torch>=2.1.1
* add falcon support
* fix conflicts falcon
* style
* fix attention_mask definition
* remove output_attentions from attnmaskconverter
* support whisper without removing any Copied from statement
* fix mbart default to eager renaming
* fix typo in falcon
* fix is_causal in SDPA
* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
* add warnings when falling back on the manual implementation
* precise doc
* wip replace _flash_attn_enabled by config.attn_implementation
* fix typo
* add tests
* style
* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
* obey to config.attn_implementation if a config is passed in from_pretrained
* fix is_torch_sdpa_available when torch is not installed
* remove dead code
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove duplicate pretraining_tp code
* add dropout in llama
* precise comment on attn_mask
* add fmt: off for _unmask_unattended docstring
* precise num_masks comment
* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
* cleanup modeling_utils
* backward compatibility
* fix style as requested
* style
* improve documentation
* test pass
* style
* add _unmask_unattended tests
* skip meaningless tests for idefics
* hard_check SDPA requirements when specifically requested
* standardize the use if XXX_ATTENTION_CLASSES
* fix SDPA bug with mem-efficient backend on CUDA when using fp32
* fix test
* rely on SDPA is_causal parameter to handle the causal mask in some cases
* fix FALCON_ATTENTION_CLASSES
* remove _flash_attn_2_enabled occurences
* fix test
* add OPT to the list of supported flash models
* improve test
* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
* remove remaining _flash_attn_2_enabled occurence
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove use_attn_implementation
* fix docstring & slight bug
* make attn_implementation internal (_attn_implementation)
* typos
* fix tests
* deprecate use_flash_attention_2=True
* fix test
* add back llama that was removed by mistake
* fix tests
* remove _flash_attn_2_enabled occurences bis
* add check & test that passed attn_implementation is valid
* fix falcon torchscript export
* fix device of mask in tests
* add tip about torch.jit.trace and move bt doc below sdpa
* fix parameterized.expand order
* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
* update sdpaattention class with the new cache
* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bark/modeling_bark.py
* address review comments
* WIP torch.jit.trace fix. left: test both eager & sdpa
* add test for torch.jit.trace for both eager/sdpa
* fix falcon with torch==2.0 that needs to use sdpa
* fix doc
* hopefully last fix
* fix key_value_length that has no default now in mask converter
* is it flacky?
* fix speculative decoding bug
* tests do pass
* fix following #27907
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add model like
* logits match
* minor fixes
* fixes
* up
* up
* add todo
* llava processor
* keep the processor simple
* add conversion script
* fixup
* fix copies
* up
* add to index
* fix config + logits
* fix
* refactor
* more refactor
* more refactor
* fix copies
* add authors
* v1 tests
* add `LlavaProcessor` in init
* remove unneeded import
* up
* up
* docs
* up
* fix CI
* fix CI
* add attention mask in test
* make fixup
* remove the vision model
* that' s the dirty way to do it
* nits
* nits
* updates
* add more tests
* add input tests
* fixup
* more styling
* nits
* updates amd cleanup
* fixup the generation expected results
* fix the testing script
* some cleanup and simplification which does not work yet but almost there!
* make correct dispatch operations
* vectorize works for batch of images and text
* last todos
* nits
* update test and modeling code
* remove useless function for now
* fix few issues
* fix generation
* some nits
* add bakllava
* nits
* remove duplicated code
* finis merge
* cleanup
* missed this line
* fill the todos
* add left padding offset
* add left and rignt padding logic
* bool to properly index
* make sure
* more cleanups
* batch is fixed 😉
* add correct device for tensor creation
* fix some dtype missmatch
* ruff
* update conversion script
* Update src/transformers/__init__.py
* fa 2 support + fix conversion script
* more
* correct reshaping
* fix test dict
* fix copies by ignoring
* fix nit
* skip clip vision model
* fixup
* fixup
* LlavaForVisionText2Text -> LlavaForCausalLM
* update
* fix
* raise correct errors
* fix
* docs
* nuke for now
* nits here and there
* fixup
* fix remaining tests
* update LlavaForConditionalGeneration instead of CausalLM
* fixups
* pipeline support
* slow and piepline tests
* supports batch
* nits
* cleanup
* fix first integration tests
* add pad token where needed
* correct etsts
* fixups
* update pipeline testr
* fix quality
* nits
* revert unneeded change
* nit
* use BatchFeature
* from ...feature_extraction_utils import BatchFeature
* nits
* nits
* properly update
* more f*** nits
* fix copies
* comment
* keep slow test slow
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add piepline example
* add pixel values in docstrign
* update pr doctest
* fix
* fix slow tests
* remove hack
* fixup
* small note
* forward contrib credits from PR25789
* forward contrib credits from original implementation and work
* add arthur
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update docstring
* nit
* move to not doctested because of timeout issues
* fixup
* add description
* more
* fix-copies
* fix docs
* add beam search
* add more comments
* add typehints on processor
* add speedup plot
* update slow tests and docs
* push test
* push batched test
* fix batched generation with different number of images
* remove benchmark due to a bug
* fix test
* fix copies
* add gcolab demo
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Copies `modeling_flax_gpt_neo.py` to start
* MLP Block. WIP Attention and Block
* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue
* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.
* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating
* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..
* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?
* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now
* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now
* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting
* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗
* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later
* start porting pretrained wrappers
realised it probably needs return dict as a prereq
* cleanup, quality, style
* readds `return_dict` and model output named tuples
* (tentatively) pretrained wrappers work 🔥
* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.
* [WIP] debugging numerics
* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.
* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict
* adds missing TYPE_CHECKING import and `make fixup`
* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO
* commenting out equivalence test as can just use common
* debugging
* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥
* cleanup of modeling file
* cleanup of test file
* Resolving simpler review comments
* addresses more minor review comments
* fixing introduced pytest errors from review
* wip additional slow tests
* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay
* `make quality`, `make style`
* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs
* `make fix-copies`
* fix mangled function following `make fix-copies`
* adds missing type checking imports
* fixes missing parameter checkpoint warning
* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`
* swaps import guards
??? how did these get swapped initially?
* removing `inv_freq` again as pytorch version has now removed
* attempting to get CI to pass
* adds doc entries for llama flax models
* fixes typo in __init__.py imports
* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version
* overrides tests with dummy to see if CI passes
need to fill in these tests later
* adds my contribution to docs
* `make style; make quality`
* replaces random masking with fixed to work with flax version
* `make quality; make style`
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* updates `x`->`tensor` in `rotate_half`
* addresses smaller review comments
* Update docs/source/en/model_doc/llama.md
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adds integration test class
* adds `dtype` to rotary embedding to cast outputs
* adds type to flax llama rotary layer
* `make style`
* `make fix-copies`
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* applies suggestions from review
* Update modeling_flax_llama.py
* `make fix-copies`
* Update tests/models/llama/test_modeling_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixes shape mismatch in FlaxLlamaMLP
* applies some suggestions from reviews
* casts attn output logits to f32 regardless of dtype
* adds attn bias using `LlamaConfig.attention_bias`
* adds Copied From comments to Flax Llama test
* mistral and persimmon test change -copy from llama
* updates docs index
* removes Copied from in tests
it was preventing `make fix-copies` from succeeding
* quality and style
* ignores FlaxLlama input docstring
* adds revision to `_CHECKPOINT_FOR_DOC`
* repo consistency and quality
* removes unused import
* removes copied from from Phi test
now diverges from llama tests following FlaxLlama changes
* adds `_REAL_CHECKPOINT_FOR_DOC`
* removes refs from pr tests
* reformat to make ruff happy
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Add models
* Add models and update `_toctree.yml`
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/blip-2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* solve merge conflicts and update paper titles
* Update docs/source/ja/model_doc/bridgetower.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update the authons name in bros..md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add working convertion script
* first non-working version of modeling code
* update modeling code (working)
* make style
* make fix-copies
* add config docstrings
* add config to ignore docstrings formatage due to unconventional markdown
* fix copies
* fix generation num_return_sequences
* enrich docs
* add and fix tests beside integration tests
* update integration tests
* update repo id
* add tie weights and make style
* correct naming in .md
* fix imports and so on
* correct docstrings
* fix fp16 speech forward
* fix speechencoder attention
* make style
* fix copied from
* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2
* Apply suggestions on configuration
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove useless public models
* fix private models + better naming for T2U models
* clean speech encoder relative position embeddings
* refactor chunk attention
* add docstrings to chunk attention method
* improve naming and docstrings
* rename some attention variables + add temperature sampling in T2U model
* rename DOCSTRINGS variable names
* make style + remove 2 useless config parameters
* enrich model card
* remove any attention_head reference + fix temperature in T2U
* new fmt and make style
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* rename spkr_id->speaker_id and change docstrings of get_char_input_ids
* simplify v2attention
* make style
* Update seamless_m4t_v2.md
* update code and tests with last update
* update repo ids
* fill article name, abstract andauthors
* update not_doctested and slow_doc tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add distribution head to forecasting
* formatting
* Add generate function for forecasting
* Add generate function to prediction task
* formatting
* use argsort
* add past_observed_mask ordering
* fix arguments
* docs
* add back test_model_outputs_equivalence test
* formatting
* cleanup
* formatting
* use ACT2CLS
* formatting
* fix add_start_docstrings decorator
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* fix typos
* add forecast_masking
* fixed tests
* use set_seed
* fix doc test
* formatting
* Update docs/source/en/model_doc/patchtst.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* better var names
* rename PatchTSTTranspose
* fix argument names and docs string
* remove compute_num_patches and unused class
* remove assert
* renamed to PatchTSTMasking
* use num_labels for classification
* use num_labels
* use default num_labels from super class
* move model_type after docstring
* renamed PatchTSTForMaskPretraining
* bs -> batch_size
* more review fixes
* use hidden_state
* rename encoder layer and block class
* remove commented seed_number
* edit docstring
* Add docstring
* formatting
* use past_observed_mask
* doc suggestion
* make fix-copies
* use Args:
* add docstring
* add docstring
* change some variable names and add PatchTST before some class names
* formatting
* fix argument types
* fix tests
* change x variable to patch_input
* format
* formatting
* fix-copies
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* move loss to forward
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* formatting
* fix a bug when pre_norm is set to True
* output_hidden_states is set to False as default
* set pre_norm=True as default
* format docstring
* format
* output_hidden_states is None by default
* add missing docs
* better var names
* docstring: remove default to False in output_hidden_states
* change labels name to target_values in regression task
* format
* fix tests
* change to forecast_mask_ratios and random_mask_ratio
* change mask names
* change future_values to target_values param in the prediction class
* remove nn.Sequential and make PatchTSTBatchNorm class
* black
* fix argument name for prediction
* add output_attentions option
* add output_attentions to PatchTSTEncoder
* formatting
* Add attention output option to all classes
* Remove PatchTSTEncoderBlock
* create PatchTSTEmbedding class
* use config in PatchTSTPatchify
* Use config in PatchTSTMasking class
* add channel_attn_weights
* Add PatchTSTScaler class
* add output_attentions arg to test function
* format
* Update doc with image patchtst.md
* fix-copies
* rename Forecast <-> Prediction
* change name of a few parameters to match with PatchTSMixer.
* Remove *ForForecasting class to match with other time series models.
* make style
* Remove PatchTSTForForecasting in the test
* remove PatchTSTForForecastingOutput class
* change test_forecast_head to test_prediction_head
* style
* fix docs
* fix tests
* change num_labels to num_targets
* Remove PatchTSTTranspose
* remove arguments in PatchTSTMeanScaler
* remove arguments in PatchTSTStdScaler
* add config as an argument to all the scaler classes
* reformat
* Add norm_eps for batchnorm and layernorm
* reformat.
* reformat
* edit docstring
* update docstring
* change variable name pooling to pooling_type
* fix output_hidden_states as tuple
* fix bug when calling PatchTSTBatchNorm
* change stride to patch_stride
* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder
* formatting
* initialize scalers with configs
* edit output_hidden_states
* style
* fix forecast_mask_patches doc string
* doc improvements
* move summary to the start
* typo
* fix docstring
* turn off masking when using prediction, regression, classification
* return scaled output
* adjust output when using distribution head
* remove _num_patches function in the config
* get config.num_patches from patchifier init
* add output_attentions docstring, remove tuple in output_hidden_states
* change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput
* remove print("model_class: ", model_class)
* change encoder_attention_heads to num_attention_heads
* change norm to norm_layer
* change encoder_layers to num_hidden_layers
* change shared_embedding to share_embedding, shared_projection to share_projection
* add output_attentions
* more robust check of norm_type
* change dropout_path to path_dropout
* edit docstring
* remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding
* edit shape of cls_token and initialize it
* add a check on the num_input_channels.
* edit head_dim in the Prediction class to allow the use of cls_token
* remove some positional_encoding_type options, remove learn_pe arg, initalize pe
* change Exception to ValueError
* format
* norm_type is "batchnorm"
* make style
* change cls_token shape
* Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios.
* Bring PatchTSTClassificationHead on top of PatchTSTForClassification
* change encoder_ffn_dim to ffn_dim and edit the docstring.
* update variable names to match with the config
* add generation tests
* change num_mask_patches to num_forecast_mask_patches
* Add examples explaining the use of these models
* make style
* Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)"
This reverts commit 78f6ed6c70.
* make style
* fix default std scaler's minimum_scale
* fix docstring
* close code blocks
* Update docs/source/en/model_doc/patchtst.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/configuration_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix tests
* add add_start_docstrings
* move examples to the forward's docstrings
* update prepare_batch
* update test
* fix test_prediction_head
* fix generation test
* use seed to create generator
* add output_hidden_states and config.num_patches
* add loc and scale args in PatchTSTForPredictionOutput
* edit outputs if if not return_dict
* use self.share_embedding to check instead checking type.
* remove seed
* make style
* seed is an optional int
* fix test
* generator device
* Fix assertTrue test
* swap order of items in outputs when return_dict=False.
* add mask_type and random_mask_ratio to unittest
* Update modeling_patchtst.py
* add add_start_docstrings for regression model
* make style
* update model path
* Edit the ValueError comment in forecast_masking
* update examples
* make style
* fix commented code
* update examples: remove config from from_pretrained call
* Edit example outputs
* Set default target_values to None
* remove config setting in regression example
* Update configuration_patchtst.py
* Update configuration_patchtst.py
* remove config from examples
* change default d_model and ffn_dim
* norm_eps default
* set has_attentions to Trye and define self.seq_length = self.num_patche
* update docstring
* change variable mask_input to do_mask_input
* fix blank space.
* change logger.debug to logger.warning.
* remove unused PATCHTST_INPUTS_DOCSTRING
* remove all_generative_model_classes
* set test_missing_keys=True
* remove undefined params in the docstring.
---------
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* added flash attention for opt
* added to list
* fix use cache (#3)
* style fix
* fix text
* test fix2
* reverted until 689f599
* torch fx tests are working now!
* small fix
* added TODO docstring
* changes
* comments and .md file modification
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* initial commit
* Add inital testing files and modify __init__ files to add UnivNet imports.
* Fix some bugs
* Add checkpoint conversion script and add references to transformers pre-trained model.
* Add UnivNet entries for auto.
* Add initial docs for UnivNet.
* Handle input and output shapes in UnivNetGan.forward and add initial docstrings.
* Write tests and make them pass.
* Write docs.
* Add UnivNet doc to _toctree.yml and improve docs.
* fix typo
* make fixup
* make fix-copies
* Add upsample_rates parameter to config and improve config documentation.
* make fixup
* make fix-copies
* Remove unused upsample_rates config parameter.
* apply suggestions from review
* make style
* Verify and add reason for skipped tests inherited from ModelTesterMixin.
* Add initial UnivNetGan integration tests
* make style
* Remove noise_length input to UnivNetGan and improve integration tests.
* Fix bug and make style
* Make UnivNet integration tests pass
* Add initial code for UnivNetFeatureExtractor.
* make style
* Add initial tests for UnivNetFeatureExtractor.
* make style
* Properly initialize weights for UnivNetGan
* Get feature extractor fast tests passing
* make style
* Get feature extractor integration tests passing
* Get UnivNet integration tests passing
* make style
* Add UnivNetGan usage example
* make style and use feature extractor from hub in integration tests
* Update tips in docs
* apply suggestions from review
* make style
* Calculate padding directly instead of using get_padding methods.
* Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific.
* Update feature extractor to support using model(**inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__.
* Perform padding before generating noise to ensure the shapes are correct.
* Rename UnivNetGan.forward's noise_waveform argument to noise_sequence.
* make style
* Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__.
* Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model.
* Add expected mean and stddev checks to the integration tests and make them pass.
* make style
* Make it possible to use model(**inputs), where inputs is the output of the feature extractor.
* fix typo in UnivNetGanConfig example
* Calculate spectrogram_zero from other config values.
* apply suggestions from review
* make style
* Refactor UnivNet conversion script to use load_state_dict (following persimmon).
* Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor.
* make style
* Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices.
* make style
* Use config in UnivNetGan modeling blocks.
* make style
* Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper.
* make style
* Improving padding documentation.
* Add UnivNet usage example to the docs.
* apply suggestions from review
* Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor.
* Improve UnivNetGan.forward return docstring.
* Update table in docs/source/en/index.md.
* make fix-copies
* Rename UnivNet components to have pattern UnivNet*.
* make style
* make fix-copies
* Update docs
* make style
* Increase tolerance on flaky unbatched integration test.
* Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors.
* Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding.
* Update documentation and clean up padding code.
* make style
* make style
* Remove torch dependency from UnivNetFeatureExtractor.
* make style
* Fix UnivNetModel usage example
* Clean up feature extractor code/docstrings.
* apply suggestions from review
* make style
* Add comments for tests skipped via ModelTesterMixin flags.
* Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag.
* Add # Copied from statements to copied UnivNetFeatureExtractionTest tests.
* Simplify UnivNetFeatureExtractorTest.test_batch_decode.
* Add support for unbatched padding_masks in UnivNetModel.forward.
* Refactor unbatched padding_mask support.
* make style