* fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix format
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* [run_slow] idefics2
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* added sequences_scores to the output
* added beam_indices to output
* added test to check for beam_indices, sequences_scores and their shape
* removed redundant whitespaces
* make fixup
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* split test case
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix ci failure
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refine test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading
* make style
* fix sew
* fix sew and sew_d tests
* Fix failing tensor placement in Whisper
* fix long form generation tests
* more return_timestamps=True
* make fixup
* [run_slow] whisper
* [run_slow] whisper
* Uniformize kwargs for LlaVa and update docs
* Change order of processor inputs in docstring
* Improve BC support for reversed images and text inputs
* cleanup llava processor call docstring
* Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning
* Put function check reversed images text outside base processor class
* Refactor _validate_images_text_input_order
* Add ProcessingUtilTester
* fix processing and test_processing
* initial commit
* gloups
* updates
* work
* weights match
* nits
* nits
* updates to support the tokenizer :)
* updates
* Pixtral processor (#33454)
* rough outline
* Add in image break and end tokens
* Fix
* Udo some formatting changes
* Set patch_size default
* Fix
* Fix token expansion
* nit in conversion script
* Fix image token list creation
* done
* add expected results
* Process list of list of images (#33465)
* updates
* working image and processor
* this is the expected format
* some fixes
* push current updated
* working mult images!
* add a small integration test
* Uodate configuration docstring
* Formatting
* Config docstring fix
* simplify model test
* fixup modeling and etests
* Return BatchMixFeature in image processor
* fix some copies
* update
* nits
* Update model docstring
* Apply suggestions from code review
* Fix up
* updates
* revert modeling changes
* update
* update
* fix load safe
* addd liscence
* update
* use pixel_values as required by the model
* skip some tests and refactor
* Add pixtral image processing tests (#33476)
* Image processing tests
* Add processing tests
* woops
* defaults reflect pixtral image processor
* fixup post merge
* images -> pixel values
* oups sorry Mr docbuilder
* isort
* fix
* fix processor tests
* small fixes
* nit
* update
* last nits
* oups this was really breaking!
* nits
* is composition needs to be true
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* Addressed comments
* added a unit test
* fixed cache position
* Added a warning msg to the forward fn
* fixed test case
* Update tokenization_whisper.py
Fix issue with flax whisper model
* Update tokenization_whisper_fast.py
Fix issue with flax whisper model
* Update tokenization_whisper.py
just check len of token_ids
* Update tokenization_whisper_fast.py
just use len of token_ids
* Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update test_tokenization_whisper.py to add test for _convert_to_list method
* Update test_tokenization_whisper.py to fix code style issues
* Fix code style
* Fix code check again
* Update test_tokenization)whisper.py to Improve code style
* Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available
* Update tests/models/whisper/test_tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method
* Revert the changes automatically applied by formatter and was unrelated to PR
* Format for minimal changes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Import structure & first three model refactors
* Register -> Export. Export all in __all__. Sensible defaults according to filename.
* Apply most comments from Amy and some comments from Lucain
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* Style
* Add comment
* Clearer .py management
* Raise if not in backend mapping
* More specific type
* More efficient listdir
* Misc fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* add self.head_dim for VisionAttention in Qwen2-VL
* add self.head_dim for VisionAttention in Qwen2-VL
* fix ci
* black the test_modeling_qwen2_vl.py
* use ruff to format test_modeling_qwen2_vl.py
* [run-slow] qwen2_vl
* use tying for python3.8
* fix the import format
* use ruff to fix the ci error I001
* [run-slow] qwen2_vl
* remove unused import
* commit for rebase
* use ruff fix ci
* [run-slow] qwen2_vl
---------
Co-authored-by: root <liji>
* Add validation for maximum sequence length in modeling_whisper.py
Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.
This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.
* Change exception message in src/transformers/models/whisper/modeling_whisper.py
The exception message is for whisper's label's sequence max length.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py
It's for whisper's config.max_target_positions.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py
* Add test for maximum label's sequence length in test_modeling_whisper.py
* Add self to modeling_whisper.py
* Update test_modeling_whisper.py with respect to automatic validations
* Update modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Remove assert from test_modeling_whisper.py
* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py
* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py
* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py
* Add new tests in test_modeling_whisper.py
* Update test_modeling_whisper.py
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Load remote code only once
* Use hash as load indicator
* Add a new option `force_reload` for old behavior (i.e. always reload)
* Add test for dynamic module is cached
* Add more type annotations to improve code readability
* Address comments from code review
* [InstructBLIP] qformer_tokenizer is required input
* Bit safer
* Add to instructblipvideo processor
* Fix up
* Use video inputs
* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* init fix
* fix mask during cached forward, move mask related stuff to own function
* adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp)
* revert overwriting new integration tests
* move some comments to docstring
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs
* remove crop_size argument in align processor tests to be coherent with base tests
* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
* fix typo
* uniform kwargs
* make style
* add comments
* remove return_tensors
* remove common_kwargs from processor since it propagates
* make style
* return_token_type_ids to True
* revert the default imagekwargs since does not accept any value in the image processro
* revert processing_utils.py
* make style
* add molbap's commit
* fix typo
* fix common processor
* remain
* Revert "add molbap's commit"
This reverts commit a476c6ee88.
* add unsync PR
* revert
* make CI happy
* nit
* import annotationformat
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* Draft fast image processors
* Draft working fast version
* py3.8 compatible cache
* Enable loading fast image processors through auto
* Tidy up; rescale behaviour based on input type
* Enable tests for fast image processors
* Smarter rescaling
* Don't default to Fast
* Safer imports
* Add necessary Pillow requirement
* Woops
* Add AutoImageProcessor test
* Fix up
* Fix test for imagegpt
* Fix test
* Review comments
* Add warning for TF and JAX input types
* Rearrange
* Return transforms
* NumpyToTensor transformation
* Rebase - include changes from upstream in ImageProcessingMixin
* Safe typing
* Fix up
* convert mean/std to tesnor to rescale
* Don't store transforms in state
* Fix up
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Warn if fast image processor available
* Update src/transformers/models/vit/image_processing_vit_fast.py
* Transpose incoming numpy images to be in CHW format
* Update mapping names based on packages, auto set fast to None
* Fix up
* Fix
* Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test
* Update src/transformers/models/vit/image_processing_vit_fast.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add equivalence and speed tests
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* docstring and argument fix
* doc fixes and test case fix suggested in review.
* varibale typo fix
* styling and name fixes for padding interpolation flag.
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
* Initial attempt
* Updates: PR suggestions
* Interpolate the relative position bias when interpolate_pos_encoding is True
* Add slow tag for the added tests
* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
* Added interpolate pos encoding feature and test to deit
* Added interpolate pos encoding feature and test for deit TF model
* readded accidentally delted test for multi_gpu
* storing only patch_size instead of entire config and removed commented code
* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix the get_size_with_aspect_ratio in max_size situation
* make fix-up
* add more general solution
* consider when max_size is not defined
* fix typo
* fix typo
* simple fix
* fix error
* fix if else error
* fix error of size overwrite
* fix yolos image processing
* fix detr image processing
* make
* add longest related test script
* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more test
* add test script about longest size
* remove deprecated
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* seems like `split_special_tokens` is used here
* split special token
* add new line at end of file
* moving split special token test to common tests
* added assertions
* test
* fixup
* add co-author
* passing rest of args to gptsan_japanese, fixing tests
* removing direct comparison of fast and slow models
* adding test support for UDOP and LayoutXLM
* ruff fix
* readd check if slow tokenizer
* modify test to handle bos tokens
* removing commented function
* trigger build
* applying review feedback - updated docstrings, var names, and simplified tests
* ruff fixes
* Update tests/test_tokenization_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* applying feedback, comments
* shutil temp directory fix
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
Co-authored-by: itazap <itazap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>
* added interpolation for vitmae model in pytorch as well as tf.
* Update modeling_vit_mae.py
irreugalr import fixed
* small changes and proper formatting
* changes suggested in review.
* modified decoder interpolate_func
* arguments and docstring fix
* Apply suggestions from code review
doc fixes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add test that currently fails
* test passed
* all perceiver passed
* fixup, style, quality, repo-consistency, all passed
* Apply suggestions from code review: default to False + compute sqrt once only
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix a minor bracket
* replace dim with self._num_channels
* add arguments to the rest preprocessors
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True
* Testing for the non-safe-tensors case, since the default is safe-tensors already
* Running fixup/fix-copies
* Adding accelerate annotations to tests
* change cis
* nits
* update
* minor updates
* [push-ci-image]
* nit [push-ci-image]
* nitsssss
* [build-ci-image]
* [push-ci-image]
* [push-ci-image]
* both
* [push-ci-image]
* this?
* [push-ci-image]
* pypi-kenlm needs g++
* [push-ci-image]
* nit
* more nits [push-ci-image]
* nits [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* add vision
* [push-ci-image]
* [push-ci-image]
* add new dummy file but will need to update them [push-ci-image]
* [push-ci-image]
* show package size as well
* [push-ci-image]
* potentially ignore failures
* workflow updates
* nits [push-ci-image]
* [push-ci-image]
* fix consistency
* clean nciida triton
* also show big packages [push-ci-image]
* nit
* update
* another one
* line escape?
* add accelerate [push-ci-image]
* updates [push-ci-image]
* nits to run tests, no push-ci
* try to parse skip reason to make sure nothing is skipped that should no be skippped
* nit?
* always show skipped reasons
* nits
* better parsing of the test outputs
* action="store_true",
* failure on failed
* show matched
* debug
* update short summary with skipped, failed and errors
* nits
* nits
* coolu pdates
* remove docbuilder
* fix
* always run checks
* oups
* nits
* don't error out on library printing
* non zero exi codes
* no warning
* nit
* WAT?
* format nit
* [push-ci-image]
* fail if fail is needed
* [push-ci-image]
* sound file for torch light?
* [push-ci-image]
* order is important [push-ci-image]
* [push-ci-image] reduce even further
* [push-ci-image]
* use pytest rich !
* yes [push-ci-image]
* oupsy
* bring back the full traceback, but pytest rich should help
* nit
* [push-ci-image]
* re run
* nit
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* empty push to trigger
* [push-ci-image]
* nit? [push-ci-image]
* empty
* try to install timm with no deps
* [push-ci-image]
* oups [push-ci-image]
* [push-ci-image]
* [push-ci-image] ?
* [push-ci-image] open ssh client for git checkout fast
* empty for torch light
* updates [push-ci-image]
* nit
* @v4 for checkout
* [push-ci-image]
* [push-ci-image]
* fix fetch tests with parallelism
* [push-ci-image]
* more parallelism
* nit
* more nits
* empty to re-trigger
* empty to re-trigger
* split by timing
* did not work with previous commit
* junit.xml
* no path?
* mmm this?
* junitxml format
* split by timing
* nit
* fix junit family
* now we can test if the xunit1 is compatible!
* this?
* fully list tests
* update
* update
* oups
* finally
* use classname
* remove working directory to make sure the path does not interfere
* okay no juni should have the correct path
* name split?
* sort by classname is what make most sense
* some testing
* naem
* oups
* test something fun
* autodetect
* 18?
* nit
* file size?
* uip
* 4 is best
* update to see versions
* better print
* [push-ci-image]
* [push-ci-image]
* please install the correct keras version
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* uv is fucking me up
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* nits
* [push-ci-image]
* [push-ci-image]
* install issues an pins
* tapas as well
* nits
* more paralellism
* short tb
* soundfile
* soundfile
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* oups
* [push-ci-image]
* fix some things
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* use torch-light for hub
* small git lfs for hub job
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fix tf tapas
* [push-ci-image]
* nits
* [push-ci-image]
* don't update the test
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* no use them
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* update tf proba
* [push-ci-image]
* [push-ci-image]
* woops
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* test with built dockers
* [push-ci-image]
* skip annoying tests
* revert fix copy
* update test values
* update
* last skip and fixup
* nit
* ALL GOOOD
* quality
* Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py
* Update docker/quality.dockerfile
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update src/transformers/models/tapas/modeling_tf_tapas.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* use torch-speed
* updates
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fuck ken-lm [push-ci-image]
* [push-ci-image]
* [push-ci-image]
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move scaling to nn.Module
* let the test be here for now (need to fix)
* failing tests
* last failing models
* Revert commit 4c14817f38
* clean-up
* oops forgot
* codestyle
* raise NotImplemented when possible
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* skip tests in respective modeling files
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Enable instantiating model with pretrained backbone weights
* Clarify pretrained import
* Use load_backbone instead
* Add backbone_kwargs to config
* Fix up
* Add tests
* Tidy up
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add backbone_kwargs to config
* Pass kwargs to constructors
* Draft
* Fix tests
* Add back timm - weight naming
* More tidying up
* Whoops
* Tidy up
* Handle when kwargs are none
* Update tests
* Revert test changes
* Deformable detr test - don't use default
* Don't mutate; correct model attributes
* Add some clarifying comments
* nit - grammar is hard
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Adding SDPA support for BERT
* Using the proper input name for testing model input in inference()
* Adding documentation for SDPA in BERT model page
* Use the stable link for the documentation
* Adding a gate to only call .contiguous() for torch < 2.2.0
* Additions and fixes to the documentation
* Minor updates to documentation
* Adding extra requirements needed for the contiguous() bug
* Adding "Adapted from" in plcae of the "Copied from"
* Add benchmark speedup tables to the documentation
* Minor fixes to the documentation
* Use ClapText as a replacemenet for Bert in the Copied-From
* Some more fixes for the fix-copies references
* Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
[test all]
* Undo changes to separate test
* Refactored SDPA self attention code for KV projections
* Change use_sdpa to attn_implementation
* Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
* first modeling code
* make repository
* still WIP
* update model
* add tests
* add latest change
* clean docstrings and copied from
* update docstrings md and readme
* correct chroma function
* correct copied from and remove unreleated test
* add doc to toctree
* correct imports
* add convert script to notdoctested
* Add suggestion from Sanchit
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct get_uncoditional_inputs docstrings
* modify README according to SANCHIT feedback
* add chroma to audio utils
* clean librosa and torchaudio hard dependencies
* fix FE
* refactor audio decoder -> audio encoder for consistency with previous musicgen
* refactor conditional -> encoder
* modify sampling rate logics
* modify license at the beginning
* refactor all_self_attns->all_attentions
* remove ignore copy from causallm generate
* add copied from for from_sub_models
* fix make copies
* add warning if audio is truncated
* add copied from where relevant
* remove artefact
* fix convert script
* fix torchaudio and FE
* modify chroma method according to feedback-> better naming
* refactor input_values->input_features
* refactor input_values->input_features and fix import fe
* add input_features to docstrigs
* correct inputs_embeds logics
* remove dtype conversion
* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
* change warning for chroma length
* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* change way to save wav, using soundfile
* correct docs and change to soundfile
* fix import
* fix init proj layers
* add draft training
* fix cross entropy
* clean loss computation
* fix labels
* remove line breaks from md
* fix issue with docstrings
* add FE suggestions
* improve is in logics and remove useless imports
* remove custom from_pretrained
* simplify docstring code
* add suggestions for modeling tests
* make style
* update converting script with sanity check
* remove encoder attention mask from conditional generation
* replace musicgen melody checkpoints with official orga
* rename ylacombe->facebook in checkpoints
* fix copies
* remove unecessary warning
* add shape in code docstrings
* add files to slow doc tests
* fix md bug and add md to not_tested
* make fix-copies
* fix hidden states test and batching
* update training code
* add training tests for melody
* add training for o.g musicgen
* fix copied from
* remove final todos
* make style
* fix style
* add suggestions from review
* add ref to the original loss computation code
* rename method + fix labels in tests
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>