* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* Draft fast image processors
* Draft working fast version
* py3.8 compatible cache
* Enable loading fast image processors through auto
* Tidy up; rescale behaviour based on input type
* Enable tests for fast image processors
* Smarter rescaling
* Don't default to Fast
* Safer imports
* Add necessary Pillow requirement
* Woops
* Add AutoImageProcessor test
* Fix up
* Fix test for imagegpt
* Fix test
* Review comments
* Add warning for TF and JAX input types
* Rearrange
* Return transforms
* NumpyToTensor transformation
* Rebase - include changes from upstream in ImageProcessingMixin
* Safe typing
* Fix up
* convert mean/std to tesnor to rescale
* Don't store transforms in state
* Fix up
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Warn if fast image processor available
* Update src/transformers/models/vit/image_processing_vit_fast.py
* Transpose incoming numpy images to be in CHW format
* Update mapping names based on packages, auto set fast to None
* Fix up
* Fix
* Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test
* Update src/transformers/models/vit/image_processing_vit_fast.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add equivalence and speed tests
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* docstring and argument fix
* doc fixes and test case fix suggested in review.
* varibale typo fix
* styling and name fixes for padding interpolation flag.
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
* Initial attempt
* Updates: PR suggestions
* Interpolate the relative position bias when interpolate_pos_encoding is True
* Add slow tag for the added tests
* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
* Added interpolate pos encoding feature and test to deit
* Added interpolate pos encoding feature and test for deit TF model
* readded accidentally delted test for multi_gpu
* storing only patch_size instead of entire config and removed commented code
* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix the get_size_with_aspect_ratio in max_size situation
* make fix-up
* add more general solution
* consider when max_size is not defined
* fix typo
* fix typo
* simple fix
* fix error
* fix if else error
* fix error of size overwrite
* fix yolos image processing
* fix detr image processing
* make
* add longest related test script
* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more test
* add test script about longest size
* remove deprecated
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* seems like `split_special_tokens` is used here
* split special token
* add new line at end of file
* moving split special token test to common tests
* added assertions
* test
* fixup
* add co-author
* passing rest of args to gptsan_japanese, fixing tests
* removing direct comparison of fast and slow models
* adding test support for UDOP and LayoutXLM
* ruff fix
* readd check if slow tokenizer
* modify test to handle bos tokens
* removing commented function
* trigger build
* applying review feedback - updated docstrings, var names, and simplified tests
* ruff fixes
* Update tests/test_tokenization_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* applying feedback, comments
* shutil temp directory fix
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
Co-authored-by: itazap <itazap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>
* added interpolation for vitmae model in pytorch as well as tf.
* Update modeling_vit_mae.py
irreugalr import fixed
* small changes and proper formatting
* changes suggested in review.
* modified decoder interpolate_func
* arguments and docstring fix
* Apply suggestions from code review
doc fixes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add test that currently fails
* test passed
* all perceiver passed
* fixup, style, quality, repo-consistency, all passed
* Apply suggestions from code review: default to False + compute sqrt once only
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix a minor bracket
* replace dim with self._num_channels
* add arguments to the rest preprocessors
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True
* Testing for the non-safe-tensors case, since the default is safe-tensors already
* Running fixup/fix-copies
* Adding accelerate annotations to tests
* change cis
* nits
* update
* minor updates
* [push-ci-image]
* nit [push-ci-image]
* nitsssss
* [build-ci-image]
* [push-ci-image]
* [push-ci-image]
* both
* [push-ci-image]
* this?
* [push-ci-image]
* pypi-kenlm needs g++
* [push-ci-image]
* nit
* more nits [push-ci-image]
* nits [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* add vision
* [push-ci-image]
* [push-ci-image]
* add new dummy file but will need to update them [push-ci-image]
* [push-ci-image]
* show package size as well
* [push-ci-image]
* potentially ignore failures
* workflow updates
* nits [push-ci-image]
* [push-ci-image]
* fix consistency
* clean nciida triton
* also show big packages [push-ci-image]
* nit
* update
* another one
* line escape?
* add accelerate [push-ci-image]
* updates [push-ci-image]
* nits to run tests, no push-ci
* try to parse skip reason to make sure nothing is skipped that should no be skippped
* nit?
* always show skipped reasons
* nits
* better parsing of the test outputs
* action="store_true",
* failure on failed
* show matched
* debug
* update short summary with skipped, failed and errors
* nits
* nits
* coolu pdates
* remove docbuilder
* fix
* always run checks
* oups
* nits
* don't error out on library printing
* non zero exi codes
* no warning
* nit
* WAT?
* format nit
* [push-ci-image]
* fail if fail is needed
* [push-ci-image]
* sound file for torch light?
* [push-ci-image]
* order is important [push-ci-image]
* [push-ci-image] reduce even further
* [push-ci-image]
* use pytest rich !
* yes [push-ci-image]
* oupsy
* bring back the full traceback, but pytest rich should help
* nit
* [push-ci-image]
* re run
* nit
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* empty push to trigger
* [push-ci-image]
* nit? [push-ci-image]
* empty
* try to install timm with no deps
* [push-ci-image]
* oups [push-ci-image]
* [push-ci-image]
* [push-ci-image] ?
* [push-ci-image] open ssh client for git checkout fast
* empty for torch light
* updates [push-ci-image]
* nit
* @v4 for checkout
* [push-ci-image]
* [push-ci-image]
* fix fetch tests with parallelism
* [push-ci-image]
* more parallelism
* nit
* more nits
* empty to re-trigger
* empty to re-trigger
* split by timing
* did not work with previous commit
* junit.xml
* no path?
* mmm this?
* junitxml format
* split by timing
* nit
* fix junit family
* now we can test if the xunit1 is compatible!
* this?
* fully list tests
* update
* update
* oups
* finally
* use classname
* remove working directory to make sure the path does not interfere
* okay no juni should have the correct path
* name split?
* sort by classname is what make most sense
* some testing
* naem
* oups
* test something fun
* autodetect
* 18?
* nit
* file size?
* uip
* 4 is best
* update to see versions
* better print
* [push-ci-image]
* [push-ci-image]
* please install the correct keras version
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* uv is fucking me up
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* nits
* [push-ci-image]
* [push-ci-image]
* install issues an pins
* tapas as well
* nits
* more paralellism
* short tb
* soundfile
* soundfile
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* oups
* [push-ci-image]
* fix some things
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* use torch-light for hub
* small git lfs for hub job
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fix tf tapas
* [push-ci-image]
* nits
* [push-ci-image]
* don't update the test
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* no use them
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* update tf proba
* [push-ci-image]
* [push-ci-image]
* woops
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* test with built dockers
* [push-ci-image]
* skip annoying tests
* revert fix copy
* update test values
* update
* last skip and fixup
* nit
* ALL GOOOD
* quality
* Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py
* Update docker/quality.dockerfile
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update src/transformers/models/tapas/modeling_tf_tapas.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* use torch-speed
* updates
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* [push-ci-image]
* fuck ken-lm [push-ci-image]
* [push-ci-image]
* [push-ci-image]
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move scaling to nn.Module
* let the test be here for now (need to fix)
* failing tests
* last failing models
* Revert commit 4c14817f38
* clean-up
* oops forgot
* codestyle
* raise NotImplemented when possible
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* skip tests in respective modeling files
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Enable instantiating model with pretrained backbone weights
* Clarify pretrained import
* Use load_backbone instead
* Add backbone_kwargs to config
* Fix up
* Add tests
* Tidy up
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add backbone_kwargs to config
* Pass kwargs to constructors
* Draft
* Fix tests
* Add back timm - weight naming
* More tidying up
* Whoops
* Tidy up
* Handle when kwargs are none
* Update tests
* Revert test changes
* Deformable detr test - don't use default
* Don't mutate; correct model attributes
* Add some clarifying comments
* nit - grammar is hard
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Adding SDPA support for BERT
* Using the proper input name for testing model input in inference()
* Adding documentation for SDPA in BERT model page
* Use the stable link for the documentation
* Adding a gate to only call .contiguous() for torch < 2.2.0
* Additions and fixes to the documentation
* Minor updates to documentation
* Adding extra requirements needed for the contiguous() bug
* Adding "Adapted from" in plcae of the "Copied from"
* Add benchmark speedup tables to the documentation
* Minor fixes to the documentation
* Use ClapText as a replacemenet for Bert in the Copied-From
* Some more fixes for the fix-copies references
* Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
[test all]
* Undo changes to separate test
* Refactored SDPA self attention code for KV projections
* Change use_sdpa to attn_implementation
* Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
* first modeling code
* make repository
* still WIP
* update model
* add tests
* add latest change
* clean docstrings and copied from
* update docstrings md and readme
* correct chroma function
* correct copied from and remove unreleated test
* add doc to toctree
* correct imports
* add convert script to notdoctested
* Add suggestion from Sanchit
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct get_uncoditional_inputs docstrings
* modify README according to SANCHIT feedback
* add chroma to audio utils
* clean librosa and torchaudio hard dependencies
* fix FE
* refactor audio decoder -> audio encoder for consistency with previous musicgen
* refactor conditional -> encoder
* modify sampling rate logics
* modify license at the beginning
* refactor all_self_attns->all_attentions
* remove ignore copy from causallm generate
* add copied from for from_sub_models
* fix make copies
* add warning if audio is truncated
* add copied from where relevant
* remove artefact
* fix convert script
* fix torchaudio and FE
* modify chroma method according to feedback-> better naming
* refactor input_values->input_features
* refactor input_values->input_features and fix import fe
* add input_features to docstrigs
* correct inputs_embeds logics
* remove dtype conversion
* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
* change warning for chroma length
* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* change way to save wav, using soundfile
* correct docs and change to soundfile
* fix import
* fix init proj layers
* add draft training
* fix cross entropy
* clean loss computation
* fix labels
* remove line breaks from md
* fix issue with docstrings
* add FE suggestions
* improve is in logics and remove useless imports
* remove custom from_pretrained
* simplify docstring code
* add suggestions for modeling tests
* make style
* update converting script with sanity check
* remove encoder attention mask from conditional generation
* replace musicgen melody checkpoints with official orga
* rename ylacombe->facebook in checkpoints
* fix copies
* remove unecessary warning
* add shape in code docstrings
* add files to slow doc tests
* fix md bug and add md to not_tested
* make fix-copies
* fix hidden states test and batching
* update training code
* add training tests for melody
* add training for o.g musicgen
* fix copied from
* remove final todos
* make style
* fix style
* add suggestions from review
* add ref to the original loss computation code
* rename method + fix labels in tests
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* chore(root): Initial commit of Phi-3 files.
* fix(root): Fixes Phi-3 missing on readme.
* fix(root): Ensures files are consistent.
* fix(phi3): Fixes unit tests.
* fix(tests): Fixes style of phi-3 test file.
* chore(tests): Adds integration tests for Phi-3.
* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.
* fix(phi3): Fixes incorrect docstrings.
* fix(phi3): Fixes docstring typos.
* fix(phi3): Adds support for Su and Yarn embeddings.
* fix(phi3): Improves according first batch of reviews.
* fix(phi3): Uses up_states instead of y in Phi3MLP.
* fix(phi3): Uses gemma rotary embedding to support torch.compile.
* fix(phi3): Improves how rotary embedding classes are defined.
* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.
* fix(phi3): Adds last suggestions to modeling file.
* fix(phi3): Splits inv_freq calculation in two lines.
* Fixed main train issues
* Added loss test
* Update src/transformers/models/seggpt/modeling_seggpt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added missing labels arg in SegGptModel forward
* Fixed typo
* Added slow test to test loss calculation
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* push legacy to fast as well
* super strange
* Update src/transformers/convert_slow_tokenizer.py
* make sure we are BC
* fix Llama test
* nit
* revert
* more test
* style
* update
* small update w.r.t tokenizers
* nit
* don't split
* lol
* add a test for `add_prefix_space=False`
* fix gemma tokenizer as well
* update
* fix gemma
* nicer failures
* fixup
* update
* fix the example for legacy = False
* use `huggyllama/llama-7b` for the PR doctest
* nit
* use from_slow
* fix llama
* Duplicate swiftformer
* Convert SwiftFormerPatchEmbedding
* Convert SwiftFormerEmbeddings
* Convert TFSwiftFormerMlp
* Convert TFSwiftFormerConvEncoder
* Convert TFSwiftFormerLocalRepresentation
* convert TFSwiftFormerEncoderBlock
* Convert SwiftFormerStage
* Convert SwiftFormerEncoder
* Add TFSWiftFormerPreTrainedModel
* Convert SwiftFormerForImageClassification
* Add kwargs and start drop path
* Fix syntax
* Change Model class name
* Add TFSwiftFormer to __init__
* Duplicate test_modeling_swiftformer
* First test conversions
* Change require_torch to require_tf
* Add exports to swiftformer __init__
* Add TFSwiftFormerModel wrapper
* Fix __init__ and run black
* Remove docstring from MainLayer, fix padding
* Use keras.layers.Activation on keras.Sequential
* Fix swiftformer exports
* Fix activation layer from config
* Remove post_inits
* Use tf.keras.layers.ZeroPadding2D
* Convert torch normalize
* Change tf test input shape
* Fix softmax and reduce_sum
* Convert expand_dims and repeat
* Add missing reshape and tranpose
* Simplify TFSwiftFormerEncoderBlock.call
* Fix mismatch in patch embeddings
* Fix expected output shape to match channels last
* Fix swiftformer typo
* Disable test_onnx
* Fix TFSwiftFormerForImageClassification call
* Add unpack inputs
* Convert flatten(2).mean(-1)
* Change vision dummy inputs (to be reviewed)
* Change test_forward_signature to use .call
* Fix @unpack_inputs
* Set return_tensors="tf" and rename class
* Rename wrongly named patch_embeddings layer
* Add serving_output and change dummy_input shape
* Make dimensions BCHW and transpose inside embedding layer
* Change SwiftFormerEncoderBlock
* Fix ruff problems
* Add image size to swiftformer config
* Change tranpose to MainLayer and use -1 for reshape
* Remove serving_outputs and dummy_inputs
* Remove test_initialization test from tf model
* Make Sequential component a separate layer
* Fix layers' names
* Tranpose encoder outputs
* Fix tests and check if hidden states is not None
* Fix TFSwiftFormerForImageClassification
* Run make fixup
* Run make fix-copies
* Update modeling_tf_auto
* Update docs
* Fix modeling auto mapping
* Update modelint_tf_swiftformer docs
* Fill image_size doc and type
* Add reduction=None to loss computation
* Update docs
* make style
* Debug: Delete the tip to see if that changes anything
* Re-add tip
* Remove add_code_sample_docstrings
* Remove unused import
* Get the debug to actually tell us the problem it has with the docs
* Try a substitution to match the PyTorch file?
* Add swiftformer to ignore list
* Add build() methods
* Update copyright year
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove FIXME comment
* Remove from_pt
* Update copyright year
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Rename one-letter variables
* Remove FIXMEs related to momentum
* Remove old TODO comment
* Remove outstanding FIXME comments
* Get dropout rate from config
* Add specific dropout config for MLP
* Add convencoder dropout to config
* Pass config to SwiftFormerDropPath layer
* Fix drop_path variable name and add Adapted from comment
* Run ruff
* Removed copied from comment
* Run fix copies
* Change drop_path to identity to match pt
* Cleanup build() methods and move to new keras imports
* Update docs/source/en/model_doc/swiftformer.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Raise error if drop_path_rate > 0.0
* Apply suggestions from code review
Replace (self.dim), with self.dim,
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove drop_path function
* Add training to TFSwiftFormerEncoder
* Set self.built = True last
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Should have been added to previous commit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Change default_feature_extractor to default_image_processor
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Import Keras from modeling_tf_utils
* Remove relative import
* Run ruff --fix
* Move import keras to tf_available
* Add copied from comment to test_forward_signature
* Reduce batch size and num_labels
* Extract loss logic to hf_compute_loss
* Run ruff format
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* wip
* fix __init__.py
* add docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments 1
* work on make fixup
* pass configs down
* add sdpa attention
* remove DbrxBlock
* add to configuration_auto
* docstring now passes formatting test
* fix style
* update READMEs
* add dbrx to modeling_auto
* make fix-copies generated this
* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* config docstring passes formatting test
* rename moe_loss_weight to router_aux_loss_coef
* add to flash-attn documentation
* fix model-path in tests
* Explicitly make `"suli"` the default `ffn_act_fn`
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
* fix _flash_attn_uses_top_left_mask and is_causal
* fix tests path
* don't use token type IDs
* follow Llama and remove token_type_ids from test
* init ConfigTester differently so tests pass
* remove multiple choice test
* remove question + answer test
* remove sequence classification test
* remove token classification test
* copy Llama tests and remove token_type_ids from test inputs
* do not test pruning or headmasking; style code
* add _tied_weights_keys parameter to pass test
* add type hints
* fix type check
* update config tester
* remove masked_lm test
* remove encoder tests
* initialize DbrxModelTester with correct params
* style
* torch_dtype does not rely on torch
* run make fixup, fix-copies
* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
* add copyright info
* fix imports and DbrxRotaryEmbedding
* update DbrxModel docstring
* use copies
* change model path in docstring
* use config in DbrxFFN
* fix flashattention2, sdpaattention
* input config to DbrXAttention, DbrxNormAttentionNorm
* more fixes
* fix
* fix again!
* add informative comment
* fix ruff?
* remove print statement + style
* change doc-test
* fix doc-test
* fix docstring
* delete commented out text
* make defaults match dbrx-instruct
* replace `router_aux_loss_coef` with `moe_loss_weight`
* is_decoder=True
* remove is_decoder from configtester
* implement sdpa properly
* make is_decoder pass tests
* start on the GenerationTesterMixin tests
* add dbrx to sdpa documentation
* skip weight typing test
* style
* initialize smaller model
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add DBRX to toctree
* skip test_new_cache_format
* make config defaults smaller again
* add pad_token_id
* remove pad_token_id from config
* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* Update src/transformers/models/dbrx/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix typo
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs, fix configuration_auto.py
* address pr comments
* remove is_decoder flag
* slice
* fix requires grad
* remove grad
* disconnect differently
* remove grad
* enable grads
* patch
* detach expert
* nissan al ghaib
* Update modeling_dbrx.py
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* replace "Gemma" with "Dbrx"
* remove # type: ignore
* don't hardcode vocab_size
* remove ToDo
* Re-add removed idefics2 line
* Update test to use tiny-random!
* Remove TODO
* Remove one more case of loading the entire dbrx-instruct in the tests
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address some comments
* small model
* add dbrx to tokenization_auto
* More docstrings with add_start_docstrings
* Dbrx for now
* add PipelineTesterMixin
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove flash-attn2 import error
* fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add useage example
* put on one line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix ffn_act_fn
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change "dbrx" to "DBRX" for display purposes.
* fix __init__.py?
* fix __init__.py
* fix README
* return the aux_loss
* remove extra spaces
* fix configuration_auto.py
* fix format in tokenization_auto
* remove new line
* add more useage examples
---------
Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add jamba arch
* apply "make fix-copies" changes
* fix link to model in JambaConfig docstring
* Add n_ctx in modeling file because repo-consistency wants that
* Add jamba to flash attention and sdpa documentation
* mamba dt_proj quant fix now works for LoRA as well
* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
* add jamba to tokenization auto
* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
* simple PR fixes
* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
* Add copied comment on JambaMLP (it's the same as MixtralMLP)
* remove padding_mask warnings. It's not supported anymore
* fix docstring. Float instead of int
* A few more minor PR fixes
* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
* Return None attention weights from mamba layers. Append to all attentions only if not None.
* remove some leftover jamba archive lists
* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
* Add Jamba paper on READMEs
* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
* Add copied from comment
* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
* clearer docstring for _convert_to_standard_cache
* style fixes
* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
* rename test so it still overrides what its meant to override
* draft
* oups
* nit
* remove more complexe logic
* fix names used in config
* fix fix fix
* style
* fix some more failing tests
* generate did not init the cache 🙃
* more small nits
* typo
* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
* fix init of pkv with torch.tensor()
* empty tensor
* fix some init issues
* stupid changes required by generate because it does not even support it's own DynamicCache class
* more fixes
* fix general assisted gen cache_position bug
* tests passing
* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
* fix docstrings and typehints for past_key_values
* style fixes
* fix docs
* change typehint due to copy from Mixtral
* forgot import
* import order
* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
* Add integration test with tiny tandom Jamba model on hub
* fix flash attention cache shapes
* bring back forgotten hidden states
* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
* align integration test after modeling fixes
* bugfix - mamba can use precomputed states only of forward pass is on a single token
* bugfix - mamba can use precomputed states only if they match the batch size
* typo
* remove making _prepare_4d_causal_attention_mask a leaf function
* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Add OLMo using add-new-model-like with Llama
* Fix incorrect tokenizer for OLMo
* Copy-paste relevant OLMo methods and their imports
* Add OLMo config
* Modify OLMo config to follow HF conventions
* Remove unneeded Llama code from OLMo model
* Add ability for OLMo model to output attentions
* Add OLMoPreTrainedModel and OLMoModel
* Add OLMoForCausalLM
* Minor fixes to OLMo model for style and missing functions
* Implement OLMo tokenizer
* Implement OLMo to HF conversion script
* Add tests for OLMo model
* Add tests for OLMo fast tokenizer
* Add auto-generated dummy objects
* Remove unimplemented OLMo classes from auto and init classes and re-format
* Add README and associated auto-generated files
* Use OLMo names for common properties
* Run make fixup
* Remove `|` from OLMo typing
* Remove unneeded tokenization_olmo.py
* Revert model, config and converter to add-new-model-like Llama
* Move logic for adding bos/eos token into GPTNeoxTokenizerFast
* Change OLMoConfig defaults to match OLMo-7B
* Use GPTNeoXToknizerFast in OLMo tokenizer tests
* Modify auto-generated OLMoModelTests to work for OLMo
* Add non-parametric layer norm OLMoLayerNorm
* Update weight conversion script for OLMo
* Fix __init__ and auto structure for OLMo
* Fix errors from make fixup
* Remove OLMoTokenizerFast from documentation
* Add missing 'Copied from' for OLMoModel._update_causal_mask
* Run make fix-copies
* Rearrange string replacements in OLMoForCausalLM Copied from
* Move OLMo and Llama CausalLM.forward example into global constants
* Fix OLMO_GENERATION_EXAMPLE doc string typo
* Add option for qkv clipping to OLMo
* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
* Fix OLMo tokenization bug using conversion script
* Keep model in full precision after conversion
* Do not add eos token automatically
* Update references to OLMo model in HF Hub
* Do not add eos token during encoding by default
* Fix Llama generation example
* Run make fixup
* OLMo 7B integration test fix
* Remove unneeded special case for OLMoConfig
* OLMo 7B Twin 2T integration test fix
* Fix test_model_7b_greedy_generation
* Remove test_compile_static_cache
* Fix OLMo and Llama generation example
* Run make fixup
* Revert "OLMo 7B integration test fix"
This reverts commit 4df56a4b15.
* Revert "OLMo 7B Twin 2T integration test fix"
This reverts commit 9ff65a4a29.
* Ungate 7B integration tests and fix greedy generation test
* Add retries for flaky test_eager_matches_sdpa_generate
* Fix output of doc example for OLMoForCausalLM.forward
* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
* Try fix incorrect characters in OLMoForCausalLM.forward doct test
* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
* Remove pretraining_tp from OLMo config and model
* Add missing 'Copied from' instances
* Remove unneeded causal_mask from OLMoModel
* Revert Llama changes
* Ignore copy for OLMoForCausalLM.forward
* Change 'OLMo' to 'Olmo' in classes
* Move minimal OLMo tokenization tests to model tests
* Add missed 'Copied from' for repeat_kv
* Add create token type ids to CodeGenTokenizer
* Fix inconsistent length of token type ids
* Format source codes
* Fix inconsistent order of methods
* Update docstring
* add test_tokenizer_integration test
* Format source codes
* Add `copied from` comment to CodeGenTokenizerFast
* Add doc of create_token_type_ids_from_sequences
* Make return_token_type_ids False by default
* Make test_tokenizer_integration as slow test
* Add return_token_type_ids to tokenizer init arg
* Add test for tokenizer's init return_token_type_ids
* Format source codes
* Remove auto class
* Update ImagePointDescriptionOutput
* Update model outputs
* Rename output class
* Revert "Remove auto class"
This reverts commit ed4a8f549d.
* Address comments
* Fork.
* RecurrentGemma initial commit.
* Updating __init__.py.
* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.
* Reformat code to 4 spaces.
Fixed a few typos.
* Fixed the forward pass.
Still unclear on the cache?
* Fixed the RecurrentGemmaForCausalLM
* Minor comment that we might not need attention_mask and output_attention arguments.
* Now cache should work as well.
* Adding a temporary example to check whether the model generation works.
* Adding the tests and updating imports.
* Adding the example file missing in the previous commit.
* First working example.
* Removing .gitignore and reverting parts of __init__.
* Re-add .gitignore.
* Addressing comments for configuration.
* Move mask creation to `_prepare_inputs_for_generation`.
* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed
* Transfoering between machines.
* Running normal tests.
* Minor fix.
* More fixes.
* Addressing more comments.
* Minor fixes.
* first stab at cleanup
* more refactoring
* fix copies and else
* renaming and get init to work
* fix causal mask creation
* update
* nit
* fix a hell lot of things
* updates
* update conversion script
* make all keys importable
* nits
* add auto mappings
* properly convert ffw_up and down
* add scaling
* fix generations
* for recurrent dtype
* update
* fix going beyong window
* fixup
* add missing files
* current updates to remove last einops
* finish modeling refactor
* TADA
* fix compile
* fix most failing testt ? ?
* update tests
* refactor and update
* update
* nits, fixup and update tests
* more fixup
* nits
* fix imports
* test format
* fixups
* nits
* tuple typing
* fix code quality
* add model card
* fix doc
* skip most generation tests
* nits
* style
* doc fixes
* fix pr and check_copies?
* last nit
* oupsy
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update
* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update based on review
* doc nit
* fix quality
* quality
* fix slow test model path
* update default dype
* ignore attributes that can be safely ignored in check config attributes
* 0lallalala come on
* save nit
* style
* remove to dict update
* make sure we can also run in float16
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* init: add StableLm 2 support
* add integration test for parallel residual and qk layernorm
* update(modeling): match qk norm naming for consistency with phi/persimmon
* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
* refactor: rename head states var in `StableLmLayerNormPerHead`
* tests: update test model and add generate check
* add _torch_extract_fbank_features_batch function in feature_extractor_whisper
* reformat feature_extraction_whisper.py file
* handle batching in single function
* add gpu test & doc
* add batch test & device in each __call__
* add device arg in doc string
---------
Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* Defaulted IdeficsProcessor padding to 'longest', removed manual padding
* make fixup
* Defaulted processor call to padding=False
* Add padding to processor call in IdeficsModelIntegrationTest as well
* redefaulted padding=longest again
* fixup/doc
* Fix generate_with_fallback **kwargs
* Change pop to get
* Delete keys from kwargs to prevent overriding generation_config
* Revert to passing kwargs by reference, but make a (shallow) copy
* dict -> copy.copy
* Add test_whisper_longform_multi_batch_beam
* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode
* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode
* Exclude pad_token filtering since it is used as CTC-blank token
* Add small test for skip_special_tokens
* Update decoding test for added new token
* add FA2 to o.g Musicgen
* make style
* add FA2 support to Musicgen Melody
* add generation FA2 tests to o.g Musicgen
* make style and fix copies
* add Musicgen to FA2 docs + deprecate list
* add sdpa supports to Musicgen's
* make style and fix copies
* refactor attention implementation arguments
* add Copied from to sdpa tests
* add copied form in sdpa tests melody
* add copied for FA2 generation tests
* add FA2 inference copied from
* make style
* Fix sinusoidal_embeddings in FlaubertModel
* Fix for Informer
* Fix for XLM
* Move sinusoidal emb for XLM
* Move sinusoidal emb for Flaubert
* Small cleanup
* Add comments on tests code copied from
* Add with Distilbert->
* fix bug and add tests
* nit
* otherway to get the cur len instead of attention mask
* more places where this might have been broken
* nit
* oups
* inputs_embeds vs input_embeds
* test generated outptus
* style
* nit
* fix
* skip failing biogpt
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* attempt to fix
* the actual fix that works with compilation!
* this?
* temporary update
* nit?
* dispatcg to memory efficient?
* update both models that have static cache support
* fix copies fix compile
* make sure fix
* fix cohere and gemma
* fix beams?
* nit
* slipped through the cracks
* nit
* nits
* update
* fix-copies
* skip failing tests
* nits
* Added SuperPoint docs
* Added tests
* Removed commented part
* Commit to create and fix add_superpoint branch with a new branch
* Fixed dummy_pt_objects
* Committed missing files
* Fixed README.md
* Apply suggestions from code review
Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py
* Removed AutoModelForKeypointDetection and related stuff
* Fixed inconsistencies in image_processing_superpoint.py
* Moved infer_on_model logic simply in test_inference
* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py
* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale
* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixed from (w, h) to (h, w) as input for tests
* Removed unnecessary condition
* Moved last_hidden_state to be the first returned
* Moved last_hidden_state to be the first returned (bis)
* Moved last_hidden_state to be the first returned (ter)
* Switched image_width and image_height in tests to match recent changes
* Added config as first SuperPointConvBlock init argument
* Reordered README's after merge
* Added missing first config argument to SuperPointConvBlock instantiations
* Removed formatting error
* Added SuperPoint to README's de, pt-br, ru, te and vi
* Checked out README_fr.md
* Fixed README_fr.md
* Test fix README_fr.md
* Test fix README_fr.md
* Last make fix-copies !
* Updated checkpoint path
* Removed unused SuperPoint doc
* Added missing image
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed unnecessary import
* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added SuperPoint to _toctree.yml
---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
* use user_defined_symbols
* fixup
* nit
* add a very robust test
* make sure all models are tested with the `pretrained_tokenizer_to_test`
* should we make sure we test all of them?
* merge
* remove the id
* fix test
* update
* ousies
* oups
* fixup
* fix copies check
* remove `pretrained_tokenizer_to_test`
* Cohere Model Release (#1)
Cohere Model Release
* Remove unnecessary files and code (#2)
Some cleanup
* Delete cohere-model directory (#3)
* Make Fix (#5)
* Pr fixes (#6)
* fixes for pr
* pr fixes for the format
* pr fixes for the format
* src/transformers/models/auto/tokenization_auto.py
* Tokenizer test (#8)
* tokenizer test
* format fix
* Adding Docs and other minor changes (#7)
* Add modeling tests (#9)
* Smol Fix (#11)
* tokenization tests are fixed
* format fixes
* fix pr doc tests
* fix pr doc tests
* fix pr doc tests
* fix pr style check
* small changes in cohere.md
* FIX: Address final comments for transformers integration (#13)
* fix modeling final nits and add proper test file
* for now leave empty tests
* add integration test
* push new test
* fix modeling cohere (#14)
* Update chat templates to use the new API (#15)
---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit c4a04416dd.
* Updated conversion script to reflect module name change
* Deprecated reshape_last_stage option in config
* Removed unused imports
* Code formatting
* Fixed outdated decorators on test_inference_fp16
* Added "Copied from" comments in test_modeling_pvt_v2.py
* Fixed import listing
* Updated model name
* Force empty commit for PR refresh
* Fixed linting issue
* Removed # Copied from comments
* Added PVTv2 to README_fr.md
* Ran make fix-copies
* Replace all FoamoftheSea hub references with OpenGVLab
* Fixed out_indices and out_features logic in configuration_pvt_v2.py
* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
* Ran code fixup
* Fixed order of parent classes in PvtV2Config to fix the to_dict method override
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial implementation of flash attention for gptj
* modify flash attention and overwrite test_flash_attn_2_generate_padding_right
* update flash attention support list
* remove the copy line in the `CodeGenBlock`
* address copy mechanism
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add GPTJ attention classes
* add expected outputs in the gptj test
* Ensure repo consistency with 'make fix-copies'
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add tests for batching support
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* fixes and comments
* use cosine distance for conv models
* skip mra model testing
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* finzalize and make style
* check model type by input names
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixed batch size for all testers
* Revert "fixed batch size for all testers"
This reverts commit 525f3a0a05.
* add batch_size for all testers
* dict from model output
* do not skip layoutlm
* bring back some code from git revert
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* clean-up
* where did minus go in tolerance
* make whisper happy
* deal with consequences of losing minus
* deal with consequences of losing minus
* maskformer needs its own test for happiness
* fix more models
* tag flaky CV models from Amy's approval
* make codestyle
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* left-padding test revisited
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* First draft
* More improvements
* More improvements
* More fixes
* Fix copies
* More improvements
* More fixes
* More improvements
* Convert checkpoint
* More improvements, set up tests
* Fix more tests
* Add UdopModel
* More improvements
* Fix equivalence test
* More fixes
* Redesign model
* Extend conversion script
* Use real inputs for conversion script
* Add image processor
* Improve conversion script
* Add UdopTokenizer
* Add fast tokenizer
* Add converter
* Update README's
* Add processor
* Add fully fledged tokenizer
* Add fast tokenizer
* Use processor in conversion script
* Add tokenizer tests
* Fix one more test
* Fix more tests
* Fix tokenizer tests
* Enable fast tokenizer tests
* Fix more tests
* Fix additional_special_tokens of fast tokenizer
* Fix tokenizer tests
* Fix more tests
* Fix equivalence test
* Rename image to pixel_values
* Rename seg_data to bbox
* More renamings
* Remove vis_special_token
* More improvements
* Add docs
* Fix copied from
* Update slow tokenizer
* Update fast tokenizer design
* Make text input optional
* Add first draft of processor tests
* Fix more processor tests
* Fix decoder_start_token_id
* Fix test_initialization
* Add integration test
* More improvements
* Improve processor, add test
* Add more copied from
* Add more copied from
* Add more copied from
* Add more copied from
* Remove print statement
* Update README and auto mapping
* Delete files
* Delete another file
* Remove code
* Fix test
* Fix docs
* Remove asserts
* Add doc tests
* Include UDOP in exotic model tests
* Add expected tesseract decodings
* Add sentencepiece
* Use same design as T5
* Add UdopEncoderModel
* Add UdopEncoderModel to tests
* More fixes
* Fix fast tokenizer
* Fix one more test
* Remove parallelisable attribute
* Fix copies
* Remove legacy file
* Copy from T5Tokenizer
* Fix rebase
* More fixes, copy from T5
* More fixes
* Fix init
* Use ArthurZ/udop for tests
* Make all model tests pass
* Remove UdopForConditionalGeneration from auto mapping
* Fix more tests
* fixups
* more fixups
* fix the tokenizers
* remove un-necessary changes
* nits
* nits
* replace truncate_sequences_boxes with truncate_sequences for fix-copies
* nit current path
* add a test for input ids
* ids that we should get taken from c9f7a32f57
* nits converting
* nits
* apply ruff
* nits
* nits
* style
* fix slow order of addition
* fix udop fast range as well
* fixup
* nits
* Add docstrings
* Fix gradient checkpointing
* Update code examples
* Skip tests
* Update integration test
* Address comment
* Make fixup
* Remove extra ids from tokenizer
* Skip test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update year
* Address comment
* Address more comments
* Address comments
* Add copied from
* Update CI
* Rename script
* Update model id
* Add AddedToken, skip tests
* Update CI
* Fix doc tests
* Do not use Tesseract for the doc tests
* Remove kwargs
* Add original inputs
* Update casting
* Fix doc test
* Update question
* Update question
* Use LayoutLMv3ImageProcessor
* Update organization
* Improve docs
* Update forward signature
* Make images optional
* Remove deprecated device argument
* Add comment, add add_prefix_space
* More improvements
* Remove kwargs
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* 🐛 Fix oneformer instance post processing when using panoptic task type
* ✅ Add unit test for oneformer instance post processing panoptic bug
---------
Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>
* remove control flow
* update gptneox
* update ....
* nits
* Actually let's just break. Otherwise we are silently failing which imo is not optimal
* version BC
* fix tests
* fix eager causal
* nit
* add a test
* style
* nits
* nits
* more nits for the test
* update and fix
* make sure cuda graphs are not skipped
* read token is needed for meta llama
* update!
* fiixup
* compile test should be slow
* fix thet fix copies
* stle 🫠
* stash commit
* stash commit
* It works!
* Remove unnecessary change
* We don't actually need the cache_dir!
* Update docstring
* Add test
* Add test with custom cache dir too
* Update model repo path
* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)"
This reverts commit 725f4ad1cc.
* Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)"
This reverts commit 4156f517ce.
* add add_dummy_prefix_space option to slow
* checking kwargs might be better. Should be there for all spm tokenizer IMO
* nits
* fix copies
* more copied
* nits
* add prefix space
* nit
* nits
* Update src/transformers/convert_slow_tokenizer.py
* fix inti
* revert wrong styling
* fix
* nits
* style
* updates
* make sure we use slow tokenizer for conversion instead of looking for the decoder
* support llama ast well
* update llama tokenizer fast
* nits
* nits nits nits
* update the doc
* update
* update to fix tests
* skip unrelated tailing test
* Update src/transformers/convert_slow_tokenizer.py
* add proper testing
* test decode as well
* more testing
* format
* fix llama test
* Apply suggestions from code review
* pass through trust_remote_code for dynamically loading unregistered tokenizers specified by config
add test
* change directories back to previous directory after test
* fix ruff check
* Add a note to that block for future in case we want to remove it later
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
* Add tie_weights() to LM heads and set bias in set_output_embeddings()
The bias were not tied correctly in some LM heads, and this change should fix that.
* Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin
* Adding _tie_weights() to MPNet and Vilt
* Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device
* Rename to test name to save_load to match the convention
* Update the processing so bbox coords are adjusted for padding
* Just pad masks
* Tidy up, add tests
* Better tests
* Fix yolos and mark as slow for pycocotols
* Fix yolos - return_tensors
* Clarify padding and normalization behaviour
* add sudachi_projection option
* Upgrade sudachipy>=0.6.8
* add a test case for sudachi_projection
* Compatible with older versions of SudachiPy
* make fixup
* make style
* error message for unidic download
* revert jumanpp test cases
* format options for sudachi_projection
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* format options for sudachi_split_mode and sudachi_dict_type
* comment
* add tests for full_tokenizer kwargs
* pass projection arg directly
* require_sudachi_projection
* make style
* revert upgrade sudachipy
* check is_sudachi_projection_available()
* revert dependency_version_table and bugfix
* style format
* simply raise ImportError
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* simply raise ImportError
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* refactor with addedtokens decoder
* style
* get rid of lang code to id
* style
* keep some things for BC
* update tests
* add the mask token at the end of the vocab
* nits
* nits
* fix final tests
* style
* nits
* Update src/transformers/models/nllb/tokenization_nllb_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
* style?
* Update src/transformers/convert_slow_tokenizer.py
* make it a tad bit more custom
* ruff please stop
Co-Authored by avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale <dale.david@mail.ru>
* oupts
* ouft
* nites
* test
* fix the remaining failing tests
* style
* fix failing test
* ficx other test
* temp dir + test the raw init
* update test
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
* up
* Fix more
* Correct more
* Fix more tests
* fix fast tests
* Fix more
* fix more
* push all files
* finish all
* make style
* Fix timestamp wrap
* make style
* make style
* up
* up
* up
* Fix lang detection behavior
* Fix lang detection behavior
* Add lang detection test
* Fix lang detection behavior
* make style
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* better error message
* make style tests
* add warning
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* Enable instantiating model with pretrained backbone weights
* Remove doc updates until changes made in modeling code
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add missing imports and arguments
* Update docstrings
* Make sure test is properly configured
* Include recent DPT updates
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
* format code using black and ruff
* skip computing mask if attention_mask=None
* add tests for load balancing loss Mixtral-Moe
* fix assert loss is different in mixtral_test
* fix pad_leng
* use assertNotAlmostEqual and print to debug
* remove print for debug
* minor updates
* reduce rtol and atol
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Remove doc updates until changes made in modeling code
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Small test updates
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [DETA] fix freeze/unfreeze function
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add freeze/unfreeze test case in DETA
* fix type
* fix typo 2
* fix : enable aux and enc loss in training pipeline
* Add unsynced variables from original DETA for training
* modification for passing CI test
* make style
* make fix
* manual make fix
* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking
* remove print
* divide configuration in DetaModel and DetaForObjectDetection
* image smaller size than 224 will give topk error
* pred_boxes and logits should be equivalent to two_stage_num_proposals
* add missing part in DetaConfig
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add docstring in configure and prettify TO DO part
* change distribute related code to accelerate
* Update src/transformers/models/deta/configuration_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/deta/test_modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* protect importing accelerate
* change variable name to specific value
* wrong import
* fix aux_loss in conditional_detr
* add test aux_loss
* add aux_loss test in deta and table_transformer
* fix yolos since it doesn't have auxiliary function
* fix maskformer auxiliary_loss related code
* make style
* change param 'auxiliary_loss' to 'use_auxiliary_loss'
* change param 'auxiliary_loss' to 'use_auxiliary_loss' in tests
* make style & fix-copies, also revert yolos related parameter
* revert variable name 'use_auxiliary_loss' to 'auxiliary_loss' due to DetrConfig
* revert variable name in yolos
* revert maskformer
* add aux_loss test in maskformer
* make style
* Update src/transformers/models/yolos/configuration_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Allow non-special tokens to be added
* Add test, fix token adding code
* Revert changes to id_to_token and token_to_id
* Update the ESM tokenizer to be a bit more standardized
* Update src/transformers/models/esm/tokenization_esm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* finalize
* make fix copies whisper
* [Tests] Make sure that we don't run tests mulitple times
* Update src/transformers/models/whisper/modeling_whisper.py
* [Tests] Make sure that we don't run tests mulitple times
* fix more
* improve
* improve
* improve further
* improve more
* improve
* fix more
* git commit and git push
* fix more
* fix more
* fix more
* New try
* Fix more whisper stuff
* Improve
* correct more
* correct more
* correct more
* Fix some tests
* Add more tests
* correct more
* correct more
* correct more
* push
* correct more
* Fix more
* Better
* without dec mask
* correct more
* clean
* save intermediate
* Fix more
* Fix VAD for large-v2
* Save new
* Correct more
* make cleaner
* correct tests
* correct src
* Finish
* Fix more
* Fix more
* finish
* Fix edge cases
* fix return_dict_in_generate
* fix all tests
* make style
* add docstrings
* add docstrings
* Fix logit processor
* make style
* fix pipeline test
* fix more style
* Apply suggestions from code review
* apply feedback Sanchit
* correct more
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct more
* correct more
* correct more
* Fix staticmethod
* correct more
* fix
* fix slow tests
* make style
* fix tokenizer test
* fix tokenizer test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* finish
* finish
* revert kwargs change
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* first commit
* correct default value non causal
* update config and modeling code
* update converting checkpoint
* clean modeling and fix tests
* make style
* add new config parameters to docstring
* fix copied from statements
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* make position_embeddings_type docstrings clearer
* clean converting script
* remove function not used
* clean modeling file
* apply suggestion for test file + add convert script to not_doctested
* modify tests according to review - cleaner logic and more tests
* Apply nit suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add checker of valid position embeddings type
* instantiate new layer norm layer with the right eps
* fix freeze_feature_encoder since it can be None in some cases
* add test same output in convert script
* restore wav2vec2conformer and add new model
* create processor and FE + clean
* add new model code
* fix convert script and set default config parameters
* correct model id paths
* make style
* make fix-copies and cleaning files
* fix copied from statements
* complete .md and fixe copies
* clean convert script argument defaults
* fix config parameters docstrings
* fix config docstring
* add copied from and enrich FE tests
* fix copied from and repo-consistency
* add autotokenizer
* make test input length shorter and change docstring code
* fix docstrings and copied from
* add add_adapter to ASR training example
* make testing of adapters more robust
* adapt to multi adapter layers
* refactor input_values->input_features and remove w2v2-bert feature extractor
* remove pretraining model
* remove depreciated features and useless lines
* add copied from and ignore statements to modeling tests
* remove pretraining model #2
* change import in convert script
* change default in convert script
* update readme and remove useless line
* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor BERT to Bert for consistency
* remove useless ignore copy statement
* add persistent to buffer in rotary
* add eps in LayerNorm init and remove copied from
* add adapter activation parameters and add copied from statements
* Fix copied statements and add unitest.skip reasons
* add copied statement in test_processor
* refactor processor
* make style
* replace numpy random by torch rand
* remove expected output CTC
* improve converting script with processor class
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove gumbel class
* remove tests related to previously deleted class
* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct typos
* remove uused parameters
* update processor to takes both text and audio
* update checkpoints
* update expected output and add ctc expected output
* add label_attention_mask
* replace pt with np in processor tests
* fix typo
* revert to behaviour with labels_attention_mask
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
* last attempt
* current work
* fix forward compatibility
* save all special tokens
* current state
* revert additional changes
* updates
* remove tokenizer.model
* add a test and the fix
* nit
* revert one more break
* fix typefield issue
* quality
* more tests
* fix fields for FC
* more nits?
* new additional changes
* how
* some updates
* the fix
* where do we stand
* nits
* nits
* revert unrelated changes
* nits nits nits
* styling
* don't break llama just yet
* revert llama changes
* safe arg check
* fixup
* Add a test for T5
* Necessary changes
* Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning
* Add even more tests, when normalization is set to True (which does not work 😓 )
* Add even more tests, when normalization is set to True (which does not work 😓 )
* Update to main
* nits
* fmt
* more and more test
* comments
* revert change as tests are failing
* make the test more readble
* nits
* refactor the test
* nit
* updates
* simplify
* style
* style
* style convert slow
* Update src/transformers/convert_slow_tokenizer.py
* skip bf16 test if not supported by device
* fix
* fix bis
* use is_torch_bf16_available_on_device
* use is_torch_fp16_available_on_device
* fix & use public llama
* use 1b model
* fix flacky test
---------
Co-authored-by: Your Name <you@example.com>
* Fix bug in SpeechT5 speech decoder prenet's forward method
- Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
- Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
- This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
* Refactor SpeechT5 text to speech integration tests
- Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
- Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
- Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
- Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
* Fix bug in SpeechT5 speech decoder prenet's forward method
- Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
- Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
- This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
* Refactor SpeechT5 text to speech integration tests
- Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
- Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
- Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
- Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
* Enhance handling of speaker embeddings in SpeechT5
- Refined the generate and generate_speech functions in the SpeechT5 class to robustly handle two scenarios for speaker embeddings: matching the batch size (one embedding per sample) and one-to-many (a single embedding for all samples in the batch).
- The update includes logic to repeat the speaker embedding when a single embedding is provided for multiple samples, and a ValueError is raised for any mismatched dimensions.
- Also added corresponding test cases to validate both scenarios, ensuring complete coverage and functionality for diverse speaker embedding situations.
* Improve Test Robustness with Randomized Speaker Embeddings
* Correct the implementation of auxiliary loss of mixtrtal
* correct the implementation of auxiliary loss of mixtrtal
* Implement a simpler calculation method
---------
Co-authored-by: zhangliangxu3 <zhangliangxu3@jd.com>
* chore(phi): Updates configuration_phi with missing keys.
* chore(phi): Adds first draft of combined modeling_phi.
* fix(phi): Fixes according to latest review.
* fix(phi): Removes pad_vocab_size_multiple to prevent inconsistencies.
* fix(phi): Fixes unit and integration tests.
* fix(phi): Ensures that everything works with microsoft/phi-1 for first integration.
* fix(phi): Fixes output of docstring generation.
* fix(phi): Fixes according to latest review.
* fix(phi): Fixes according to latest review.
* fix(tests): Re-enables Phi-1.5 test.
* fix(phi): Fixes attention overflow on PhiAttention (for Phi-2).
* fix(phi): Improves how queries and keys are upcast.
* fix(phi): Small updates on latest changes.
* optionally preprocess segmentation maps for mobilevit
* changed pretrained model name to that of segmentation model
* removed voc-deeplabv3 from model archive list
* added preprocess_image and preprocess_mask methods for processing images and segmentation masks respectively
* added tests for segmentation masks based on segformer feature extractor
* use crop_size instead of size
* reverting to initial model
* Add first draft
* Use appropriate gelu function
* More improvements
* More improvements
* More improvements
* Convert checkpoint
* More improvements
* Improve docs, remove print statements
* More improvements
* Add link
* remove unused masking function
* begin tokenizer
* do_lower_case
* debug
* set split_special_tokens=True
* Remove script
* Fix style
* Fix rebase
* Use same design as CLIP
* Add fast tokenizer
* Add SiglipTokenizer to init, remove extra_ids
* Improve conversion script
* Use smaller inputs in conversion script
* Update conversion script
* More improvements
* Add processor to conversion script
* Add tests
* Remove print statements
* Add tokenizer tests
* Fix more tests
* More improvements related to weight initialization
* More improvements
* Make more tests pass
* More improvements
* More improvements
* Add copied from
* Add canonicalize_text
* Enable fast tokenizer tests
* More improvements
* Fix most slow tokenizer tests
* Address comments
* Fix style
* Remove script
* Address some comments
* Add copied from to tests
* Add more copied from
* Add more copied from
* Add more copied from
* Remove is_flax_available
* More updates
* Address comment
* Remove SiglipTokenizerFast for now
* Add caching
* Remove umt5 test
* Add canonicalize_text inside _tokenize, thanks Arthur
* Fix image processor tests
* Skip tests which are not applicable
* Skip test_initialization
* More improvements
* Compare pixel values
* Fix doc tests, add integration test
* Add do_normalize
* Remove causal mask and leverage ignore copy
* Fix attention_mask
* Fix remaining tests
* Fix dummies
* Rename temperature and bias
* Address comments
* Add copied from to tokenizer tests
* Add SiglipVisionModel to auto mapping
* Add copied from to image processor tests
* Improve doc
* Remove SiglipVisionModel from index
* Address comments
* Improve docs
* Simplify config
* Add first draft
* Make it like mistral
* More improvements
* Fix attention_mask
* Fix output_attentions
* Add note in docs
* Convert multilingual model
* Convert large checkpoint
* Convert more checkpoints
* Add pipeline support, correct image_mean and image_std
* Use padding=max_length by default
* Make processor like llava
* Add code snippet
* Convert more checkpoints
* Set keep_punctuation_string=None as in OpenCLIP
* Set normalized=False for special tokens
* Fix doc test
* Update integration test
* Add figure
* Update organization
* Happy new year
* Use AutoModel everywhere
---------
Co-authored-by: patil-suraj <surajp815@gmail.com>
* [DETA] fix freeze/unfreeze function
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add freeze/unfreeze test case in DETA
* fix type
* fix typo 2
* fix : enable aux and enc loss in training pipeline
* Add unsynced variables from original DETA for training
* modification for passing CI test
* make style
* make fix
* manual make fix
* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking
* remove print
* divide configuration in DetaModel and DetaForObjectDetection
* image smaller size than 224 will give topk error
* pred_boxes and logits should be equivalent to two_stage_num_proposals
* add missing part in DetaConfig
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add docstring in configure and prettify TO DO part
* change distribute related code to accelerate
* Update src/transformers/models/deta/configuration_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/deta/test_modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* protect importing accelerate
* change variable name to specific value
* wrong import
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove token_type_ids from model_input_names (like #24788)
* removed test that assumed token_type_ids should be present and updated a model reference so that it points to an available model)
* start - docs, SpeechT5 copy and rename
* add relevant code from FastSpeech2 draft, have tests pass
* make it an actual conformer, demo ex.
* matching inference with original repo, includes debug code
* refactor nn.Sequentials, start more desc. var names
* more renaming
* more renaming
* vocoder scratchwork
* matching vocoder outputs
* hifigan vocoder conversion script
* convert model script, rename some config vars
* replace postnet with speecht5's implementation
* passing common tests, file cleanup
* expand testing, add output hidden states and attention
* tokenizer + passing tokenizer tests
* variety of updates and tests
* g2p_en pckg setup
* import structure edits
* docstrings and cleanup
* repo consistency
* deps
* small cleanup
* forward signature param order
* address comments except for masks and labels
* address comments on attention_mask and labels
* address second round of comments
* remove old unneeded line
* address comments part 1
* address comments pt 2
* rename auto mapping
* fixes for failing tests
* address comments part 3 (bart-like, train loss)
* make style
* pass config where possible
* add forward method + tests to WithHifiGan model
* make style
* address arg passing and generate_speech comments
* address Arthur comments
* address Arthur comments pt2
* lint changes
* Sanchit comment
* add g2p-en to doctest deps
* move up self.encoder
* onnx compatible tensor method
* fix is symbolic
* fix paper url
* move models to espnet org
* make style
* make fix-copies
* update docstring
* Arthur comments
* update docstring w/ new updates
* add model architecture images
* header size
* md wording update
* make style
* First draft
* More improvements
* More improvements
* Make all tests pass
* Remove script
* Update image processor
* Address comments
* Use new gradient checkpointing method
* Convert checkpoints, add integration test
* Do not keep aspect ratio for now
* Set keep_aspect_ratio=False for beit, add integration test
* Remove print statement
* fixes: code fixes on is_batched condition to also check for batched audio data in torch.Tensor format instead of only just checking for batched audio data in np.ndarray format
* Update src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* refactor: code refactoring to remove torch framework dependency
* docs: updated docstring to add torch tensor compatibility
* test: add test cases to incorporate torch tensor inputs
* test: ran make fix-copies for code conformity
* test: refactor test to separate the test_call into test_call_numpy and test_call_torch
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Fix vision text dual encoder
* Small cleanup for wav2vec2 (not fixed yet)
* Small fix for vision_encoder_decoder
* Fix SAM builds
* Update TFBertTokenizer test with modern exporting + tokenizer
* Fix DeBERTa
* Fix DeBERTav2
* Try RAG fix but it's impossible to test locally
* Actually fix RAG now that I got FAISS working somehow
* Fix Wav2Vec2, add sermon
* Fix Hubert
* some nits
* update test
* add support d\sd[a
* remove some dummy inputs
* all good
* style
* nits
* fixes
* fix more copies
* nits
* styling
* fix
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add a slow test just to be sure
* fixup
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Add a convenience method for building in your own name scope
* Second attempt at auto layer building
* Revert "Second attempt at auto layer building"
This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.
* Attempt #3
* Revert "Attempt #3"
This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.
* Add missing attributes that we're going to need later
* Add some attributes we're going to need later
* A fourth attempt! Feel the power flow through you!
* Revert "A fourth attempt! Feel the power flow through you!"
This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.
* Add more values we'll need later
* TF refactor that we'll need later
* Revert "TF refactor that we'll need later"
This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.
* Revert "Revert "TF refactor that we'll need later""
This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.
* make fixup
* Attempt five!
* Revert "Attempt five!"
This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.
* Attempt six - this time don't add empty methods
* Revert "Attempt six - this time don't add empty methods"
This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.
* Attempt seven - better base model class detection!
* Revert "Attempt seven - better base model class detection!"
This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.
* Another attribute we'll need later
* Try again with the missing attribute!
* Revert "Try again with the missing attribute!"
This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.
* This is the attempt that will pierce the heavens!
* Revert "This is the attempt that will pierce the heavens!"
This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.
* Attempt seven - snag list is steadily decreasing
* Revert "Attempt seven - snag list is steadily decreasing"
This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.
* Attempt eight - will an empty snag list do it?
* Revert "Attempt eight - will an empty snag list do it?"
This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.
* Fixes to Hubert issues that cause problems later
* Trying again with Conv1D/SeparableConv fixes
* Revert "Trying again with Conv1D/SeparableConv fixes"
This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.
* Apply the build shape fixes to Wav2Vec2 as well
* One more attempt!
* Revert "One more attempt!"
This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.
* Another attempt!
* Revert "Another attempt!"
This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.
* Let's see how many failures we get without the internal build method
* Fix OpenAI
* Fix MobileBERT
* (Mostly) fix GroupVIT
* Fix BLIP
* One more BLIP fix
* One more BLIP fix!
* Fix Regnet
* Finally fully fix GroupViT
* Fix Data2Vec and add the new AdaptivePool
* Fix Segformer
* Fix Albert
* Fix Deberta/DebertaV2
* Fix XLM
* Actually fix XLM
* Fix Flaubert
* Fix lxmert
* Fix Resnet
* Fix ConvBERT
* Fix ESM
* Fix Convnext / ConvnextV2
* Fix SAM
* Fix Efficientformer
* Fix LayoutLMv3
* Fix speech_to_text
* Fix mpnet and mobilevit
* Fix Swin
* Fix CTRL
* Fix CVT
* Fix DPR
* Fix Wav2Vec2
* Fix T5
* Fix Hubert
* Fix GPT2
* Fix Whisper
* Fix DeiT
* Fix the encoder-decoder / dual-encoder classes
* make fix-copies
* build in name scope
* Fix summarization test
* Fix tied weight names for BART + Blenderbot
* Fix tied weight name building
* Fix to TFESM weight building
* Update TF SAM
* Expand all the shapes out into Big Boy Shapes