Commit Graph

514 Commits

Author SHA1 Message Date
Yikang Shen
ccdabc5642
Add JetMoE model (#30005)
* init jetmoe code

* update archive maps

* remove flax import

* fix import error

* update README

* ruff fix

* update readme

* fix

* update config

* fix issue

* merge files

* fix model bug

* fix test

* auto fix

* model size

* add comments

* fix form

* add flash attention support

* fix attention head number

* fix init

* fix support list

* sort auto mapping

* fix test

* fix docs

* update test

* fix test

* fix test

* change variable name

* fix config

* fix init

* update format

* clean code

* fix config

* fix config

* change default config

* update config

* fix issues

* update formate

* update config argument

* update format

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* change to mixtral aux loss

* change to cache_position

* debug

* fix bugs

* debug

* fix format

* fix format

* fix copy

* fix format

* fix format

* fix sort

* fix sort

* fix sort

* add copy comment

* add copy from

* remove debug code

* revert readme update

* add copy

* debug

* remove debug code

* fix flash attention

* add comments

* clean code

* clean format

* fix format

* fix format

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* change variable name

* add copied from

* fix variable name

* remove deprecated functinos

* sync to llama implementation

* fix format

* fix copy

* fix format

* update format

* remove repr

* add comment for moe weight

* fix copy

* Update src/transformers/models/jetmoe/configuration_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add comments and reformat config

* fix format

* fix format

* fix format

* update test

* update doc string in config

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update config doc

* update attention cache

* fix format

* fix copy

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-14 16:32:01 +02:00
Jacky Lee
449894d2e5
Fix OWLv2 Doc (#30794)
fix: owlv2 doc
2024-05-14 08:36:11 +02:00
Alazar
94306352f4
Port IDEFICS to tensorflow (#26870)
* Initial commit

* Just a copy of modeling_idefics.py that will be ported to TF

* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)

* Add TF imports

* Add autotranslated files

* Add TF classes to model_tf_auto.py

* Add the TF classes in model_doc

* include auto-translated code

* Adopted from auto-translated version

* Add a forgotten super().build

* Add test code for TF version.

* Fix indentation and load pytorch weights for now

* Some fixes. Many tests are still failing but some are passing now.

- I have added TODO's for some of the hacks I made to unblock me
  and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily

* Add ALL_LAYERNORM_LAYERS to match pytorch

* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"

This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.

* Fix freeze_relevant_params()

* Some more fixes

* Fix test_attention_outputs

* Add tf stuff to processing_idefics.py

processing_idefics.py supports both pytorch and tf now.

test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.

* Pass return_tensors to image processing code and fix test

* Pass return_tensors to the image processor __init__

* Fix several test cases

- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel

* Some more fixes forgotten in last commit

* Fix processing code and vision_tf.py

* Fix perceiver bug

* Import from

* Auto-add build() methods + style pass

* Fix build() errors due to `None` being passed as shape to some layers

* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text

* Fix pytorch weights load for tf2

There were a lot of `name=` missing in weight initialization code.

* Attempt to fix CI

* Add back accidently removed line

* Remove torch-specific stuff from the TF test file

* make fix-copies, make style, remove autotranslated files

* Fixes to imports/docstrings

* Let's try the from future import in desperation

* Fix the core random_attention_mask fn to match the torch/flax behaviour

* Clean random_attention_mask up correctly

* Remove torch-only test

* Fix loss shape, couple of nits

* make style

* Don't test for OOB embeddings because IDEFICS uses those deliberately

* Fix loss computation to handle masking

* Fix test failures when flattening

* Fix some test failures

- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs

* Add a proper stateless scaled_dot_product_attention

* make style

* Adding missing attribute from the PyTorch version

* Small cleanups to decoupledlinearlayer in case that helps

* Pass epsilon to LayerNormalization

* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding

* Fix a bug in TFIdeficsGatedCrossAttentionLayer

* Patching up build() methods

* Constant self.inv_freq

* Constant self.inv_freq

* First working version

The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)

I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.

I also dumped the final outputs for both pytorch and TF
and they are identical.

* Fix some test failures

* remove print statement

* Fix return_tensors

* Fix CI test failure check_code_quality

* Attempt to fix CI failures by running `make fixup`

The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.

* Attempt to fix tests_pr_documentation_tests

* Fix a test failure in test_image_processing_idefics.py

* Fix test test_pt_tf_model_equivalence

* Fix a few failures

* Tiny fix

* Some minor fixes

* Remove a duplicate test

* Override a few test failures for IDEFICS

- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing

* Fix processing_idefics.py after rebase

* Guard import keras with is_tf_available

* fix check code quality

* fix check code quality

* Minor fixes

* Skip test_save_load temporarily

This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.

* Run `ruff format tests src utils`

* Fix last failing test, `test_compile_tf_model`

* Add fixes for vision_tf.py

I forgot to add this file in last commit.

* Minor fixes

* Replace "<<<" with "<<" for doc tests

IDEFICS-9B is too big for doctest runner, so don't run it there

* Make code more readable

* Fix bug after code review

I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.

* Fix after code review

Use original code tokenizer.convert_tokens_to_ids

* Keep PyTorch as the default return_tensors

* Fixes to modeling_tf after code review

* Fixes from code review

- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver

* Run ruff

* Undo a change

* Refactor processing code after Matt's suggestion

* Remove TODO's that aren't needed anymore

* For pytorch, Use original pytorch processing code from main

Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.

* Update tests/models/idefics/test_modeling_idefics.py

* Update tests/models/idefics/test_modeling_tf_idefics.py

* Add missing imports for is_pt_tf_cross_test

* [DO NOT MERGE]: This is a commit for debugging and will be reverted

The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.

* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"

This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.

* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted

* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted

* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"

This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.

* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"

This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.

* Don't skip test_save_load

IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests

* Debugging commit, will be reverted

* Revert "Debugging commit, will be reverted"

This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.

* Override `test_save_load` and push model to save

Maybe this will help me repro this weird bug

* pass my repo_id

* add endpoint

* Pass a temp (write) token just for this CI

* Undo last few commits, still pushing to hub for model debugging

The issue seems to be with save_pretrained(),  when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging

* Add logging to modeling tf utils, will be reverted just for debugging

* Debugging, will revert

* Revert "Debugging, will revert"

This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.

* Revert "Add logging to modeling tf utils, will be reverted just for debugging"

This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.

* Remove `test_save_load`

The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.

* Run make fix-copies

* Run ruff format tests src utils

* Debugging commit, will be reverted

* Run ruff, also trigger CI run

* Run ruff again

* Undo debugging commit

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-05-13 15:59:46 +01:00
Sungkyun Chang
73fcfb2861
Update llama3.md, fix typo (#30739)
Update llama3.md

fix typo again
2024-05-10 12:40:57 +01:00
Omar Sanseviero
c99d88e520
Update CodeLlama references (#30218)
* Update CodeLlama references

* Update slow_documentation_tests.txt

* Update slow_documentation_tests.txt
2024-05-09 22:57:52 +02:00
Pavel Iakubovskii
998dbe068b
Add examples for detection models finetuning (#30422)
* Training script for object detection

* Evaluation script for object detection

* Training script for object detection with eval loop outside trainer

* Trainer DETR finetuning

* No trainer DETR finetuning

* Eval script

* Refine object detection example with trainer

* Remove commented code and enable telemetry

* No trainer example

* Add requirements for object detection examples

* Add test for trainer example

* Readme draft

* Fix uploading to HUB

* Readme improvements

* Update eval script

* Adding tests for object-detection examples

* Add object-detection example

* Add object-detection resources to docs

* Update README with custom dataset instructions

* Update year

* Replace valid with validation

* Update instructions for custom dataset

* Remove eval script

* Remove use_auth_token

* Add copied from and telemetry

* Fixup

* Update readme

* Fix id2label

* Fix links in docs

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Move description to the top

* Fix Trainer example

* Update no trainer example

* Update albumentations version

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2024-05-08 11:42:07 +01:00
Sungkyun Chang
277db238b7
Fix typo: llama3.md (#30653)
Update llama3.md

fix typo
2024-05-06 15:54:39 +02:00
amitportnoy
801894e08c
phi3 chat_template does not support system role (#30606)
* phi3 chat_template does not support system role

* fix doc test error
2024-05-02 15:30:21 +02:00
Eitan Turok
73014b561d
Fix link in dbrx.md (#30509) 2024-04-26 20:52:24 +01:00
Eduardo Pacheco
6d4cabda26
[SegGPT] Fix seggpt image processor (#29550)
* Fixed SegGptImageProcessor to handle 2D and 3D prompt mask inputs

* Added new test to check prompt mask equivalence

* New proposal

* Better proposal

* Removed unnecessary method

* Updated seggpt docs

* Introduced do_convert_rgb

* nits
2024-04-26 19:40:12 +01:00
JB (Don)
dfa7b580e9
[BERT] Add support for sdpa (#28802)
* Adding SDPA support for BERT

* Using the proper input name for testing model input in inference()

* Adding documentation for SDPA in BERT model page

* Use the stable link for the documentation

* Adding a gate to only call .contiguous() for torch < 2.2.0

* Additions and fixes to the documentation

* Minor updates to documentation

* Adding extra requirements needed for the contiguous() bug

* Adding "Adapted from" in plcae of the "Copied from"

* Add benchmark speedup tables to the documentation

* Minor fixes to the documentation

* Use ClapText as a replacemenet for Bert in the Copied-From

* Some more fixes for the fix-copies references

* Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage

[test all]

* Undo changes to separate test

* Refactored SDPA self attention code for KV projections

* Change use_sdpa to attn_implementation

* Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
2024-04-26 16:23:44 +01:00
Gustavo de Rosa
c9693db2fc
Phi-3 (#30423)
* chore(root): Initial commit of Phi-3 files.

* fix(root): Fixes Phi-3 missing on readme.

* fix(root): Ensures files are consistent.

* fix(phi3): Fixes unit tests.

* fix(tests): Fixes style of phi-3 test file.

* chore(tests): Adds integration tests for Phi-3.

* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.

* fix(phi3): Fixes incorrect docstrings.

* fix(phi3): Fixes docstring typos.

* fix(phi3): Adds support for Su and Yarn embeddings.

* fix(phi3): Improves according first batch of reviews.

* fix(phi3): Uses up_states instead of y in Phi3MLP.

* fix(phi3): Uses gemma rotary embedding to support torch.compile.

* fix(phi3): Improves how rotary embedding classes are defined.

* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.

* fix(phi3): Adds last suggestions to modeling file.

* fix(phi3): Splits inv_freq calculation in two lines.
2024-04-24 17:32:09 +02:00
Arthur
89c510d842
Add llama3 (#30334)
* nuke

* add co-author

* add co-author

* update card

* fixup and fix copies to please our ci

* nit fixup

* super small nits

* remove tokenizer_path from call to `write_model`

* always safe serialize by default

---------

Co-authored-by: pcuenca <pcuenca@users.noreply.github.com>
Co-authored-by: xenova <xenova@users.noreply.github.com>
2024-04-24 10:11:19 +02:00
Kamil Akesbi
569743f510
Add sdpa and fa2 the Wav2vec2 family. (#30121)
* add sdpa to wav2vec.
Co-authored-by: kamilakesbi <kamil@huggingface.co>
Co-authored-by: jp1924 <jp42maru@gmail.com>

* add fa2 to wav2vec2

* add tests

* fix attention_mask compatibility with fa2

* minor dtype fix

* replace fa2 slow test

* fix fa2 slow test

* apply code review + add fa2 batch test

* add sdpa and fa2 to hubert

* sdpa and fa2 to data2vec_audio

* sdpa and fa2 to Sew

* sdpa to unispeech + unispeech sat

* small fix

* attention mask in tests

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add_speedup_benchmark_to_doc

---------

Co-authored-by: kamil@huggingface.co <kamil.akesbi@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-04-22 18:30:38 +01:00
Merve Noyan
b20b017949
Nits for model docs (#29795)
* Update llava_next.md

* Update seggpt.md
2024-04-22 10:41:03 +01:00
NielsRogge
8c12690cec
[Grounding DINO] Add resources (#30232)
* Add resources

* Address comments

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-19 21:03:07 +02:00
João David
d2cec09baa
Add TF swiftformer (#23342)
* Duplicate swiftformer

* Convert SwiftFormerPatchEmbedding

* Convert SwiftFormerEmbeddings

* Convert TFSwiftFormerMlp

* Convert TFSwiftFormerConvEncoder

* Convert TFSwiftFormerLocalRepresentation

* convert TFSwiftFormerEncoderBlock

* Convert SwiftFormerStage

* Convert SwiftFormerEncoder

* Add TFSWiftFormerPreTrainedModel

* Convert SwiftFormerForImageClassification

* Add kwargs and start drop path

* Fix syntax

* Change Model class name

* Add TFSwiftFormer to __init__

* Duplicate test_modeling_swiftformer

* First test conversions

* Change require_torch to require_tf

* Add exports to swiftformer __init__

* Add TFSwiftFormerModel wrapper

* Fix __init__ and run black

* Remove docstring from MainLayer, fix padding

* Use keras.layers.Activation on keras.Sequential

* Fix swiftformer exports

* Fix activation layer from config

* Remove post_inits

* Use tf.keras.layers.ZeroPadding2D

* Convert torch normalize

* Change tf test input shape

* Fix softmax and reduce_sum

* Convert expand_dims and repeat

* Add missing reshape and tranpose

* Simplify TFSwiftFormerEncoderBlock.call

* Fix mismatch in patch embeddings

* Fix expected output shape to match channels last

* Fix swiftformer typo

* Disable test_onnx

* Fix TFSwiftFormerForImageClassification call

* Add unpack inputs

* Convert flatten(2).mean(-1)

* Change vision dummy inputs (to be reviewed)

* Change test_forward_signature to use .call

* Fix @unpack_inputs

* Set return_tensors="tf" and rename class

* Rename wrongly named patch_embeddings layer

* Add serving_output and change dummy_input shape

* Make dimensions BCHW and transpose inside embedding layer

* Change SwiftFormerEncoderBlock

* Fix ruff problems

* Add image size to swiftformer config

* Change tranpose to MainLayer and use -1 for reshape

* Remove serving_outputs and dummy_inputs

* Remove test_initialization test from tf model

* Make Sequential component a separate layer

* Fix layers' names

* Tranpose encoder outputs

* Fix tests and check if hidden states is not None

* Fix TFSwiftFormerForImageClassification

* Run make fixup

* Run make fix-copies

* Update modeling_tf_auto

* Update docs

* Fix modeling auto mapping

* Update modelint_tf_swiftformer docs

* Fill image_size doc and type

* Add reduction=None to loss computation

* Update docs

* make style

* Debug: Delete the tip to see if that changes anything

* Re-add tip

* Remove add_code_sample_docstrings

* Remove unused import

* Get the debug to actually tell us the problem it has with the docs

* Try a substitution to match the PyTorch file?

* Add swiftformer to ignore list

* Add build() methods

* Update copyright year

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove FIXME comment

* Remove from_pt

* Update copyright year

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Rename one-letter variables

* Remove FIXMEs related to momentum

* Remove old TODO comment

* Remove outstanding FIXME comments

* Get dropout rate from config

* Add specific dropout config for MLP

* Add convencoder dropout to config

* Pass config to SwiftFormerDropPath layer

* Fix drop_path variable name and add Adapted from comment

* Run ruff

* Removed copied from comment

* Run fix copies

* Change drop_path to identity to match pt

* Cleanup build() methods and move to new keras imports

* Update docs/source/en/model_doc/swiftformer.md

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Raise error if drop_path_rate > 0.0

* Apply suggestions from code review

Replace (self.dim), with self.dim,

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove drop_path function

* Add training to TFSwiftFormerEncoder

* Set self.built = True last

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Should have been added to previous commit

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Change default_feature_extractor to default_image_processor

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Import Keras from modeling_tf_utils

* Remove relative import

* Run ruff --fix

* Move import keras to tf_available

* Add copied from comment to test_forward_signature

* Reduce batch size and num_labels

* Extract loss logic to hf_compute_loss

* Run ruff format

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-04-19 18:31:43 +01:00
Abhi Venigalla
005b957fb8
Add DBRX Model (#29921)
* wip

* fix __init__.py

* add docs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments 1

* work on make fixup

* pass configs down

* add sdpa attention

* remove DbrxBlock

* add to configuration_auto

* docstring now passes formatting test

* fix style

* update READMEs

* add dbrx to modeling_auto

* make fix-copies generated this

* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* config docstring passes formatting test

* rename moe_loss_weight to router_aux_loss_coef

* add to flash-attn documentation

* fix model-path in tests

* Explicitly make `"suli"` the default `ffn_act_fn`

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]

* fix _flash_attn_uses_top_left_mask and is_causal

* fix tests path

* don't use token type IDs

* follow Llama and remove token_type_ids from test

* init ConfigTester differently so tests pass

* remove multiple choice test

* remove question + answer test

* remove sequence classification test

* remove token classification test

* copy Llama tests and remove token_type_ids from test inputs

* do not test pruning or headmasking; style code

* add _tied_weights_keys parameter to pass test

* add type hints

* fix type check

* update config tester

* remove masked_lm test

* remove encoder tests

* initialize DbrxModelTester with correct params

* style

* torch_dtype does not rely on torch

* run make fixup, fix-copies

* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py

* add copyright info

* fix imports and DbrxRotaryEmbedding

* update DbrxModel docstring

* use copies

* change model path in docstring

* use config in DbrxFFN

* fix flashattention2, sdpaattention

* input config to DbrXAttention, DbrxNormAttentionNorm

* more fixes

* fix

* fix again!

* add informative comment

* fix ruff?

* remove print statement + style

* change doc-test

* fix doc-test

* fix docstring

* delete commented out text

* make defaults match dbrx-instruct

* replace `router_aux_loss_coef` with `moe_loss_weight`

* is_decoder=True

* remove is_decoder from configtester

* implement sdpa properly

* make is_decoder pass tests

* start on the GenerationTesterMixin tests

* add dbrx to sdpa documentation

* skip weight typing test

* style

* initialize smaller model

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Add DBRX to toctree

* skip test_new_cache_format

* make config defaults smaller again

* add pad_token_id

* remove pad_token_id from config

* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* Update src/transformers/models/dbrx/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix typo

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update docs, fix configuration_auto.py

* address pr comments

* remove is_decoder flag

* slice

* fix requires grad

* remove grad

* disconnect differently

* remove grad

* enable grads

* patch

* detach expert

* nissan al ghaib

* Update modeling_dbrx.py

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* replace "Gemma" with "Dbrx"

* remove # type: ignore

* don't hardcode vocab_size

* remove ToDo

* Re-add removed idefics2 line

* Update test to use tiny-random!

* Remove TODO

* Remove one more case of loading the entire dbrx-instruct in the tests

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address some comments

* small model

* add dbrx to tokenization_auto

* More docstrings with add_start_docstrings

* Dbrx for now

* add PipelineTesterMixin

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove flash-attn2 import error

* fix docstring

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add useage example

* put on one line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix ffn_act_fn

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change "dbrx" to "DBRX" for display purposes.

* fix __init__.py?

* fix __init__.py

* fix README

* return the aux_loss

* remove extra spaces

* fix configuration_auto.py

* fix format in tokenization_auto

* remove new line

* add more useage examples

---------

Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-18 15:18:52 +02:00
tomeras91
3f20877da9
Add jamba (#29943)
* Add jamba arch

* apply "make fix-copies" changes

* fix link to model in JambaConfig docstring

* Add n_ctx in modeling file because repo-consistency wants that

* Add jamba to flash attention and sdpa documentation

* mamba dt_proj quant fix now works for LoRA as well

* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers

* add jamba to tokenization auto

* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)

* simple PR fixes

* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer

* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)

* Add copied comment on JambaMLP (it's the same as MixtralMLP)

* remove padding_mask warnings. It's not supported anymore

* fix docstring. Float instead of int

* A few more minor PR fixes

* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass

* Return None attention weights from mamba layers. Append to all attentions only if not None.

* remove some leftover jamba archive lists

* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel

* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers

* Add Jamba paper on READMEs

* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)

* Add copied from comment

* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms

* clearer docstring for _convert_to_standard_cache

* style fixes

* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs

* rename test so it still overrides what its meant to override

* draft

* oups

* nit

* remove more complexe logic

* fix names used in config

* fix fix fix

* style

* fix some more failing tests

* generate did not init the cache 🙃

* more small nits

* typo

* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes

* fix init of pkv with torch.tensor()

* empty tensor

* fix some init issues

* stupid changes required by generate because it does not even support it's own DynamicCache class

* more fixes

* fix general assisted gen cache_position bug

* tests passing

* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py

* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache

* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore

* fix docstrings and typehints for past_key_values

* style fixes

* fix docs

* change typehint due to copy from Mixtral

* forgot import

* import order

* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)

* Add integration test with tiny tandom Jamba model on hub

* fix flash attention cache shapes

* bring back forgotten hidden states

* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model

* align integration test after modeling fixes

* bugfix - mamba can use precomputed states only of forward pass is on a single token

* bugfix - mamba can use precomputed states only if they match the batch size

* typo

* remove making _prepare_4d_causal_attention_mask a leaf function

* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-04-18 11:04:02 +02:00
Alexander Visheratin
b65df514d1
Add Flash Attention 2 to M2M100 model (#30256)
* Added flash attention 2.

* Fixes.

* Fix inheritance.

* Fixed init.

* Remove stuff.

* Added documentation.

* Add FA2 to M2M100 documentation.

* Add test.

* Fixed documentation.

* Update src/transformers/models/m2m_100/modeling_m2m_100.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed variable name.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-18 10:27:58 +02:00
Shane A
e4ea19b958
Add OLMo model family (#29890)
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4b15.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4a29.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
2024-04-17 17:59:07 +02:00
st81
8d6b509611
Add token type ids to CodeGenTokenizer (#29265)
* Add create token type ids to CodeGenTokenizer

* Fix inconsistent length of token type ids

* Format source codes

* Fix inconsistent order of methods

* Update docstring

* add test_tokenizer_integration test

* Format source codes

* Add `copied from` comment to CodeGenTokenizerFast

* Add doc of create_token_type_ids_from_sequences

* Make return_token_type_ids False by default

* Make test_tokenizer_integration as slow test

* Add return_token_type_ids to tokenizer init arg

* Add test for tokenizer's init return_token_type_ids

* Format source codes
2024-04-17 12:19:18 +02:00
Yih-Dar
cbc2cc187a
More fixes for doctest (#30265)
* fix

* update

* update

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-16 11:58:55 +02:00
Matt
5be21302ad
Remove incorrect arg in codellama doctest (#30257)
Remove incorrect arg in codellama docstring
2024-04-15 18:31:23 +01:00
Sayak Paul
8127f39624
[Docs] Update recurrent_gemma.md for some minor nits (#30238)
Update recurrent_gemma.md
2024-04-15 18:30:59 +02:00
amyeroberts
6b78360e6d
Add Idefics2 (#30253)
* Initial add model additions

* Test

* All weights loading

* Can perform full forward pass

* Local and remote the same

* Matching local and remote

* Fixup

* Idefics2Model importable; fixup docstrings

* Don't skip by default

* Remove deprecated use_resampler arg

* Remove self.config

* DecoupledLinear takes config

* Tidy up

* Enable eager attention and tidy up

* Most tests passing

* Update for batch of processed images

* Add image processor

* Update doc pages

* Update conversion script

* Remove erroneous breakpoint

* Remove accidendtal spelling change

* Update to reflect changes on hub - make generate work

* Fix up

* Image processor tests

* Update tests

* Add a processor

* Add a processor

* Update convert script

* Update modeling file - remove fixmes

* Bug fix

* Add processing test

* Use processor

* Fix up

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Fix test

* Update config - PR comments and defaults align with checkpoint

* Reviewer comments

* Add copied froms for flahs attention

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove qk_layer_norm and freeze_layers functionality

* Fix

* Remove freeze_layer options from config

* Sync with upstream main

* Fix attention shapes siglip

* Remove Llava-next refs - TO REBASE

* Use AutoModel for text model

* Add comment to explain vision embeddings

* Fix issue with tie_word_embeddings

* Address review comments

* Fix and fix up

* Chat templates for idefics

* Fix copies

* Fix

* Add layer norms to FA2

* Fix tests

* Apply suggestions from code review

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Fix

* Review comments

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update inputs merger

* Merge weights in correct order

* Update convert script

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update template

* Model code examples (fix idefics too)

* More review comments

* Tidy up

* Update processing

* Fix attention mask preparation

* Update inputs_merger inputs

* Vectorize inputs_merger

* Update src/transformers/models/idefics2/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/idefics2/modeling_idefics2.py

* Review comments

* saying bye to the `qk_layer_norms`

* Simplify

* Update latents

* Remove erroneuous readme changes

* Return images when applying chat template

* Fix bug - prompt images are for a single sample

* Update src/transformers/models/idefics2/modeling_idefics2.py

* image splitting

* fix test

* some more comment

* some comment

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics2/image_processing_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update processor

* Update model tests

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Don't add BOS in template

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Remove index in examples

* Update tests to reflect #13

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* PR comment - consistent typing

* Update readme and model doc

* Update docs

* Update checkpoint references

* Update examples

* Fix and update tests

* Small addition

* Update tests - remove copied from as no ignore placement copy could be found

* Update example

* small fixes

* Update docs/source/en/model_doc/idefics2.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update docs/source/en/model_doc/idefics2.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update README.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Connector model as bridge

* Fix up

* Fix up

* Don't pass model inputs for generation kwargs update

* IDEFICS-2 -> Idefics2

* Remove config archive name

* IDEFICS-2 -> Idefics2

* Add back llava-next

* Update readmes

* Add requirements for processor tester

* Use custom convert_to_rgb to avoid possible BC

* Fix doc example

* Fix doc example

* Skip model doc tests - as model to large

* More doc example - account for image splitting

* Update src/transformers/image_transforms.py

* Fix config doctest

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-04-15 17:03:03 +01:00
Yih-Dar
fe2d20d275
Fix doctest more (for docs/source/en) (#30247)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-15 14:10:59 +02:00
lewtun
fbdb978eb5
Fix Llava chat template examples (#30130) 2024-04-11 10:38:24 +02:00
Eduardo Pacheco
b752ad3019
Adding grounding dino (#26087)
* Fixed typo when converting weigths to GroundingDINO vision backbone

* Final modifications on modeling

* Removed unnecessary class

* Fixed convert structure

* Added image processing

* make fixup partially completed

* Now text_backbone_config has its own class

* Modified convert script

* Removed unnecessary config attribute

* Added new function to generate sub sentence mask

* Renamed parameters with gamma in the name as it's currently not allowed

* Removed tokenization and image_processing scripts since we'll map from existing models

* Fixed some issues with configuration

* Just some modifications on conversion script

* Other modifications

* Copied deformable detr

* First commit

* Added bert to model

* Bert validated

* Created Text and Fusion layers for Encoder

* Adapted Encoder layer

* Fixed typos

* Adjusted Encoder

* Converted encoder to hf

* Modified Decoder Layer

* Modified main decoder class

* Removed copy comments

* Fixed forward from GroundingDINOModel and GroundingDINODecoder

* Added all necessary layers, configurations and forward logic up to GroundingDINOModel

* Added all layers to convertion

* Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection

* Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output

* Fixed forward from GroundingDINOTextEnhancerLayer

* Fixed output bug with GroundingDINODeformableLayer

* Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method

* Fixed attentions to be passed correctly

* Passing temperature arg when creating Sine position embedding

* Removed copy comments

* Added temperature argument for position embedding

* Fixed typo when converting weigths to GroundingDINO vision backbone

* Final modifications on modeling

* Removed unnecessary class

* Fixed convert structure

* Added image processing

* make fixup partially completed

* Now text_backbone_config has its own class

* Modified convert script

* Removed unnecessary config attribute

* Added new function to generate sub sentence mask

* Renamed parameters with gamma in the name as it's currently not allowed

* Removed tokenization and image_processing scripts since we'll map from existing models

* Fixed some issues with configuration

* Just some modifications on conversion script

* Other modifications

* Fix style

* Improve fixup

* Improve conversion script

* Improve conversion script

* Add GroundingDINOProcessor

* More improvements

* Return token type ids

* something

* Fix more tests

* More improvements

* More cleanup

* More improvements

* Fixed tests, improved modeling and config

* More improvements and fixing tests

* Improved tests and modeling

* Improved tests and added image processor

* Improved tests inference

* More improvements

* More test improvements

* Fixed last test

* Improved docstrings and comments

* Fix style

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Better naming

* Better naming

* Added Copied statement

* Added Copied statement

* Moved param init from GroundingDINOBiMultiHeadAttention

* Better naming

* Fixing clamp style

* Better naming

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Improving conversion script

* Improved config

* Improved naming

* Improved naming again

* Improved grouding-dino.md

* Moved grounding dino to multimodal

* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Fixed docstrings and style

* Fix docstrings

* Remove timm attributes

* Reorder imports

* More improvements

* Add Grounding DINO to pipeline

* Remove model from check_repo

* Added grounded post_process to GroundingDINOProcessor

* Fixed style

* Fixed GroundingDINOTextPrenetConfig docstrings

* Aligned inputs.keys() when both image and text are passed with model_input_names

* Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor

* Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head

* Fixed order

* Marked test with require_torch

* Temporarily changed repo_id

* More improvements

* Fix style

* Final improvements

* Improve annotators

* Fix style

* Add is_torch_available

* Remove type hints

* vocab_tokens as one liner

* Removed print statements

* Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig

* remove unnecessary comments

* Removed unnecessary tests on conversion script

* Renamed GroundingDINO to camel case GroundingDino

* Fixed GroundingDinoProcessor docstrings

* loading MSDA kernels in the modeling file

* Fix copies

* Replace nn.multiheadattention

* Replace nn.multiheadattention

* Fixed inputs for GroundingDinoMultiheadAttention & order of modules

* Fixed processing to avoid messing with inputs

* Added more tips for GroundingDino

* Make style

* Chaning name to align with SAM

* Replace final nn.multiheadattention

* Fix model tests

* Update year, remove GenerationTesterMixin

* Address comments

* Address more comments

* Rename TextPrenet to TextModel

* Rename hidden_states

* Address more comments

* Address more comments

* Address comment

* Address more comments

* Address merge

* Address comment

* Address comment

* Address comment

* Make style

* Added layer norm eps to layer norms

* Address more comments

* More fixes

* Fixed equivalence

* Make fixup

* Remove print statements

* Address comments

* Address comments

* Address comments

* Address comments

* Address comments

* Address comments

* Add comment

* Address comment

* Remove overwriting of test

* Fix bbox_embed

* Improve decoder_bbox_embed_share

* Simplify outputs

* Updated post_process_grounded_object_detection

* Renamed sources to feature_maps

* Improved tests for Grounding Dino ImageProcessor and Processor

* Fixed test requirements and imports

* Fixed image_processing

* Fixed processor tests

* Fixed imports for image processing tests

* Fix copies

* Updated modeling

* Fix style

* Moved functions to correct position

* Fixed copy issues

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Keeping consistency custom cuda kernels for MSDA

* Make GroundingDinoProcessor logic clearer

* Updated Grounding DINO checkpoints

* Changed tests to correct structure

* Updated gpu-cpu equivalence test

* fix copies

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed erros and style

* Fix copies

* Removed inheritance from PreTrainedModel from GroundingDinoTextModel

* Fixed GroundingDinoTextModel

* Fixed type of default backbone config

* Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder

* Addressed comments

* Addressed batched image processing tests

* Addressed zero shot test comment

* Addressed tip comment

* Removed GroundingDinoTextModel from check_repo

* Removed inplace masking

* Addressed comments

* Addressed comments

* Addressed comments

* Fix copies

* Fixing timm test

* Fixed batching equivalence test

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Addressed more comments

* Added a new comment

* Reduced image size

* Addressed more comments

* Nits

* Nits

* Changed the way text_config is initialized

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>
2024-04-11 08:32:16 +01:00
Arthur
0fe44059ae
Add recurrent gemma (#30143)
* Fork.

* RecurrentGemma initial commit.

* Updating __init__.py.

* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.

* Reformat code to 4 spaces.
Fixed a few typos.

* Fixed the forward pass.
Still unclear on the cache?

* Fixed the RecurrentGemmaForCausalLM

* Minor comment that we might not need attention_mask and output_attention arguments.

* Now cache should work as well.

* Adding a temporary example to check whether the model generation works.

* Adding the tests and updating imports.

* Adding the example file missing in the previous commit.

* First working example.

* Removing .gitignore and reverting parts of __init__.

* Re-add .gitignore.

* Addressing comments for configuration.

* Move mask creation to `_prepare_inputs_for_generation`.

* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed

* Transfoering between machines.

* Running normal tests.

* Minor fix.

* More fixes.

* Addressing more comments.

* Minor fixes.

* first stab at cleanup

* more refactoring

* fix copies and else

* renaming and get init to work

* fix causal mask creation

* update

* nit

* fix a hell lot of things

* updates

* update conversion script

* make all keys importable

* nits

* add auto mappings

* properly convert ffw_up and down

* add scaling

* fix generations

* for recurrent dtype

* update

* fix going beyong window

* fixup

* add missing files

* current updates to remove last einops

* finish modeling refactor

* TADA

* fix compile

* fix most failing testt ? ?

* update tests

* refactor and update

* update

* nits, fixup and update tests

* more fixup

* nits

* fix imports

* test format

* fixups

* nits

* tuple typing

* fix code quality

* add model card

* fix doc

* skip most generation tests

* nits

* style

* doc fixes

* fix pr and check_copies?

* last nit

* oupsy

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update

* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* doc nit

* fix quality

* quality

* fix slow test model path

* update default dype

* ignore attributes that can be safely ignored in check config attributes

* 0lallalala come on

* save nit

* style

* remove to dict update

* make sure we can also run in float16

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-10 16:59:13 +02:00
NielsRogge
505854f78f
[UDOP] Improve docs, add resources (#29571)
* Improve docs

* Add more tips
2024-04-10 16:02:50 +02:00
Bo Zheng
46d636818b
Update model card and link of blog post. (#29928)
* Update qwen2_moe.md

* update link of blogpost.

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-03-30 17:49:03 +01:00
fzyzcjy
5ad7f17002
Super tiny fix 12 typos about "with with" (#29926)
* with with

* style
2024-03-29 14:31:31 +00:00
Eduardo Pacheco
22d159ddf9
Adding Flash Attention 2 Support for GPT2 (#29226)
* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-28 09:31:24 +00:00
Bo Zheng
1c39974a4c
Add Qwen2MoE (#29377)
* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
Johannes Kolbe
7eb3ba8224
remove quotes in code example (#29812)
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
2024-03-25 13:26:54 +00:00
Billy Cao
dafe370255
[DOCS] Fix typo for llava next docs (#29829)
Fix typo for llava next docs
2024-03-23 11:32:31 -07:00
NielsRogge
d91fd7f92c
Add LLaVa-1.6, bis (#29586)
* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-20 15:51:12 +00:00
amyeroberts
3c17c529cc
SuperPointModel -> SuperPointForKeypointDetection (#29757) 2024-03-20 15:41:03 +00:00
StevenBucaille
56baa03380
Implementation of SuperPoint and AutoModelForKeypointDetection (#28966)
* Added SuperPoint docs

* Added tests

* Removed commented part

* Commit to create and fix add_superpoint branch with a new branch

* Fixed dummy_pt_objects

* Committed missing files

* Fixed README.md

* Apply suggestions from code review

Fixed small changes

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py

* Removed AutoModelForKeypointDetection and related stuff

* Fixed inconsistencies in image_processing_superpoint.py

* Moved infer_on_model logic simply in test_inference

* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py

* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale

* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed from (w, h) to (h, w) as input for tests

* Removed unnecessary condition

* Moved last_hidden_state to be the first returned

* Moved last_hidden_state to be the first returned (bis)

* Moved last_hidden_state to be the first returned (ter)

* Switched image_width and image_height in tests to match recent changes

* Added config as first SuperPointConvBlock init argument

* Reordered README's after merge

* Added missing first config argument to SuperPointConvBlock instantiations

* Removed formatting error

* Added SuperPoint to README's de, pt-br, ru, te and vi

* Checked out README_fr.md

* Fixed README_fr.md

* Test fix README_fr.md

* Test fix README_fr.md

* Last make fix-copies !

* Updated checkpoint path

* Removed unused SuperPoint doc

* Added missing image

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed unnecessary import

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added SuperPoint to _toctree.yml

---------

Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
2024-03-19 14:43:02 +00:00
Yoach Lacombe
c43b380e70
Add MusicGen Melody (#28819)
* first modeling code

* make repository

* still WIP

* update model

* add tests

* add latest change

* clean docstrings and copied from

* update docstrings md and readme

* correct chroma function

* correct copied from and remove unreleated test

* add doc to toctree

* correct imports

* add convert script to notdoctested

* Add suggestion from Sanchit

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct get_uncoditional_inputs docstrings

* modify README according to SANCHIT feedback

* add chroma to audio utils

* clean librosa and torchaudio hard dependencies

* fix FE

* refactor audio decoder -> audio encoder for consistency with previous musicgen

* refactor conditional -> encoder

* modify sampling rate logics

* modify license at the beginning

* refactor all_self_attns->all_attentions

* remove ignore copy from causallm generate

* add copied from for from_sub_models

* fix make copies

* add warning if audio is truncated

* add copied from where relevant

* remove artefact

* fix convert script

* fix torchaudio and FE

* modify chroma method according to feedback-> better naming

* refactor input_values->input_features

* refactor input_values->input_features and fix import fe

* add input_features to docstrigs

* correct inputs_embeds logics

* remove dtype conversion

* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation

* change warning for chroma length

* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* change way to save wav, using soundfile

* correct docs and change to soundfile

* fix import

* fix init proj layers

* remove line breaks from md

* fix issue with docstrings

* add FE suggestions

* improve is in logics and remove useless imports

* remove custom from_pretrained

* simplify docstring code

* add suggestions for modeling tests

* make style

* update converting script with sanity check

* remove encoder attention mask from conditional generation

* replace musicgen melody checkpoints with official orga

* rename ylacombe->facebook in checkpoints

* fix copies

* remove unecessary warning

* add shape in code docstrings

* add files to slow doc tests

* fix md bug and add md to not_tested

* make fix-copies

* fix hidden states test and batching

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-03-18 13:06:12 +00:00
Saurabh Dash
0e4a1c3401
Cohere Model Release (#29622)
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-03-15 14:29:11 +01:00
Nate Cibik
1fc505b816
Add PvT-v2 Model (#26812)
* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Updated index.md

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Fixed config docstring. Added channels property

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Fixed config backbone compat

* Ran fix-copies

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Fixed issue from rebase

* Fixed issue from rebase

* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported

* Fixed issue from rebase

* Fixed issue from rebase

* Changed model name in docs

* Removed duplicate PvtV2Backbone

* Work around type switching issue in tests

* Fix model name in config comments

* Update docs/source/en/model_doc/pvt_v2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed old code

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Fixed Class names to be more descriptive

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed outdated code

* Moved paper abstract to single line in pvt_v2.md

* Added usage tips to pvt_v2.md

* Simplified module inits by passing layer_idx

* Fixed typing for hidden_act in PvtV2Config

* Removed unusued import

* Add pvt_v2 to docs/source/en/_toctree.yml

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Move function parameters to single line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Update year of copyright to 2024

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Make code more explicit

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated sr_ratio to be more explicit spatial_reduction_ratio

* Removed excess type hints in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed needless comment in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update copyright date in pvt_v2.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated copyright date in configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Cleaned comments in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Renamed spatial_reduction Conv2D operation

* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"

This reverts commit c4a04416dd.

* Updated conversion script to reflect module name change

* Deprecated reshape_last_stage option in config

* Removed unused imports

* Code formatting

* Fixed outdated decorators on test_inference_fp16

* Added "Copied from" comments in test_modeling_pvt_v2.py

* Fixed import listing

* Updated model name

* Force empty commit for PR refresh

* Fixed linting issue

* Removed # Copied from comments

* Added PVTv2 to README_fr.md

* Ran make fix-copies

* Replace all FoamoftheSea hub references with OpenGVLab

* Fixed out_indices and out_features logic in configuration_pvt_v2.py

* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py

* Ran code fixup

* Fixed order of parent classes in PvtV2Config to fix the to_dict method override

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-13 19:05:20 +00:00
Dries Verachtert
624788570c
Fix minor typo: infenrece => inference (#29621) 2024-03-13 14:49:09 +00:00
Raushan Turganbay
6cc5411d81
Fix Fuyu doc typos (#29601)
fix fuyu docs
2024-03-12 10:16:21 +00:00
Amrit Gupta
b45c0f55e0
Fixed broken link (#29558)
Fixed broken link for Resources -> Token Classification -> Finetuning BERT for named-entity
2024-03-11 17:26:38 +00:00
Arthur
4f27ee936a
[Mamba doc] Post merge updates (#29472)
* post merge update

* nit

* oups
2024-03-11 09:46:24 +01:00
Younes Belkada
b27aa206dd
[docs] Add starcoder2 docs (#29454)
* add accelerate docs

* Apply suggestions from code review

Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>

* Update starcoder2.md

* add correct generation

---------

Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
2024-03-06 06:58:37 +01:00
Arthur
fb1c62e973
[Add Mamba] Adds support for the Mamba models (#28094)
* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-05 20:01:06 +09:00
NielsRogge
836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00