Commit Graph

245 Commits

Author SHA1 Message Date
Younes Belkada
87a351818e
Docs / Quantization: refactor quantization documentation (#30942)
* refactor quant docs

* delete file

* rename to overview

* fix

* fix table

* fix

* add content

* fix library versions

* fix table

* fix table

* fix table

* fix table

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* replace to quantization_config

* fix aqlm snippet

* add DLAI courses

* fix

* fix table

* fix bulet points

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-23 14:31:52 +02:00
Lysandre Debut
a42844955f
Loading GGUF files support (#30391)
* Adds support for loading GGUF files

Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>

* add q2_k q3_k q5_k support from @99991

* fix tests

* Update doc

* Style

* Docs

* fix CI

* Update docs/source/en/gguf.md

* Update docs/source/en/gguf.md

* Compute merges

* change logic

* add comment for clarity

* add comment for clarity

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change logic

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_gguf_pytorch_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* put back comment

* add comment about mistral

* comments and added tests

* fix unconsistent type

* more

* fix tokenizer

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address comments about tests and tokenizer + add added_tokens

* from_gguf -> gguf_file

* replace on docs too

---------

Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 14:28:20 +02:00
Raushan Turganbay
bd9f4d7951
Add Video Llava (#29733)
* add model draft

* update docstring

* add tests

* support image and video as input

* update for better handling of mixed input and clean-up a bit

* bug when mixed inputs & add tests

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Merge remote-tracking branch 'upstream/main' into video_llava

* link to abstract of paper in README

* fix test

* fix-copies

* make tests happy

* skip docstest for now

* do not run doctest for now

* Update src/transformers/models/video_llava/processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/video_llava/test_modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address review comments

* failing tests

* Fix vocab_size in common tests for VLMs

* codestyle

* Update src/transformers/models/video_llava/configuration_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/configuration_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/video_llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/video_llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/image_processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/video_llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/processing_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/video_llava/test_modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/video_llava/test_modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/video_llava/test_modeling_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* PR suggestions

* fix-copies

* Update src/transformers/models/video_llava/configuration_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/video_llava/configuration_video_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add full example in docs

* clean-up with new model-id

* [run-slow] video_llava

* update docstring

* [run-slow] video_llava

* remove all achive maps

* fix some tests

* test was supposed to be skipped for llava :)

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 16:42:29 +05:00
Pablo Montalvo
1360801a69
Add PaliGemma (#30814)
* add new model like

* add state dict slicing + new model config

* update palma config and weights, passes vision activations

* fix

* update

* reorder loading/unpacking

* clean up

* add debug statements

* change device

* fix

* debugging

* fix noncausal mask

* fixup sdpa + causal mask

* fix activation function

* remove debug before changing modeling file

* add variants

* debug attention mask in generate

* revert to non-debug sdpa

* revert gemma modifications

* add custom language modeling

* use Processor

* add language modeling file to init

* try thin wrapper around generate

* Update

* update mask

* breakpoints galore

* remove conflict

* switch to left-padding

* add incomplete model doc

* add paligemma global files

* batch rename paligemma

* make generation match outputs and captioning

* style

* style

* remove copied from + doc

* remove more copied from

* remove copy from projector

* minor fix

* update config and style

* add readme - dummy

* CORRECT image captioning

* moving to args

* add siglip proper + fix merging image + text features

* take update_causal_mask from upstream

* remove breakpoint

* leverage AutoModel

* fix input_ids slicing

* make siglip head conditional

* remove encoder_decoder value

* remove unneeded modeling file

* add commented 4d attention mask

* FIXED generation with 4D mask

* Update src/transformers/models/siglip/modeling_siglip.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix left padding detection

* shuffle order of verifications

* fix missing labels for training

* fix

* vectorize merging of features, improve slicing

* improve testing before conversion

* handle merging in processor

* image token index depends on checkpoint

* add variants, save processor too

* save processors, base tokenizer off spm file

* expand model embeddings due to additional image token

* pass image processing args

* add convert rgb to siglip processor

* add \n token separately

* fix tokenizer and prompts

* fix docstrings

* change to camel

* fix casing

* debug pos_ids and sdpa

* pass and use cache_position

* add flag for newline tokenization

* Update src/transformers/models/paligemma/processing_paligemma.py

Co-authored-by: Merve Noyan <merveenoyan@gmail.com>

* simplify conversion script

* add copied from

* add precision to conversion script

* Update src/transformers/models/paligemma/modeling_paligemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* clean up

* Shift attention mask from `1:`

After discussion with @molbap

* add docs, fix quality

* quality, tied weights inheritance, and logits/label alignment

* fix more tests

* pass attn_implementation to language model correctly

* add SiglipVisionTransformer to no split modules

* skip paligemma test for sdpa dispatch to flash

* skip incompatible tests

* quality

* [broken archive maps]

* Apply suggestions

- remove archive lists
- style
- take shape of inputs_embeds for batch

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/dummy_pt_objects.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* simplify conversion script

* add suggestions

* add suggestions

* add copied from

* fix

* move labels out

* revert

* fix

* remove placeholder labels if None

* use cache_position

* fix quality + docstrings

* fix quality

* fix paligemma 4d gemma mask incompatibility

* fix config docstring

* fix query and attn_mask dtype

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-05-14 22:07:15 +02:00
Yikang Shen
ccdabc5642
Add JetMoE model (#30005)
* init jetmoe code

* update archive maps

* remove flax import

* fix import error

* update README

* ruff fix

* update readme

* fix

* update config

* fix issue

* merge files

* fix model bug

* fix test

* auto fix

* model size

* add comments

* fix form

* add flash attention support

* fix attention head number

* fix init

* fix support list

* sort auto mapping

* fix test

* fix docs

* update test

* fix test

* fix test

* change variable name

* fix config

* fix init

* update format

* clean code

* fix config

* fix config

* change default config

* update config

* fix issues

* update formate

* update config argument

* update format

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* change to mixtral aux loss

* change to cache_position

* debug

* fix bugs

* debug

* fix format

* fix format

* fix copy

* fix format

* fix format

* fix sort

* fix sort

* fix sort

* add copy comment

* add copy from

* remove debug code

* revert readme update

* add copy

* debug

* remove debug code

* fix flash attention

* add comments

* clean code

* clean format

* fix format

* fix format

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* change variable name

* add copied from

* fix variable name

* remove deprecated functinos

* sync to llama implementation

* fix format

* fix copy

* fix format

* update format

* remove repr

* add comment for moe weight

* fix copy

* Update src/transformers/models/jetmoe/configuration_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add comments and reformat config

* fix format

* fix format

* fix format

* update test

* update doc string in config

* Update src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update config doc

* update attention cache

* fix format

* fix copy

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-14 16:32:01 +02:00
Aymeric Roucher
0ba15cedbc
Reboot Agents (#30387)
* Create CodeAgent and ReactAgent

* Fix formatting errors

* Update documentation for agents

* Add custom errors, improve logging

* Support variable usage in ReactAgent

* add messages

* Add message passing format

* Create React Code Agent

* Update

* Refactoring

* Fix errors

* Improve python interpreter

* Only non-tensor inputs should be sent to device

* Calculator tool slight refactor

* Improve docstrings

* Refactor

* Fix tests

* Fix more tests

* Fix even more tests

* Fix tests by replacing output and input types

* Fix operand type issue

* two small fixes

* EM TTS

* Fix agent running type errors

* Change text to speech tests to allow changed outputs

* Update doc with new agent types

* Improve code interpreter

* If max iterations reached, provide a real answer instead of an error

* Add edge case in interpreter

* Add safe imports to the interpreter

* Interpreter tweaks: tuples and listcomp

* Make style

* Make quality

* Add dictcomp to interpreter

* Rename ReactJSONAgent to ReactJsonAgent

* Misc changes

* ToolCollection

* Rename agent's logger to self.logger

* Add while loops to interpreter

* Update doc with new tools. still need to mention collections

* Add collections to the doc

* Small fixes on logs and interpretor

* Fix toolbox return type

* Docs + fixup

* Skip doctests

* Correct prompts with improved examples and formatting

* Update prompt

* Remove outdated docs

* Change agent to accept Toolbox object for tools

* Remove calculator tool

* Propagate removal of calculator in doc

* Fix 2 failing workflows

* Simplify additional argument passing

* AgentType audio

* Minor changes: function name, types

* Remove calculator tests

* Fix test

* Fix torch requirement

* Fix final answer tests

* Style fixes

* Fix tests

* Update docstrings with calculator removal

* Small type hint fixes

* Update tests/agents/test_translation.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_python_interpreter.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/default_tools.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/tools.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_agents.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bert/configuration_bert.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/tools.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/speech_to_text.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_speech_to_text.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_tools_common.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pygments

* Answer comments

* Cleaning up

* Simplifying init for all agents

* Improving prompts and making code nicer

* Style fixes

* Add multiple comparator test in interpreter

* Style fixes

* Improve BERT example in documentation

* Add examples to doc

* Fix python interpreter quality

* Logging improvements

* Change test flag to agents

* Quality fix

* Add example for HfEngine

* Improve conversation example for HfEngine

* typo fix

* Verify doc

* Update docs/source/en/agents.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/agents.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/prompts.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/python_interpreter.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/agents.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix style issues

* local s2t tool

---------

Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-07 12:59:49 +02:00
Matt
f39627125b
Add sidebar tutorial for chat models (#30401)
* Draft tutorial for talking to chat models

* Reformat lists and text snippets

* Cleanups and clarifications

* Finish up remaining TODOs

* Correct section link

* Small fix

* Add proper quantization examples

* Add proper quantization examples

* Add proper quantization examples

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix Text Generation Pipeline link and add a ref to the LLM inference guide

* intelligent -> capable

* Small intro cleanup

* Small text cleanup

* Small text cleanup

* Clarification about system message

* Clarification about system message

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-04-25 19:38:48 +01:00
Gustavo de Rosa
c9693db2fc
Phi-3 (#30423)
* chore(root): Initial commit of Phi-3 files.

* fix(root): Fixes Phi-3 missing on readme.

* fix(root): Ensures files are consistent.

* fix(phi3): Fixes unit tests.

* fix(tests): Fixes style of phi-3 test file.

* chore(tests): Adds integration tests for Phi-3.

* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.

* fix(phi3): Fixes incorrect docstrings.

* fix(phi3): Fixes docstring typos.

* fix(phi3): Adds support for Su and Yarn embeddings.

* fix(phi3): Improves according first batch of reviews.

* fix(phi3): Uses up_states instead of y in Phi3MLP.

* fix(phi3): Uses gemma rotary embedding to support torch.compile.

* fix(phi3): Improves how rotary embedding classes are defined.

* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.

* fix(phi3): Adds last suggestions to modeling file.

* fix(phi3): Splits inv_freq calculation in two lines.
2024-04-24 17:32:09 +02:00
Arthur
89c510d842
Add llama3 (#30334)
* nuke

* add co-author

* add co-author

* update card

* fixup and fix copies to please our ci

* nit fixup

* super small nits

* remove tokenizer_path from call to `write_model`

* always safe serialize by default

---------

Co-authored-by: pcuenca <pcuenca@users.noreply.github.com>
Co-authored-by: xenova <xenova@users.noreply.github.com>
2024-04-24 10:11:19 +02:00
Matt
696ededd2b
Remove old TF port docs (#30426)
* Remove old TF port guide

* repo-consistency

* Remove some translations as well for consistency

* Remove some translations as well for consistency
2024-04-23 16:06:20 +01:00
Steven Liu
e74d793a3c
[docs] LLM inference (#29791)
* first draft

* feedback

* static cache snippet

* feedback

* feedback
2024-04-22 12:41:51 -07:00
Abhi Venigalla
005b957fb8
Add DBRX Model (#29921)
* wip

* fix __init__.py

* add docs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments 1

* work on make fixup

* pass configs down

* add sdpa attention

* remove DbrxBlock

* add to configuration_auto

* docstring now passes formatting test

* fix style

* update READMEs

* add dbrx to modeling_auto

* make fix-copies generated this

* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* config docstring passes formatting test

* rename moe_loss_weight to router_aux_loss_coef

* add to flash-attn documentation

* fix model-path in tests

* Explicitly make `"suli"` the default `ffn_act_fn`

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]

* fix _flash_attn_uses_top_left_mask and is_causal

* fix tests path

* don't use token type IDs

* follow Llama and remove token_type_ids from test

* init ConfigTester differently so tests pass

* remove multiple choice test

* remove question + answer test

* remove sequence classification test

* remove token classification test

* copy Llama tests and remove token_type_ids from test inputs

* do not test pruning or headmasking; style code

* add _tied_weights_keys parameter to pass test

* add type hints

* fix type check

* update config tester

* remove masked_lm test

* remove encoder tests

* initialize DbrxModelTester with correct params

* style

* torch_dtype does not rely on torch

* run make fixup, fix-copies

* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py

* add copyright info

* fix imports and DbrxRotaryEmbedding

* update DbrxModel docstring

* use copies

* change model path in docstring

* use config in DbrxFFN

* fix flashattention2, sdpaattention

* input config to DbrXAttention, DbrxNormAttentionNorm

* more fixes

* fix

* fix again!

* add informative comment

* fix ruff?

* remove print statement + style

* change doc-test

* fix doc-test

* fix docstring

* delete commented out text

* make defaults match dbrx-instruct

* replace `router_aux_loss_coef` with `moe_loss_weight`

* is_decoder=True

* remove is_decoder from configtester

* implement sdpa properly

* make is_decoder pass tests

* start on the GenerationTesterMixin tests

* add dbrx to sdpa documentation

* skip weight typing test

* style

* initialize smaller model

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Add DBRX to toctree

* skip test_new_cache_format

* make config defaults smaller again

* add pad_token_id

* remove pad_token_id from config

* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

* Update src/transformers/models/dbrx/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/dbrx.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix typo

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update docs, fix configuration_auto.py

* address pr comments

* remove is_decoder flag

* slice

* fix requires grad

* remove grad

* disconnect differently

* remove grad

* enable grads

* patch

* detach expert

* nissan al ghaib

* Update modeling_dbrx.py

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* replace "Gemma" with "Dbrx"

* remove # type: ignore

* don't hardcode vocab_size

* remove ToDo

* Re-add removed idefics2 line

* Update test to use tiny-random!

* Remove TODO

* Remove one more case of loading the entire dbrx-instruct in the tests

* Update src/transformers/models/dbrx/modeling_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* address some comments

* small model

* add dbrx to tokenization_auto

* More docstrings with add_start_docstrings

* Dbrx for now

* add PipelineTesterMixin

* Update src/transformers/models/dbrx/configuration_dbrx.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove flash-attn2 import error

* fix docstring

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add useage example

* put on one line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix ffn_act_fn

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change "dbrx" to "DBRX" for display purposes.

* fix __init__.py?

* fix __init__.py

* fix README

* return the aux_loss

* remove extra spaces

* fix configuration_auto.py

* fix format in tokenization_auto

* remove new line

* add more useage examples

---------

Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-18 15:18:52 +02:00
tomeras91
3f20877da9
Add jamba (#29943)
* Add jamba arch

* apply "make fix-copies" changes

* fix link to model in JambaConfig docstring

* Add n_ctx in modeling file because repo-consistency wants that

* Add jamba to flash attention and sdpa documentation

* mamba dt_proj quant fix now works for LoRA as well

* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers

* add jamba to tokenization auto

* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)

* simple PR fixes

* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer

* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)

* Add copied comment on JambaMLP (it's the same as MixtralMLP)

* remove padding_mask warnings. It's not supported anymore

* fix docstring. Float instead of int

* A few more minor PR fixes

* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass

* Return None attention weights from mamba layers. Append to all attentions only if not None.

* remove some leftover jamba archive lists

* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel

* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers

* Add Jamba paper on READMEs

* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)

* Add copied from comment

* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms

* clearer docstring for _convert_to_standard_cache

* style fixes

* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs

* rename test so it still overrides what its meant to override

* draft

* oups

* nit

* remove more complexe logic

* fix names used in config

* fix fix fix

* style

* fix some more failing tests

* generate did not init the cache 🙃

* more small nits

* typo

* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes

* fix init of pkv with torch.tensor()

* empty tensor

* fix some init issues

* stupid changes required by generate because it does not even support it's own DynamicCache class

* more fixes

* fix general assisted gen cache_position bug

* tests passing

* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py

* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache

* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore

* fix docstrings and typehints for past_key_values

* style fixes

* fix docs

* change typehint due to copy from Mixtral

* forgot import

* import order

* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)

* Add integration test with tiny tandom Jamba model on hub

* fix flash attention cache shapes

* bring back forgotten hidden states

* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model

* align integration test after modeling fixes

* bugfix - mamba can use precomputed states only of forward pass is on a single token

* bugfix - mamba can use precomputed states only if they match the batch size

* typo

* remove making _prepare_4d_causal_attention_mask a leaf function

* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-04-18 11:04:02 +02:00
Shane A
e4ea19b958
Add OLMo model family (#29890)
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4b15.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4a29.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
2024-04-17 17:59:07 +02:00
amyeroberts
6b78360e6d
Add Idefics2 (#30253)
* Initial add model additions

* Test

* All weights loading

* Can perform full forward pass

* Local and remote the same

* Matching local and remote

* Fixup

* Idefics2Model importable; fixup docstrings

* Don't skip by default

* Remove deprecated use_resampler arg

* Remove self.config

* DecoupledLinear takes config

* Tidy up

* Enable eager attention and tidy up

* Most tests passing

* Update for batch of processed images

* Add image processor

* Update doc pages

* Update conversion script

* Remove erroneous breakpoint

* Remove accidendtal spelling change

* Update to reflect changes on hub - make generate work

* Fix up

* Image processor tests

* Update tests

* Add a processor

* Add a processor

* Update convert script

* Update modeling file - remove fixmes

* Bug fix

* Add processing test

* Use processor

* Fix up

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Fix test

* Update config - PR comments and defaults align with checkpoint

* Reviewer comments

* Add copied froms for flahs attention

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove qk_layer_norm and freeze_layers functionality

* Fix

* Remove freeze_layer options from config

* Sync with upstream main

* Fix attention shapes siglip

* Remove Llava-next refs - TO REBASE

* Use AutoModel for text model

* Add comment to explain vision embeddings

* Fix issue with tie_word_embeddings

* Address review comments

* Fix and fix up

* Chat templates for idefics

* Fix copies

* Fix

* Add layer norms to FA2

* Fix tests

* Apply suggestions from code review

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Fix

* Review comments

* Update src/transformers/models/idefics2/modeling_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update inputs merger

* Merge weights in correct order

* Update convert script

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update template

* Model code examples (fix idefics too)

* More review comments

* Tidy up

* Update processing

* Fix attention mask preparation

* Update inputs_merger inputs

* Vectorize inputs_merger

* Update src/transformers/models/idefics2/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/idefics2/modeling_idefics2.py

* Review comments

* saying bye to the `qk_layer_norms`

* Simplify

* Update latents

* Remove erroneuous readme changes

* Return images when applying chat template

* Fix bug - prompt images are for a single sample

* Update src/transformers/models/idefics2/modeling_idefics2.py

* image splitting

* fix test

* some more comment

* some comment

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics2/image_processing_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update processor

* Update model tests

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Don't add BOS in template

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Remove index in examples

* Update tests to reflect #13

* Update src/transformers/models/idefics2/processing_idefics2.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* PR comment - consistent typing

* Update readme and model doc

* Update docs

* Update checkpoint references

* Update examples

* Fix and update tests

* Small addition

* Update tests - remove copied from as no ignore placement copy could be found

* Update example

* small fixes

* Update docs/source/en/model_doc/idefics2.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update docs/source/en/model_doc/idefics2.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Update README.md

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* Connector model as bridge

* Fix up

* Fix up

* Don't pass model inputs for generation kwargs update

* IDEFICS-2 -> Idefics2

* Remove config archive name

* IDEFICS-2 -> Idefics2

* Add back llava-next

* Update readmes

* Add requirements for processor tester

* Use custom convert_to_rgb to avoid possible BC

* Fix doc example

* Fix doc example

* Skip model doc tests - as model to large

* More doc example - account for image splitting

* Update src/transformers/image_transforms.py

* Fix config doctest

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-04-15 17:03:03 +01:00
Eduardo Pacheco
b752ad3019
Adding grounding dino (#26087)
* Fixed typo when converting weigths to GroundingDINO vision backbone

* Final modifications on modeling

* Removed unnecessary class

* Fixed convert structure

* Added image processing

* make fixup partially completed

* Now text_backbone_config has its own class

* Modified convert script

* Removed unnecessary config attribute

* Added new function to generate sub sentence mask

* Renamed parameters with gamma in the name as it's currently not allowed

* Removed tokenization and image_processing scripts since we'll map from existing models

* Fixed some issues with configuration

* Just some modifications on conversion script

* Other modifications

* Copied deformable detr

* First commit

* Added bert to model

* Bert validated

* Created Text and Fusion layers for Encoder

* Adapted Encoder layer

* Fixed typos

* Adjusted Encoder

* Converted encoder to hf

* Modified Decoder Layer

* Modified main decoder class

* Removed copy comments

* Fixed forward from GroundingDINOModel and GroundingDINODecoder

* Added all necessary layers, configurations and forward logic up to GroundingDINOModel

* Added all layers to convertion

* Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection

* Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output

* Fixed forward from GroundingDINOTextEnhancerLayer

* Fixed output bug with GroundingDINODeformableLayer

* Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method

* Fixed attentions to be passed correctly

* Passing temperature arg when creating Sine position embedding

* Removed copy comments

* Added temperature argument for position embedding

* Fixed typo when converting weigths to GroundingDINO vision backbone

* Final modifications on modeling

* Removed unnecessary class

* Fixed convert structure

* Added image processing

* make fixup partially completed

* Now text_backbone_config has its own class

* Modified convert script

* Removed unnecessary config attribute

* Added new function to generate sub sentence mask

* Renamed parameters with gamma in the name as it's currently not allowed

* Removed tokenization and image_processing scripts since we'll map from existing models

* Fixed some issues with configuration

* Just some modifications on conversion script

* Other modifications

* Fix style

* Improve fixup

* Improve conversion script

* Improve conversion script

* Add GroundingDINOProcessor

* More improvements

* Return token type ids

* something

* Fix more tests

* More improvements

* More cleanup

* More improvements

* Fixed tests, improved modeling and config

* More improvements and fixing tests

* Improved tests and modeling

* Improved tests and added image processor

* Improved tests inference

* More improvements

* More test improvements

* Fixed last test

* Improved docstrings and comments

* Fix style

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Better naming

* Better naming

* Added Copied statement

* Added Copied statement

* Moved param init from GroundingDINOBiMultiHeadAttention

* Better naming

* Fixing clamp style

* Better naming

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Improving conversion script

* Improved config

* Improved naming

* Improved naming again

* Improved grouding-dino.md

* Moved grounding dino to multimodal

* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py

Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>

* Fixed docstrings and style

* Fix docstrings

* Remove timm attributes

* Reorder imports

* More improvements

* Add Grounding DINO to pipeline

* Remove model from check_repo

* Added grounded post_process to GroundingDINOProcessor

* Fixed style

* Fixed GroundingDINOTextPrenetConfig docstrings

* Aligned inputs.keys() when both image and text are passed with model_input_names

* Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor

* Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head

* Fixed order

* Marked test with require_torch

* Temporarily changed repo_id

* More improvements

* Fix style

* Final improvements

* Improve annotators

* Fix style

* Add is_torch_available

* Remove type hints

* vocab_tokens as one liner

* Removed print statements

* Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig

* remove unnecessary comments

* Removed unnecessary tests on conversion script

* Renamed GroundingDINO to camel case GroundingDino

* Fixed GroundingDinoProcessor docstrings

* loading MSDA kernels in the modeling file

* Fix copies

* Replace nn.multiheadattention

* Replace nn.multiheadattention

* Fixed inputs for GroundingDinoMultiheadAttention & order of modules

* Fixed processing to avoid messing with inputs

* Added more tips for GroundingDino

* Make style

* Chaning name to align with SAM

* Replace final nn.multiheadattention

* Fix model tests

* Update year, remove GenerationTesterMixin

* Address comments

* Address more comments

* Rename TextPrenet to TextModel

* Rename hidden_states

* Address more comments

* Address more comments

* Address comment

* Address more comments

* Address merge

* Address comment

* Address comment

* Address comment

* Make style

* Added layer norm eps to layer norms

* Address more comments

* More fixes

* Fixed equivalence

* Make fixup

* Remove print statements

* Address comments

* Address comments

* Address comments

* Address comments

* Address comments

* Address comments

* Add comment

* Address comment

* Remove overwriting of test

* Fix bbox_embed

* Improve decoder_bbox_embed_share

* Simplify outputs

* Updated post_process_grounded_object_detection

* Renamed sources to feature_maps

* Improved tests for Grounding Dino ImageProcessor and Processor

* Fixed test requirements and imports

* Fixed image_processing

* Fixed processor tests

* Fixed imports for image processing tests

* Fix copies

* Updated modeling

* Fix style

* Moved functions to correct position

* Fixed copy issues

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Keeping consistency custom cuda kernels for MSDA

* Make GroundingDinoProcessor logic clearer

* Updated Grounding DINO checkpoints

* Changed tests to correct structure

* Updated gpu-cpu equivalence test

* fix copies

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed erros and style

* Fix copies

* Removed inheritance from PreTrainedModel from GroundingDinoTextModel

* Fixed GroundingDinoTextModel

* Fixed type of default backbone config

* Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder

* Addressed comments

* Addressed batched image processing tests

* Addressed zero shot test comment

* Addressed tip comment

* Removed GroundingDinoTextModel from check_repo

* Removed inplace masking

* Addressed comments

* Addressed comments

* Addressed comments

* Fix copies

* Fixing timm test

* Fixed batching equivalence test

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Update docs/source/en/model_doc/grounding-dino.md

Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>

* Addressed more comments

* Added a new comment

* Reduced image size

* Addressed more comments

* Nits

* Nits

* Changed the way text_config is initialized

* Update src/transformers/models/grounding_dino/processing_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>
2024-04-11 08:32:16 +01:00
Arthur
0fe44059ae
Add recurrent gemma (#30143)
* Fork.

* RecurrentGemma initial commit.

* Updating __init__.py.

* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.

* Reformat code to 4 spaces.
Fixed a few typos.

* Fixed the forward pass.
Still unclear on the cache?

* Fixed the RecurrentGemmaForCausalLM

* Minor comment that we might not need attention_mask and output_attention arguments.

* Now cache should work as well.

* Adding a temporary example to check whether the model generation works.

* Adding the tests and updating imports.

* Adding the example file missing in the previous commit.

* First working example.

* Removing .gitignore and reverting parts of __init__.

* Re-add .gitignore.

* Addressing comments for configuration.

* Move mask creation to `_prepare_inputs_for_generation`.

* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed

* Transfoering between machines.

* Running normal tests.

* Minor fix.

* More fixes.

* Addressing more comments.

* Minor fixes.

* first stab at cleanup

* more refactoring

* fix copies and else

* renaming and get init to work

* fix causal mask creation

* update

* nit

* fix a hell lot of things

* updates

* update conversion script

* make all keys importable

* nits

* add auto mappings

* properly convert ffw_up and down

* add scaling

* fix generations

* for recurrent dtype

* update

* fix going beyong window

* fixup

* add missing files

* current updates to remove last einops

* finish modeling refactor

* TADA

* fix compile

* fix most failing testt ? ?

* update tests

* refactor and update

* update

* nits, fixup and update tests

* more fixup

* nits

* fix imports

* test format

* fixups

* nits

* tuple typing

* fix code quality

* add model card

* fix doc

* skip most generation tests

* nits

* style

* doc fixes

* fix pr and check_copies?

* last nit

* oupsy

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update

* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* doc nit

* fix quality

* quality

* fix slow test model path

* update default dype

* ignore attributes that can be safely ignored in check config attributes

* 0lallalala come on

* save nit

* style

* remove to dict update

* make sure we can also run in float16

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-10 16:59:13 +02:00
Steven Liu
096f304695
[docs] Big model loading (#29920)
* update

* feedback
2024-04-01 18:47:32 -07:00
Bo Zheng
1c39974a4c
Add Qwen2MoE (#29377)
* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
NielsRogge
d91fd7f92c
Add LLaVa-1.6, bis (#29586)
* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-20 15:51:12 +00:00
StevenBucaille
56baa03380
Implementation of SuperPoint and AutoModelForKeypointDetection (#28966)
* Added SuperPoint docs

* Added tests

* Removed commented part

* Commit to create and fix add_superpoint branch with a new branch

* Fixed dummy_pt_objects

* Committed missing files

* Fixed README.md

* Apply suggestions from code review

Fixed small changes

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py

* Removed AutoModelForKeypointDetection and related stuff

* Fixed inconsistencies in image_processing_superpoint.py

* Moved infer_on_model logic simply in test_inference

* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py

* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale

* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed from (w, h) to (h, w) as input for tests

* Removed unnecessary condition

* Moved last_hidden_state to be the first returned

* Moved last_hidden_state to be the first returned (bis)

* Moved last_hidden_state to be the first returned (ter)

* Switched image_width and image_height in tests to match recent changes

* Added config as first SuperPointConvBlock init argument

* Reordered README's after merge

* Added missing first config argument to SuperPointConvBlock instantiations

* Removed formatting error

* Added SuperPoint to README's de, pt-br, ru, te and vi

* Checked out README_fr.md

* Fixed README_fr.md

* Test fix README_fr.md

* Test fix README_fr.md

* Last make fix-copies !

* Updated checkpoint path

* Removed unused SuperPoint doc

* Added missing image

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed unnecessary import

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added SuperPoint to _toctree.yml

---------

Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
2024-03-19 14:43:02 +00:00
Yoach Lacombe
c43b380e70
Add MusicGen Melody (#28819)
* first modeling code

* make repository

* still WIP

* update model

* add tests

* add latest change

* clean docstrings and copied from

* update docstrings md and readme

* correct chroma function

* correct copied from and remove unreleated test

* add doc to toctree

* correct imports

* add convert script to notdoctested

* Add suggestion from Sanchit

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct get_uncoditional_inputs docstrings

* modify README according to SANCHIT feedback

* add chroma to audio utils

* clean librosa and torchaudio hard dependencies

* fix FE

* refactor audio decoder -> audio encoder for consistency with previous musicgen

* refactor conditional -> encoder

* modify sampling rate logics

* modify license at the beginning

* refactor all_self_attns->all_attentions

* remove ignore copy from causallm generate

* add copied from for from_sub_models

* fix make copies

* add warning if audio is truncated

* add copied from where relevant

* remove artefact

* fix convert script

* fix torchaudio and FE

* modify chroma method according to feedback-> better naming

* refactor input_values->input_features

* refactor input_values->input_features and fix import fe

* add input_features to docstrigs

* correct inputs_embeds logics

* remove dtype conversion

* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation

* change warning for chroma length

* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* change way to save wav, using soundfile

* correct docs and change to soundfile

* fix import

* fix init proj layers

* remove line breaks from md

* fix issue with docstrings

* add FE suggestions

* improve is in logics and remove useless imports

* remove custom from_pretrained

* simplify docstring code

* add suggestions for modeling tests

* make style

* update converting script with sanity check

* remove encoder attention mask from conditional generation

* replace musicgen melody checkpoints with official orga

* rename ylacombe->facebook in checkpoints

* fix copies

* remove unecessary warning

* add shape in code docstrings

* add files to slow doc tests

* fix md bug and add md to not_tested

* make fix-copies

* fix hidden states test and batching

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-03-18 13:06:12 +00:00
Saurabh Dash
0e4a1c3401
Cohere Model Release (#29622)
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-03-15 14:29:11 +01:00
Nate Cibik
1fc505b816
Add PvT-v2 Model (#26812)
* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Updated index.md

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Fixed config docstring. Added channels property

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Fixed config backbone compat

* Ran fix-copies

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Fixed issue from rebase

* Fixed issue from rebase

* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported

* Fixed issue from rebase

* Fixed issue from rebase

* Changed model name in docs

* Removed duplicate PvtV2Backbone

* Work around type switching issue in tests

* Fix model name in config comments

* Update docs/source/en/model_doc/pvt_v2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed old code

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Fixed Class names to be more descriptive

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed outdated code

* Moved paper abstract to single line in pvt_v2.md

* Added usage tips to pvt_v2.md

* Simplified module inits by passing layer_idx

* Fixed typing for hidden_act in PvtV2Config

* Removed unusued import

* Add pvt_v2 to docs/source/en/_toctree.yml

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Move function parameters to single line

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Update year of copyright to 2024

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Make code more explicit

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated sr_ratio to be more explicit spatial_reduction_ratio

* Removed excess type hints in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed needless comment in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update copyright date in pvt_v2.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved params to single line in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated copyright date in configuration_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Cleaned comments in modeling_pvt_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Renamed spatial_reduction Conv2D operation

* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"

This reverts commit c4a04416dd.

* Updated conversion script to reflect module name change

* Deprecated reshape_last_stage option in config

* Removed unused imports

* Code formatting

* Fixed outdated decorators on test_inference_fp16

* Added "Copied from" comments in test_modeling_pvt_v2.py

* Fixed import listing

* Updated model name

* Force empty commit for PR refresh

* Fixed linting issue

* Removed # Copied from comments

* Added PVTv2 to README_fr.md

* Ran make fix-copies

* Replace all FoamoftheSea hub references with OpenGVLab

* Fixed out_indices and out_features logic in configuration_pvt_v2.py

* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py

* Ran code fixup

* Fixed order of parent classes in PvtV2Config to fix the to_dict method override

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-13 19:05:20 +00:00
Arthur
fb1c62e973
[Add Mamba] Adds support for the Mamba models (#28094)
* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-05 20:01:06 +09:00
NielsRogge
836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
RaymondLi0
63caa370e6
Starcoder2 model - bis (#29215)
* Copy model

* changes

* misc

* fixes

* add embed and residual dropout (#30)

* misc

* remove rms norm and gated MLP

* remove copied mentions where its not a copy anymore

* remove unused _shape

* copied from mistral instead

* fix copies

* fix copies

* add not doctested

* fix

* fix copyright

* Update docs/source/en/model_doc/starcoder2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix doc

* revert some changes

* add fa2 tests

* fix styling nit

* fix

* push dummy docs

---------

Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-28 01:24:34 +01:00
Merve Noyan
83e366bfd4
Image Feature Extraction docs (#28973)
* Image Feature Extraction docs

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update image_feature_extraction.md

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Address comments

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_feature_extraction.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update image_feature_extraction.md

* Update image_feature_extraction.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Maria Khalusova <kafooster@gmail.com>
2024-02-27 09:39:58 +00:00
Eduardo Pacheco
3fcfbe7549
Adding SegGPT (#27735)
* First commit

* Improvements

* More improvements

* Converted original checkpoint to HF checkpoint

* Fix style

* Fixed forward

* More improvements

* More improvements

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove asserts

* Remove unnecessary attributes

* Changed model name to camel case

* Improve forward doc

* Improve tests

* More improvements

* Fix copies

* Fix doc

* Make SegGptImageProcessor more flexible

* Added few-shot test

* Fix style

* Update READMEs and docs

* Update READMEs

* Make inputs required

* Add SegGptForImageSegmentation

* Make tests pass

* Rename to out_indicies

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fixed naming convention

* Copying SegGptMlp from modeling_sam.py

* Some minor improvements

* Remove mlp_ratio

* Fix docstrings

* Fixed docstring match

* Objects defined before use

* Storing only patch_size and beta for SegGptLoss

* removed _prepare_inputs method

* Removed modified from headers

* Renamed to output_indicies

* Removed unnecessary einsums

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixing issues

* Raise error as soon as possible

* More fixes

* Fix merge

* Added palette to SegGptImageProcessor

* Fixed typo

* Fixed shape typo

* Added permute before doing palette to class mapping

* Fixed style

* Fixed and added tests

* Fixed docstrings

* Matching SegFormer API for post_processing_semantic_segmentation

* Fixed copies

* Fixed SegGptImageProcessor to handle both binary and RGB masks

* Updated docstrings of SegGptImageProcessor

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/seggpt.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Object definitions above & fix style

* Renamed output_indices to intermediate_feature_indices

* Removed unnecessary check on bool_masked_pos

* Loss first in the outputs

* Added validation for do_normalize

* Improved SegGptImageProcessor and added new tests

* Added comment

* Added docstrings to SegGptLoss

* Reimplemented ensemble condition logic in SegGptEncoder

* Update src/transformers/models/seggpt/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updated docstrings to use post_process_semantic_segmentation

* Fixed typo on docstrings

* moved pixel values test to test_image_processing_seggpt

* Addressed comments

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated docstrings for SegGptLoss

* Address comments

* Added SegGpt example to model docs

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* moved patchify and unpatchify

* Rename checkpoint

* Renamed intermediate_features to intermediate_hidden_states for consistency

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Replaced post_process_masks for post_process_semantic_segmentation in the docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 18:17:19 +00:00
Arthur
594c1277b2
[ gemma] Adds support for Gemma 💎 (#29167)
* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗

* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀

* update all readmes

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-02-21 14:21:28 +01:00
Merve Noyan
3f4e79d29c
Mask Generation Task Guide (#28897)
* Create mask_generation.md

* add h1

* add to toctree

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update mask_generation.md

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update mask_generation.md

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Klaus Hipp <khipp@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Klaus Hipp <khipp@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Klaus Hipp <khipp@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/tasks/mask_generation.md

* Update mask_generation.md

* Update mask_generation.md

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Maria Khalusova <kafooster@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Klaus Hipp <khipp@users.noreply.github.com>
2024-02-14 18:29:49 +00:00
Jonathan Tow
de6029a059
Add StableLM (#28810)
* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`
2024-02-14 07:15:18 +01:00
Klaus Hipp
d90acc1643
[i18n-de] Translate CONTRIBUTING.md to German (#28954)
* Translate contributing.md to German

* Fix formatting issues in contributing.md

* Address review comments

* Fix capitalization
2024-02-12 13:39:20 -08:00
NielsRogge
78ba9f4617
[Docs] Add video section (#28958)
Add video section
2024-02-12 19:50:31 +01:00
Younes Belkada
866253f85e
[HfQuantizer] Move it to "Developper guides" (#28768)
Update _toctree.yml
2024-01-30 07:20:20 +01:00
Poedator
d78e78a0e4
HfQuantizer class for quantization-related stuff in modeling_utils.py (#26610)
* squashed earlier commits for easier rebase

* rm rebase leftovers

* 4bit save enabled @quantizers

* TMP gptq test use exllama

* fix AwqConfigTest::test_wrong_backend for A100

* quantizers AWQ fixes

* _load_pretrained_model low_cpu_mem_usage branch

* quantizers style

* remove require_low_cpu_mem_usage attr

* rm dtype arg from process_model_before_weight_loading

* rm config_origin from Q-config

* rm inspect from q_config

* fixed docstrings in QuantizationConfigParser

* logger.warning fix

* mv is_loaded_in_4(8)bit to BnbHFQuantizer

* is_accelerate_available error msg fix in quantizer

* split is_model_trainable in bnb quantizer class

* rm llm_int8_skip_modules as separate var in Q

* Q rm todo

* fwd ref to HFQuantizer in type hint

* rm note re optimum.gptq.GPTQQuantizer

* quantization_config in __init__ simplified

* replaced NonImplemented with  create_quantized_param

* rm load_in_4/8_bit deprecation warning

* QuantizationConfigParser refactoring

* awq-related minor changes

* awq-related changes

* awq config.modules_to_not_convert

* raise error if no q-method in q-config in args

* minor cleanup

* awq quantizer docstring

* combine common parts in bnb process_model_before_weight_loading

* revert test_gptq

* .process_model_ cleanup

* restore dict config warning

* removed typevars in quantizers.py

* cleanup post-rebase 16 jan

* QuantizationConfigParser classmethod refactor

* rework of handling of unexpected aux elements of bnb weights

* moved q-related stuff from save_pretrained to quantizers

* refactor v1

* more changes

* fix some tests

* remove it from main init

* ooops

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix awq issues

* fix

* fix

* fix

* fix

* fix

* fix

* add docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/hf_quantizer.md

* address comments

* fix

* fixup

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address final comment

* update

* Update src/transformers/quantizers/base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* add kwargs update

* fixup

* add `optimum_quantizer` attribute

* oops

* rm unneeded file

* fix doctests

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-30 02:48:25 +01:00
NielsRogge
963db81a5a
Add Depth Anything (#28654)
* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* Add docs

* Remove file

* Add copied from

* Address comments

* Address comments

* Address comments

* Fix style

* Update docs

* Convert all checkpoints, add integration test

* Rename checkpoints

* Add pretrained backbone attributes

* Fix default config

* Address comment

* Add figure to docs

* Fix bug thanks to @xenova

* Update conversion script

* Fix integration test
2024-01-25 09:34:50 +01:00
Steven Liu
738ec75c90
[docs] DeepSpeed (#28542)
* config

* optim

* pre deploy

* deploy

* save weights, memory, troubleshoot, non-Trainer

* done
2024-01-24 08:31:28 -08:00
Yoach Lacombe
d2cdefb9ec
Add new meta w2v2-conformer BERT-like model (#28165)
* first commit

* correct default value non causal

* update config and modeling code

* update converting checkpoint

* clean modeling and fix tests

* make style

* add new config parameters to docstring

* fix copied from statements

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* make position_embeddings_type docstrings clearer

* clean converting script

* remove function not used

* clean modeling file

* apply suggestion for test file + add convert script to not_doctested

* modify tests according to review - cleaner logic and more tests

* Apply nit suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add checker of valid position embeddings type

* instantiate new layer norm layer with the right eps

* fix freeze_feature_encoder since it can be None in some cases

* add test same output in convert script

* restore wav2vec2conformer and add new model

* create processor and FE + clean

* add new model code

* fix convert script and set default config parameters

* correct model id paths

* make style

* make fix-copies and cleaning files

* fix copied from statements

* complete .md and fixe copies

* clean convert script argument defaults

* fix config parameters docstrings

* fix config docstring

* add copied from and enrich FE tests

* fix copied from and repo-consistency

* add autotokenizer

* make test input length shorter and change docstring code

* fix docstrings and copied from

* add add_adapter to ASR training example

* make testing of adapters more robust

* adapt to multi adapter layers

* refactor input_values->input_features and remove w2v2-bert feature extractor

* remove pretraining model

* remove depreciated features and useless lines

* add copied from and ignore statements to modeling tests

* remove pretraining model #2

* change import in convert script

* change default in convert script

* update readme and remove useless line

* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor BERT to Bert for consistency

* remove useless ignore copy statement

* add persistent to buffer in rotary

* add eps in LayerNorm init and remove copied from

* add adapter activation parameters and add copied from statements

* Fix copied statements and add unitest.skip reasons

* add copied statement in test_processor

* refactor processor

* make style

* replace numpy random by torch rand

* remove expected output CTC

* improve converting script with processor class

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove gumbel class

* remove tests related to previously deleted class

* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* correct typos

* remove uused parameters

* update processor to takes both text and audio

* update checkpoints

* update expected output and add ctc expected output

* add label_attention_mask

* replace pt with np in processor tests

* fix typo

* revert to behaviour with labels_attention_mask

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-18 13:37:34 +00:00
Junyang Lin
d6ffe74dfa
Add qwen2 (#28436)
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-17 16:02:22 +01:00
NielsRogge
3b742ea84c
Add SigLIP (#26522)
* Add first draft

* Use appropriate gelu function

* More improvements

* More improvements

* More improvements

* Convert checkpoint

* More improvements

* Improve docs, remove print statements

* More improvements

* Add link

* remove unused masking function

* begin tokenizer

* do_lower_case

* debug

* set split_special_tokens=True

* Remove script

* Fix style

* Fix rebase

* Use same design as CLIP

* Add fast tokenizer

* Add SiglipTokenizer to init, remove extra_ids

* Improve conversion script

* Use smaller inputs in conversion script

* Update conversion script

* More improvements

* Add processor to conversion script

* Add tests

* Remove print statements

* Add tokenizer tests

* Fix more tests

* More improvements related to weight initialization

* More improvements

* Make more tests pass

* More improvements

* More improvements

* Add copied from

* Add canonicalize_text

* Enable fast tokenizer tests

* More improvements

* Fix most slow tokenizer tests

* Address comments

* Fix style

* Remove script

* Address some comments

* Add copied from to tests

* Add more copied from

* Add more copied from

* Add more copied from

* Remove is_flax_available

* More updates

* Address comment

* Remove SiglipTokenizerFast for now

* Add caching

* Remove umt5 test

* Add canonicalize_text inside _tokenize, thanks Arthur

* Fix image processor tests

* Skip tests which are not applicable

* Skip test_initialization

* More improvements

* Compare pixel values

* Fix doc tests, add integration test

* Add do_normalize

* Remove causal mask and leverage ignore copy

* Fix attention_mask

* Fix remaining tests

* Fix dummies

* Rename temperature and bias

* Address comments

* Add copied from to tokenizer tests

* Add SiglipVisionModel to auto mapping

* Add copied from to image processor tests

* Improve doc

* Remove SiglipVisionModel from index

* Address comments

* Improve docs

* Simplify config

* Add first draft

* Make it like mistral

* More improvements

* Fix attention_mask

* Fix output_attentions

* Add note in docs

* Convert multilingual model

* Convert large checkpoint

* Convert more checkpoints

* Add pipeline support, correct image_mean and image_std

* Use padding=max_length by default

* Make processor like llava

* Add code snippet

* Convert more checkpoints

* Set keep_punctuation_string=None as in OpenCLIP

* Set normalized=False for special tokens

* Fix doc test

* Update integration test

* Add figure

* Update organization

* Happy new year

* Use AutoModel everywhere

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00
Connor Henderson
d83ff5eeff
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style
2024-01-03 18:01:06 +00:00
Steven Liu
01c081d138
[docs] Trainer docs (#28145)
* fsdp, debugging, gpu selection

* fix hfoption

* fix
2023-12-20 10:37:23 -08:00
Steven Liu
a52e180a0f
[docs] General doc fixes (#28087)
* doc fix friday

* deprecated objects

* update not_doctested

* update toctree
2023-12-18 10:44:09 -08:00
Steven Liu
ebfdb9ca62
[docs] MPS (#28016)
* mps docs

* toctree
2023-12-15 13:17:29 -08:00
Steven Liu
0d63d17765
[docs] Trainer (#27986)
* first draft

* add to toctree

* edits

* feedback
2023-12-15 12:06:55 -08:00
Younes Belkada
c7f076a00e
Adds VIP-llava to transformers (#27932)
* v1

* add-new-model-like

* revert

* fix forward and conversion script

* revert

* fix copies

* fixup

* fix

* Update docs/source/en/index.md

* Apply suggestions from code review

* push

* fix

* fixes here and there

* up

* fixup and fix tests

* Apply suggestions from code review

* add docs

* fixup

* fixes

* docstring

* add docstring

* fixup

* docstring

* fixup

* nit

* docs

* more copies

* fix copies

* nit

* update test
2023-12-13 10:42:24 +01:00
Merve Noyan
b911c1f10f
Docs for AutoBackbone & Backbone (#27456)
* Initial commit for AutoBackbone & Backbone

* Added timm and clarified out_indices

* Swapped the example to out_indices

* fix toctree

* Update autoclass_tutorial.md

* Update backbones.md

* Update autoclass_tutorial.md

* Add dummy torch input instead

* Add dummy torch input

* Update autoclass_tutorial.md

* Update backbones.md

* minor fix

* Update docs/source/en/main_classes/backbones.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/autoclass_tutorial.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Added illustrations and explained backbone & neck

* Update docs/source/en/main_classes/backbones.md

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update backbones.md

---------

Co-authored-by: Maria Khalusova <kafooster@gmail.com>
2023-12-11 08:22:17 -05:00
Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
Younes Belkada
44b5506d29
[Llava] Add Llava to transformers (#27662)
* add model like

* logits match

* minor fixes

* fixes

* up

* up

* add todo

* llava processor

* keep the processor simple

* add conversion script

* fixup

* fix copies

* up

* add to index

* fix config + logits

* fix

* refactor

* more refactor

* more refactor

* fix copies

* add authors

* v1 tests

* add `LlavaProcessor` in init

* remove unneeded import

* up

* up

* docs

* up

* fix CI

* fix CI

* add attention  mask in test

* make fixup

* remove the vision model

* that' s the dirty way to do it

* nits

* nits

* updates

* add more tests

* add input tests

* fixup

* more styling

* nits

* updates amd cleanup

* fixup the generation expected results

* fix the testing script

* some cleanup and simplification which does not work yet but almost there!

* make correct dispatch operations

* vectorize works for batch of images and text

* last todos

* nits

* update test and modeling code

* remove useless function for now

* fix few issues

* fix generation

* some nits

* add bakllava

* nits

* remove duplicated code

* finis merge

* cleanup

* missed this line

* fill the todos

* add left padding offset

* add left and rignt padding logic

* bool to properly index

* make sure

* more cleanups

* batch is fixed 😉

* add correct device for tensor creation

* fix some dtype missmatch

* ruff

* update conversion script

* Update src/transformers/__init__.py

* fa 2 support + fix conversion script

* more

* correct reshaping

* fix test dict

* fix copies by ignoring

* fix nit

* skip clip vision model

* fixup

* fixup

* LlavaForVisionText2Text -> LlavaForCausalLM

* update

* fix

* raise correct errors

* fix

* docs

* nuke for now

* nits here and there

* fixup

* fix remaining tests

* update LlavaForConditionalGeneration instead of CausalLM

* fixups

* pipeline support

* slow and piepline tests

* supports batch

* nits

* cleanup

* fix first integration tests

* add pad token where needed

* correct etsts

* fixups

* update pipeline testr

* fix quality

* nits

* revert unneeded change

* nit

* use BatchFeature

* from ...feature_extraction_utils import BatchFeature

* nits

* nits

* properly update

* more f*** nits

* fix copies

* comment

* keep slow test slow

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add piepline example

* add pixel values in docstrign

* update pr doctest

* fix

* fix slow tests

* remove hack

* fixup

* small note

* forward contrib credits from PR25789

* forward contrib credits from original implementation and work

* add arthur

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update docstring

* nit

* move to not doctested because of timeout issues

* fixup

* add description

* more

* fix-copies

* fix docs

* add beam search

* add more comments

* add typehints on processor

* add speedup plot

* update slow tests and docs

* push test

* push batched test

* fix batched generation with different number of images

* remove benchmark due to a bug

* fix test

* fix copies

* add gcolab demo

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-07 09:30:47 +01:00