Commit Graph

669 Commits

Author SHA1 Message Date
Amrit Gupta
b45c0f55e0
Fixed broken link (#29558)
Fixed broken link for Resources -> Token Classification -> Finetuning BERT for named-entity
2024-03-11 17:26:38 +00:00
Arthur
4f27ee936a
[Mamba doc] Post merge updates (#29472)
* post merge update

* nit

* oups
2024-03-11 09:46:24 +01:00
Younes Belkada
b27aa206dd
[docs] Add starcoder2 docs (#29454)
* add accelerate docs

* Apply suggestions from code review

Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>

* Update starcoder2.md

* add correct generation

---------

Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
2024-03-06 06:58:37 +01:00
Arthur
fb1c62e973
[Add Mamba] Adds support for the Mamba models (#28094)
* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-05 20:01:06 +09:00
NielsRogge
836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
RaymondLi0
63caa370e6
Starcoder2 model - bis (#29215)
* Copy model

* changes

* misc

* fixes

* add embed and residual dropout (#30)

* misc

* remove rms norm and gated MLP

* remove copied mentions where its not a copy anymore

* remove unused _shape

* copied from mistral instead

* fix copies

* fix copies

* add not doctested

* fix

* fix copyright

* Update docs/source/en/model_doc/starcoder2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix doc

* revert some changes

* add fa2 tests

* fix styling nit

* fix

* push dummy docs

---------

Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-28 01:24:34 +01:00
Eduardo Pacheco
3fcfbe7549
Adding SegGPT (#27735)
* First commit

* Improvements

* More improvements

* Converted original checkpoint to HF checkpoint

* Fix style

* Fixed forward

* More improvements

* More improvements

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove asserts

* Remove unnecessary attributes

* Changed model name to camel case

* Improve forward doc

* Improve tests

* More improvements

* Fix copies

* Fix doc

* Make SegGptImageProcessor more flexible

* Added few-shot test

* Fix style

* Update READMEs and docs

* Update READMEs

* Make inputs required

* Add SegGptForImageSegmentation

* Make tests pass

* Rename to out_indicies

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fixed naming convention

* Copying SegGptMlp from modeling_sam.py

* Some minor improvements

* Remove mlp_ratio

* Fix docstrings

* Fixed docstring match

* Objects defined before use

* Storing only patch_size and beta for SegGptLoss

* removed _prepare_inputs method

* Removed modified from headers

* Renamed to output_indicies

* Removed unnecessary einsums

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixing issues

* Raise error as soon as possible

* More fixes

* Fix merge

* Added palette to SegGptImageProcessor

* Fixed typo

* Fixed shape typo

* Added permute before doing palette to class mapping

* Fixed style

* Fixed and added tests

* Fixed docstrings

* Matching SegFormer API for post_processing_semantic_segmentation

* Fixed copies

* Fixed SegGptImageProcessor to handle both binary and RGB masks

* Updated docstrings of SegGptImageProcessor

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/seggpt.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Object definitions above & fix style

* Renamed output_indices to intermediate_feature_indices

* Removed unnecessary check on bool_masked_pos

* Loss first in the outputs

* Added validation for do_normalize

* Improved SegGptImageProcessor and added new tests

* Added comment

* Added docstrings to SegGptLoss

* Reimplemented ensemble condition logic in SegGptEncoder

* Update src/transformers/models/seggpt/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updated docstrings to use post_process_semantic_segmentation

* Fixed typo on docstrings

* moved pixel values test to test_image_processing_seggpt

* Addressed comments

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated docstrings for SegGptLoss

* Address comments

* Added SegGpt example to model docs

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* moved patchify and unpatchify

* Rename checkpoint

* Renamed intermediate_features to intermediate_hidden_states for consistency

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Replaced post_process_masks for post_process_semantic_segmentation in the docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 18:17:19 +00:00
Ming Xu (徐明)
734eb25476
🌐 [i18n-ZH] Translate chat_templating.md into Chinese (#28790)
* [Pix2struct] Simplify generation (#22527)

* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link

* Release: v4.28.0

* Revert (for now) the change on `Deta` in #22437 (#22750)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Patch release: v4.28.1

* update zh chat template.

* Update docs/source/zh/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/zh/_toctree.yml

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Michael <haifeng.yao@daocloud.io>
2024-02-26 08:42:24 -08:00
Arthur
89c64817ce
[Doc] update model doc qwen2 (#29238)
* update model doc qwen2

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-23 10:43:31 +01:00
NielsRogge
dabe855668
[Mistral, Mixtral] Improve docs (#29084)
* Improve docs

* Improve chat template
2024-02-22 11:48:01 +01:00
Arthur
594c1277b2
[ gemma] Adds support for Gemma 💎 (#29167)
* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗

* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀

* update all readmes

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-02-21 14:21:28 +01:00
NielsRogge
07e3454f03
[Docs] Add resources (#28705)
* Add resource

* Add more resources

* Add resources

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove mention

* Remove pipeline tags

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-19 15:22:29 +01:00
Lysandre Debut
f497f564bb
Update all references to canonical models (#29001)
* Script & Manual edition

* Update
2024-02-16 08:16:58 +01:00
NielsRogge
63ffd56d02
Add SiglipForImageClassification and CLIPForImageClassification (#28952)
* First draft

* Add CLIPForImageClassification

* Remove scripts

* Fix doctests
2024-02-14 08:41:31 +01:00
Jonathan Tow
de6029a059
Add StableLM (#28810)
* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`
2024-02-14 07:15:18 +01:00
Klaus Hipp
fe3df9d5b3
[Docs] Add language identifiers to fenced code blocks (#28955)
Add language identifiers to code blocks
2024-02-12 10:48:31 -08:00
Klaus Hipp
2749e479f3
[Docs] Fix broken links and syntax issues (#28918)
* Fix model documentation links in attention.md

* Fix external link syntax

* Fix target anchor names of section links

* Fix copyright statement comments

* Fix documentation headings
2024-02-08 14:13:35 -08:00
nakranivaibhav
2e7c942c81
Adds LlamaForQuestionAnswering class in modeling_llama.py along with AutoModel Support (#28777)
* This is a test commit

* testing commit

* final commit with some changes

* Removed copy statement

* Fixed formatting issues

* Fixed error added past_key_values in the forward method

* Fixed a trailing whitespace. Damn the formatting rules are strict

* Added the copy statement
2024-02-06 03:41:42 +01:00
Klaus Hipp
721ee783ca
[Docs] Fix spelling and grammar mistakes (#28825)
* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
2024-02-02 08:45:00 +01:00
JB (Don)
0d26abdd3a
Adding [T5/MT5/UMT5]ForTokenClassification (#28443)
* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest
2024-02-01 03:53:49 +01:00
Kian Sierra McGettigan
f7076cd346
Flax mistral (#26943)
* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf22.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187a.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
Sanchit Gandhi
da3c79b245
[Whisper] Make tokenizer normalization public (#28136)
* [Whisper] Make tokenizer normalization public

* add to docs
2024-01-29 16:07:35 +00:00
Julien Chaumond
26aa03a252
small doc update for CamemBERT (#28644) 2024-01-29 15:46:32 +01:00
Vinyzu
3a08cc485f
[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) (#28751)
* [Docs] Fix Typo in English CLIP model_doc

* [Docs] Fix Typo in Japanese CLIP model_doc
2024-01-29 10:06:51 +00:00
NielsRogge
963db81a5a
Add Depth Anything (#28654)
* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* Add docs

* Remove file

* Add copied from

* Address comments

* Address comments

* Address comments

* Fix style

* Update docs

* Convert all checkpoints, add integration test

* Rename checkpoints

* Add pretrained backbone attributes

* Fix default config

* Address comment

* Add figure to docs

* Fix bug thanks to @xenova

* Update conversion script

* Fix integration test
2024-01-25 09:34:50 +01:00
amyeroberts
e547458c43
Fix phi model doc checkpoint (#28581)
Co-authored-by: Pashmina Cameron <11311835+pashminacameron@users.noreply.github.com>
2024-01-22 17:15:07 +00:00
NielsRogge
faf03541e2
[SigLIP] Don't pad by default (#28578)
First draft
2024-01-19 13:30:00 +01:00
Yoach Lacombe
d2cdefb9ec
Add new meta w2v2-conformer BERT-like model (#28165)
* first commit

* correct default value non causal

* update config and modeling code

* update converting checkpoint

* clean modeling and fix tests

* make style

* add new config parameters to docstring

* fix copied from statements

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* make position_embeddings_type docstrings clearer

* clean converting script

* remove function not used

* clean modeling file

* apply suggestion for test file + add convert script to not_doctested

* modify tests according to review - cleaner logic and more tests

* Apply nit suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add checker of valid position embeddings type

* instantiate new layer norm layer with the right eps

* fix freeze_feature_encoder since it can be None in some cases

* add test same output in convert script

* restore wav2vec2conformer and add new model

* create processor and FE + clean

* add new model code

* fix convert script and set default config parameters

* correct model id paths

* make style

* make fix-copies and cleaning files

* fix copied from statements

* complete .md and fixe copies

* clean convert script argument defaults

* fix config parameters docstrings

* fix config docstring

* add copied from and enrich FE tests

* fix copied from and repo-consistency

* add autotokenizer

* make test input length shorter and change docstring code

* fix docstrings and copied from

* add add_adapter to ASR training example

* make testing of adapters more robust

* adapt to multi adapter layers

* refactor input_values->input_features and remove w2v2-bert feature extractor

* remove pretraining model

* remove depreciated features and useless lines

* add copied from and ignore statements to modeling tests

* remove pretraining model #2

* change import in convert script

* change default in convert script

* update readme and remove useless line

* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor BERT to Bert for consistency

* remove useless ignore copy statement

* add persistent to buffer in rotary

* add eps in LayerNorm init and remove copied from

* add adapter activation parameters and add copied from statements

* Fix copied statements and add unitest.skip reasons

* add copied statement in test_processor

* refactor processor

* make style

* replace numpy random by torch rand

* remove expected output CTC

* improve converting script with processor class

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove gumbel class

* remove tests related to previously deleted class

* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* correct typos

* remove uused parameters

* update processor to takes both text and audio

* update checkpoints

* update expected output and add ctc expected output

* add label_attention_mask

* replace pt with np in processor tests

* fix typo

* revert to behaviour with labels_attention_mask

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-18 13:37:34 +00:00
Junyang Lin
d6ffe74dfa
Add qwen2 (#28436)
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-17 16:02:22 +01:00
Gustavo de Rosa
d93ef7d751
Fixes default value of softmax_scale in PhiFlashAttention2. (#28537)
* fix(phi): Phi does not use softmax_scale in Flash-Attention.

* chore(docs): Update Phi docs.
2024-01-17 14:22:44 +01:00
Francisco Kurucz
121641cab1
Fix paths to AI Sweden Models reference and model loading (#28423)
Fix URL to Ai Sweden Models reference and model loading
2024-01-15 09:09:22 +01:00
Francisco Kurucz
3724156b4d
Fix load correct tokenizer in Mixtral model documentation (#28437) 2024-01-10 18:09:06 +01:00
Susnato Dhar
fff8ca8e59
update docs to add the phi-2 example (#28392)
* update docs

* added Tip
2024-01-10 16:07:47 +01:00
NielsRogge
3b742ea84c
Add SigLIP (#26522)
* Add first draft

* Use appropriate gelu function

* More improvements

* More improvements

* More improvements

* Convert checkpoint

* More improvements

* Improve docs, remove print statements

* More improvements

* Add link

* remove unused masking function

* begin tokenizer

* do_lower_case

* debug

* set split_special_tokens=True

* Remove script

* Fix style

* Fix rebase

* Use same design as CLIP

* Add fast tokenizer

* Add SiglipTokenizer to init, remove extra_ids

* Improve conversion script

* Use smaller inputs in conversion script

* Update conversion script

* More improvements

* Add processor to conversion script

* Add tests

* Remove print statements

* Add tokenizer tests

* Fix more tests

* More improvements related to weight initialization

* More improvements

* Make more tests pass

* More improvements

* More improvements

* Add copied from

* Add canonicalize_text

* Enable fast tokenizer tests

* More improvements

* Fix most slow tokenizer tests

* Address comments

* Fix style

* Remove script

* Address some comments

* Add copied from to tests

* Add more copied from

* Add more copied from

* Add more copied from

* Remove is_flax_available

* More updates

* Address comment

* Remove SiglipTokenizerFast for now

* Add caching

* Remove umt5 test

* Add canonicalize_text inside _tokenize, thanks Arthur

* Fix image processor tests

* Skip tests which are not applicable

* Skip test_initialization

* More improvements

* Compare pixel values

* Fix doc tests, add integration test

* Add do_normalize

* Remove causal mask and leverage ignore copy

* Fix attention_mask

* Fix remaining tests

* Fix dummies

* Rename temperature and bias

* Address comments

* Add copied from to tokenizer tests

* Add SiglipVisionModel to auto mapping

* Add copied from to image processor tests

* Improve doc

* Remove SiglipVisionModel from index

* Address comments

* Improve docs

* Simplify config

* Add first draft

* Make it like mistral

* More improvements

* Fix attention_mask

* Fix output_attentions

* Add note in docs

* Convert multilingual model

* Convert large checkpoint

* Convert more checkpoints

* Add pipeline support, correct image_mean and image_std

* Use padding=max_length by default

* Make processor like llava

* Add code snippet

* Convert more checkpoints

* Set keep_punctuation_string=None as in OpenCLIP

* Set normalized=False for special tokens

* Fix doc test

* Update integration test

* Add figure

* Update organization

* Happy new year

* Use AutoModel everywhere

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00
Rosie Wood
73c88012b7
Add segmentation map processing to SAM Image Processor (#27463)
* add segmentation map processing to sam image processor

* fixup

* add tests

* reshaped_input_size is shape before padding

* update tests for size/shape outputs

* fixup

* add code snippet to docs

* Update docs/source/en/model_doc/sam.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add missing backticks

* add `segmentation_maps` as arg for SamProcessor.__call__()

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-08 16:40:36 +00:00
Connor Henderson
d83ff5eeff
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style
2024-01-03 18:01:06 +00:00
Sourab Mangrulkar
def581ef51
Fix FA2 integration (#28142)
* fix fa2

* fix FA2 for popular models

* improve warning and add Younes as co-author

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix the warning

* Add Tip

* typo fix

* nit

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-20 14:25:07 +05:30
Aaron Jimenez
38611086d2
[docs] Fix mistral link in mixtral.md (#28143)
Fix mistral link in mixtral.md
2023-12-19 10:34:14 -08:00
Aeneas Stankowski
7f2a8f92e4
Spelling correction (#28110)
Update mixtral.md

correct minor typo in overview
2023-12-18 14:04:05 +00:00
Younes Belkada
1faeff85ce
Fix Vip-llava docs (#28085)
* Update vipllava.md

* Update modeling_vipllava.py
2023-12-15 20:16:47 +01:00
Sanchit Gandhi
52c37882fb
[Seamless] Fix links in docs (#27905)
* [Seamless] Fix links in docs

* apply suggestions from code review
2023-12-14 15:14:13 +00:00
Younes Belkada
c7f076a00e
Adds VIP-llava to transformers (#27932)
* v1

* add-new-model-like

* revert

* fix forward and conversion script

* revert

* fix copies

* fixup

* fix

* Update docs/source/en/index.md

* Apply suggestions from code review

* push

* fix

* fixes here and there

* up

* fixup and fix tests

* Apply suggestions from code review

* add docs

* fixup

* fixes

* docstring

* add docstring

* fixup

* docstring

* fixup

* nit

* docs

* more copies

* fix copies

* nit

* update test
2023-12-13 10:42:24 +01:00
NielsRogge
67b1335cb9
Update bounding box format everywhere (#27944)
Update formats
2023-12-11 18:03:42 +00:00
Timon Käch
5cec306cdc
Fix parameter count in readme for mixtral 45b (#27945)
fix parameter count in readme
2023-12-11 14:58:48 +00:00
Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
NielsRogge
7ea21f1f03
[LLaVa] Some improvements (#27895)
* More improvements

* Improve variable names

* Update READMEs, improve docs
2023-12-11 10:22:26 +01:00
fxmarty
80377eb018
F.scaled_dot_product_attention support (#26572)
* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
Younes Belkada
44b5506d29
[Llava] Add Llava to transformers (#27662)
* add model like

* logits match

* minor fixes

* fixes

* up

* up

* add todo

* llava processor

* keep the processor simple

* add conversion script

* fixup

* fix copies

* up

* add to index

* fix config + logits

* fix

* refactor

* more refactor

* more refactor

* fix copies

* add authors

* v1 tests

* add `LlavaProcessor` in init

* remove unneeded import

* up

* up

* docs

* up

* fix CI

* fix CI

* add attention  mask in test

* make fixup

* remove the vision model

* that' s the dirty way to do it

* nits

* nits

* updates

* add more tests

* add input tests

* fixup

* more styling

* nits

* updates amd cleanup

* fixup the generation expected results

* fix the testing script

* some cleanup and simplification which does not work yet but almost there!

* make correct dispatch operations

* vectorize works for batch of images and text

* last todos

* nits

* update test and modeling code

* remove useless function for now

* fix few issues

* fix generation

* some nits

* add bakllava

* nits

* remove duplicated code

* finis merge

* cleanup

* missed this line

* fill the todos

* add left padding offset

* add left and rignt padding logic

* bool to properly index

* make sure

* more cleanups

* batch is fixed 😉

* add correct device for tensor creation

* fix some dtype missmatch

* ruff

* update conversion script

* Update src/transformers/__init__.py

* fa 2 support + fix conversion script

* more

* correct reshaping

* fix test dict

* fix copies by ignoring

* fix nit

* skip clip vision model

* fixup

* fixup

* LlavaForVisionText2Text -> LlavaForCausalLM

* update

* fix

* raise correct errors

* fix

* docs

* nuke for now

* nits here and there

* fixup

* fix remaining tests

* update LlavaForConditionalGeneration instead of CausalLM

* fixups

* pipeline support

* slow and piepline tests

* supports batch

* nits

* cleanup

* fix first integration tests

* add pad token where needed

* correct etsts

* fixups

* update pipeline testr

* fix quality

* nits

* revert unneeded change

* nit

* use BatchFeature

* from ...feature_extraction_utils import BatchFeature

* nits

* nits

* properly update

* more f*** nits

* fix copies

* comment

* keep slow test slow

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add piepline example

* add pixel values in docstrign

* update pr doctest

* fix

* fix slow tests

* remove hack

* fixup

* small note

* forward contrib credits from PR25789

* forward contrib credits from original implementation and work

* add arthur

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update docstring

* nit

* move to not doctested because of timeout issues

* fixup

* add description

* more

* fix-copies

* fix docs

* add beam search

* add more comments

* add typehints on processor

* add speedup plot

* update slow tests and docs

* push test

* push batched test

* fix batched generation with different number of images

* remove benchmark due to a bug

* fix test

* fix copies

* add gcolab demo

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-07 09:30:47 +01:00
Susnato Dhar
f84d85ba67
[FA-2] Add Flash Attention to Phi (#27661)
* add FA and modify doc file

* test_flash_attn_2_generate_padding_right test overwritten

* comment

* modify persimmon modeling file

* added speedup graph

* more changes
2023-12-07 07:57:48 +01:00
Alex McKinney
75336c1794
Add Llama Flax Implementation (#24587)
* Copies `modeling_flax_gpt_neo.py` to start

* MLP Block. WIP Attention and Block

* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue

* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.

* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating

* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..

* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?

* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now

* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now

* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting

* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗

* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later

* start porting pretrained wrappers
realised it probably needs return dict as a prereq

* cleanup, quality, style

* readds `return_dict` and model output named tuples

* (tentatively) pretrained wrappers work 🔥

* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.

* [WIP] debugging numerics

* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.

* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict

* adds missing TYPE_CHECKING import and `make fixup`

* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO

* commenting out equivalence test as can just use common

* debugging

* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥

* cleanup of modeling file

* cleanup of test file

* Resolving simpler review comments

* addresses more minor review comments

* fixing introduced pytest errors from review

* wip additional slow tests

* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay

* `make quality`, `make style`

* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs

* `make fix-copies`

* fix mangled function following `make fix-copies`

* adds missing type checking imports

* fixes missing parameter checkpoint warning

* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`

* swaps import guards
??? how did these get swapped initially?

* removing `inv_freq` again as pytorch version has now removed

* attempting to get CI to pass

* adds doc entries for llama flax models

* fixes typo in __init__.py imports

* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version

* overrides tests with dummy to see if CI passes
need to fill in these tests later

* adds my contribution to docs

* `make style; make quality`

* replaces random masking with fixed to work with flax version

* `make quality; make style`

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* updates `x`->`tensor` in `rotate_half`

* addresses smaller review comments

* Update docs/source/en/model_doc/llama.md

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adds integration test class

* adds `dtype` to rotary embedding to cast outputs

* adds type to flax llama rotary layer

* `make style`

* `make fix-copies`

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* applies suggestions from review

* Update modeling_flax_llama.py

* `make fix-copies`

* Update tests/models/llama/test_modeling_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixes shape mismatch in FlaxLlamaMLP

* applies some suggestions from reviews

* casts attn output logits to f32 regardless of dtype

* adds attn bias using `LlamaConfig.attention_bias`

* adds Copied From comments to Flax Llama test

* mistral and persimmon test change -copy from llama

* updates docs index

* removes Copied from in tests

it was preventing `make fix-copies` from succeeding

* quality and style

* ignores FlaxLlama input docstring

* adds revision to `_CHECKPOINT_FOR_DOC`

* repo consistency and quality

* removes unused import

* removes copied from from Phi test

now diverges from llama tests following FlaxLlama changes

* adds `_REAL_CHECKPOINT_FOR_DOC`

* removes refs from pr tests

* reformat to make ruff happy

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-12-07 07:05:00 +01:00
Rockerz
9660e27cd0
Translating en/model_doc folder docs to Japanese(from blip to clap) 🇯🇵 (#27673)
* Add models

* Add models and update `_toctree.yml`

* Update docs/source/ja/model_doc/chinese_clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/camembert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/bros.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/bros.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/blip-2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/camembert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* solve merge conflicts and update paper titles

* Update docs/source/ja/model_doc/bridgetower.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/chinese_clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the authons name in bros..md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-12-06 10:38:21 -08:00
Younes Belkada
9270ab0827
[Flash Attention 2] Add flash attention 2 for GPT-Neo-X (#26463)
* add flash-attn-2 support for GPT-neo-x

* fixup

* add comment

* revert

* fixes

* update docs

* comment

* again

* fix copies

* add plot + fix copies

* Update docs/source/en/model_doc/gpt_neox.md
2023-12-06 17:22:32 +01:00
Arindam Jati
b242d0f297
[Time series] Add PatchTSMixer (#26247)
* patchtsmixer initial commit

* x,y->context_values,target_values, unittest addded

* cleanup code

* minor

* return hidden states

* model tests, partial integration tests

* ettm notebook temporary

* minor

* config mask bug fix, tests updated

* final ETT notebooks

* add selfattn

* init

* added docstrings

* PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining

* functionality tests added

* add start and input docstrings

* docstring edits

* testcase edits

* minor changes

* docstring error fixed

* ran make fixup

* finalize integration tests and docs

* minor

* cleaned gitignore

* added dataclass decorator, ran black formatter

* ran ruff

* formatting

* add slow decorator

* renamed in_Channel to input_size and default to 1

* shorten dataclass names

* use smaller model for testing

* moved the 3 heads to the modeling file

* use scalers instead of revin

* support forecast_channel_indices

* fix regression scaling

* undo reg. scaling

* removed unneeded classes

* forgot missing

* add more layers

* add copied positional_encoding

* use patchmask from patchtst

* removed dependency on layers directory

* formatting

* set seed

* removed unused imports

* fixed forward signature test

* adding distributional head for PatchTSMixerForecasting

* add generate to forecast

* testcases for generate

* add generate and distributional head for regression

* raise Exception for negative values for neg binominal distribution

* formatting changes

* remove copied from patchtst and add TODO for test passing

* make copies

* doc edits

* minor changes

* format issues

* minor changes

* minor changes

* format docstring

* change some class names to PatchTSMixer + class name

Transpose to PatchTSMixerTranspose
GatedAttention to PatchTSMixerGatedAttention

* change NormLayer to PatchTSMixerNormLayer

* change MLP to PatchTSMixerMLP

* change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock

* change ChannelFeatureMixer to ChannelFeatureMixerBlock

* change PatchMasking to PatchTSMixerMasking

* change Patchify to PatchTSMixerPatchify

* list to `list`

* fix docstrings

* formatting

* change bs to batch_size, edit forecast_masking

* edit random_masking

* change variable name and update docstring in PatchTSMixerMasking

* change variable name and update docstring in InjectScalerStatistics4D

* update forward call in PatchTSMixerTranspose

* change variable name and update docstring in PatchTSMixerNormLayer

* change variable name and update docstring in PatchTSMixerMLP

* change variable name and update docstring in ChannelFeatureMixerBlock

* formatting

* formatting issues

* docstring issue

* fixed observed_mask type in docstrings

* use FloatTensor type

* formatting

* fix rescaling issue in forecasting, fixed integration tests

* add docstring from decorator

* fix docstring

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PatchTSMixerChannelFeatureMixerBlock

* formatting

* ForPretraining

* use num_labels instead of n_classes

* remove commented out code

* docstring fixed

* nn.functional used instead of one letter F

* x_tmp renamed

* one letter variable x removed from forward calls

* one letter variable y removed

* remove commented code

* rename patch_size, in_channels, PatchTSMixerBackbone

* add config to heads

* add config to heads tests

* code reafactoring to use config instead of passing individual params

* Cdocstring fixes part 1

* docstring fixes part 2

* removed logger.debug

* context_values -> past_values

* formatting changes

* pe -> positional_encoding

* removed unused target variable

* self.mode logic fixed

* formatting change

* edit docstring and var name

* change n_targets to num_targets

* rename input_size to num_input_channels

* add head names with prefix PatchTSMixer

* edit docstring in PatchTSMixerForRegression

* fix var name change in testcases

* add PatchTSMixerAttention

* return dict for all exposed classes, test cases added

* format

* move loss function to forward call

* make style

* adding return dict/tuple

* make repo-consistency

* remove flatten mode

* code refactoring

* rename data

* remove PatchTSMixer and keep only PatchTSMixerEncoder

* docstring fixes

* removed unused code

* format

* format

* remove contiguous and formatting changes

* remove model description from config

* replace asserts with ValueError

* remove nn.Sequential from PatchTSMixerNormLayer

* replace if-else with map

* remove all nn.Sequential

* format

* formatting

* fix gradient_checkpointing error after merge, and formatting

* make fix-copies

* remove comments

* reshape

* doesnt support gradient checkpointing

* corect Patchify

* masking updates

* batchnorm copy from

* format checks

* scaler edits

* remove comments

* format changes

* remove self.config

* correct class PatchTSMixerMLP(nn.Module):

* makr fix

* doc updates

* fix-copies

* scaler class correction

* doc edits

* scaler edits

* update readme with links

* injectstatistics add

* fix-copies

* add norm_eps option to LayerNorm

* format changes

* fix copies

* correct make copies

* use parametrize

* fix doc string

* add docs to toctree

* make style

* doc segmenting

* docstring edit

* change forecast to prediction

* edit doc

* doc edits

* remove PatchTSMixerTranspose

* add PatchTSMixerPositionalEncoding and init position_enc

* remove positional_encoding

* edit forecast_masking, remove forecast_mask_ratios

* fix broken code

* var rename target_values -> future_values

* num_features -> d_model

* fix broken code after master merge

* repo consistency

* use postional embedding

* prediction_logits -> prediction_outputs, make fix-copies

* uncommented @slow

* minor changes

* loss first in tuple

* tuple and dict same ordering

* style edits

* minor changes

* dict/tuple consistent enablement

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* formatting

* usage tip

* test on cpu only

* add sample usage

* change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification

* push changes

* fix copies

* std scaling set to default True case

* minor changes

* stylechanges

---------

Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: vijaye12 <vijaye12@in.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: vijaye12 <vijaykr.e@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-05 15:31:35 +01:00
Yih-Dar
1d63b0ec36
Disallow pickle.load unless TRUST_REMOTE_CODE=True (#27776)
* fix

* fix

* Use TRUST_REMOTE_CODE

* fix doc

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-04 16:48:37 +01:00
fxmarty
1da1302ec8
Flash Attention 2 support for RoCm (#27611)
* support FA2

* fix typo

* fix broken tests

* fix more test errors

* left/right

* fix bug

* more test

* typo

* fix layout flash attention falcon

* do not support this case

* use allclose instead of equal

* fix various bugs with flash attention

* bump

* fix test

* fix mistral

* use skiptest instead of return that may be misleading

* add fix causal arg flash attention

* fix copies

* more explicit comment

* still use self.is_causal

* fix causal argument

* comment

* fixes

* update documentation

* add link

* wrong test

* simplify FA2 RoCm requirements

* update opt

* make flash_attn_uses_top_left_mask attribute private and precise comment

* better error handling

* fix copy & mistral

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210

* fix merge

* simplify

* inline args

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-04 21:52:17 +09:00
Sanchit Gandhi
ede09d671d
[Seamless v1] Link to v2 docs (#27827) 2023-12-04 11:47:54 +00:00
Yoach Lacombe
29f1aee3b6
Add SeamlessM4T v2 (#27779)
* add working convertion script

* first non-working version of modeling code

* update modeling code (working)

* make style

* make fix-copies

* add config docstrings

* add config to ignore docstrings formatage due to unconventional markdown

* fix copies

* fix generation num_return_sequences

* enrich docs

* add and fix tests beside integration tests

* update integration tests

* update repo id

* add tie weights and make style

* correct naming in .md

* fix imports and so on

* correct docstrings

* fix fp16 speech forward

* fix speechencoder attention

* make style

* fix copied from

* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2

* Apply suggestions on configuration

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove useless public models

* fix private models + better naming for T2U models

* clean speech encoder relative position embeddings

* refactor chunk attention

* add docstrings to chunk attention method

* improve naming and docstrings

* rename some attention variables + add temperature sampling in T2U model

* rename DOCSTRINGS variable names

* make style + remove 2 useless config parameters

* enrich model card

* remove any attention_head reference + fix temperature in T2U

* new fmt and make style

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename spkr_id->speaker_id and change docstrings of get_char_input_ids

* simplify v2attention

* make style

* Update seamless_m4t_v2.md

* update code and tests with last update

* update repo ids

* fill article name, abstract andauthors

* update not_doctested and slow_doc tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-30 20:24:43 +01:00
Kashif Rasul
af8acc4760
[Time series] Add patchtst (#27581)
* add distribution head to forecasting

* formatting

* Add generate function for forecasting

* Add generate function to prediction task

* formatting

* use argsort

* add past_observed_mask ordering

* fix arguments

* docs

* add back test_model_outputs_equivalence test

* formatting

* cleanup

* formatting

* use ACT2CLS

* formatting

* fix add_start_docstrings decorator

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* fix typos

* add forecast_masking

* fixed tests

* use set_seed

* fix doc test

* formatting

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* better var names

* rename PatchTSTTranspose

* fix argument names and docs string

* remove compute_num_patches and unused class

* remove assert

* renamed to PatchTSTMasking

* use num_labels for classification

* use num_labels

* use default num_labels from super class

* move model_type after docstring

* renamed PatchTSTForMaskPretraining

* bs -> batch_size

* more review fixes

* use hidden_state

* rename encoder layer and block class

* remove commented seed_number

* edit docstring

* Add docstring

* formatting

* use past_observed_mask

* doc suggestion

* make fix-copies

* use Args:

* add docstring

* add docstring

* change some variable names and add PatchTST before some class names

* formatting

* fix argument types

* fix tests

* change x variable to patch_input

* format

* formatting

* fix-copies

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move loss to forward

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* formatting

* fix a bug when pre_norm is set to True

* output_hidden_states is set to False as default

* set pre_norm=True as default

* format docstring

* format

* output_hidden_states is None by default

* add missing docs

* better var names

* docstring: remove default to False in output_hidden_states

* change labels name to target_values in regression task

* format

* fix tests

* change to forecast_mask_ratios and random_mask_ratio

* change mask names

* change future_values to target_values param in the prediction class

* remove nn.Sequential and make PatchTSTBatchNorm class

* black

* fix argument name for prediction

* add output_attentions option

* add output_attentions to PatchTSTEncoder

* formatting

* Add attention output option to all classes

* Remove PatchTSTEncoderBlock

* create PatchTSTEmbedding class

* use config in PatchTSTPatchify

* Use config in PatchTSTMasking class

* add channel_attn_weights

* Add PatchTSTScaler class

* add output_attentions arg to test function

* format

* Update doc with image patchtst.md

* fix-copies

* rename Forecast <-> Prediction

* change name of a few parameters to match with PatchTSMixer.

* Remove *ForForecasting class to match with other time series models.

* make style

* Remove PatchTSTForForecasting in the test

* remove PatchTSTForForecastingOutput class

* change test_forecast_head to test_prediction_head

* style

* fix docs

* fix tests

* change num_labels to num_targets

* Remove PatchTSTTranspose

* remove arguments in PatchTSTMeanScaler

* remove arguments in PatchTSTStdScaler

* add config as an argument to all the scaler classes

* reformat

* Add norm_eps for batchnorm and layernorm

* reformat.

* reformat

* edit docstring

* update docstring

* change variable name pooling to pooling_type

* fix output_hidden_states as tuple

* fix bug when calling PatchTSTBatchNorm

* change stride to patch_stride

* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder

* formatting

* initialize scalers with configs

* edit output_hidden_states

* style

* fix forecast_mask_patches doc string

* doc improvements

* move summary to the start

* typo

* fix docstring

* turn off masking when using prediction, regression, classification

* return scaled output

* adjust output when using distribution head

* remove _num_patches function in the config

* get config.num_patches from patchifier init

* add output_attentions docstring, remove tuple in output_hidden_states

* change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput

* remove print("model_class: ", model_class)

* change encoder_attention_heads to num_attention_heads

* change norm to norm_layer

* change encoder_layers to num_hidden_layers

* change shared_embedding to share_embedding, shared_projection to share_projection

* add output_attentions

* more robust check of norm_type

* change dropout_path to path_dropout

* edit docstring

* remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding

* edit shape of cls_token and initialize it

* add a check on the num_input_channels.

* edit head_dim in the Prediction class to allow the use of cls_token

* remove some positional_encoding_type options, remove learn_pe arg, initalize pe

* change Exception to ValueError

* format

* norm_type is "batchnorm"

* make style

* change cls_token shape

* Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios.

* Bring PatchTSTClassificationHead on top of PatchTSTForClassification

* change encoder_ffn_dim to ffn_dim and edit the docstring.

* update variable names to match with the config

* add generation tests

* change num_mask_patches to num_forecast_mask_patches

* Add examples explaining the use of these models

* make style

* Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)"

This reverts commit 78f6ed6c70.

* make style

* fix default std scaler's minimum_scale

* fix docstring

* close code blocks

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/configuration_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix tests

* add add_start_docstrings

* move examples to the forward's docstrings

* update prepare_batch

* update test

* fix test_prediction_head

* fix generation test

* use seed to create generator

* add output_hidden_states and config.num_patches

* add loc and scale args in PatchTSTForPredictionOutput

* edit outputs if if not return_dict

* use self.share_embedding to check instead checking type.

* remove seed

* make style

* seed is an optional int

* fix test

* generator device

* Fix assertTrue test

* swap order of items in outputs when return_dict=False.

* add mask_type and random_mask_ratio to unittest

* Update modeling_patchtst.py

* add add_start_docstrings for regression model

* make style

* update model path

* Edit the ValueError comment in forecast_masking

* update examples

* make style

* fix commented code

* update examples: remove config from from_pretrained call

* Edit example outputs

* Set default target_values to None

* remove config setting in regression example

* Update configuration_patchtst.py

* Update configuration_patchtst.py

* remove config from examples

* change default d_model and ffn_dim

* norm_eps default

* set has_attentions to Trye and define self.seq_length = self.num_patche

* update docstring

* change variable mask_input to do_mask_input

* fix blank space.

* change logger.debug to logger.warning.

* remove unused PATCHTST_INPUTS_DOCSTRING

* remove all_generative_model_classes

* set test_missing_keys=True

* remove undefined params in the docstring.

---------

Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-29 13:36:38 +01:00
Tom Aarsen
f2ad4b537b
Docs: Fix broken cross-references, i.e. ~transformer. -> ~transformers. (#27740)
~transformer. -> ~transformers.
2023-11-28 08:40:44 -08:00
Juarez Bochi
fdd86eed3b
Add madlad-400 MT models (#27471)
* Add madlad-400 models

* Add madlad-400 to the doc table

* Update docs/source/en/model_doc/madlad-400.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fill missing details in documentation

* Update docs/source/en/model_doc/madlad-400.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Do not doctest madlad-400

Tests are timing out.

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-28 13:19:50 +00:00
fxmarty
c13a43aaf2
Reflect RoCm support in the documentation (#27636)
* reflect RoCm support in the documentation

* Update docs/source/en/main_classes/trainer.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* fix review comments

* use ROCm instead of RoCm

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-25 00:59:17 +09:00
yoinked
181f85da24
Docs/Add conversion code to the musicgen docs (#27665)
* Update musicgen.md

please make it less hidden

* Add cleaner formatting
2023-11-24 12:34:24 +01:00
Yih-Dar
7293fdc5b9
Deprecate TransfoXL (#27607)
* fix

* fix

* trigger

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-24 11:48:02 +01:00
NielsRogge
fe1c16e95a
[DPT, Dinov2] Add resources (#27655)
* Add resources

* Remove script

* Update docs/source/en/model_doc/dinov2.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-23 17:44:08 +00:00
amyeroberts
b406c4d261
Update TVP arxiv link (#27672)
Update arxiv link
2023-11-23 17:02:16 +00:00
Susnato Dhar
3bc50d81e6
[FA2] Add flash attention for opt (#26414)
* added flash attention for opt

* added to list

* fix use cache (#3)

* style fix

* fix text

* test fix2

* reverted until 689f599

* torch fx tests are working now!

* small fix

* added TODO docstring

* changes

* comments and .md file modification

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-11-23 10:16:51 +00:00
dg845
7f6a804d30
Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799)
* initial commit

* Add inital testing files and modify __init__ files to add UnivNet imports.

* Fix some bugs

* Add checkpoint conversion script and add references to transformers pre-trained model.

* Add UnivNet entries for auto.

* Add initial docs for UnivNet.

* Handle input and output shapes in UnivNetGan.forward and add initial docstrings.

* Write tests and make them pass.

* Write docs.

* Add UnivNet doc to _toctree.yml and improve docs.

* fix typo

* make fixup

* make fix-copies

* Add upsample_rates parameter to config and improve config documentation.

* make fixup

* make fix-copies

* Remove unused upsample_rates config parameter.

* apply suggestions from review

* make style

* Verify and add reason for skipped tests inherited from ModelTesterMixin.

* Add initial UnivNetGan integration tests

* make style

* Remove noise_length input to UnivNetGan and improve integration tests.

* Fix bug and make style

* Make UnivNet integration tests pass

* Add initial code for UnivNetFeatureExtractor.

* make style

* Add initial tests for UnivNetFeatureExtractor.

* make style

* Properly initialize weights for UnivNetGan

* Get feature extractor fast tests passing

* make style

* Get feature extractor integration tests passing

* Get UnivNet integration tests passing

* make style

* Add UnivNetGan usage example

* make style and use feature extractor from hub in integration tests

* Update tips in docs

* apply suggestions from review

* make style

* Calculate padding directly instead of using get_padding methods.

* Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific.

* Update feature extractor to support using model(**inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__.

* Perform padding before generating noise to ensure the shapes are correct.

* Rename UnivNetGan.forward's noise_waveform argument to noise_sequence.

* make style

* Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__.

* Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model.

* Add expected mean and stddev checks to the integration tests and make them pass.

* make style

* Make it possible to use model(**inputs), where inputs is the output of the feature extractor.

* fix typo in UnivNetGanConfig example

* Calculate spectrogram_zero from other config values.

* apply suggestions from review

* make style

* Refactor UnivNet conversion script to use load_state_dict (following persimmon).

* Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor.

* make style

* Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices.

* make style

* Use config in UnivNetGan modeling blocks.

* make style

* Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper.

* make style

* Improving padding documentation.

* Add UnivNet usage example to the docs.

* apply suggestions from review

* Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor.

* Improve UnivNetGan.forward return docstring.

* Update table in docs/source/en/index.md.

* make fix-copies

* Rename UnivNet components to have pattern UnivNet*.

* make style

* make fix-copies

* Update docs

* make style

* Increase tolerance on flaky unbatched integration test.

* Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors.

* Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding.

* Update documentation and clean up padding code.

* make style

* make style

* Remove torch dependency from UnivNetFeatureExtractor.

* make style

* Fix UnivNetModel usage example

* Clean up feature extractor code/docstrings.

* apply suggestions from review

* make style

* Add comments for tests skipped via ModelTesterMixin flags.

* Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag.

* Add # Copied from statements to copied UnivNetFeatureExtractionTest tests.

* Simplify UnivNetFeatureExtractorTest.test_batch_decode.

* Add support for unbatched padding_masks in UnivNetModel.forward.

* Refactor unbatched padding_mask support.

* make style
2023-11-22 17:21:36 +01:00
jiqing-feng
c770600fde
TVP model (#25856)
* tvp model for video grounding

add tokenizer auto

fix param in TVPProcessor

add docs

clear comments and enable different torch dtype

add image processor test and model test and fix code style

* fix conflict

* fix model doc

* fix image processing tests

* fix tvp tests

* remove torch in processor

* fix grammar error

* add more details on tvp.md

* fix model arch for loss, grammar, and processor

* add docstring and do not regard TvpTransformer, TvpVisionModel as individual model

* use pad_image

* update copyright

* control first downsample stride

* reduce first only works for ResNetBottleNeckLayer

* fix param name

* fix style

* add testing

* fix style

* rm init_weight

* fix style

* add post init

* fix comments

* do not test TvpTransformer

* fix warning

* fix style

* fix example

* fix config map

* add link in config

* fix comments

* fix style

* rm useless param

* change attention

* change test

* add notes

* fix comments

* fix tvp

* import checkpointing

* fix gradient checkpointing

* Use a more accurate example in readme

* update

* fix copy

* fix style

* update readme

* delete print

* remove tvp test_forward_signature

* remove TvpTransformer

* fix test init model

* merge main and make style

* fix tests and others

* fix image processor

* fix style and model_input_names

* fix tests
2023-11-21 16:41:55 +00:00
amyeroberts
0145c6825e
Fix tracing dinov2 (#27561)
* Enable tracing with DINOv2 model

* ABC

* Add note to model doc
2023-11-21 14:28:38 +00:00
Dmitrii Mukhutdinov
87e217d065
[Whisper] Add large-v3 version support (#27336)
* Enable large-v3 downloading and update language list

* Fix type annotation

* make fixup

* Export Whisper feature extractor

* Fix error after extractor loading

* Do not use pre-computed mel filters

* Save the full preprocessor properly

* Update docs

* Remove comment

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add alignment heads consistent with each Whisper version

* Remove alignment heads calculation

* Save fast tokenizer format as well

* Fix slow to fast conversion

* Fix bos/eos/pad token IDs in the model config

* Add decoder_start_token_id to config

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-20 17:36:48 +01:00
Xabier de Zuazo
ee29261555
Add convert_hf_to_openai.py script to Whisper documentation resources (#27590)
Add `convert_hf_to_openai.py` script to Whisper documentation resources.
2023-11-20 08:08:40 +01:00
Omar Sanseviero
25b0f2033b
Fix broken distilbert url (#27579) 2023-11-18 17:22:52 +00:00
Nathaniel Egwu
93f31e0e78
Updated albert.md doc for ALBERT model (#27223)
* Updated albert.md doc for ALBERT model

* Update docs/source/en/model_doc/albert.md

Fixed Resources heading

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the ALBERT model doc resources

Fixed resource example for fine-tuning the ALBERT sentence-pair classification.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

Removed resource duplicate

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated albert.md doc with reviewed changes

* Updated albert.md doc for ALBERT

* Update docs/source/en/model_doc/albert.md

Removed duplicates from  updated docs/source/en/model_doc/albert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-16 11:44:36 -08:00
Yuki-Imajuku
a0633c4483
Translating en/model_doc docs to Japanese. (#27401)
* update _toctree.yml & add albert-autoformer

* Fixed typo in docs/source/ja/model_doc/audio-spectrogram-transformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Delete duplicated sentence docs/source/ja/model_doc/autoformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Reflect reviews

* delete untranslated models from toctree

* delete all comments

* add abstract translation

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-15 10:13:52 -08:00
amyeroberts
78f6ed6c70
Revert "[time series] Add PatchTST (#25927)" (#27486)
The model was merged before final review and approval.

This reverts commit 2ac5b9325e.
2023-11-14 12:24:00 +00:00
Gift Sinthong
2ac5b9325e
[time series] Add PatchTST (#25927)
* Initial commit of PatchTST model classes

Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>

* Add PatchTSTForPretraining

* update to include classification

Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>

* clean up auto files

* Add PatchTSTForPrediction

* Fix relative import

* Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder

* temporary adding absolute path + add PatchTSTForForecasting class

* Update base PatchTSTModel + Unittest

* Update ForecastHead to use the config class

* edit cv_random_masking, add mask to model output

* Update configuration_patchtst.py

* add masked_loss to the pretraining

* add PatchEmbeddings

* Update configuration_patchtst.py

* edit loss which considers mask in the pretraining

* remove patch_last option

* Add commits from internal repo

* Update ForecastHead

* Add model weight initilization + unittest

* Update PatchTST unittest to use local import

* PatchTST integration tests for pretraining and prediction

* Added PatchTSTForRegression + update unittest to include label generation

* Revert unrelated model test file

* Combine similar output classes

* update PredictionHead

* Update configuration_patchtst.py

* Add Revin

* small edit to PatchTSTModelOutputWithNoAttention

* Update modeling_patchtst.py

* Updating integration test for forecasting

* Fix unittest after class structure changed

* docstring updates

* change input_size to num_input_channels

* more formatting

* Remove some unused params

* Add a comment for pretrained models

* add channel_attention option

add channel_attention option and remove unused positional encoders.

* Update PatchTST models to use HF's MultiHeadAttention module

* Update paper + github urls

* Fix hidden_state return value

* Update integration test to use PatchTSTForForecasting

* Adding dataclass decorator for model output classes

* Run fixup script

* Rename model repos for integration test

* edit argument explanation

* change individual option to shared_projection

* style

* Rename integration test + import cleanup

* Fix outpu_hidden_states return value

* removed unused mode

* added std, mean and nops scaler

* add initial distributional loss for predition

* fix typo in docs

* add generate function

* formatting

* add num_parallel_samples

* Fix a typo

* copy weighted_average function, edit PredictionHead

* edit PredictionHead

* add distribution head to forecasting

* formatting

* Add generate function for forecasting

* Add generate function to prediction task

* formatting

* use argsort

* add past_observed_mask ordering

* fix arguments

* docs

* add back test_model_outputs_equivalence test

* formatting

* cleanup

* formatting

* use ACT2CLS

* formatting

* fix add_start_docstrings decorator

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* fix typos

* add forecast_masking

* fixed tests

* use set_seed

* fix doc test

* formatting

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* better var names

* rename PatchTSTTranspose

* fix argument names and docs string

* remove compute_num_patches and unused class

* remove assert

* renamed to PatchTSTMasking

* use num_labels for classification

* use num_labels

* use default num_labels from super class

* move model_type after docstring

* renamed PatchTSTForMaskPretraining

* bs -> batch_size

* more review fixes

* use hidden_state

* rename encoder layer and block class

* remove commented seed_number

* edit docstring

* Add docstring

* formatting

* use past_observed_mask

* doc suggestion

* make fix-copies

* use Args:

* add docstring

* add docstring

* change some variable names and add PatchTST before some class names

* formatting

* fix argument types

* fix tests

* change x variable to patch_input

* format

* formatting

* fix-copies

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move loss to forward

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* formatting

* fix a bug when pre_norm is set to True

* output_hidden_states is set to False as default

* set pre_norm=True as default

* format docstring

* format

* output_hidden_states is None by default

* add missing docs

* better var names

* docstring: remove default to False in output_hidden_states

* change labels name to target_values in regression task

* format

* fix tests

* change to forecast_mask_ratios and random_mask_ratio

* change mask names

* change future_values to target_values param in the prediction class

* remove nn.Sequential and make PatchTSTBatchNorm class

* black

* fix argument name for prediction

* add output_attentions option

* add output_attentions to PatchTSTEncoder

* formatting

* Add attention output option to all classes

* Remove PatchTSTEncoderBlock

* create PatchTSTEmbedding class

* use config in PatchTSTPatchify

* Use config in PatchTSTMasking class

* add channel_attn_weights

* Add PatchTSTScaler class

* add output_attentions arg to test function

* format

* Update doc with image patchtst.md

* fix-copies

* rename Forecast <-> Prediction

* change name of a few parameters to match with PatchTSMixer.

* Remove *ForForecasting class to match with other time series models.

* make style

* Remove PatchTSTForForecasting in the test

* remove PatchTSTForForecastingOutput class

* change test_forecast_head to test_prediction_head

* style

* fix docs

* fix tests

* change num_labels to num_targets

* Remove PatchTSTTranspose

* remove arguments in PatchTSTMeanScaler

* remove arguments in PatchTSTStdScaler

* add config as an argument to all the scaler classes

* reformat

* Add norm_eps for batchnorm and layernorm

* reformat.

* reformat

* edit docstring

* update docstring

* change variable name pooling to pooling_type

* fix output_hidden_states as tuple

* fix bug when calling PatchTSTBatchNorm

* change stride to patch_stride

* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder

* formatting

* initialize scalers with configs

* edit output_hidden_states

* style

* fix forecast_mask_patches doc string

---------

Co-authored-by: Gift Sinthong <gift.sinthong@ibm.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: Ngoc Diep Do <diiepy@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-13 19:06:32 +01:00
Arthur
b97cab7e6d
Remove-auth-token (#27060)
* don't use `use_auth_token`internally

* let's use token everywhere

* fixup
2023-11-13 14:20:54 +01:00
Susnato Dhar
e1c3ac2551
Add Phi-1 and Phi-1_5 (#26170)
* only dir not even init

* init

* tokenizer removed and reference of codegen added

* modeling file updated a lot remaining app_rotary_emb

* conversion script done

* conversion script fixed, a lot of factoring done and most tests pass

* added token_clf and extractive_QA_head

* integration tests pass

* flash attn tests pass!

* config done

* more docs in modeling file

* some style fix

* style and others

* doc test error fix

* more doc fix

* some attention fixes

* most fixes

* style and other fixes

* docs fix and config

* doc fix

* some comments

* conversion script updated

* conversion script updated

* Revert "conversion script updated"

This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f.

* final comments

* add Phi to language_modeling.md

* edit phi.md file

* rebase and fix

* removed phi-1.5 example

* changed model_type from 'phi'->'mixformer-sequential'

* small change

* small change

* revert \small change

* changed mixformer-sequential->phi

* small change

* added phi-1.5 example instead of phi-1

* doc test might pass now

* rebase and small change

* added the dropout layer

* more fixes

* modified .md file

* very very small doc change
2023-11-10 15:28:30 +00:00
Susnato Dhar
7e9f10ac94
Add CLVP (#24745)
* init commit

* attention arch done except rotary emb

* rotary emb done

* text encoder working

* outputs matching

* arch first pass done

* make commands done, tests and docs remaining

* all tests passed, only docs remaining

* docs done

* doc-builder fix

* convert script removed(not relevant)

* minor comments done

* added ckpt conversion script

* tokenizer done

* very minor fix of index.md 2

* mostly make fixup related

* all done except fe and rotary emb

* very small change

* removed unidecode dependency

* style changes

* tokenizer removed require_backends

* added require_inflect to tokenizer tests

* removed VOCAB_FILES in tokenizer test

* inflect dependency removed

* added rotary pos emb cache and simplified the apply method

* style

* little doc change

* more comments

* feature extractor added

* added processor

* auto-regressive config added

* added CLVPConditioningEncoder

* comments done except the test one

* weights added successfull(NOT tested)

* tokenizer fix with numbers

* generate outputs matching

* almost tests passing Integ tests not written

* Integ tests added

* major CUDA error fixed

* docs done

* rebase and multiple fixes

* fixed rebase overwrites

* generate code simplified and tests for AutoRegressive model added

* minor changes

* refectored gpt2 code in clvp file

* weights done and all code refactored

* mostly done except the fast_tokenizer

* doc test fix

* config file's doc fixes

* more config fix

* more comments

* tokenizer comments mostly done

* modeling file mostly refactored and can load modules

* ClvpEncoder tested

* ClvpDecoder, ClvpModel and ClvpForCausalLM tested

* integration and all tests passed

* more fixes

* docs almost done

* ckpt conversion refectored

* style and some failing tests fix

* comments

* temporary output fix but test_assisted_decoding_matches_greedy_search test fails

* majority changes done

* use_cache outputs same now! Along with the asisted_greedy_decoding test fix

* more comments

* more comments

* prepare_inputs_for_generation fixed and _prepare_model_inputs added

* style fix

* clvp.md change

* moved clvpconditionalencoder norms

* add model to new index

* added tokenizer input_ids_with_special_tokens

* small fix

* config mostly done

* added config-tester and changed conversion script

* more comments

* comments

* style fix

* some comments

* tokenizer changed back to prev state

* small commnets

* added output hidden states for the main model

* style fix

* comments

* small change

* revert small change

* .

* Update clvp.md

* Update test_modeling_clvp.py

* :)

* some minor change

* new fixes

* remove to_dict from FE
2023-11-10 13:49:10 +00:00
Yoach Lacombe
9dd58c53dd
update Bark FA2 docs (#27400)
* update Bark FA2 docs

* update benchmark section

* Update bark.md

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* rephrase

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-11-10 13:40:30 +00:00
Sanchit Gandhi
f16ff0f07e
MusicGen Update (#27084)
* [MusicGen] Add stereo model

* safe serialization

* Update src/transformers/models/musicgen/modeling_musicgen.py

* split over 2 lines

* fix slow tests on cuda
2023-11-08 13:26:02 +00:00
Arthur
88832c01c8
[Whisper] Add conversion script for the tokenizer (#27338)
* draft

* updates

* full conversion taken from `https://gist.github.com/xenova/a452a6474428de0182b17605a98631ee`

* psuh

* nits

* updates

* more nits

* Add co author

Co-authored-by: Joshua Lochner <admin@xenova.com>

* fixup

* cleanup

* styling

* add proper path

* update

* nits

* don't  push the exit

* clean

* update whisper doc

* don't error out if tiktoken is not here

* make sure we are BC with conversion

* nit

* Update docs/source/en/model_doc/whisper.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* merge and update

* update markdwon

* Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-07 15:07:55 +01:00
Susnato Dhar
0ded281557
[FA2] Add flash attention for GPT-Neo (#26486)
* added flash attention for gpt-neo

* small change

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* readme updated

* .

* changes

* removed padding_mask

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-07 13:54:01 +00:00
Xabier de Zuazo
606d90845f
Fix Whisper Conversion Script: Correct decoder_attention_heads and _download function (#26834)
* Fix error in convert_openai_to_hf.py: "_download() missing 1 required positional argument: root"

* Fix error in convert_openai_to_hf.py: "TypeError: byte indices must be integers or slices, not str"

* Fix decoder_attention_heads value in convert_openai_to_hf.py.

Correct the assignment for `decoder_attention_heads` in the conversion script for the Whisper model.

* Black reformat convert_openai_to_hf.py file.

* Fix Whisper model configuration defaults (for Tiny).

- Correct encoder/decoder layers and attention heads count.
- Update model width (`d_model`) to 384.

* Add docstring to the convert_openai_to_hf.py script with a doctest

* Add shebang and +x permission to the convert_openai_to_hf.py

* convert_openai_to_hf.py: reuse the read model_bytes in the _download() function

* Move convert_openai_to_hf.py doctest example to whisper.md

* whisper.md: Add an inference example to the Conversion section.

* whisper.md: remove `model.config.forced_decoder_ids` from examples (deprecated)

* whisper.md: Remove "## Format Conversion" section; not used by users

* whisper.md: Use librispeech_asr_dummy dataset and load_dataset()
2023-11-07 13:39:42 +01:00
Maria Khalusova
9beb2737d7
[docs] fixed links with 404 (#27327)
* fixed links with 404

* make style
2023-11-06 19:45:03 +00:00
Susnato Dhar
1ac2463dfe
[FA2] Add flash attention for for DistilBert (#26489)
* flash attention added for DistilBert

* fixes

* removed padding_masks

* Update modeling_distilbert.py

* Update test_modeling_distilbert.py

* style fix
2023-11-03 16:07:54 +00:00
Maria Khalusova
5964f820db
[Docs] Model_doc structure/clarity improvements (#26876)
* first batch of structure improvements for model_docs

* second batch of structure improvements for model_docs

* more structure improvements for model_docs

* more structure improvements for model_docs

* structure improvements for cv model_docs

* more structural refactoring

* addressed feedback about image processors
2023-11-03 10:57:03 -04:00
Younes Belkada
ad8ff96224
[Docs / SAM ] Reflect correct changes to run inference without OOM (#27268)
Update sam.md
2023-11-03 15:23:13 +01:00
Andi Powers Holmes
f8afb2b2ec
Add TensorFlow implementation of ConvNeXTv2 (#25558)
* Add type annotations to TFConvNextDropPath

* Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check

* Add TensorFlow implementation of ConvNeXTV2

* check_docstrings: add TFConvNextV2Model to exclusions

TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings
which are equivalent to their PyTorch cousins, but a parsing issue prevents them
from passing the test.

Adding exclusions for these two classes as discussed in #25558.
2023-11-01 15:09:55 +00:00
Patrick von Platen
391d14e810
[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding (#27195)
* finish

* add tests

* fix all tests

* [Assistant Decoding] Add test

* fix more

* better

* finish

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* finish

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 16:01:53 +01:00
Susnato Dhar
b5db8ca66f
Add flash attention for gpt_bigcode (#26479)
* added flash attention of gpt_bigcode

* changed docs

* Update src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

* add FA-2 docs

* oops

* Update docs/source/en/perf_infer_gpu_one.md Last Nit

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* oops

* remove padding_mask

* change getattr->hasattr logic

* changed .md file

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-31 11:21:02 +00:00
Clifford Ressel
b5c8e23f0f
Remove broken links to s-JoL/Open-Llama (#27164) 2023-10-31 10:17:54 +00:00
NielsRogge
8211c59b9a
[KOSMOS-2] Update docs (#27157)
Update docs
2023-10-30 21:42:19 +01:00
Yih-Dar
691fd8fdde
Add Kosmos-2 model (#24709)
* Add KOSMOS-2 model

* update

* update

* update

* address review comment - 001

* address review comment - 002

* address review comment - 003

* style

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

* address review comment - 004

* address review comment - 005

* address review comment - 006

* address review comment - 007

* address review comment - 008

* address review comment - 009

* address review comment - 010

* address review comment - 011

* update readme

* fix

* fix

* fix

* [skip ci] fix

* revert the change in _decode

* fix docstring

* fix docstring

* Update docs/source/en/model_doc/kosmos-2.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* no more Kosmos2Tokenizer

* style

* remove "returned when being computed by the model"

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* UTM5 Atten

* fix attn mask

* use present_key_value_states instead of next_decoder_cache

* style

* conversion scripts

* conversion scripts

* conversion scripts

* Add _reorder_cache

* fix doctest and copies

* rename 1

* rename 2

* rename 3

* make fixup

* fix table

* fix docstring

* rename 4

* change repo_id

* remove tip

* update md file

* make style

* update md file

* put docs/source/en/model_doc/kosmos-2.md to slow

* update conversion script

* Use CLIPImageProcessor in Kosmos2Processor

* Remove Kosmos2ImageProcessor

* Remove to_dict in Kosmos2Config

* Remove files

* fix import

* Update conversion

* normalized=False

* Not using hardcoded values like <image>

* elt --> element

* Apply suggestion

* Not using hardcoded values like </image>

* No assert

* No nested functions

* Fix md file

* copy

* update doc

* fix docstring

* fix name

* Remove _add_remove_spaces_around_tag_tokens

* Remove dummy docstring of _preprocess_single_example

* Use `BatchEncoding`

* temp

* temp

* temp

* Update

* Update

* Make Kosmos2ProcessorTest a bit pretty

* Update gradient checkpointing

* Fix gradient checkpointing test

* Remove one liner remove_special_fields

* Simplify conversion script

* fix add_eos_token

* update readme

* update tests

* Change to microsoft/kosmos-2-patch14-224

* style

* Fix doc

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-30 13:32:17 +01:00
Yoach Lacombe
cb45f71c4d
Add Seamless M4T model (#25693)
* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-23 14:49:48 +02:00
Omar Sanseviero
d33d313192
Nits in Llama2 docstring (#26996)
Update llama2.md
2023-10-23 14:19:59 +02:00
Mohamed Aymane Farhi
73dc23f786
Fix license (#26931) 2023-10-19 15:36:41 +02:00
Patrick von Platen
734dd96e02
[Docs] Make sure important decode and generate method are nicely displayed in Whisper docs (#26927)
better docstrings whisper
2023-10-19 13:01:47 +02:00
Pablo Montalvo
caa0ff0bf1
Add fuyu model (#26911)
* initial commit

* add processor, add fuyu naming

* add draft processor

* fix processor

* remove dropout to fix loading of weights

* add image processing fixes from Pedro

* fix

* fix processor

* add basic processing fuyu test

* add documentation and TODO

* address comments, add tests, add doc

* replace assert with torch asserts

* add Mixins and fix tests

* clean imports

* add model tester, clean imports

* fix embedding test

* add updated tests from pre-release model

* Processor: return input_ids used for inference

* separate processing and model tests

* relax test tolerance for embeddings

* add test for logit comparison

* make sure fuyu image processor is imported in the init

* fix formattingh

* more formatting issues

* and more

* fixups

* remove some stuff

* nits

* update init

* remove the fuyu file

* Update integration test with release model

* Update conversion script.

The projection is not used, as confirmed by the authors.

* improve geenration

* Remove duplicate function

* Trickle down patches to model call

* processing fuyu updates

* remove things

* fix prepare_inputs_for_generation to fix generate()

* remove model_input

* update

* add generation tests

* nits

* draft leverage automodel and autoconfig

* nits

* fix dtype patch

* address comments, update READMEs and doc, include tests

* add working processing test, remove refs to subsequences

* add tests, remove Sequence classification

* processing

* update

* update the conversion script

* more processing cleanup

* safe import

* take out ModelTesterMixin for early release

* more cl;eanup

* more cleanup

* more cleanup

* and more

* register a buffer

* nits

* add postprocessing of generate output

* nits

* updates

* add one working test

* fix test

* make fixup works

* fixup

* Arthur's updates

* nits

* update

* update

* fix processor

* update tests

* passe more fixups

* fix

* nits

* don't import torch

* skip fuyu config for now

* fixup done

* fixup

* update

* oups

* nits

* Use input embeddings

* no buffer

* update

* styling processing fuyu

* fix test

* update licence

* protect torch import

* fixup and update not doctested

* kwargs should be passed

* udpates

* update the impofixuprts in the test

* protect import

* protecting imports

* protect imports in type checking

* add testing decorators

* protect top level import structure

* fix typo

* fix check init

* move requires_backend to functions

* Imports

* Protect types

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-10-18 15:24:11 -07:00
Bingchen Zhao
46092f763d
Fixed a typo in mistral.md (#26879)
Fix a typo in mistral.md
2023-10-17 14:06:37 -07:00
Susheel Thapa
b3961f7291
Chore: Typo fixed in multiple files of docs/source/en/model_doc (#26833)
* Chore: Typo fixed in multiple files of docs/source/en/model_doc

* Update docs/source/en/model_doc/nllb-moe.md

Co-authored-by: Aryan V S <avs050602@gmail.com>

---------

Co-authored-by: Aryan V S <avs050602@gmail.com>
2023-10-17 07:10:08 +02:00
NielsRogge
570b3f9cdd
[OWL-ViT, OWLv2] Add resources (#26822)
Add resources
2023-10-16 15:47:44 +02:00
Injin Paek
d6e5b02ef3
Add CLIP resources (#26534)
* docs: feat: model resources for CLIP

* fix: resolve suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: resolve suggestion

* fix: resolve suggestion

* fix: resolve suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: resolve suggestion

* fix: resolve suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-10-13 11:12:59 -07:00
NielsRogge
762af3e3c7
Add OWLv2, bis (#26668)
* First draft

* Update conversion script

* Update copied from statements

* Fix style

* Add copied from to config

* Add copied from to processor

* Run make fixup

* Add docstring

* Update docstrings

* Add method

* Improve docstrings

* Fix docstrings

* Improve docstrings

* Remove onnx

* Add flag

* Address comments

* Add copied from to model tests

* Add flag to conversion script

* Add code snippet

* Address more comments

* Address comment

* Improve conversion script

* More improvements

* Add expected objectness logits

* Skip test

* Improve conversion script

* Extend conversion script

* Convert large checkpoint

* Fix doc tests

* Convert all checkpoints, update integration tests

* Add checkpoint_path arg

* Fix repo_id
2023-10-13 16:41:24 +02:00
Galland
f9ab07f920
Update mistral.md to update 404 link (#26590) 2023-10-04 17:48:11 +02:00
김준재_T3056
2f3ea08a07
docs: feat: add clip notebook resources from OSSCA community (#26505) 2023-10-03 11:20:22 -07:00
Younes Belkada
ae9a344cce
[Mistral] Add Flash Attention-2 support for mistral (#26464)
* add FA-2 support for mistral

* fixup

* add sliding windows

* fixing few nits

* v1 slicing cache - logits do not match

* add comment

* fix bugs

* more mem efficient

* add warning once

* add warning once

* oops

* fixup

* more comments

* copy

* add safety checker

* fixup

* Update src/transformers/models/mistral/modeling_mistral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copied from

* up

* raise when padding side is right

* fixup

* add doc + few minor changes

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-03 13:44:46 +02:00
HelgeS
7d6627d0d9
Fix broken link to video classification task (#26487) 2023-10-02 11:19:11 +02:00
Chris Bamford
72958fcd3c
[Mistral] Mistral-7B-v0.1 support (#26447)
* [Mistral] Mistral-7B-v0.1 support

* fixing names

* slightly longer test

* fixups

* not_doctested

* wrongly formatted references

* make fixuped

---------

Co-authored-by: Timothee Lacroix <t@eugen.ai>
Co-authored-by: timlacroix <t@mistral.ai>
2023-09-27 18:30:46 +02:00
NielsRogge
a09130feee
[ViTMatte] Add resources (#26317)
Add resource
2023-09-26 07:06:38 +02:00
NielsRogge
ace74d16bd
Add Nougat (#25942)
* Add conversion script

* Add NougatImageProcessor

* Add crop margin

* More improvements

* Add docs, READMEs

* Remove print statements

* Include model_max_length

* Add NougatTokenizerFast

* Fix imports

* Improve postprocessing

* Improve image processor

* Fix image processor

* Improve normalize method

* More improvements

* More improvements

* Add processor, improve docs

* Simplify fast tokenizer

* Remove test file

* Fix docstrings

* Use NougatProcessor in conversion script

* Add is_levensthein_available

* Add tokenizer tests

* More improvements

* Use numpy instead of opencv

* Add is_cv2_available

* Fix cv2_available

* Add is_nltk_available

* Add image processor tests, improve crop_margin

* Add integration tests

* Improve integration test

* Use do_rescale instead of hacks, thanks Amy

* Remove random_padding

* Address comments

* Address more comments

* Add import

* Address more comments

* Address more comments

* Address comment

* Address comment

* Set max_model_input_sizes

* Add tests

* Add requires_backends

* Add Nougat to exotic tests

* Use to_pil_image

* Address comment regarding nltk

* Add NLTK

* Improve variable names, integration test

* Add test

* refactor, document, and test regexes

* remove named capture groups, add comments

* format

* add non-markdown fixed tokenization

* format

* correct flakyness of args parse

* add regex comments

* test functionalities for crop_image, align long axis and expected output

* add regex tests

* remove cv2 dependency

* test crop_margin equality between cv2 and python

* refactor table regexes to markdown

add newline

* change print to log, improve doc

* fix high count tables correction

* address PR comments: naming, linting, asserts

* Address comments

* Add copied from

* Update conversion script

* Update conversion script to convert both small and base versions

* Add inference example

* Add more info

* Fix style

* Add require annotators to test

* Define all keyword arguments explicitly

* Move cv2 annotator

* Add tokenizer init method

* Transfer checkpoints

* Add reference to Donut

* Address comments

* Skip test

* Remove cv2 method

* Add copied from statements

* Use cached_property

* Fix docstring

* Add file to not doctested

---------

Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
2023-09-26 07:06:04 +02:00
LeviVasconcelos
576cd45a57
Add image to image pipeline (#25393)
* Add image to image pipeline

Add image to image pipeline

* remove swin2sr from tf auto

* make ImageToImage importable

* make style

make style

make style

make style

* remove tf support

* remove nonused imports

* fix postprocessing

* add important comments; add unit tests

* add documentation

* remove support for TF

* make fixup

* fix typehint Image.Image

* fix documentation code

* address review request; fix unittest type checking

* address review request; fix unittest type checking

* make fixup

* address reviews

* Update src/transformers/pipelines/image_to_image.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* enhance docs

* make style

* make style

* improve docetest time

* improve docetest time

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* make fixup

* undo faulty merge

* undo faulty merge

* add image-to-image to test pipeline mixin

* Update src/transformers/pipelines/image_to_image.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_image_to_image.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* improve docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-09-22 19:53:55 +03:00
NielsRogge
7d6354e047
Add ViTMatte (#25843)
* First draft

* Simplify image processor

* Fix rebase

* Address comments

* Address more comments

* Address more comments

* Address more comments

* Address more comments

* Improve pad_image

* Add tests

* Update integration test

* Fix image processor tests

* Fix model tests

* Convert checkpoints

* Fix doc tests

* Remove file

* Apply suggestions

* Address comments

* Fix typing hint

* Add batch_norm_eps

* Address comments

* Fix style
2023-09-19 10:56:10 -03:00
Jinho Park
17fdd35481
Add BROS (#23190)
* add Bros boilerplate

* copy and pasted modeling_bros.py from official Bros repo

* update copyright of bros files

* copy tokenization_bros.py from official repo and update import path

* copy tokenization_bros_fast.py from official repo and update import path

* copy configuration_bros.py from official repo and update import path

* remove trailing period in copyright line

* copy and paste bros/__init__.py from official repo

* save formatting

* remove unused unnecessary pe_type argument - using only crel type

* resolve import issue

* remove unused model classes

* remove unnecessary tests

* remove unused classes

* fix original code's bug - layer_module's argument order

* clean up modeling auto

* add bbox to prepare_config_and_inputs

* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)

* remove decoder test, update create_and_check* input arguemnts

* add missing variable to model tests

* do make fixup

* update bros.mdx

* add boilerate plate for no_head inference test

* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)

* add prepare_bros_batch_inputs function

* update modeling_common to add bbox inputs in Bros Model Test

* remove unnecessary model inference

* add test case

* add model_doc

* add test case for token_classification

* apply fixup

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* - update class name

* - add BrosSpadeOutput
- update BrosConfig arguments

* add boilerate plate for no_head inference test

* add prepare_bros_batch_inputs function

* add test case

* add test case for token_classification

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* apply masking on the fly

* add BrosSpadeForTokenLinking

* update class name
put docstring to the beginning of the file

* separate the logits calculation logic and loss calculation logic

* update logic for loss calculation so that logits shape doesn't change
when return

* update typo

* update prepare_config_and_inputs

* update dummy node initialization

* update last_hidden_states getting logic to consider when return_dict is False

* update box first token mask param

* bugfix: remove random attention mask generation

* update keys to ignore on load missing

* run make style and quality

* apply make style and quality of other codes

* update box_first_token_mask to bool type

* update index.md

* apply make style and quality

* apply make fix-copies

* pass check_repo

* update bros model doc

* docstring bugfix fix

* add checkpoint for doc, tokenizer for doc

* Update README.md

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update bros.md

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply suggestions from code review

* apply suggestions from code review

* revert test_processor_markuplm.py

* Update test_processor_markuplm.py

* apply suggestions from code review

* apply suggestions from code review

* apply suggestions from code review

* update BrosSpadeELForTokenClassification head name to entity linker

* add doc string for config params

* update class, var names to more explicit and apply suggestions from code review

* remove unnecessary keys to ignore

* update relation extractor to be initialized with config

* add bros processor

* apply make style and quality

* update bros.md

* remove bros tokenizer, add bros processor that wraps bert tokenizer

* revert change

* apply make fix-copies

* update processor code, update itc -> initial token, stc -> subsequent token

* add type hint

* remove unnecessary condition branches in embedding forward

* fix auto tokenizer fail

* update docstring for each classes

* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass

* update bros docs

* apply suggestions from code review : update Bros -> BROS in bros.md

* 1. box prefix var -> bbox
2. update variable names to be more explicit

* replace einsum with torch matmul

* apply style and quality

* remove unused argument

* remove unused arguments

* update docstrings

* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations

* revert einsum update

* update bros processor

* apply suggestions from code review

* add conversion script for bros

* Apply suggestions from code review

* fix readme

* apply fix-copies

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-14 18:02:37 +01:00
김준재_T3056
a6ae2bd059
docs: feat: add llama2 notebook resources from OSSCA community (#26076) 2023-09-13 08:27:41 -07:00
Arthur
9cccb3a838
[Persimmon] Add support for persimmon (#26042)
* intiial commit

* updates

* nits

* update conversion script

* update conversion script

* use path to load

* add tips etc

* some modeling logic

* modeling update

* more nits

* nits

* normal layer norm

* update config and doc

* nits

* update doc remove unused

* update

* fix inits and stuff

* fixup

* revert wrong changes

* updates

* more nits

* add default config values to the configuration file

* fixup happy

* update

* 2 tests left

* update readmes

* more nits

* slow test and more documentation

* update readme

* fix licences

* styling

* use fast if possible when saving tokenizer

* remove todo

* remove tokenization tests

* small last nits

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* nits to skip the timout doctest

* fix integration test

* fix test

* update eos token

* update to allow fast tokenization

* styling

* fix codeLlama as well for the update post processor

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more copied from statements

* update

* doc passes doctest

* remove `# final layer norm?`

* change docstring prompot

* update

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't doctest the conversion script as it requires more packages

* don't init a model in the config

* oups

* fix doctest

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-09-12 11:33:27 +02:00
Injin Paek
6206f599e1
Add LLaMA resources (#25859)
* docs: feat: model resources for llama

* fix: resolve suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
2023-09-05 10:50:08 -07:00
raghavanone
1110b565d6
Add TFDebertaV2ForMultipleChoice (#25932)
* Add TFDebertaV2ForMultipleChoice

* Import newer model in main init

* Fix import issues

* Fix copies

* Add doc

* Fix tests

* Fix copies

* Fix docstring
2023-09-05 17:13:06 +01:00
Susnato Dhar
52a46dc57b
Add Pop2Piano space demo. (#25975)
Update pop2piano.md
2023-09-05 11:07:02 +01:00
Matt
034bc5d26a
Add proper Falcon docs and conversion script (#25954)
* Add proper Falcon docs and conversion script

* Autodetect the decoder architecture instead of using an arg

* Update docs now that we can autodetect

* Fix doc error

* Add doc to toctree

* Quick doc update
2023-09-04 17:18:34 +01:00
Sanchit Gandhi
f435003e0c
[MMS] Fix pip install in docs (#25949) 2023-09-04 11:53:41 +01:00
Arthur
a4dd53d88e
Update-llama-code (#25826)
* some bug fixes

* updates

* Update code_llama.md

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>

* Add co author

Co-authored-by: pcuenca <pedro@latenitesoft.com>

* add a test

* fixup

* nits

* some updates

* fix-coies

* adress comments

* nits

* nits

* fix docsting

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update

* add int for https://huggingface.co/spaces/hf-accelerate/model-memory-usage

---------

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
Co-authored-by: pcuenca <pedro@latenitesoft.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-01 20:40:40 +02:00
Sanchit Gandhi
1fa2d89a9b
[MMS] Update docs with HF TTS implementation (#25907)
* [MMS] Update docs with HF TTS implementation

* Update docs/source/en/model_doc/mms.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add uromanise to docs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-09-01 16:50:59 +01:00
Omar Sanseviero
69c5b8f186
Remove broken docs for MusicGen (#25905)
Update musicgen.md
2023-09-01 15:26:42 +01:00
Matthijs Hollemans
4ece3b9433
add VITS model (#24085)
* add VITS model

* let's vits

* finish TextEncoder (mostly)

* rename VITS to Vits

* add StochasticDurationPredictor

* ads flow model

* add generator

* correctly set vocab size

* add tokenizer

* remove processor & feature extractor

* add PosteriorEncoder

* add missing weights to SDP

* also convert LJSpeech and VCTK checkpoints

* add training stuff in forward

* add placeholder tests for tokenizer

* add placeholder tests for model

* starting cleanup

* let the great renaming begin!

* use config

* global_conditioning

* more cleaning

* renaming variables

* more renaming

* more renaming

* it never ends

* reticulating the splines

* more renaming

* HiFi-GAN

* doc strings for main model

* fixup

* fix-copies

* don't make it a PreTrainedModel

* fixup

* rename config options

* remove training logic from forward pass

* simplify relative position

* use actual checkpoint

* style

* PR review fixes

* more review changes

* fixup

* more unit tests

* fixup

* fix doc test

* add integration test

* improve tokenizer tests

* add tokenizer integration test

* fix tests on GPU (gave OOM)

* conversion script can handle repos from hub

* add conversion script for all MMS-TTS checkpoints

* automatically create a README for the converted checkpoint

* small changes to config

* push README to hub

* only show uroman note for checkpoints that need it

* remove conversion script because code formatting breaks the readme

* make WaveNet layers configurable

* rename variables

* simplifying the math

* output attentions and hidden states

* remove VitsFlip in flow model

* also got rid of the other flip

* fix tests

* rename more variables

* rename tokenizer, add phonemization

* raise error when phonemizer missing

* re-order config docstrings to match method

* change config naming

* remove redundant str -> list

* fix copyright: vits authors -> kakao enterprise

* (mean, log_variances) -> (prior_mean, prior_log_variances)

* if return dict -> if not return dict

* speed -> speaking rate

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update fused tanh sigmoid

* reduce dims in tester

* audio -> output_values

* audio -> output_values in tuple out

* fix return type

* fix return type

* make _unconstrained_rational_quadratic_spline a function

* all nn's to accept a config

* add spectro to output

* move {speaking rate, noise scale, noise scale duration} to config

* path -> attn_path

* idxs -> valid idxs -> padded idxs

* output values -> waveform

* use config for attention

* make generation work

* harden integration test

* add spectrogram to dict output

* tokenizer refactor

* make style

* remove 'fake' padding token

* harden tokenizer tests

* ron norm test

* fprop / save tests deterministic

* move uroman to tokenizer as much as possible

* better logger message

* fix vivit imports

* add uroman integration test

* make style

* up

* matthijs -> sanchit-gandhi

* fix tokenizer test

* make fix-copies

* fix dict comprehension

* fix config tests

* fix model tests

* make outputs consistent with reverse/not reverse

* fix key concat

* more model details

* add author

* return dict

* speaker error

* labels error

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vits/convert_original_checkpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove uromanize

* add docstrings

* add docstrings for tokenizer

* upper-case skip messages

* fix return dict

* style

* finish tests

* update checkpoints

* make style

* remove doctest file

* revert

* fix docstring

* fix tokenizer

* remove uroman integration test

* add sampling rate

* fix docs / docstrings

* style

* add sr to model output

* fix outputs

* style / copies

* fix docstring

* fix copies

* remove sr from model outputs

* Update utils/documentation_tests.txt

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add sr as allowed attr

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-01 10:50:06 +01:00
Omar Sanseviero
9525515cd4
Minor wording changes for Code Llama (#25815)
* Update code_llama.md

* Update code_llama.md
2023-08-29 15:02:57 +02:00
NielsRogge
4c21da5e34
Add ViTDet (#25524)
* First draft

* Fix READMEs

* Update return_dict

* Add more tests

* Fix docstrings

* Address comments

* Address more comments

* Address more comments

* Address more comments, fix test

* Fix test
2023-08-29 10:03:52 +01:00
Arthur
de139702a1
[LlamaFamiliy] add a tip about dtype (#25794)
* add a warning=True tip to the Llama2 doc

* code llama needs a tip too

* doc nit

* build PR doc

* doc nits

Co-authored-by: Lysandre <lysandre@huggingface.co>

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-08-28 12:07:31 +02:00
Arthur
015f8e110d
[CodeLlama] Add support for CodeLlama (#25740)
* add all

* Revert "Delete .github directory"

This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.

* make conversion script backward compatible

* fixup

* more styling

* copy to llama changes

* fix repo consistency

* nits

* document correct classes

* updates

* more fixes

* nits

* update auto mappings

* add readmes

* smallupdates

* llama-code replace with llama_code

* make fixup

* updates to the testsing suite

* fix fast nits

* more small fixes

* fix decode

* fix template processing

* properly reset the normalizer

* nits processor

* tokenization tests pass

* styling

* last tests

* additional nits

* one test is left

* nits

Co-authored-by faabian <faabian@users.noreply.github.com>

* update failing test

* fixup

* remove decode infilling users should handle it on their onw after generation, padding can be a problem

* update

* make test slow and more meaningfull

* fixup

* doc update

* fixup

* Apply suggestions from code review

* add kwargs doc

* tokenizer requires `requires_backend`

* type requires_backends

* CodeLlama instead of LlamaCode

* more name cahnges

* nits

* make doctests happy

* small pipeline nits

* last nit

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update

* add codellama to toctree

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-08-25 18:57:40 +02:00
Pedro Cuenca
cb8e3ee25f
Add FlaxCLIPTextModelWithProjection (#25254)
* Add FlaxClipTextModelWithProjection

This is necessary to support the Flax port of Stable Diffusion XL: fb6d705fb5/text_encoder_2/config.json (L3)

Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>

* Use FlaxCLIPTextModelOutput

* make fix-copies again

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Use `return_dict` for consistency with other uses.

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Fix docstring example.

* Add new model to FlaxCLIPTextModelTest

* Add to IGNORE_NON_AUTO_CONFIGURED list

* Fix naming convention.

---------

Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-08-25 10:58:14 +02:00
Anthony Susevski
8968fface4
fixed typo in speech encoder decoder doc (#25745)
fixed typo in speech encoder decoder blog
2023-08-25 09:20:37 +02:00
Wonhyeong Seo
57943630e2
Add Llama2 resources (#25531)
* docs: feat: model resources for llama2

Co-authored-by: Woojun Jung <hello_984@naver.com>

* fix: add description for dpo and rearrange posts

* docs: feat: add llama2 notebook resources

* style: one liners for each resource

Co-Authored-By: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-Authored-By: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <hello_984@naver.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-08-22 17:14:54 -07:00
Blake Wyatt
6a314ea7cd
[DOCS] MusicGen Docs Update (#25510)
* docs: note token limitations for MusicGen

* docs: note token limitations for MusicGen

* docs: fix token count with token limitations for MusicGen
2023-08-22 08:22:45 +02:00
Susnato Dhar
450a181d8b
Add Pop2Piano (#21785)
* init commit

* config updated also some modeling

* Processor and Model config combined

* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested

* model loading successful!

* feature extractor done!

* FE can now be called from HF

* postprocessing added in fe file

* same as prev commit

* Pop2PianoConfig doc done

* cfg docs slightly changed

* fe docs done

* batched

* batched working!

* temp

* v1

* checking

* trying to go with generate

* with generate and model tests passed

* before rebasing

* .

* tests done docs done remaining others & nits

* nits

* LogMelSpectogram shifted to FeatureExtractor

* is_tf rmeoved from pop2piano/init

* import solved

* tokenization tests added

* minor fixed regarding modeling_pop2piano

* tokenizer changed to only return midi_object and other changes

* Updated paper abstract(Camera-ready version) (#2)

* more comments and nits

* ruff changes

* code quality fix

* sg comments

* t5 change added and rebased

* comments except batching

* batching done

* comments

* small doc fix

* example removed from modeling

* ckpt

* forward it compatible with fe and generation done

* comments

* comments

* code-quality fix(maybe)

* ckpts changed

* doc file changed from mdx to md

* test fixes

* tokenizer test fix

* changes

* nits done main changes remaining

* code modified

* Pop2PianoProcessor added with tests

* other comments

* added Pop2PianoProcessor to dummy_objects

* added require_onnx to modeling file

* changes

* update .md file

* remove extra line in index.md

* back to the main index

* added pop2piano to index

* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too

* changes

* added return types to 2 tokenizer methods

* the PR build test might work now

* added backends

* PR build fix

* vocab added

* comments

* refactored vocab into 1 file

* added conversion script

* comments

* essentia version changed in .md

* comments

* more tokenizer tests added

* minor fix

* tests extended for outputs acc check

* small fix

---------

Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
2023-08-21 16:35:00 +01:00
Stas Bekman
6c811a322f
new model: IDEFICS via HuggingFaceM4 (#24796)
* rename

* restore

* mappings

* unedited tests+docs

* docs

* fixes

* fix auto-sync breakage

* cleanup

* wip

* wip

* add fetch_images

* remove einops dependency

* update

* fix

* fix

* fix

* fix

* fix

* re-add

* add batching

* rework

* fix

* improve

* add Leo as I am extending his work

* cleanup

* fix

* cleanup

* slow-test

* fix

* fix

* fixes

* deal with warning

* rename modified llama classes

* rework fetch_images

* alternative implementation

* cleanup

* strict version

* cleanup

* [`IDEFICS`] Fix idefics ci (#25056)

* Fix IDEFICS CI

* fix test file

* fixup

* some changes to make tests pass

* fix

* fixup

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* remove compat checks

* style

* explain that Idefics is not for training from scratch

* require pt>=2.0

* fix idefics vision config (#25092)

* fix idefics vision config

* fixup

* clean

* Update src/transformers/models/idefics/configuration_idefics.py

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* cleanup

* style

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* upcase

* sequence of images

* handle the case with no images

* Update src/transformers/image_processing_utils.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* support pure lm take 2

* support tokenizer options

* parameterize num_channels

* fix upcase

* s|IdeficsForCausalLM|IdeficsForVisionText2Text|g

* manual to one line

* addressing review

* unbreak

* remove clip dependency

* fix test

* consistency

* PIL import

* Idefics prefix

* Idefics prefix

* hack to make tests work

* style

* fix

* fix

* revert

* try/finally

* cleanup

* clean up

* move

* [`IDEFICS`] Fix idefics config refactor (#25149)

* refactor config

* nuke init weights

* more refactor

* oops

* remove visual question answering pipeline support

* Update src/transformers/models/idefics/clip.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/idefics/modeling_idefics.py

* cleanup

* mv clip.py vision.py

* tidyup

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

* fix

* license

* condition on pt

* fix

* style

* fix

* rm torchvision dependency, allow custom transforms

* address review

* rework device arg

* add_eos_token

* s/transforms/transform/

* fix top level imports

* fix return value

* cleanup

* cleanup

* fix

* style

* license

* license

* Update src/transformers/models/idefics/image_processing_idefics.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add a wrapper to freeze vision layears

* tidyup

* use the correct std/mean settings

* parameterize values from config

* add tests/models/idefics/test_image_processing_idefics.py

* add test_processor_idefics.py

* cleanup

* cleanups

* fix

* fix

* move to the right group

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add perceiver config

* reset

* missing arg docs

* Apply suggestions from code review

Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>

* address review comments

* inject automatic end of utterance tokens (#25218)

* inject automatic end of utterance tokens

* fix

* fix

* fix

* rework to not use the config

* not end_of_utterance_token at the end

* Update src/transformers/models/idefics/processing_idefics.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address review

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/image_processing_utils.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* [`Idefics`] add image_embeddings option in generate-related methods (#25442)

* add image_embeddings option in generate-related methods

* style

* rename image_embeddings and allow perceiver embeddings precomputation

* compute embeddings within generate

* make is_encoder_decoder= True the default in config

* nested if else fix

* better triple check

* switch if elif order for pixel values / img embeds

* update model_kwargs perceiver only at the end

* use _prepare_model_inputs instead of encoder_decoder logic

* fix comment typo

* fix config default for is_encoder_decoder

* style

* add typehints

* precompute in forward

* doc builder

* style

* pop instead of get image hidden states

* Trigger CI

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix * + indentation + style

* simplify a bit the use_resampler logic using comments

* update diocstrings

* Trigger CI

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix rebase changes

* unbreak #25237 - to be fixed in follow up PRs

* is_composition = False

* no longer needed

---------

Co-authored-by: leot13 <leo.tronchon@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-08-18 14:12:28 -07:00
Omar Sanseviero
6f4424bb08
Make TTS automodels importable (#25595)
* Add auto model for spectrogram/waveform

* Add doc and install

* Add dummy objects

* Did I miss anything?
2023-08-18 22:01:35 +02:00
Younes Belkada
e7e9261a20
[Docs] Fix un-rendered images (#25561)
fix un-rendered images
2023-08-17 12:08:11 +02:00
Joao Gante
f456b4d10b
Generate: generation config validation fixes in docs (#25405) 2023-08-09 13:07:11 +01:00
Howard Huang
33da2db5ea
[small] llama2.md typo (#25295)
`groupe` -> `grouped`
2023-08-03 14:17:06 -07:00
Yoach Lacombe
6d3f9c1e2e
add generate method to SpeechT5ForTextToSpeech (#25233)
* add generate method to SpeechT5ForTextToSpeech

* update speecht5forTTS docstrings

* Remove defaults to None in generate docstrings

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-08-03 14:12:07 +01:00
Yoach Lacombe
8455346c5c
Update bark doc (#25234)
* add mention to optimization in Bark docs

* add offload mention in docs

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update bark docs.

* Update bark.md

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-08-03 14:08:39 +01:00
Sanchit Gandhi
e93103632b
Add bloom flax (#25094)
* First commit

* step 1 working

* add alibi

* placeholder for `scan`

* add matrix mult alibi

* beta scaling factor for bmm

* working v1 - simple forward pass

* move layer_number from attribute to arg in call

* partial functioning scan

* hacky working scan

* add more modifs

* add test

* update scan for new kwarg order

* fix position_ids problem

* fix bug in attention layer

* small fix

- do the alibi broadcasting only once

* prelim refactor

* finish refactor

* alibi shifting

* incorporate dropout_add to attention module

* make style

* make padding work again

* update

* remove bogus file

* up

* get generation to work

* clean code a bit

* added small tests

* adding albii test

* make CI tests pass:

- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work

* fix few nits

* fix nit onnx

* fix onnx nit

* add missing dtype args to nn.Modules

* remove debugging statements

* fix scan generate

* Update modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* fix small test issue + make style

* clean up

* Update tests/models/bloom/test_modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix function name

* small fix test

* forward contrib credits from PR17761

* Fix failing test

* fix small typo documentation

* fix non passing test

- remove device from build alibi

* refactor call

- refactor `FlaxBloomBlockCollection` module

* make style

* upcast to fp32

* cleaner way to upcast

* remove unused args

* remove layer number

* fix scan test

* make style

* fix i4 casting

* fix slow test

* Update src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove `layer_past`

* refactor a bit

* fix `scan` slow test

* remove useless import

* major changes

- remove unused code
- refactor a bit
- revert import `torch`

* major refactoring

- change build alibi

* remove scan

* fix tests

* make style

* clean-up alibi

* add integration tests

* up

* fix batch norm conversion

* style

* style

* update pt-fx cross tests

* update copyright

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* per-weight check

* style

* line formats

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-27 18:24:56 +01:00
Sebastian Husch Lee
8f36ab3e22
[T5, MT5, UMT5] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)
* Initial addition of t5forsequenceclassification

* Adding imports and adding tests

* Formatting

* Running make fix-copies

* Adding mt5forseq

* Formatting

* run make fix-copies

* Adding to docs

* Add model_parallel

* Fix bug

* Fix

* Remove TODO

* Fixing tests for T5ForSequenceClassification

* Undo changes to dependency_versions_table.py

* Change classification head to work with T5Config directly

* Change seq length to let tests pass

* PR comments for formatting

* Formatting

* Initial addition of UMT5ForSequenceClassification

* Adding to inits and formatting

* run make fix-copies

* Add doc for UMT5ForSeqClass

* Update UMT5 config

* Fix docs

* Skip torch fx test for SequenceClassification

* Formatting

* Add skip to UMT5 tests as well

* Fix umt5 tests

* Running make fix-copies

* PR comments

* Fix for change to sentence_representation

* Rename seq_len to hidden_size since that's what it is

* Use base_model to follow format of the rest of the library

* Update docs

* Extract the decoder_input_ids changes and make one liner

* Make one-liner
2023-07-25 21:02:49 +02:00
Arthur
dcb183f4bd
[MPT] Add MosaicML's MPT model to transformers (#24629)
* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-25 14:32:40 +02:00
Arthur
c53a6eae74
[RWKV] Add note in doc on RwkvStoppingCriteria (#25055)
* Add note in doc on `RwkvStoppingCriteria`

* give some breathing space to the code
2023-07-25 10:15:00 +02:00
Rinat
a03d13c83d
Pvt model (#24720)
* pull and push updates

* add docs

* fix modeling

* Add and run test

* make copies

* add task

* fix tests and fix small issues

* Checks on a Pull Request

* fix docs

* add desc pvt.md
2023-07-24 15:34:19 +01:00
Tom Aarsen
79444f370f
Deprecate unused OpenLlama architecture (#24922)
* Resolve typo in check_repo.py

* Specify encoding when opening modeling files

* Deprecate the OpenLlama architecture

* Add disclaimer pointing to Llama

I'm open to different wordings here

* Match the capitalisation of LLaMA
2023-07-20 07:03:24 -04:00
Travis Cline
3a43794dd6
Fix minor llama2.md model doc typos (#24909)
Update llama2.md

 Fix typos in the llama2 model doc
2023-07-19 08:13:14 -04:00
Arthur
07360b6c9c
[Llama2] Add support for Llama 2 (#24891)
* add llama

* add other readmes

* update padding id in readme

* add link to paper

* fix paths and tokenizer

* more nits

* styling

* fit operation in 2 lines when possible

* nits

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add form

* update reademe

* update readme, we don't have a default pad token

* update test and tokenization

* LLaMA instead of Llama

* nits

* add expected text

* add greeedy output

* styling

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sequential device map

* skip relevant changes

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-18 15:18:31 -04:00
NielsRogge
3ec10e6c76
Add DINOv2 (#24016)
* First draft

* More improvements

* Convert patch embedding layer

* Convert all weights

* Make conversion work

* Improve conversion script

* Fix style

* Make all tests pass

* Add image processor to auto mapping

* Add swiglu ffn

* Add image processor to conversion script

* Fix conversion of giant model

* Fix documentation

* Fix style

* Fix tests

* Address comments

* Address more comments

* Remove unused arguments

* Remove more arguments

* Rename parameters

* Include mask token

* Address comments

* Add docstring

* Transfer checkpoints

* Empty commit
2023-07-18 15:34:06 +01:00
Yoach Lacombe
f42a35e611
Add bark (#24086)
* first raw version of the bark integration

* working code on small models with single run

* add converting script from suno weights 2 hf

* many changes

* correct past_kv output

* working implementation for inference

* update the converting script according to the architecture changes

* add a working end-to-end inference code

* remove some comments and make small changes

* remove unecessary comment

* add docstrings and ensure no unecessary intermediary output during audio generation

* remove done TODOs

* make style + add config docstrings

* modification for batch inference support on the whole model

* add details to .generation_audio method

* add copyright

* convert EncodecModel from original library to transformers implementation

* add two class in order to facilitate model and sub-models loading from the hub

* add support of loading the whole model

* add BarkProcessor

* correct modeling according to processor output

* Add proper __init__ and auto support

* Add up-to-date copyright/license message

* add relative import instead of absolute

* cleaner head_dim computation

* small comment removal or changes

* more verbose LayerNorm init method

* specify eps for clearer comprehension

* more verbose variable naming in the MLP module

* remove unecessary BarkBlock parameter

* clearer code in the forward pass of the BarkBlock

* remove _initialize_modules method for cleaner code

* Remove unnecessary methods from sub-models

* move code to remove unnecessary function

* rename a variable for clarity and change an assert

* move code and change variable name for clarity

* remove unnecessary asserts

* correct small bug

* correct a comment

* change variable names for clarity

* remove asserts

* change import from absolute to relative

* correct small error due to comma missing + correct import

* Add attribute Bark config

* add first version of tests

* update attention_map

* add tie_weights and resize_token_embeddings for fineModel

* correct getting attention_mask in generate_text_semantic

* remove Bark inference trick

* leave more choices in barkProcessor

* remove _no_split_modules

* fixe error in forward of block and introduce clearer notations

* correct converting script with last changes

* make style + add draft bark.mdx

* correct BarkModelTest::test_generate_text_semantic

* add Bark in main README

* add dummy_pt_objects for Bark

* add missing models in the main init

* correct test_decoder_model_past_with_large_inputs

* disable torchscript test

* change docstring of BarkProcessor

* Add test_processor_bark

* make style

* correct copyrights

* add bark.mdx + make style, quality and consistency

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Remove unnecessary test method

* simply logic of a test

* Only check first ids for slow audio generation

* split full end-to-end generation tests

* remove unneccessary comment

* change submodel names for clearer naming

* remove ModuleDict from modeling_bark

* combine two if statements

* ensure that an edge misued won't happen

* modify variable name

* move code snippet to the right place (coarse instead of semantic)

* change BarkSemanticModule -> BarkSemanticModel

* align BarkProcessor with transformers paradigm

* correct BarkProcessor tests with last commit changes

* change _validate_voice_preset to an instance method instead of a class method

* tie_weights already called with post_init

* add codec_model config to configuration

* update bark modeling tests with recent BarkProcessor changes

* remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel

* change absolute imports to relative

* remove TODO

* change docstrings

* add examples to docs and docstrings

* make style

* uses BatchFeature in BarkProcessor insteads of dict

* continue improving docstrings and docs + make style

* correct docstrings examples

* more comprehensible speaker_embeddings load/Save

* rename speaker_embeddings_dict -> speaker_embeddings

* correct bark.mdx + add bark to documentation_tests

* correct docstrings configuration_bark

* integrate last nit suggestions

* integrate BarkGeneration configs

* make style

* remove bark tests from documentation_tests.txt because timeout - tested manually

* add proper generation config initialization

* small bark.mdx documentation changes

* rename bark.mdx -> bark.md

* add torch.no_grad behind BarkModel.generate_audio()

* replace assert by ValueError in convert_suno_to_hf.py

* integrate a series of short comments from reviewer

* move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings

* actually remove SemanticLogitsProcessor from modeling_bark.oy

* BarkProcessor returns a single output instead of tuple + correct docstrings

* make style + correct bug

* add initializer_range to BarkConfig + correct slow modeling tests

* add .clone() to history_prompt.coarse_prompt to avoid modifying input array

* Making sure no extra "`" are present

* remove extra characters in modeling_bark.py

* Correct output if history_prompt is None

* remove TODOs

* remove ravel comment

* completing generation_configuration_bark.py docstrings

* change docstrings - number of audio codebooks instead of Encodec codebooks

* change 'bias' docstrings in configuration_bark.py

* format code

* rename BarkModel.generate_audio -> BarkModel.generate_speech

* modify AutoConfig instead of EncodecConfig in BarkConfig

* correct AutoConfig wrong init

* refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic

* remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor

* move nb_codebook related config arguments to BarkFineConfig

* rename bark.mdx -> bark.md

* correcting BarkModelConfig from_pretrained + remove keys_to_ignore

* correct bark.md with correct hub path

* correct code bug in bark.md

* correct list tokens_to_suppress

* modify Processor to load nested speaker embeddings in a safer way

* correct batch sampling in BarkFineModel.generate_fine

* Apply suggestions from code review

Small docstrings correction and code improvements

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* give more details about num_layers in docstrings

* correct indentation mistake

* correct submodelconfig order of docstring variables

* put audio models in alphabetical order in utils/check_repo.my

* remove useless line from test_modeling_bark.py

* makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest

* make a Tester class for each sub-model instead of inheriting

* add test_resize_embeddings=True for Bark sub-models

* add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads

* remove 'Copied fom Bark' comment

* remove unneccessary comment

* change np.min -> min in modeling_bark.py

* refactored all custom layers to have Bark prefix

* add attention_mask as an argument of generate_text_semantic

* refactor sub-models start docstrings to have more precise config class definition

* move _tied_weights_keys overriding

* add docstrings to generate_xxx in modeling_bark.py

* add loading whole BarkModel to convert_suno_to_hf

* refactor attribute and variable names

* make style convert_suno

* update bark checkpoints

* remove never entered if statement

* move bark_modeling docstrings after BarkPretrainedModel class definition

* refactor modeling_bark.py: kv -> key_values

* small nits - code refactoring and removing unecessary lines from _init_weights

* nits - replace inplace method by variable assigning

* remove *optional* when necessary

* remove some lines in generate_speech

* add default value for optional parameter

* Refactor preprocess_histories_before_coarse -> preprocess_histories

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct usage after refactoring

* refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly

* update docstrings python in configuration_bark.py

* add bark files in utils/documentation_test.txt

* correct docstrings python snippet

* add the ability to use parameters in the form of e.g coarse_temperature

* add semantic_max_new_tokens in python snippet in docstrings for quicker generation

* Reformate sub-models kwargs in BakModel.generate

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* correct kwargs in BarkModel.generate

* correct attention_mask kwarg in BarkModel.generate

* add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16

* enrich BarkModel.generate docstrings with a description of how to use the kwargs

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-17 17:53:24 +01:00
Matt
c0ca73dc98
Remove Falcon docs for the release until TGI is ready (#24808)
* Remove Falcon docs for the release until TGI is ready

* Update toctree
2023-07-13 17:27:58 +01:00
Sylvain Gugger
9342c8fb82
Deprecate models (#24787)
* Deprecate some models

* Fix imports

* Fix inits too

* Remove tests

* Add deprecated banner to documentation

* Remove from init

* Fix auto classes

* Style

* Remote upgrade strategy 1

* Remove site package cache

* Revert this part

* Fix typo...

* Update utils

* Update docs/source/en/model_doc/bort.md

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* With all files saved

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-07-13 11:46:54 -04:00
Jegor Kitškerkin
8a5e8a9c2a
Add ViViT (#22518)
* Add model

* Add ability to get classification head weights

* Add docs

* Add imports to __init__.py

* Run style

* Fix imports and add mdx doc

* Run style

* Fix copyright

* Fix config docstring

* Remove imports of ViViTLayer and load_tf_weights_in_vivit

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Remove ViViTForPreTraining from vivit.mdx

* Change ViViT -> Vivit everywhere

* Add model_doc to _toctree.yml

* Replace tuples with lists in arguments of VivitConfig

* Rename patch_size to tubelet_size in TubeletEmbeddings

* Fix checkpoint names

* Add tests

* Remove unused num_frames

* Fix imports for VivitImageProcessor

* Minor fixes

* Decrease number of frames in VivitModelTester from 32 to 16

* Decrease number of frames in VivitModelTester from 16 to 8

* Add initialization for pos embeddings

* Rename Vivit -> ViViT in some places

* Fix docstring and formatting

* Rename TubeletEmbeddings -> VivitTubeletEmbeddings

* Remove load_tf_weights_in_vivit

* Change checkpoint name

* Remove Vivit _TOKENIZER_FOR_DOC

* Fix

* Fix VivitTubeletEmbeddings and pass config object as parameter

* Use image_size and num_frames instead of video_size

* Change conversion script and fix differences with the orig implementation

* Fix docstrings

* Add attention head pruning

* Run style and fixup

* Fix tests

* Add ViViT to video_classification.mdx

* Save processor in conversion script

* Fix

* Add image processor test

* Run fixup and style

* Run fix-copies

* Update tests/models/vivit/test_modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vivit/test_modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use PyAV instead of decord

* Add unittest.skip

* Run style

* Remove unneeded test

* Update docs/source/en/model_doc/vivit.mdx

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/configuration_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add model

* Add docs

* Run style

* Fix imports and add mdx doc

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Change ViViT -> Vivit everywhere

* Rename Vivit -> ViViT in some places

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Run make style

* Remove inputs save

* Fix image processor

* Fix

* Run `make style`

* Decrease parameters of VivitModelTester

* Decrease tubelet size

* Rename vivit.mdx

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix default values in image_processing_vivit.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 14:04:04 +01:00
Matt
b3ab3fac1d
Falcon port (#24523)
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 13:36:31 +01:00
novice
30ed3adf47
Add Multi Resolution Analysis (MRA) (New PR) (#24513)
* Add all files

* Update masked_language_modeling.md

* fix mlm models

* fix conflicts

* fix conflicts

* fix copies

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Reduce seq_len and hidden_size in ModelTester

* remove output_attentions

* fix conflicts

* remove copied from statements

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-10 10:50:43 +01:00
Arthur
fb78769b9c
[MT5] Fix CONFIG_MAPPING issue leading it to load umt5 class (#24678)
* update

* add umt5 to auto tokenizer mapping

* nits

* fixup

* fix failing torch test
2023-07-07 11:33:54 +09:00
Rafael Padilla
ea9caf7aba
Update warning messages reffering to post_process_object_detection (#24649)
* including the threshold alert in warning messages.

* Updating doc owlvit.md including post_process_object_detection function with threshold.

* fix
2023-07-04 16:47:57 -03:00
Eli Simhayev
fc7ce2ebc5
[Time-Series] Added blog-post to tips (#24482)
* [Time-Series] Added blog-post to tips

* added Resources to time series models docs

* removed "with Bert"
2023-07-03 10:07:25 +02:00
Arthur
799df10aef
[Umt5] Add google's umt5 to transformers (#24477)
* add tokenization template

* update conversion script

* update modeling code

* update

* update convert checkpoint

* update modeling

* revert changes on convert script

* new conversion script for new format

* correct position bias

* cleaning a bit

* Credit co authors

Co-authored-by: agemagician
<ahmed.elnaggar@tum.de>

Co-authored-by: stefan-it
<>

* styling

* Add docq

* fix copies

* add co author

* Other Author

* Merge branch 'main' of https://github.com/huggingface/transformers into add-umt5

* add testing

* nit

* Update docs/source/en/model_doc/umt5.mdx

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* fix t5

* actual fix?

* revert wrong changes

* remove

* update test

* more fixes

* revert some changes

* add SPIECE_UNDERLINE

* add a commone xample

* upfate

* fix copies

* revert changes on t5 conversion script

* revert bytefallback changes since there was no addition yet

* fixup

* fixup

* ingore umt5 cutom testing folder

* fix readmes

* revertT5 changes

* same outputs

* fixup

* update example

* Apply suggestions from code review

* style

* draft addition of all new files

* current update

* fix attention and stuff

* finish refactoring

* auto config

* fixup

* more nits

* add umt5 to init

* use md format

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert changes on mt5

* revert mt4 changes

* update test

* more fixes

* add to mapping

* fix-copies

* fix copies

* foix retain grad

* fix some tests

* nits

* done

* Update src/transformers/models/umt5/modeling_umt5.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/umt5.md

* Update src/transformers/models/umt5/__init__.py

* Update docs/source/en/model_doc/umt5.md

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Update src/transformers/models/umt5/modeling_umt5.py

* update conversion script + use google checkpoints

* nits

* update test and modelling

* stash slow convert

* update fixupd

* don't change slow

---------

Co-authored-by: stefan-it <>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-03 07:38:21 +02:00
Yih-Dar
c817bc44e2
Check all objects are equally in the main __init__ file (#24573)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-29 17:49:59 +02:00
amyeroberts
b324557aac
Removal of deprecated vision methods and specify deprecation versions (#24570)
* Removal of deprecated methods and specify versions

* Fix tests
2023-06-29 15:09:51 +01:00
Sanchit Gandhi
1c1c90756d
Add Musicgen (#24109)
* Add Audiocraft

* add cross attention

* style

* add for lm

* convert and verify

* introduce t5

* split configs

* load t5 + lm

* clean conversion

* copy from t5

* style

* start pattern provider

* make generation work

* style

* fix pos embs

* propagate shape changes

* propagate shape changes

* style

* delay pattern: pad tokens at end

* audiocraft -> musicgen

* fix inits

* add mdx

* style

* fix pad token in processor

* override generate and add todos

* add init to test

* undo pattern delay mask after gen

* remove cfg logits processor

* remove cfg logits processor

* remove logits processor in favour of mask

* clean pos embs

* make fix copies

* update readmes

* clean pos emb

* refactor encoder/decoder

* make fix copies

* update conversion

* fix config imports

* update config docs

* make style

* send pattern mask to device

* pattern mask with delay

* recover prompted audio tokens

* fix docstrings

* laydown test file

* pattern edge case

* remove t5 ref

* add processing class

* config refactor

* better pattern comment

* check if mask is not present

* check if mask is not present

* refactor to auto class

* remove encoder configs

* fix processor

* processor import

* start updating conversion

* start updating tests

* make style

* convert t5, encodec, lm

* convert as composite

* also convert processor

* run generate

* classifier free gen

* comments and clean up

* make style

* docs for logit proc

* docstring for uncond gen

* start lm tests

* work tests

* let the lm generate

* refactor: reshape inside forward

* undo greedy loop changes

* from_enc_dec -> from_sub_model

* fix input id shapes in docstrings

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* undo generate changes

* from sub model config

* Update src/transformers/models/musicgen/modeling_musicgen.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make generate work again

* generate uncond -> get uncond inputs

* remove prefix allowed tokens fn

* better error message

* logit proc checks

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* make decoder only tests work

* composite fast tests

* make style

* uncond generation

* feat extr padding

* make audio prompt work

* fix inputs docstrings

* unconditional inputs: dict -> model output

* clean up tests

* more clean up tests

* make style

* t5 encoder -> auto text encoder

* remove comments

* deal with frames

* fix auto text

* slow tests

* nice mdx

* remove can generate

* todo - hub id

* convert m/l

* make fix copies

* only import generation with torch

* ignore decoder from tests

* don't wrap uncond inputs

* make style

* cleaner uncond inputs

* add example to musicgen forward

* fix docs

* ignore MusicGen Model/ForConditionalGeneration in auto mapping

* add doc section to toctree

* add to doc tests

* add processor tests

* fix push to hub in conversion

* tips for decoder only loading

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix conversion for s / m / l checkpoints

* import stopping criteria from module

* remove from pipeline tests

* fix uncond docstring

* decode audio method

* fix docs

* org: sanchit-gandhi -> facebook

* fix max pos embeddings

* remove auto doc (not compatible with shapes)

* bump max pos emb

* make style

* fix doc

* fix config doc

* fix config doc

* ignore musicgen config from docstring

* make style

* fix config

* fix config for doctest

* consistent from_sub_models

* don't automap decoder

* fix mdx save audio file

* fix mdx save audio file

* processor batch decode for audio

* remove keys to ignore

* update doc md

* update generation config

* allow changes for default generation config

* update tests

* make style

* fix docstring for uncond

* fix processor test

* fix processor test

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-06-29 14:48:59 +01:00
amyeroberts
ae454f41d4
Update old existing feature extractor references (#24552)
* Update old existing feature extractor references

* Typo

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Address comments from review - update 'feature extractor'
Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2023-06-29 10:17:36 +01:00
Sebastian
06910f5a76
[T5] Add T5ForQuestionAnswering and MT5ForQuestionAnswering (#24481)
* Adding T5ForQuestionAnswering

* Changed weight initialization that results in better initial loss when fine-tuning

* Update to class variables

* Running make fixup

* Running make fix-copies

* Remove model_parallel

* Adding MT5ForQuestionAnswering

* Adding docs

* Fix wrong doc

* Update src/transformers/models/mt5/modeling_mt5.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* File formatting

* Undoing change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-06-27 10:07:06 -04:00
NielsRogge
868363abb9
Add InstructBLIP (#23460)
* Squash 88 commits

* Use markdown

* Remove mdx files due to bad rebase

* Fix modeling files due to bad rebase

* Fix style

* Update comment

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-26 11:23:57 +02:00
Sanchit Gandhi
ea91c2adca
[AutoModel] Add AutoModelForTextEncoding (#24305)
* [AutoModel] Add AutoModelForTextEncoding

* add mt5

* add other models

* add to docs

* fix tf imports

* add tf to docs / init

* up

* fix inits

* add to dummy objects
2023-06-23 10:01:37 +01:00
Steven Liu
ad78d9597b
[docs] Fix NLLB-MoE links (#24388)
fix broken links
2023-06-20 17:34:20 -07:00
Sylvain Gugger
eb849f6604
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00
Vineel Pratap
7761b1893a
Update MMS integration docs (#24311)
* Update mms.mdx

* Update mms.mdx

* Update docs/source/en/model_doc/mms.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update mms.mdx

* Update docs/source/en/model_doc/mms.mdx

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-06-19 14:49:01 +01:00
hitchhicker
c3ca346b49
[Docs] Fix the paper URL for MMS model (#24302)
Fix the paper URL for MMS model
2023-06-15 15:45:49 +01:00
Patrick von Platen
604a21b1e6
[Docs] Improve docs for MMS loading of other languages (#24292)
* Improve docs

* Apply suggestions from code review

* upload readme

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-06-15 14:29:32 +02:00
Matthijs Hollemans
0c3fdccf2f
[WIP] add EnCodec model (#23655)
* boilerplate stuff

* messing around with the feature extractor

* fix feature extractor

* unit tests for feature extractor

* rename speech to audio

* quick-and-dirty import of Meta's code

* import weights (sort of)

* cleaning up

* more cleaning up

* move encoder/decoder args into config

* cleanup model

* rename EnCodec -> Encodec

* RVQ parameters in config

* add slow test

* add lstm init and test_init

* Add save & load

* finish EncodecModel

* remove decoder_input_values as they are ont used anywhere (not removed from doc yet)

* fix test feature extraction model name

* Add better slow test

* Fix tests

* some fixup and cleaning

* Improve further

* cleaning up quantizer

* fix up conversion script

* test don't pass, _encode_fram does not work

* update tests with output per encode and decode

* more cleanup

* rename _codebook

* remove old config cruft

* ratios & hop_length

* use ModuleList instead of Sequential

* clean up resnet block

* update types

* update tests

* fixup

* quick cleanup

* fix padding

* more styl,ing

* add patrick feedback

* fix copies

* fixup

* fix lstm

* fix shape issues

* fixup

* rename conv layers

* fixup

* fix decoding

* small conv refactoring

* remove norm_params

* simplify conv layers

* rename conv layers

* stuff

* Clean up

* Add padding logic

use padding mask

small conv refactoring

remove norm_params

simplify conv layers

rename conv layers

stuff

add batched test

update

Clean up

merge and update for padding

fix padding

fixup

* clean up more

* clean up more

* More clean ups

* cleanup convolutions

* typo

* fix typos

* fixup

* build PR doc?

* start refactoring docstring

* fix don't pad when no strid and chunk

* update docstring

* update docstring

* nits

* update going to lunch

* update config and model

* fix broken testse (becaue of the config changes)

* fix scale computation

* fixu[

* only return dict if speciefied or if config returns it

* remove todos

* update defaults in config

* update conversion script

* fix doctest

* more docstring + fixup

* nits on batched_tests

* more nits

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update basxed on review

* fix update

* updaet tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fixup

* add overlap and chunl_length_s

* cleanup feature extraction

* teste edge cases truncation and padding

* correct processor values

* update config encodec, nits

* fix tests

* fixup

* fix 24Hz test

* elle tests are green

* fix fixup

* Apply suggestions from code review

* revert readme changes

* fixup

* add example

* use facebook checkpoints

* fix typo

* no pipeline tests

* use slef.pad everywhere we can

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* update

* update mdx

* fix bug and tests

* fixup

* fix doctest

* remove comment

* more nits

* add more coverage for `test_truncation_and_padding`

* fixup

* add last test

* fix text

* nits

* Update tests/models/encodec/test_modeling_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* take care of the last comments

* typo

* fix test

* nits

* fixup

* Update src/transformers/models/encodec/feature_extraction_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: arthur.zucker@gmail.com <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-06-14 18:57:23 +02:00
Arthur
5af3a1aa48
[lamaTokenizerFast] Update documentation (#24132)
* Update documentation

* nits
2023-06-09 16:30:20 +02:00
Elliott Wang
e2972dffdd
PLAM => PaLM (#24129) 2023-06-09 12:32:16 +01:00
Eli Simhayev
bacaab1629
Added time-series blogs to the models (#23857)
* added blogs to docs

* removed new-line
2023-06-02 12:32:34 -04:00
Patrick von Platen
dcb5e18c9e
add new mms functions to doc (#23954) 2023-06-02 11:35:52 +01:00
Shehan Munasinghe
07c54413ac
Add MobileViTv2 (#22820)
* generated code from add-new-model-like

* Add code for modeling, config, and weight conversion

* add tests for image-classification, update modeling and config

* add code, tests for semantic-segmentation

* make style, make quality, make fix-copies

* make fix-copies

* Update modeling_mobilevitv2.py

fix bugs

* Update _toctree.yml

* update modeling, config

fix bugs

* Edit docs - fix bug MobileViTv2v2 -> MobileViTv2

* Update mobilevitv2.mdx

* update docstrings

* Update configuration_mobilevitv2.py

make style

* Update convert_mlcvnets_to_pytorch.py

remove unused options

* Update convert_mlcvnets_to_pytorch.py

make style

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make style, make quality

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add suggestions from code review

Remove MobileViTv2ImageProcessor

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make style

* Add suggestions from code review

Rename MobileViTv2 -> MobileViTV2

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update modeling_mobilevitv2.py

make style

* Update serialization.mdx

* Update modeling_mobilevitv2.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-06-02 10:37:02 +01:00
Patrick von Platen
5dfd407b37
[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 (#23813)
* add fine-tuned with adapter layer

* Add set_target_lang to tokenizer

* Implement load adapter

* add tests

* make style

* Apply suggestions from code review

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py

* make fix-copies

* Apply suggestions from code review

* make fix-copies

* make style again

* mkae style again

* fix doc string

* Update tests/models/wav2vec2/test_tokenization_wav2vec2.py

* Apply suggestions from code review

* fix

* Correct wav2vec2 adapter

* mkae style

* Update src/transformers/models/wav2vec2/modeling_wav2vec2.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add more nice docs

* finish

* finish

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

* all finish

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-06-02 10:30:24 +01:00
Denisa Roberts
88f50a1e89
Add TensorFlow implementation of EfficientFormer (#22620)
* Add tf code for efficientformer

* Fix return dict bug - return last hidden state after last stage

* Fix corresponding return dict bug

* Override test tol

* Change default values of training to False

* Set training to default False X3

* Rm axis from ln

* Set init in dense projection

* Rm debug stuff

* Make style; all tests pass.

* Modify year to 2023

* Fix attention biases codes

* Update the shape list logic

* Add a batch norm eps config

* Remove extract comments in test files

* Add conditional attn and hidden states return for serving output

* Change channel dim checking logic

* Add exception for withteacher model in training mode

* Revert layer count for now

* Add layer count for conditional layer naming

* Transpose for conv happens only in main layer

* Make tests smaller

* Make style

* Update doc

* Rm from_pt

* Change to actual expect image class label

* Remove stray print in tests

* Update image processor test

* Remove the old serving output logic

* Make style

* Make style

* Complete test
2023-05-31 10:43:12 +01:00
Eli Simhayev
4b6a5a7caa
[Time-Series] Autoformer model (#21891)
* ran `transformers-cli add-new-model-like`

* added `AutoformerLayernorm` and `AutoformerSeriesDecomposition`

* added `decomposition_layer` in `init` and `moving_avg` to config

* added `AutoformerAutoCorrelation` to encoder & decoder

* removed caninical self attention `AutoformerAttention`

* added arguments in config and model tester. Init works! 😁

* WIP autoformer attention with autocorrlation

* fixed `attn_weights` size

* wip time_delay_agg_training

* fixing sizes and debug time_delay_agg_training

* aggregation in training works! 😁

* `top_k_delays` -> `top_k_delays_index` and added `contiguous()`

* wip time_delay_agg_inference

* finish time_delay_agg_inference 😎

* added resize to autocorrelation

* bug fix: added the length of the output signal to `irfft`

* `attention_mask = None` in the decoder

* fixed test: changed attention expected size, `test_attention_outputs` works!

* removed unnecessary code

* apply AutoformerLayernorm in final norm in enc & dec

* added series decomposition to the encoder

* added series decomp to decoder, with inputs

* added trend todos

* added autoformer to README

* added to index

* added autoformer.mdx

* remove scaling and init attention_mask in the decoder

* make style

* fix copies

* make fix-copies

* inital fix-copies

* fix from https://github.com/huggingface/transformers/pull/22076

* make style

* fix class names

* added trend

* added d_model and projection layers

* added `trend_projection` source, and decomp layer init

* added trend & seasonal init for decoder input

* AutoformerModel cannot be copied as it has the decomp layer too

* encoder can be copied from time series transformer

* fixed generation and made distrb. out more robust

* use context window to calculate decomposition

* use the context_window for decomposition

* use output_params helper

* clean up AutoformerAttention

* subsequences_length off by 1

* make fix copies

* fix test

* added init for nn.Conv1d

* fix IGNORE_NON_TESTED

* added model_doc

* fix ruff

* ignore tests

* remove dup

* fix SPECIAL_CASES_TO_ALLOW

* do not copy due to conv1d weight init

* remove unused imports

* added short summary

* added label_length and made the model non-autoregressive

* added params docs

* better doc for `factor`

* fix tests

* renamed `moving_avg` to `moving_average`

* renamed `factor` to `autocorrelation_factor`

* make style

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix configurations

* fix integration tests

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixing `lags_sequence` doc

* Revert "fixing `lags_sequence` doc"

This reverts commit 21e34911e3.

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* model layers now take the config

* added `layer_norm_eps` to the config

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* added `config.layer_norm_eps` to AutoformerLayernorm

* added `config.layer_norm_eps` to all layernorm layers

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix variable names

* added inital pretrained model

* added use_cache docstring

* doc strings for trend and use_cache

* fix order of args

* imports on one line

* fixed get_lagged_subsequences docs

* add docstring for create_network_inputs

* get rid of layer_norm_eps config

* add back layernorm

* update fixture location

* fix signature

* use AutoformerModelOutput dataclass

* fix pretrain config

* no need as default exists

* subclass ModelOutput

* remove layer_norm_eps config

* fix test_model_outputs_equivalence test

* test hidden_states_output

* make fix-copies

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* removed unused attr

* Update tests/models/autoformer/test_modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use AutoFormerDecoderOutput

* fix formatting

* fix formatting

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-30 10:23:32 +02:00
Arthur
8d28dba35d
[OPT] Doc nit, using fast is fine (#23789)
small doc nit
2023-05-26 14:30:32 +02:00
Matt
1c460a5273
TF port of the Segment Anything Model (SAM) (#22970)
* First commit

* Add auto-translation with GPT-4

* make fixup

* Add a functional layernorm for TF

* Add all the auxiliary imports etc.

* Add the extra processor and tests

* rebase to main

* Add all the needed fixes to the GPT code

* make fixup

* Make convolutions channels-last so they run on CPU

* make fixup

* Fix final issues

* Fix other models affected by test change

* Clarify comment on the sparse_prompt_embeddings check

* Refactor functional_layernorm, use shape_list in place of .shape in some places

* Remove deprecated torch-alike code

* Update tests/models/sam/test_modeling_tf_sam.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/sam/test_modeling_tf_sam.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Refactor processor with common methods and separated private methods

* make fixup

* Quietly delete the file that didn't do anything (sorry Sylvain)

* Refactor the processor tests into one file

* make fixup

* Clean up some unnecessary indirection

* Fix TF mask postprocessing

* Add more processor equivalence tests

* Refactor generate_crop_boxes to use framework-neutral np code

* Make the serving output correctly conditional

* Fix error message line length

* Use dict keys rather than indices internally in both TF and PT SAM call/forward

* Return dicts internally in the call/forward methods

* Revert changes to common tests and just override check_pt_tf_outputs

* Revert changes to other model tests

* Clarify comments for functional layernorm

* Add missing transpose from PT code

* Removed unused copied from in PT code

* Remove overrides for tests that don't exist in TF

* Fix transpose and update tests for PT and TF to check pred_masks

* Add training flag

* Update tests to use TF checkpoints

* Update index.mdx

* Add missing cross-test decorator

* Remove optional extra asterisks

* Revert return_dict changes in PT code

* Update src/transformers/models/sam/modeling_tf_sam.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove None return annotations on init methods

* Update tests/models/sam/test_processor_sam.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix input_boxes shapes

* make fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-05-19 14:14:13 +01:00
Yih-Dar
21741e8c7e
Update test_batched_inference_image_captioning_conditioned (#23391)
* fix

* fix

* fix test + add more docs

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-05-16 14:49:24 +02:00
richardachen
65b885027a
Typo suggestion (#23360)
Update graphormer.mdx

Typo suggestion
2023-05-15 12:04:16 +01:00
Shehan Munasinghe
c045249049
Add swiftformer (#22686)
* Commit the automatically generated code

using add-new-model-like

* Update description at swiftformer.mdx file

* remove autogenerated code for MaskedImageModeling

* update weight conversion scripts

* Update modeling_swiftformer.py

* update configuration_swiftformer.py

* Update test_modeling_swiftformer.py

* update modeling code - remove einops dependency

* Update _toctree.yml

* update modeling code - remove copied from comments

* update docs

* Revert "update docs"

This reverts commit c2e05e2998.

* update docs

* remove unused reference SwiftFormerImageProcessor

* update dependency_versions_table.py

* update swiftformer.mdx

* update swiftformer.mdx

* change model output type - no attentions

* update model org name

* Fix typo

* fix copies

* Update tests/models/swiftformer/test_modeling_swiftformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/auto/feature_extraction_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/swiftformer.mdx

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/swiftformer/configuration_swiftformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update modeling_swiftformer.py

fix-copies

* make style, make quality, fix-copies

* Apply suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make style

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fix-copies

* Update modeling_swiftformer.py

* Update modeling_swiftformer.py

* Add suggestions from code review

Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-12 11:52:31 +01:00
Sylvain Gugger
b4d4d6fe87
Add RWKV-4 (#22797)
* First draft of RWKV-4

* Add support for generate

* Style post-rebase

* Properly use state

* Write doc

* Fix doc

* More math

* Add model to README, dummies and clean config

* Fix init

* multiple fixes:

- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests

* correct tokenizer

* some tweaks

- fix config docstring
- fix failing tests

* fix CI tests

- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs

* fix conversion script

- fix sharded case
- add new arguments

* add slow tests + more fixes on conversion script

* add another test

* final fixes

* change single name variable

* add mock attention mask for pipeline to work

* correct eos token id

* fix nits

* add checkpoints

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `tie_word_embeddings` in docstring

* change tensor name

* fix final nits

* Trigger CI

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-09 13:04:10 -04:00
NielsRogge
431b04d8c4
[SAM] Add resources (#23224)
Add resources
2023-05-09 08:58:19 -04:00
Ashwin Mathur
ef0c380c12
Update LLaMA docs with arxiv link (#23191)
* Update docs with arxiv link

* Update llama model docs
2023-05-07 18:52:44 -04:00
raghavanone
312b104ff6
Add FlaxWhisperForAudioClassification model (#23173)
* Add FlaxWhisperForAudioClassification model

* Add models to init

* Add models to init

* Fix copies

* Fix automapping

* Fix failing test
2023-05-05 13:23:46 -04:00
Perry Huang
1b9c352e55
Add TrOCR resources (#23142)
* Add TrOCR resources

* Made fixes suggested by stevhliu
2023-05-05 11:29:20 -04:00
Sylvain Gugger
01734dba84
Revert "Add FlaxWhisperForAudioClassification model" (#23154)
Revert "Add FlaxWhisperForAudioClassification model (#22883)"

This reverts commit c8f2c5c56e.
2023-05-04 13:47:07 -04:00
raghavanone
c8f2c5c56e
Add FlaxWhisperForAudioClassification model (#22883)
* Add FlaxWhisperForAudioClassification model

* Add models to init

* Add models to init

* Fix copies

* Fix automapping
2023-05-04 13:00:16 -04:00
peter-sk
83b38fbea8
GPTNeoXForQuestionAnswering (#23059)
* first draft - gives index error in question_answering.py

* maturing

* no labels

* pipeline should know about QA

* fixing checks

* formatting

* fixed docstring

* initial commit

* formatting

* adding the class to many places

* towards less unhappy checks

* nearly there

* and gpt neox for qa

* use right model

* forgot this one

* base_model_prefix is "gpt_neox" for GPTNeoX* models

* unnecessary stuff

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* format

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* removed gpt2 stuff

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-04 10:15:15 -04:00
peter-sk
78b7debf56
GPTNeoForQuestionAnswering (#23057)
* first draft - gives index error in question_answering.py

* maturing

* no labels

* pipeline should know about QA

* fixing checks

* formatting

* fixed docstring

* initial commit

* formatting

* adding the class to many places

* towards less unhappy checks

* nearly there

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* avoid error

* moving to device of star/end_logits

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-03 15:59:19 -04:00
Julien Chaumond
ca7eb27ed5
[doc] Try a few ≠ ways of linking to Papers, users, and org profiles (#22611)
* [doc] Try a few ≠ ways of linking to Papers, users, and org profiles

* Empty commit

* Empty commit now that the backend is fixed

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-05-03 18:23:09 +02:00
Samin Yasar
b53004fdce
Add resources for LayoutLmV2 and reformat documentation resources (#23115)
* add resources for layoutlmv2

* remove 🌎 from some resources
2023-05-03 09:53:00 -04:00
peter-sk
2b0c924568
GPT2ForQuestionAnswering (#23030)
* first draft - gives index error in question_answering.py

* maturing

* no labels

* pipeline should know about QA

* fixing checks

* formatting

* fixed docstring

* make sure legacy code executes

* comment

* like this

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
2023-05-02 09:25:46 -04:00
Ashwin Mathur
487f132a6f
Add BioGPTForSequenceClassification (#22253)
* added BioGptForSequenceClassification

* added source of copied code

* typo

* Format code with black

* Update comments for copied code

* Remove code copy comment

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix failing tests

* Update code copied from comments

* Fix code quality

* Update src/transformers/models/biogpt/modeling_biogpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix lint error

* Update src/transformers/models/biogpt/modeling_biogpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Rename model to biogpt for consistency

* Add PipelineTesterMixin to test_modeling_biogpt.py

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Resolve merge confict

---------

Co-authored-by: Guillem García Subies <37592763+GuillemGSubies@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-01 09:17:27 -04:00
s-JoL
c2c99dc7ef
add open-llama model with ckpt (#22795)
* update Open-Llama model

* update

* update format

* update doc

* update

* update stable embedding test

* update test case

* update format

* update readme

* fix typo

* update name

* remove tokenizer and update format

* remove convert_open_llama_weights_to_hf

* update warning and doc_string

---------

Co-authored-by: songliang.bayesian <songliang.bayesian@bytedance.com>
2023-04-28 11:01:32 -04:00
peter-sk
d65b14ed67
added GPTNeoForTokenClassification (#22908)
* added GPTNeoForTokenClassification

* add to top-level init

* fixup

* test

* more fixup

* add to gpt_neo.mdx

* repo consistency

* dummy copy

* fix copies

* optax >= 0.1.5 assumes jax.Array exists - which it doesn't for jax <= 0.3.6

* merge with main made this superfluous

* added classifier_dropout

* remove legacy code

* removed fmt:on/off
removed expected_outputs

* doc style fix

* classifier_dropout is always in config

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
2023-04-27 12:10:03 -04:00
peter-sk
614e191c4d
added GPTNeoXForTokenClassification (#23002)
* initial commit

* added GPTNeoXForTokenClassification

* typo

* doc
fixed extra comma that turned into a tuple

* unifying variable names
fixing forward call

* classifier_dropout is in config

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-27 11:08:26 -04:00
Ritik Nandwal
20ac86c6f1
Add TensorFlow Wav2Vec2 for sequence classification (#22073)
* Add initial changes for TF wav2vec2 for sequence classification

* Add suggested changes

* Add serving and serving output methods

* Add serving_output implementation and fix layer_weights

* Add fixes

* Fixed test cases

* Fixing test and adding suggested changes
2023-04-26 13:35:30 +01:00
Daniel Levenson
4e1522d65a
Fix typo in mega.mdx (#22998)
MegaConfiig -> MegaConfig
2023-04-25 17:58:45 -04:00
Arthur
df017c3ccc
[CLAP] Doc nits (#22957)
clap nits
2023-04-24 14:00:29 +02:00
NielsRogge
3d3204c025
Add FocalNet (#21532)
Adds FocalNet by Microsoft to transformers

---------

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: alaradirik <alaradirik@gmail.com>
2023-04-23 20:03:05 +03:00
Connor Henderson
b950c38565
tests: Fix flaky test for NLLB-MoE (#22880)
* add test update and docs edits

* docs edit suggestion
2023-04-21 17:09:40 +01:00
fxmarty
3d852da2db
Expose AutoModelForMaskGeneration (#22910)
* expose

* style

* add dummy object

* amazed by the quality of transformers CI
2023-04-21 10:04:45 -04:00
Arthur
f143037789
Add automatic-mask-generation pipeline for Segment Anything Model (SAM) (#22840)
* cleanup

* updates

* more refactoring

* make style

* update inits

* support other inputs in base

* update based on review

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* Update tests/pipelines/test_pipelines_automatic_mask_generation.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* update

* fixup

* TODO x and y to refactor, _h _w refactored here

* update docstring

* more nits

* style on these

* more doc fix

* rename variables

* update

* updates

* style

* update

* fix `_mask_to_rle_pytorch`

* styling

* fix ask to rle, wrong outputs

* add device arg

* update

* more updates, fix tets

* udpate

* update docstrings

* styling

* fixup

* add notebook on the docs

* update orginal sizes

* fix docstring

* updat condition on point_per-batch

* updates tests

* fix CI  test

* extend is required, append does not work!

* fixup

* fix CI tests

* whit pixels left

* address doc comments

* fix doc

* slow pipeline tests

* update auto init

* add revision

* make fixup

* update p!ipoeline tag when calling tests

* alphabeitcal order in inits

* fix copies

* last style nits

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* reformat docstring

* more reformat

* address most of the comments

* Update src/transformers/pipelines/mask_generation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* final refactor

* Update src/transformers/models/sam/image_processing_sam.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixup and fix slow tests

* revert

---------

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-04-20 19:27:24 +02:00
fxmarty
4cfe328bae
Fix SAM example in documentation (#22887)
fix sam example
2023-04-20 12:22:42 +02:00
Younes Belkada
2da73f6302
[SAM] Correct arxiv link (#22886)
put correct link
2023-04-20 11:23:12 +02:00
Arthur
474bf508df
Add Segment Anything Model (SAM) (#22654)
* initial commit

* keys match

* update, fix conversion

* fixes, inference working

* fix

* more fixes

* more fixes

* clean up

* more clean up

* fix copies and add convext copied layer norm

* stash

* pretty big upfate

* cleaning

* more cleaning

* fixup stuffs

* fix copies

* fix iinit

* update test removing tokenizer

* nits

* add pretrained

* more nits

* remove tracking of pipeline

* few fixes

* update san and conversion script

* fix mask decoder and prompt encoder conversion

* fixes

* small update

* fix order

* fix

* fix image embeddings

* nites

* few fixes

* fix logits

* clean up

* fixes boxes inference

* v1 AMG

* clean up

* some clean up

* multi points support

* amg working

* fixup

* clean up

* readme

* update toctree

* fix type hint

* multiple fixes

* fixup

* fixes

* updates

* updates

* more tests

* few fixes

* change to `SamForMaskGeneration`

* doc

* fixup

* fix more tests

* multiple fixes

* fix CI tests

* refactor processor

* renamings

* draft the pipeline

* refactor

* fix tests

* fix test

* few cleanings

* fix test

* edit pipelien support chunking

* udate

* add slow tests

* fix nit

* fixup

* fix nit

* current chunk pipleine

* cast boxes in fp32

* nit

* current updates

* piepleine works

* fixup

* clean up config

* fix slow tests

* fix slow tests

* clean up

* update doc and pipeline

* adds more slow tests

* fix slow tests

* cleaning

* tests pass

* add docstring

* fix copies

* clean up

* support batch of images

* style

* dummy is needed, add tests

* fix slow tests

* fix CI

* update

* adds more tests

* fixes

* fixes

* fixup

* fixes

* few fixes

* filter

* few fixes

* some refactor

* touches finales

* fix

* style

* remove pipeline files

* fixes nits

* revert pipeline changes

* fix test

* fixup

* remove automodel for automatic mask generation

* fix failing torch tests

* update mdx

* revert removal of `MODEL_FOR_AUTOMATIC_MASK_GENERATION_MAPPING`

* update sam config based on review

Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* update low_resolution_masks -> pred_masks
inti ln with layer_norm_eps
add_decomposed_rel_pos doc
forward doc of SamForMaskGeneration

* update processor docstring

* remove image processor import empty

* update for testing

* output vision hidden states + clean recomm
also test all iou values

* fixup

* fixup

* remove unused

* Update src/transformers/models/sam/modeling_sam.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/sam/image_processing_sam.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* nits

* fix

* fix CI tests and slow tests

* replace with Amy's processor

* clearer docstring

* add `SamVisionNeck`

* refactor - all CI tests should pass

* fix broken import on Gcolab

* few fixes here and there

* fix another bug

* fix more bugs

* update and merge

* correct ckpt

* address comments

* add tips

* revert

* fix docstring

* replace with `SamModel`

* make fixup

* add support for bathed images and batch ed points

* make fixup this time, really

* make fixup again and again

* few fixes here and there, this should be the touche finale

* Update docs/source/en/model_doc/sam.mdx

* fixup

* correct checkpoints

* correct name

* rm unneeded file

* add notebook

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-04-19 21:01:49 +02:00
Joao Gante
9dfd6a4baa
Generate: handle text conditioning with multimodal encoder-decoder models (#22748) 2023-04-13 19:51:13 +01:00
NielsRogge
8eb38f638d
[Pix2struct] Simplify generation (#22527)
* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link
2023-04-13 09:01:14 -04:00
pioliverse
523ca4e016
add model resources for CPMAnt (new) (#20906)
* resolve conflicts

* rebase and make style

* test

* test

* test

* rebase and make style

* rebase and make style

* tests

* tests

* rewrite some functions

* rebase and make style

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* add models and tests

* solve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* save resolution

* make style

* delete redefinition code

* reformat function

* reformat

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* make style

* fix bugs and refactor

* modify docstrings and make style

* unify import format in __init__.py

* fix import-altclp bug

* fix copies to update index.md

* fix unused config parameters

* fix unused config parameters

* fix unused config parameters

* update README_ja.md

* dummy commit for unit test

* fix attention mask

* add CPMAntTokenizer&-Fast to auto-mapping

* drop redundant changes in README_ko

* fix  defaults in docstring

* fix use_cache and some docstring

* add missing args in tokenizer

* modify tester inheritance

* add is_jieba_available

* fix some bugs

* make style and fix-copies

* add doctests

* skip integration tests

* add is_jieba_available

* fix bugs in common tests

* adjust docstrings and make style

* add argument docstring

* adjust code to some specifications

* make style and fix-copies

* add fast tokenization test

* dummy commit for unit test

* dummy commit for unit test

* dummy commit for unit test

* normalize some comments and names

* Bert->CPMAnt

* camel names and drop redundant codes

* make style and fix-coies

* add CpmTokenizerFast _import_structure

* drop cpmanttokenizerfast in model_doc

* fix some problems

* fix CPMAnt tokenization for common test

* make style and fixup

* fix copies and fixup

* fix bugs in tokenization test

* dummy commit for connection failure in unittest

* fix copies

* drop trailing comma

* fix decorator in tests

* dummy commit for connection failure in unittest

---------

Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
2023-04-12 07:33:20 -04:00
Arthur
b76e6ebd44
remove wrong doc in readme (#22723) 2023-04-12 07:11:12 -04:00
Sugawara
6daa9cb515
add GPTNeoXForSequenceClassification (#22671)
* add GPTNeoXForSequenceClassification

* move the labels to logits.device (ref: #22561)

* fix
2023-04-10 11:52:23 -04:00
Joel Lamy-Poirier
e0921c6b53
Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575)
* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-04-10 10:57:21 +02:00
Nicolas Patry
1670be4bde
Adding Llama FastTokenizer support. (#22264)
* Adding Llama FastTokenizer support.

- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens

How to test:

```python

from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer

tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")

if False:
    new_tokenizer = Tokenizer.from_file("tok.json")
else:
    new_tokenizer = convert_slow_tokenizer(tokenizer)
    new_tokenizer.save("tok.json")

strings = [
    "This is a test",
    "生活的真谛是",
    "生活的真谛是[MASK]。",
    # XXX: This one is problematic because of special tokens
    # "<s> Something something",
]

for string in strings:
    encoded = tokenizer(string)["input_ids"]
    encoded2 = new_tokenizer.encode(string).ids

    assert encoded == encoded2, f"{encoded} != {encoded2}"

    decoded = tokenizer.decode(encoded)
    decoded2 = new_tokenizer.decode(encoded2)

    assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```

The converter + some test script.

The test script.

Tmp save.

Adding Fast tokenizer + tests.

Adding the tokenization tests.

Correct combination.

Small fix.

Fixing tests.

Fixing with latest update.

Rebased.

fix copies + normalized added tokens  + copies.

Adding doc.

TMP.

Doc + split files.

Doc.

Versions + try import.

Fix Camembert + warnings -> Error.

Fix by ArthurZucker.

Not a decorator.

* Fixing comments.

* Adding more to docstring.

* Doc rewriting.
2023-04-06 09:53:03 +02:00
Younes Belkada
176ceff91f
Add DePlot + MatCha on transformers (#22528)
* add deplot + matcha on `transformers`

* more docs

* correct path

* Update docs/source/en/model_doc/deplot.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* use auto processor

* Update docs/source/en/model_doc/matcha.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Update docs/source/en/model_doc/deplot.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add correct names

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-04-05 17:43:48 +02:00
Shubhamai
900677487d
Flax Regnet (#21867)
* initial commit

* review changes

* post model PR merge

* updating doc
2023-04-04 12:41:12 -04:00
Matt
5f3ea66bc0
Add TF port of BLIP (#22090)
* Initial commit

* more stash commit

* Yet another stash commit

* yet more stash commit

* Mostly working except for docs / repo consistency

* Stop importing model list from torch file

* Add TF BLIP models to docs

* Add auto classes

* Move get_text_features and get_image_features

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/models/blip/test_modeling_tf_blip_text.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use channels_last convolutions in TF (better performance + compatibility)

* Remove _shape function

* Move multi-line statement to one line in PT + TF

* Specify tf.keras.layers instead of importing from it

* Remove test_gradient_checkpointing and empty test_training methods

* move some multi-line statements to one line

* Update docstring for generate

* Remove pruned heads set

* Remove self.seq_len_dim

* Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states

* ensure original model follows config in more cases

* Skip the same cross-attention tests in the PT tests - didn't realize we did it twice!

* Add training args throughout the models and layers

* make fixup

* Fix docstring for inputs_embeds

* Add docstring for is_decoder

* Add docstrings to text models

* Remove redundant computation

* Add unpack_inputs / keras_serializable

* Add modeling_tf_blip to doctests

* Add config classes for keras serialization

* Changes to allow model porting with pt-to-tf

* Quick fix to decoder head and test tweaks

* Revert an issue with masking the embeddings outputs

* Allow missing keys in some equivalence tests (for unused layers)

* Add tf-pt equivalence tests back in

* Update src/transformers/models/blip/modeling_tf_blip.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Refactor invert_attention_mask out into tf_utils

* Re-enable cross-tests on the PT side too

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-04 16:05:22 +01:00
Arthur
00b5887b94
🚨🚨🚨 [NLLB Tokenizer] Fix the prefix tokens 🚨🚨🚨 (#22313)
* fix the prefix tokens

* update fast and test values

* add legacy behaviour

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* update disclaimer, linkissue PR and behaviral changes

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* styling

* make a quote

* quote this time

---------

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-04-04 14:53:06 +02:00
Kirill
a60010566a
llama docs: fix conversion script url (#22514) 2023-04-03 10:28:40 -04:00
Mohammed Jabir
7d25c9c81e
added biogpt token classifier (#22447)
* added biogpt token classifier

* fix reviews

* Updated modeling_biogpt.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-04-03 09:20:02 -04:00
Arthur
19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
Shubhamai
a0cbbba31f
Resnet flax (#21472)
* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-24 19:45:57 +00:00
Mitch Naylor
57f25f4b7f
Add Mega: Moving Average Equipped Gated Attention (#21766)
* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-24 08:17:27 -04:00
Younes Belkada
0f68a7f408
Add Pix2Struct (#21400)
* v1 all keys match

* clean up

* forward pass ok

* add correct image transform

* generate works, logits matching

* clean up

* more refactor

* revert

* revert

* clean up

* clean ups

* clean up

* refactor

* refactor

* fix doc

* fix tokenizer test

* fix toctree

* revert toctree

* oops

* few fixes

* replace to `pixel_embeds`

* make fixup

* test processing & feat extractor

* fix some tests

* more fixes

* make fixup

* clean up

* more clean up

* add a single slow test

* fix test

* make fixup

* fix

* fix authors

* fix toctree

* update docs

* add docstring

* revert change

* Update src/transformers/models/pix2struct/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer

* fix processor test

* fix test

* make fixup

* refactor

* fix config

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format

* fix

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

* add docstring

* fix issues

* fix

* fix

* fix

* add slow test

* fix

* fix

* fix batched issue

* fix training issues

* fix ci test

* fix slow test

* fix conversion script

* remove unneeded classes

* fix slow test

* fix require backends

* fix masked fill

* revert

* fix softmax

* add large models support

* fix conditional generation

* few fixes

* add instructions

* rm unneeded file

* Update src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py

* fix ci test

* fix ci test really

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix nit

* fix nits

* fix image processors nits

* docstring

* clean up

* fix nit

* fix tests

* docstring nit

* fix reshape

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix nit

* fix repetition

* refactor processor

* make patch size consistent

* refactor forward

* fix docstring

* fix max_patches issue

* update docstirng

* update docstring

* fix coped from

* add skip reasons

* few fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* format

* fix doctests

* refactor and fix

* fix doc build issue

* fix processor test

* small fix conversion script

* replace correct weights

* make fixup

* fix some issues

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert config and fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more details

* fixes

* fix processor

* fix processor test

* fix

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* fix processor

* Update src/transformers/models/pix2struct/modeling_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add copied

* make fixup

* fix copies

* update docstring

* refactor

* fix docstring

* fix conversion script

* fix vqa issue

* replace to `flattened_patches`

* nit

* fix numpy issue

* fix image processors

* add batched vqa support

* fix vqa conversion

* make fixup

* fix conversion script

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add correct docstring

* update docstring

* fix module level + channel dim

* use `make_list_of_images`

* refactor

* correct docstring

* fix authors

* remove `data_format`

* add header text test

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add checkpoints

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-22 16:53:52 +01:00
Sylvain Gugger
786092a35e
Rework a bit the LLaMA conversion script (#22236)
* Update LLaMA conversion script

* Doc

* Fix the weight size for the 13B checkpoint

* Update src/transformers/models/llama/convert_llama_weights_to_hf.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-03-20 11:30:36 -04:00
Seb0
074490b2c2
fix(docs): fix task guide links in model docs (#22226)
fix(docs): task guide links in model docs
2023-03-17 14:30:17 +00:00
Maria Khalusova
314cdf7c25
Removed .mdx extension in two links (#22230)
removed .mdx extension
2023-03-17 10:27:12 -04:00
lewtun
f251441387
Add LlamaForSequenceClassification (#22209)
* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Sylvain Gugger
00934026a4
LLaMA house-keeping (#22216)
* LLaMA house-keeping

* Doc links
2023-03-17 08:55:15 -04:00
Maria Khalusova
42f8f76402
Depth estimation task guide (#22205)
* added doc to toc, auto tip with  supported models, mention of task guide in model docs

* make style

* removed "see also"

* minor fix
2023-03-17 08:36:23 -04:00
wangpeng
af1c864cdc
fix code example in mgp-str doc (#22219)
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-17 09:40:06 +00:00
Kevin Turner
33d033d694
fix typos in llama.mdx (#22223) 2023-03-17 08:43:18 +00:00
Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
Alara Dirik
1485bd9c02
Fix typo in Align docs (#22199)
Fix align docs typo
2023-03-16 13:41:48 +03:00
Alara Dirik
cdddfbffa1
Add ConvNeXT V2 (#21679)
* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues
2023-03-14 12:08:14 +03:00
wangpeng
102b5ff4a8
add new model of MGP-STR (#21418)
* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* remove representation_size from MGPSTRConfig

* reformat configuration_mgp_str.py

* format test_processor_mgp_str.py

* add test for tokenizer and complete model/processer test and model file

* rm Unnecessary tupple in modeling_mgp_str

* reduce hidden_size/layers/label_size in test_model

* add integration tests and change MGPSTR to Mgpstr

* add test for logit values

* reformat test model file

---------

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-13 10:11:31 +00:00
Alara Dirik
32e3466d38
Add AutoModelForZeroShotImageClassification (#22087)
Adds AutoModelForZeroShotImageClassification to transformers
2023-03-13 12:46:14 +03:00
Maria Khalusova
bdec2768bd
GPT-J specific half precision on CPU note (#22086)
* re: #21989

* update re: #21989

* removed cpu option

* make style
2023-03-10 14:03:43 -05:00
Kevin Jiang
ade26bf991
Fix small typo in flan-ul2.mdx (#22068)
* Update flan-ul2.mdx

* Update flan-ul2.mdx
2023-03-10 07:44:45 -05:00
Alara Dirik
2055d737ad
Update ALIGN docs (#22025)
* Fix typos and add code examples, resources
2023-03-09 14:12:17 +03:00
Anahita Bhiwandiwalla
de81adf978
[WIP] Add BridgeTowerForContrastiveLearning (#21964)
* Add BridgeTower for ITC

* Fix review feedback

* Rename BridgeTowerForITC, cleanup

* Fix style and quality

* implement tests

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
2023-03-08 09:00:54 -05:00
Eli Simhayev
8abe4930d3
[Time-Series] informer model (#21099)
* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃

* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-07 21:36:38 +01:00
Sanchit Gandhi
7c39318136
[Whisper] Add model for audio classification (#21754)
* [Whisper] Add model for audio classification

* make fix-copies

* add to docs

* add docstring

* empty returns

* add code example

* switch to fleurs

* stick everything on one line
2023-03-07 16:20:21 +01:00
Arthur
82aac00e0f
[Flan-UL2] Add-flan-ul2 (#21929)
* add doc and readme

* add model docs

* update toctree and fix copies

* update

* update doc file

* fix

* add FLAN-UL2 to configuration mapping

* fixup

* Apply suggestions from code review

* more clarification

---------

Co-authored-by: younesbelakda <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-03 17:57:24 +01:00
Alara Dirik
269b054939
Add ALIGN to transformers (#21741)
Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
2023-03-01 21:23:31 +03:00