Commit Graph

39 Commits

Author SHA1 Message Date
Aymeric Roucher
9ad4c93536
Add Aria (#34157)
* Add Aria
---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-06 12:17:34 +01:00
Arthur
c1c7e89620
Fix Gradient Accumulation issue (#34191)
* quick fix

* 3 losses

* oups

* fix

* nits

* check how it scales for special models

* propagate for conditiona detr

* propagate

* propagate

* propagate

* fixes

* propagate changes

* update

* fixup

* nits

* f string

* fixes

* more fixes

* ?

* nit

* arg annoying f string

* nits

* grumble

* update

* nit

* refactor

* fix fetch tests

* nit

* nit

* Update src/transformers/loss/loss_utils.py

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* update

* nit

* fixup

* make pass

* nits

* port code to more models

* fixup

* ntis

* arf

* update

* update

* nits

* update

* fix

* update

* nits

* fine

* agjkfslga.jsdlkgjklas

* nits

* fix fx?

* update

* update

* styel

* fix imports

* update

* update

* fixup to fix the torch fx?

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-10-17 22:34:40 +02:00
Raushan Turganbay
fd70464fa7
Fix flaky tests (#34069)
* fix mllama only

* allow image token index
2024-10-11 14:41:46 +01:00
pglorio
f319ba16fa
Add Zamba (#30950)
* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* Moved mamba init into `_init_weights`

* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Moved mamba init into `_init_weights`

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* make fixup fixes

* quality test fixes

* Fix Zamba model path

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* Update

* circleci fixes

* fix zamba test from merge

* fix ValueError for disabling mamba kernels

* add HF copyright

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* shared_transf --> shared_transformer

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixes

* Move attention head dim to config

* Fix circle/ci tests

* Update modeling_zamba.py

* apply GenerationMixin inheritance change from upstream

* apply import ordering

* update needed transformers version for zamba

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add contribution author

* add @slow to avoid CI

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Define attention_hidden_size

* Added doc for attention_head_size

* trigger CI

* Fix doc of attention_hidden_size

* [run-slow] zamba

* Fixed shared layer logic, swapped up<->gate in mlp

* shared_transformer -> shared_transf

* reformat HybridLayer __init__

* fix docstrings in zamba config

* added definition of _get_input_ids_and_config

* fixed formatting of _get_input_ids_and_config

---------

Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
2024-10-04 22:28:05 +02:00
Arthur
19d58d31f1
Add MLLama (#33703)
* current changes

* nit

* Add cross_attenttion_mask to processor

* multi-image fixed

* Add cross_attenttion_mask to processor

* cross attn works in all cases

* WIP refactoring function for image processor

* WIP refactoring image processor functions

* Refactor preprocess to use global loops instead of list nested list comps

* Docstrings

* Add channels unification

* fix dtype issues

* Update docsrings and format

* Consistent max_image_tiles

* current script

* updates

* Add convert to rgb

* Add image processor tests

* updates!

* update

* god damn it I am dumb sometimes

* Precompute aspect ratios

* now this works, full match

* fix 😉

* nits

* style

* fix model and conversion

* nit

* nit

* kinda works

* hack for sdpa non-contiguous bias

* nits here and there

* latest c hanges

* merge?

* run forward

* Add aspect_ratio_mask

* vision attention mask

* update script and config variable names

* nit

* nits

* be able to load

* style

* nits

* there

* nits

* make forward run

* small update

* enable generation multi-turn

* nit

* nit

* Clean up a bit for errors and typos

* A bit more constant fixes

* 90B keys and shapes match

* Fix for 11B model

* Fixup, remove debug part

* Docs

* Make max_aspect_ratio_id to be minimal

* Update image processing code to match new implementation

* Adjust conversion for final checkpoint state

* Change dim in repeat_interleave (accordig to meta code)

* tmp fix for num_tiles

* Fix for conversion (gate<->up, q/k_proj rope permute)

* nits

* codestyle

* Vision encoder fixes

* pass cross attn mask further

* Refactor aspect ratio mask

* Disable text-only generation

* Fix cross attention layers order, remove q/k norm rotation for cross atention layers

* Refactor gated position embeddings

* fix bugs but needs test with new weights

* rope scaling should be llama3

* Fix rope scaling name

* Remove debug for linear layer

* fix copies

* Make mask prepare private func

* Remove linear patch embed

* Make precomputed embeddings as nn.Embedding module

* MllamaPrecomputedAspectRatioEmbedding with config init

* Remove unused self.output_dim

* nit, intermediate layers

* Rename ln and pos_embed

* vision_chunk_size -> image_size

* return_intermediate -> intermediate_layers_indices

* vision_input_dim -> hidden_size

* Fix copied from statements

* fix most tests

* Fix more copied from

* layer_id->layer_idx

* Comment

* Fix tests for processor

* Copied from for _prepare_4d_causal_attention_mask_with_cache_position

* Style fix

* Add MllamaForCausalLM

* WIP fixing tests

* Remove duplicated layers

* Remove dummy file

* Fix style

* Fix consistency

* Fix some TODOs

* fix language_model instantiation, add docstring

* Move docstring, remove todos for precomputed embeds (we cannot init them properly)

* Add initial docstrings

* Fix

* fix some tests

* lets skip these

* nits, remove print, style

* Add one more copied from

* Improve test message

* Make validate func private

* Fix dummy objects

* Refactor `data_format` a bit + add comment

* typos/nits

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix dummy objects and imports

* Add chat template config json

* remove num_kv_heads from vision attention

* fix

* move some commits and add more tests

* fix test

* Remove `update_key_name` from modeling utils

* remove num-kv-heads again

* some prelimiary docs

* Update chat template + tests

* nit, conversion script max_num_tiles from params

* Fix warning for text-only generation

* Update conversion script for instruct models

* Update chat template in converstion + test

* add tests for CausalLM model

* model_max_length, avoid null chat_template

* Refactor conversion script

* Fix forward

* Fix integration tests

* Refactor vision config + docs

* Fix default

* Refactor text config

* Doc fixes

* Remove unused args, fix docs example

* Squashed commit of the following:

commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date:   Wed Sep 18 13:39:15 2024 +0000

    Move model + add output hidden states and output attentions

* Fix num_channels

* Add mllama text and mllama vision models

* Fixing repo consistency

* Style fix

* Fixing repo consistency

* Fixing unused config params

* Fix failed tests after refactoring

* hidden_activation -> hidden_act  for text mlp

* Remove from_pretrained from sub-configs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Reuse lambda in conversion script

* Remove run.py

* Update docs/source/en/model_doc/mllama.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/processing_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused LlamaTokenizerFast

* Fix logging

* Refactor gating

* Remove cycle for collecting intermediate states

* Refactor text-only check, add integration test for text-only

* Revert from pretrained to configs

* Fix example

* Add auto `bos_token` adding in processor

* Fix tips

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Enable supports_gradient_checkpointing model flag

* add eager/sdpa options

* don't skip attn tests and bring back GC skips (did i really remove those?)

* Fix signature, but get error with None gradient

* Fix output attention tests

* Disable GC back

* Change no split modules

* Fix dropout

* Style

* Add Mllama to sdpa list

* Add post init for vision model

* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model

* if skipped, say it, don't pass

* Clean vision tester config

* Doc for args

* Update tests/models/mllama/test_modeling_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add cross_attention_mask to test

* typehint

* Remove todo

* Enable gradient checkpointing

* Docstring

* Style

* Fixing and skipping some tests for new cache

* Mark flaky test

* Skip `test_sdpa_can_compile_dynamic` test

* Fixing some offload tests

* Add direct GenerationMixin inheritance

* Remove unused code

* Add initializer_range to vision config

* update the test to make sure we show if split

* fix gc?

* Fix repo consistency

* Undo modeling utils debug changes

* Fix link

* mllama -> Mllama

* [mllama] -> [Mllama]

* Enable compile test for CausalLM model (text-only)

* Fix TextModel prefix

* Update doc

* Docs for forward, type hints, and vision model prefix

* make sure to reset

* fix init

* small script refactor and styling

* nit

* updates!

* some nits

* Interpolate embeddings for 560 size and update integration tests

* nit

* does not suppor static cache!

* update

* fix

* nit2

* this?

* Fix conversion

* Style

* 4x memory improvement with image cache AFAIK

* Token decorator for tests

* Skip failing tests

* update processor errors

* fix split issues

* style

* weird

* style

* fix failing tests

* update

* nit fixing the whisper tests

* fix path

* update

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Joao Gante
238b13478d
Gemma2: fix config initialization (cache_implementation) (#33684) 2024-09-24 18:22:00 +01:00
Shijie
19e6e80e10
support qwen2-vl (#32318)
* support-qwen2-vl

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* hyphen->underscore

* make style

* add-flash2-tipd

* delete-tokenize=False

* remove-image_processor-in-init-file

* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES

* format-doct

* support-Qwen2VLVisionConfig

* remove-standardize_cache_format

* fix-letter-varaibles

* remove-torch-in-image-processor

* remove-useless-docstring

* fix-one-letter-varaible-name

* change-block-name

* default-quick-gelu-in-vision

* remove-useless-doc

* use-preimplemented-flash-forward

* fix-doc

* fix-image-processing-doc

* fix-apply-rotary-embed

* fix-flash-attn-sliding-window

* refactor

* remove-default_template

* remove-reorder_cache

* simple-get-rope_deltas

* update-prepare_inputs_for_generation

* update-attention-mask

* update-rotary_seq_len

* remove-state

* kv_seq_length

* remove-warning

* _supports_static_cache

* remove-legacy-cache

* refactor

* fix-replace

* mrope-section-doc

* code-quality

* code-quality

* polish-doc

* fix-image-processing-test

* update readme

* Update qwen2_vl.md

* fix-test

* Update qwen2_vl.md

* nit

* processor-kwargs

* hard-code-norm_layer

* code-quality

* discard-pixel-values-in-gen

* fix-inconsistent-error-msg

* unify-image-video

* hidden_act

* add-docstring

* vision-encode-as-PreTrainedModel

* pixel-to-target-dtype

* update doc and low memoryvit

* format

* format

* channel-foramt

* fix vit_flashatt

* format

* inherit-Qwen2VLPreTrainedModel

* simplify

* format-test

* remove-one-line-func-in-image-processing

* avoid-one-line-reshape

* simplify-rotary_seq_len

* avoid-single-letter-variable

* no-for-loop-sdpa

* avoid-single-letter-variable

* remove-one-line-reshape

* remove-one-line-reshape

* remove-no-rope-in-vit-logic

* default-mrope

* add-copied-from

* more-docs-for-mrope

* polish-doc

* comment-and-link

* polish-doc

* single-letter-variables

* simplify-image-processing

* video->images

* kv_seq_len-update

* vision-rope-on-the-fly

* vision-eager-attention

* change-processor-order

---------

Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
Younes Belkada
7c11491208
Add new model (#32615)
* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Arthur
e314395277
Refactor flash attention implementation in transformers (#31446)
* dumb commit

* nit

* update

* something like this

* unpack in modeling utils

* safe import

* oups

* update

* nits

* diff convert gemma

* update

* start propagating

* udpate other modeling code as well

* update for sliding window models

* nits

* more init cleanups

* styling

* fixup

* noice

* pass fixup

* typo typing_extension -> typing_extensions

* torch.nn.functionnal -> torch.nn.functional

* add to import structure

* unpack

* simplify a bit more for this first version

* nut

* update

* update

* nit

* ease the import of `Unpack`

* remove useless `use_sliding_window`

* no qua please

* protect import?

* style

* [run-slow]

* [run slow] llama,gemma,mistral,mixtral

* remove extra kwargs

* fix llama

* address review comments

* apply diff_model_converter to modeling_gemma.py

* remove cache_position 1

* remove cache_position 2

* some cleaning

* refactor gemma2 as well

* apply review comments

* rename file to modeling_flash_attention_utils.py

* siglip refactor

* remove dead code

* is the hub down?

* still down?

* fix siglip

* fix gemma2

* fatal: Could not read from remote repository.

* fix typo in softcap implem

* flacky

* Failed: Timeout >120.0s

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
2024-07-11 20:37:31 +08:00
Arthur
0cf60f13ab
Add gemma 2 (#31659)
* inital commit

* Add doc

* protect?

* fixup stuffs

* update tests

* fix build documentation

* mmmmmmm config attributes

* style

* nit

* uodate

* nit

* Fix docs

* protect some stuff

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
amyeroberts
a564d10afe
Deprecate low use models (#30781)
* Deprecate models
- graphormer
- time_series_transformer
- xlm_prophetnet
- qdqbert
- nat
- ernie_m
- tvlt
- nezha
- mega
- jukebox
- vit_hybrid
- x_clip
- deta
- speech_to_text_2
- efficientformer
- realm
- gptsan_japanese

* Fix up

* Fix speech2text2 imports

* Make sure message isn't indented

* Fix docstrings

* Correctly map for deprecated models from model_type

* Uncomment out

* Add back time series transformer and x-clip

* Import fix and fix-up

* Fix up with updated ruff
2024-05-28 18:07:07 +01:00
tomeras91
3f20877da9
Add jamba (#29943)
* Add jamba arch

* apply "make fix-copies" changes

* fix link to model in JambaConfig docstring

* Add n_ctx in modeling file because repo-consistency wants that

* Add jamba to flash attention and sdpa documentation

* mamba dt_proj quant fix now works for LoRA as well

* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers

* add jamba to tokenization auto

* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)

* simple PR fixes

* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer

* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)

* Add copied comment on JambaMLP (it's the same as MixtralMLP)

* remove padding_mask warnings. It's not supported anymore

* fix docstring. Float instead of int

* A few more minor PR fixes

* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass

* Return None attention weights from mamba layers. Append to all attentions only if not None.

* remove some leftover jamba archive lists

* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel

* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers

* Add Jamba paper on READMEs

* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)

* Add copied from comment

* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms

* clearer docstring for _convert_to_standard_cache

* style fixes

* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs

* rename test so it still overrides what its meant to override

* draft

* oups

* nit

* remove more complexe logic

* fix names used in config

* fix fix fix

* style

* fix some more failing tests

* generate did not init the cache 🙃

* more small nits

* typo

* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes

* fix init of pkv with torch.tensor()

* empty tensor

* fix some init issues

* stupid changes required by generate because it does not even support it's own DynamicCache class

* more fixes

* fix general assisted gen cache_position bug

* tests passing

* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py

* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache

* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore

* fix docstrings and typehints for past_key_values

* style fixes

* fix docs

* change typehint due to copy from Mixtral

* forgot import

* import order

* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)

* Add integration test with tiny tandom Jamba model on hub

* fix flash attention cache shapes

* bring back forgotten hidden states

* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model

* align integration test after modeling fixes

* bugfix - mamba can use precomputed states only of forward pass is on a single token

* bugfix - mamba can use precomputed states only if they match the batch size

* typo

* remove making _prepare_4d_causal_attention_mask a leaf function

* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-04-18 11:04:02 +02:00
Arthur
0fe44059ae
Add recurrent gemma (#30143)
* Fork.

* RecurrentGemma initial commit.

* Updating __init__.py.

* Minor modification to how we initialize the cache.
Changing how the config specifies the architecture.

* Reformat code to 4 spaces.
Fixed a few typos.

* Fixed the forward pass.
Still unclear on the cache?

* Fixed the RecurrentGemmaForCausalLM

* Minor comment that we might not need attention_mask and output_attention arguments.

* Now cache should work as well.

* Adding a temporary example to check whether the model generation works.

* Adding the tests and updating imports.

* Adding the example file missing in the previous commit.

* First working example.

* Removing .gitignore and reverting parts of __init__.

* Re-add .gitignore.

* Addressing comments for configuration.

* Move mask creation to `_prepare_inputs_for_generation`.

* First try at integration tests:
1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
2. `cache_position` not passed

* Transfoering between machines.

* Running normal tests.

* Minor fix.

* More fixes.

* Addressing more comments.

* Minor fixes.

* first stab at cleanup

* more refactoring

* fix copies and else

* renaming and get init to work

* fix causal mask creation

* update

* nit

* fix a hell lot of things

* updates

* update conversion script

* make all keys importable

* nits

* add auto mappings

* properly convert ffw_up and down

* add scaling

* fix generations

* for recurrent dtype

* update

* fix going beyong window

* fixup

* add missing files

* current updates to remove last einops

* finish modeling refactor

* TADA

* fix compile

* fix most failing testt ? ?

* update tests

* refactor and update

* update

* nits, fixup and update tests

* more fixup

* nits

* fix imports

* test format

* fixups

* nits

* tuple typing

* fix code quality

* add model card

* fix doc

* skip most generation tests

* nits

* style

* doc fixes

* fix pr and check_copies?

* last nit

* oupsy

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update

* Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* doc nit

* fix quality

* quality

* fix slow test model path

* update default dype

* ignore attributes that can be safely ignored in check config attributes

* 0lallalala come on

* save nit

* style

* remove to dict update

* make sure we can also run in float16

* style

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Aleksandar Botev <botev@google.com>
Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com>
Co-authored-by: anushanf <anushanf@google.com>
Co-authored-by: botev <botevmg@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-10 16:59:13 +02:00
Arthur
fb1c62e973
[Add Mamba] Adds support for the Mamba models (#28094)
* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-05 20:01:06 +09:00
NielsRogge
836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
amyeroberts
0199a484eb
Backbone kwargs in config (#28784)
* Enable instantiating model with pretrained backbone weights

* Clarify pretrained import

* Use load_backbone instead

* Add backbone_kwargs to config

* Pass kwargs to constructors

* Fix up

* Input verification

* Add tests

* Tidy up

* Update tests/utils/test_backbone_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-14 20:46:44 +00:00
amyeroberts
2fa1c808ae
[Backbone] Use load_backbone instead of AutoBackbone.from_config (#28661)
* Enable instantiating model with pretrained backbone weights

* Remove doc updates until changes made in modeling code

* Use load_backbone instead

* Add use_timm_backbone to the model configs

* Add missing imports and arguments

* Update docstrings

* Make sure test is properly configured

* Include recent DPT updates
2024-01-30 16:54:09 +00:00
amyeroberts
27c79a0fb4
Enable instantiating model with pretrained backbone weights (#28214)
* Enable instantiating model with pretrained backbone weights

* Update tests so backbone checkpoint isn't passed in

* Remove doc updates until changes made in modeling code

* Clarify pretrained import

* Update configs - docs and validation check

* Update src/transformers/utils/backbone_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Clarify exception message

* Update config init in tests

* Add test for when use_timm_backbone=True

* Small test updates

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-23 11:01:50 +00:00
Connor Henderson
d83ff5eeff
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style
2024-01-03 18:01:06 +00:00
Yoach Lacombe
29f1aee3b6
Add SeamlessM4T v2 (#27779)
* add working convertion script

* first non-working version of modeling code

* update modeling code (working)

* make style

* make fix-copies

* add config docstrings

* add config to ignore docstrings formatage due to unconventional markdown

* fix copies

* fix generation num_return_sequences

* enrich docs

* add and fix tests beside integration tests

* update integration tests

* update repo id

* add tie weights and make style

* correct naming in .md

* fix imports and so on

* correct docstrings

* fix fp16 speech forward

* fix speechencoder attention

* make style

* fix copied from

* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2

* Apply suggestions on configuration

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove useless public models

* fix private models + better naming for T2U models

* clean speech encoder relative position embeddings

* refactor chunk attention

* add docstrings to chunk attention method

* improve naming and docstrings

* rename some attention variables + add temperature sampling in T2U model

* rename DOCSTRINGS variable names

* make style + remove 2 useless config parameters

* enrich model card

* remove any attention_head reference + fix temperature in T2U

* new fmt and make style

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename spkr_id->speaker_id and change docstrings of get_char_input_ids

* simplify v2attention

* make style

* Update seamless_m4t_v2.md

* update code and tests with last update

* update repo ids

* fill article name, abstract andauthors

* update not_doctested and slow_doc tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-30 20:24:43 +01:00
Yih-Dar
7293fdc5b9
Deprecate TransfoXL (#27607)
* fix

* fix

* trigger

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-24 11:48:02 +01:00
Yoach Lacombe
cb45f71c4d
Add Seamless M4T model (#25693)
* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-23 14:49:48 +02:00
Pablo Montalvo
caa0ff0bf1
Add fuyu model (#26911)
* initial commit

* add processor, add fuyu naming

* add draft processor

* fix processor

* remove dropout to fix loading of weights

* add image processing fixes from Pedro

* fix

* fix processor

* add basic processing fuyu test

* add documentation and TODO

* address comments, add tests, add doc

* replace assert with torch asserts

* add Mixins and fix tests

* clean imports

* add model tester, clean imports

* fix embedding test

* add updated tests from pre-release model

* Processor: return input_ids used for inference

* separate processing and model tests

* relax test tolerance for embeddings

* add test for logit comparison

* make sure fuyu image processor is imported in the init

* fix formattingh

* more formatting issues

* and more

* fixups

* remove some stuff

* nits

* update init

* remove the fuyu file

* Update integration test with release model

* Update conversion script.

The projection is not used, as confirmed by the authors.

* improve geenration

* Remove duplicate function

* Trickle down patches to model call

* processing fuyu updates

* remove things

* fix prepare_inputs_for_generation to fix generate()

* remove model_input

* update

* add generation tests

* nits

* draft leverage automodel and autoconfig

* nits

* fix dtype patch

* address comments, update READMEs and doc, include tests

* add working processing test, remove refs to subsequences

* add tests, remove Sequence classification

* processing

* update

* update the conversion script

* more processing cleanup

* safe import

* take out ModelTesterMixin for early release

* more cl;eanup

* more cleanup

* more cleanup

* and more

* register a buffer

* nits

* add postprocessing of generate output

* nits

* updates

* add one working test

* fix test

* make fixup works

* fixup

* Arthur's updates

* nits

* update

* update

* fix processor

* update tests

* passe more fixups

* fix

* nits

* don't import torch

* skip fuyu config for now

* fixup done

* fixup

* update

* oups

* nits

* Use input embeddings

* no buffer

* update

* styling processing fuyu

* fix test

* update licence

* protect torch import

* fixup and update not doctested

* kwargs should be passed

* udpates

* update the impofixuprts in the test

* protect import

* protecting imports

* protect imports in type checking

* add testing decorators

* protect top level import structure

* fix typo

* fix check init

* move requires_backend to functions

* Imports

* Protect types

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-10-18 15:24:11 -07:00
Matthijs Hollemans
4ece3b9433
add VITS model (#24085)
* add VITS model

* let's vits

* finish TextEncoder (mostly)

* rename VITS to Vits

* add StochasticDurationPredictor

* ads flow model

* add generator

* correctly set vocab size

* add tokenizer

* remove processor & feature extractor

* add PosteriorEncoder

* add missing weights to SDP

* also convert LJSpeech and VCTK checkpoints

* add training stuff in forward

* add placeholder tests for tokenizer

* add placeholder tests for model

* starting cleanup

* let the great renaming begin!

* use config

* global_conditioning

* more cleaning

* renaming variables

* more renaming

* more renaming

* it never ends

* reticulating the splines

* more renaming

* HiFi-GAN

* doc strings for main model

* fixup

* fix-copies

* don't make it a PreTrainedModel

* fixup

* rename config options

* remove training logic from forward pass

* simplify relative position

* use actual checkpoint

* style

* PR review fixes

* more review changes

* fixup

* more unit tests

* fixup

* fix doc test

* add integration test

* improve tokenizer tests

* add tokenizer integration test

* fix tests on GPU (gave OOM)

* conversion script can handle repos from hub

* add conversion script for all MMS-TTS checkpoints

* automatically create a README for the converted checkpoint

* small changes to config

* push README to hub

* only show uroman note for checkpoints that need it

* remove conversion script because code formatting breaks the readme

* make WaveNet layers configurable

* rename variables

* simplifying the math

* output attentions and hidden states

* remove VitsFlip in flow model

* also got rid of the other flip

* fix tests

* rename more variables

* rename tokenizer, add phonemization

* raise error when phonemizer missing

* re-order config docstrings to match method

* change config naming

* remove redundant str -> list

* fix copyright: vits authors -> kakao enterprise

* (mean, log_variances) -> (prior_mean, prior_log_variances)

* if return dict -> if not return dict

* speed -> speaking rate

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update fused tanh sigmoid

* reduce dims in tester

* audio -> output_values

* audio -> output_values in tuple out

* fix return type

* fix return type

* make _unconstrained_rational_quadratic_spline a function

* all nn's to accept a config

* add spectro to output

* move {speaking rate, noise scale, noise scale duration} to config

* path -> attn_path

* idxs -> valid idxs -> padded idxs

* output values -> waveform

* use config for attention

* make generation work

* harden integration test

* add spectrogram to dict output

* tokenizer refactor

* make style

* remove 'fake' padding token

* harden tokenizer tests

* ron norm test

* fprop / save tests deterministic

* move uroman to tokenizer as much as possible

* better logger message

* fix vivit imports

* add uroman integration test

* make style

* up

* matthijs -> sanchit-gandhi

* fix tokenizer test

* make fix-copies

* fix dict comprehension

* fix config tests

* fix model tests

* make outputs consistent with reverse/not reverse

* fix key concat

* more model details

* add author

* return dict

* speaker error

* labels error

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vits/convert_original_checkpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove uromanize

* add docstrings

* add docstrings for tokenizer

* upper-case skip messages

* fix return dict

* style

* finish tests

* update checkpoints

* make style

* remove doctest file

* revert

* fix docstring

* fix tokenizer

* remove uroman integration test

* add sampling rate

* fix docs / docstrings

* style

* add sr to model output

* fix outputs

* style / copies

* fix docstring

* fix copies

* remove sr from model outputs

* Update utils/documentation_tests.txt

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add sr as allowed attr

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-01 10:50:06 +01:00
Susnato Dhar
450a181d8b
Add Pop2Piano (#21785)
* init commit

* config updated also some modeling

* Processor and Model config combined

* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested

* model loading successful!

* feature extractor done!

* FE can now be called from HF

* postprocessing added in fe file

* same as prev commit

* Pop2PianoConfig doc done

* cfg docs slightly changed

* fe docs done

* batched

* batched working!

* temp

* v1

* checking

* trying to go with generate

* with generate and model tests passed

* before rebasing

* .

* tests done docs done remaining others & nits

* nits

* LogMelSpectogram shifted to FeatureExtractor

* is_tf rmeoved from pop2piano/init

* import solved

* tokenization tests added

* minor fixed regarding modeling_pop2piano

* tokenizer changed to only return midi_object and other changes

* Updated paper abstract(Camera-ready version) (#2)

* more comments and nits

* ruff changes

* code quality fix

* sg comments

* t5 change added and rebased

* comments except batching

* batching done

* comments

* small doc fix

* example removed from modeling

* ckpt

* forward it compatible with fe and generation done

* comments

* comments

* code-quality fix(maybe)

* ckpts changed

* doc file changed from mdx to md

* test fixes

* tokenizer test fix

* changes

* nits done main changes remaining

* code modified

* Pop2PianoProcessor added with tests

* other comments

* added Pop2PianoProcessor to dummy_objects

* added require_onnx to modeling file

* changes

* update .md file

* remove extra line in index.md

* back to the main index

* added pop2piano to index

* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too

* changes

* added return types to 2 tokenizer methods

* the PR build test might work now

* added backends

* PR build fix

* vocab added

* comments

* refactored vocab into 1 file

* added conversion script

* comments

* essentia version changed in .md

* comments

* more tokenizer tests added

* minor fix

* tests extended for outputs acc check

* small fix

---------

Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
2023-08-21 16:35:00 +01:00
Stas Bekman
6c811a322f
new model: IDEFICS via HuggingFaceM4 (#24796)
* rename

* restore

* mappings

* unedited tests+docs

* docs

* fixes

* fix auto-sync breakage

* cleanup

* wip

* wip

* add fetch_images

* remove einops dependency

* update

* fix

* fix

* fix

* fix

* fix

* re-add

* add batching

* rework

* fix

* improve

* add Leo as I am extending his work

* cleanup

* fix

* cleanup

* slow-test

* fix

* fix

* fixes

* deal with warning

* rename modified llama classes

* rework fetch_images

* alternative implementation

* cleanup

* strict version

* cleanup

* [`IDEFICS`] Fix idefics ci (#25056)

* Fix IDEFICS CI

* fix test file

* fixup

* some changes to make tests pass

* fix

* fixup

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* remove compat checks

* style

* explain that Idefics is not for training from scratch

* require pt>=2.0

* fix idefics vision config (#25092)

* fix idefics vision config

* fixup

* clean

* Update src/transformers/models/idefics/configuration_idefics.py

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* cleanup

* style

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* upcase

* sequence of images

* handle the case with no images

* Update src/transformers/image_processing_utils.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>

* support pure lm take 2

* support tokenizer options

* parameterize num_channels

* fix upcase

* s|IdeficsForCausalLM|IdeficsForVisionText2Text|g

* manual to one line

* addressing review

* unbreak

* remove clip dependency

* fix test

* consistency

* PIL import

* Idefics prefix

* Idefics prefix

* hack to make tests work

* style

* fix

* fix

* revert

* try/finally

* cleanup

* clean up

* move

* [`IDEFICS`] Fix idefics config refactor (#25149)

* refactor config

* nuke init weights

* more refactor

* oops

* remove visual question answering pipeline support

* Update src/transformers/models/idefics/clip.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/idefics/modeling_idefics.py

* cleanup

* mv clip.py vision.py

* tidyup

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

* fix

* license

* condition on pt

* fix

* style

* fix

* rm torchvision dependency, allow custom transforms

* address review

* rework device arg

* add_eos_token

* s/transforms/transform/

* fix top level imports

* fix return value

* cleanup

* cleanup

* fix

* style

* license

* license

* Update src/transformers/models/idefics/image_processing_idefics.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add a wrapper to freeze vision layears

* tidyup

* use the correct std/mean settings

* parameterize values from config

* add tests/models/idefics/test_image_processing_idefics.py

* add test_processor_idefics.py

* cleanup

* cleanups

* fix

* fix

* move to the right group

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add perceiver config

* reset

* missing arg docs

* Apply suggestions from code review

Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>

* address review comments

* inject automatic end of utterance tokens (#25218)

* inject automatic end of utterance tokens

* fix

* fix

* fix

* rework to not use the config

* not end_of_utterance_token at the end

* Update src/transformers/models/idefics/processing_idefics.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address review

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/image_processing_utils.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* [`Idefics`] add image_embeddings option in generate-related methods (#25442)

* add image_embeddings option in generate-related methods

* style

* rename image_embeddings and allow perceiver embeddings precomputation

* compute embeddings within generate

* make is_encoder_decoder= True the default in config

* nested if else fix

* better triple check

* switch if elif order for pixel values / img embeds

* update model_kwargs perceiver only at the end

* use _prepare_model_inputs instead of encoder_decoder logic

* fix comment typo

* fix config default for is_encoder_decoder

* style

* add typehints

* precompute in forward

* doc builder

* style

* pop instead of get image hidden states

* Trigger CI

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix * + indentation + style

* simplify a bit the use_resampler logic using comments

* update diocstrings

* Trigger CI

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix rebase changes

* unbreak #25237 - to be fixed in follow up PRs

* is_composition = False

* no longer needed

---------

Co-authored-by: leot13 <leo.tronchon@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-08-18 14:12:28 -07:00
Arthur
dcb183f4bd
[MPT] Add MosaicML's MPT model to transformers (#24629)
* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-25 14:32:40 +02:00
Tom Aarsen
79444f370f
Deprecate unused OpenLlama architecture (#24922)
* Resolve typo in check_repo.py

* Specify encoding when opening modeling files

* Deprecate the OpenLlama architecture

* Add disclaimer pointing to Llama

I'm open to different wordings here

* Match the capitalisation of LLaMA
2023-07-20 07:03:24 -04:00
Sylvain Gugger
9342c8fb82
Deprecate models (#24787)
* Deprecate some models

* Fix imports

* Fix inits too

* Remove tests

* Add deprecated banner to documentation

* Remove from init

* Fix auto classes

* Style

* Remote upgrade strategy 1

* Remove site package cache

* Revert this part

* Fix typo...

* Update utils

* Update docs/source/en/model_doc/bort.md

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* With all files saved

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-07-13 11:46:54 -04:00
Arthur
fb78769b9c
[MT5] Fix CONFIG_MAPPING issue leading it to load umt5 class (#24678)
* update

* add umt5 to auto tokenizer mapping

* nits

* fixup

* fix failing torch test
2023-07-07 11:33:54 +09:00
Yih-Dar
7504be35ab
Fix check_config_attributes: check all configuration classes (#24231)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-15 11:39:20 +02:00
Matthijs Hollemans
0c3fdccf2f
[WIP] add EnCodec model (#23655)
* boilerplate stuff

* messing around with the feature extractor

* fix feature extractor

* unit tests for feature extractor

* rename speech to audio

* quick-and-dirty import of Meta's code

* import weights (sort of)

* cleaning up

* more cleaning up

* move encoder/decoder args into config

* cleanup model

* rename EnCodec -> Encodec

* RVQ parameters in config

* add slow test

* add lstm init and test_init

* Add save & load

* finish EncodecModel

* remove decoder_input_values as they are ont used anywhere (not removed from doc yet)

* fix test feature extraction model name

* Add better slow test

* Fix tests

* some fixup and cleaning

* Improve further

* cleaning up quantizer

* fix up conversion script

* test don't pass, _encode_fram does not work

* update tests with output per encode and decode

* more cleanup

* rename _codebook

* remove old config cruft

* ratios & hop_length

* use ModuleList instead of Sequential

* clean up resnet block

* update types

* update tests

* fixup

* quick cleanup

* fix padding

* more styl,ing

* add patrick feedback

* fix copies

* fixup

* fix lstm

* fix shape issues

* fixup

* rename conv layers

* fixup

* fix decoding

* small conv refactoring

* remove norm_params

* simplify conv layers

* rename conv layers

* stuff

* Clean up

* Add padding logic

use padding mask

small conv refactoring

remove norm_params

simplify conv layers

rename conv layers

stuff

add batched test

update

Clean up

merge and update for padding

fix padding

fixup

* clean up more

* clean up more

* More clean ups

* cleanup convolutions

* typo

* fix typos

* fixup

* build PR doc?

* start refactoring docstring

* fix don't pad when no strid and chunk

* update docstring

* update docstring

* nits

* update going to lunch

* update config and model

* fix broken testse (becaue of the config changes)

* fix scale computation

* fixu[

* only return dict if speciefied or if config returns it

* remove todos

* update defaults in config

* update conversion script

* fix doctest

* more docstring + fixup

* nits on batched_tests

* more nits

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update basxed on review

* fix update

* updaet tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fixup

* add overlap and chunl_length_s

* cleanup feature extraction

* teste edge cases truncation and padding

* correct processor values

* update config encodec, nits

* fix tests

* fixup

* fix 24Hz test

* elle tests are green

* fix fixup

* Apply suggestions from code review

* revert readme changes

* fixup

* add example

* use facebook checkpoints

* fix typo

* no pipeline tests

* use slef.pad everywhere we can

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* update

* update mdx

* fix bug and tests

* fixup

* fix doctest

* remove comment

* more nits

* add more coverage for `test_truncation_and_padding`

* fixup

* add last test

* fix text

* nits

* Update tests/models/encodec/test_modeling_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* take care of the last comments

* typo

* fix test

* nits

* fixup

* Update src/transformers/models/encodec/feature_extraction_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: arthur.zucker@gmail.com <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-06-14 18:57:23 +02:00
amyeroberts
a717e0318c
Add TimmBackbone model (#22619)
* Add test_backbone for convnext

* Add TimmBackbone model

* Add check for backbone type

* Tidying up - config checks

* Update convnextv2

* Tidy up

* Fix indices & clearer comment

* Exceptions for config checks

* Correclty update config for tests

* Safer imports

* Safer safer imports

* Fix where decorators go

* Update import logic and backbone tests

* More import fixes

* Fixup

* Only import all_models if torch available

* Fix kwarg updates in from_pretrained & main rebase

* Tidy up

* Add tests for AutoBackbone

* Tidy up

* Fix import error

* Fix up

* Install nattan in doc_test_job

* Revert back to setting self._out_xxx directly

* Bug fix - out_indices mapping from out_features

* Fix tests

* Dont accept output_loading_info for Timm models

* Set out_xxx and don't remap

* Use smaller checkpoint for test

* Don't remap timm indices - check out_indices based on stage names

* Skip test as it's n/a

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleaner imports / spelling is hard

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-06-06 17:11:30 +01:00
Eli Simhayev
4b6a5a7caa
[Time-Series] Autoformer model (#21891)
* ran `transformers-cli add-new-model-like`

* added `AutoformerLayernorm` and `AutoformerSeriesDecomposition`

* added `decomposition_layer` in `init` and `moving_avg` to config

* added `AutoformerAutoCorrelation` to encoder & decoder

* removed caninical self attention `AutoformerAttention`

* added arguments in config and model tester. Init works! 😁

* WIP autoformer attention with autocorrlation

* fixed `attn_weights` size

* wip time_delay_agg_training

* fixing sizes and debug time_delay_agg_training

* aggregation in training works! 😁

* `top_k_delays` -> `top_k_delays_index` and added `contiguous()`

* wip time_delay_agg_inference

* finish time_delay_agg_inference 😎

* added resize to autocorrelation

* bug fix: added the length of the output signal to `irfft`

* `attention_mask = None` in the decoder

* fixed test: changed attention expected size, `test_attention_outputs` works!

* removed unnecessary code

* apply AutoformerLayernorm in final norm in enc & dec

* added series decomposition to the encoder

* added series decomp to decoder, with inputs

* added trend todos

* added autoformer to README

* added to index

* added autoformer.mdx

* remove scaling and init attention_mask in the decoder

* make style

* fix copies

* make fix-copies

* inital fix-copies

* fix from https://github.com/huggingface/transformers/pull/22076

* make style

* fix class names

* added trend

* added d_model and projection layers

* added `trend_projection` source, and decomp layer init

* added trend & seasonal init for decoder input

* AutoformerModel cannot be copied as it has the decomp layer too

* encoder can be copied from time series transformer

* fixed generation and made distrb. out more robust

* use context window to calculate decomposition

* use the context_window for decomposition

* use output_params helper

* clean up AutoformerAttention

* subsequences_length off by 1

* make fix copies

* fix test

* added init for nn.Conv1d

* fix IGNORE_NON_TESTED

* added model_doc

* fix ruff

* ignore tests

* remove dup

* fix SPECIAL_CASES_TO_ALLOW

* do not copy due to conv1d weight init

* remove unused imports

* added short summary

* added label_length and made the model non-autoregressive

* added params docs

* better doc for `factor`

* fix tests

* renamed `moving_avg` to `moving_average`

* renamed `factor` to `autocorrelation_factor`

* make style

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix configurations

* fix integration tests

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixing `lags_sequence` doc

* Revert "fixing `lags_sequence` doc"

This reverts commit 21e34911e3.

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* model layers now take the config

* added `layer_norm_eps` to the config

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* added `config.layer_norm_eps` to AutoformerLayernorm

* added `config.layer_norm_eps` to all layernorm layers

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix variable names

* added inital pretrained model

* added use_cache docstring

* doc strings for trend and use_cache

* fix order of args

* imports on one line

* fixed get_lagged_subsequences docs

* add docstring for create_network_inputs

* get rid of layer_norm_eps config

* add back layernorm

* update fixture location

* fix signature

* use AutoformerModelOutput dataclass

* fix pretrain config

* no need as default exists

* subclass ModelOutput

* remove layer_norm_eps config

* fix test_model_outputs_equivalence test

* test hidden_states_output

* make fix-copies

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* removed unused attr

* Update tests/models/autoformer/test_modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use AutoFormerDecoderOutput

* fix formatting

* fix formatting

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-30 10:23:32 +02:00
Eli Simhayev
8abe4930d3
[Time-Series] informer model (#21099)
* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃

* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-07 21:36:38 +01:00
Connor Henderson
0f96c26de6
refactor: Make direct_transformers_import util (#21652)
* refactor: Make direct_import util

* edit direct import fn

* add docstring

* make import function specific to transformers only

* edit doc string
2023-02-16 11:32:32 -05:00
Sylvain Gugger
9d1116e995
Update deprecated load_module (#21651) 2023-02-15 15:57:24 -05:00
Yih-Dar
b47a16743b
Remove more unused attributes in config classes (#21543)
* Remove unused decoder_layerdrop

* Update SPECIAL_CASES_TO_ALLOW for MT5Config

* Remove unused position_embedding_init_scale

* Remove unused decoder_max_relative_position

* Use unused decoder_max_relative_position

* Remove unused init_std

* Remove unused forgotten attributes

* Remove unused patch_norm

* Remove unused max_seq_len

* Update SPECIAL_CASES_TO_ALLOW for OneFormerConfig

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-10 22:57:28 +01:00
Yih-Dar
479322bfaa
A new test to check config attributes being used (#21453)
* Add a new test to check config attributes being used

* Add a new test to check config attributes being used

* Add a new test to check config attributes being used

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions

* Update allowed cases - part 1

* Update allowed cases - part 2

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-02-07 17:49:30 +01:00