Raushan Turganbay
fd70464fa7
Fix flaky tests ( #34069 )
...
* fix mllama only
* allow image token index
2024-10-11 14:41:46 +01:00
Lysandre Debut
f052e94bcc
Fix flax failures ( #33912 )
...
* Few fixes here and there
* Remove typos
* Remove typos
2024-10-11 14:38:35 +02:00
Mohamed Mekkouri
24b82f3cd5
Small Fix to modular converter ( #34051 )
...
* small_fix
* supporting both src/tranformers and examples/
* make style
2024-10-10 18:43:27 +02:00
Raushan Turganbay
adea67541a
Phi3: fix attn for sliding window ( #33586 )
...
* fix phi3 attn fir sliding window
* fix tests
* address most comment
* style
* update after rebase
* add more models
* fix tests
2024-10-10 11:50:39 +02:00
Arthur
e783f12f20
[Patch helper
] update to not have to checkout main ( #34006 )
...
add more support
2024-10-09 09:21:46 +02:00
Cyril Vallez
17806d11ba
Improve modular converter ( #33991 )
...
* improve modular
* style
* Update modular_model_converter.py
* pretty print warning
* style
* Support to remove unused classes as part of added dependencies as well
* nits
* correct bug
* add example
* style
* Add documentation
2024-10-08 14:53:58 +02:00
Yoni Gozlan
e2001c3413
Add auto model for image-text-to-text ( #32472 )
...
* Add Auto model for image-text-to-text
* Remove donut from processing auto, add chameleon ti image text to text models
* add qwen2_vl and llava_onevision
* add pixtral to auto model for image-text-to-text
* add mllama and idefics3
* remove models in IGNORE_NON_AUTO_CONFIGURED
* add AutoModelForImageTextToText to tests and doc
2024-10-08 14:26:43 +02:00
Arthur
a3add29097
Add support for __all__ and potentilly deleting functions ( #33859 )
...
* Add support for __all__ and potentailly deleting functions
* updates
* update
* nits
* remove dummies
* fix warning
* fixup
* style
* update
* fixup
* skip copied from when # skip
* remove log
* bring dummies back
* fixup
* remove copied from
* fixup
* remove warnings from `make fix-copies`
* fix doc issues
* nits
* Better error message !
* add support for more flexible naming!
* style
* breaking style?
* fix super() renaming issues
* del not needed when you don't call super().__init__()
* style
* no more fmt on :)
* properly remove `self`
* fixup
* fix
* doc nits
* add some doc 🫡
2024-10-08 10:19:17 +02:00
pglorio
f319ba16fa
Add Zamba ( #30950 )
...
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* Moved mamba init into `_init_weights`
* Update index.md
* Rebase
* Rebase
* Updates from make fixup
* Update zamba.md
* Batched inference
* Update
* Fix tests
* Fix tests
* Fix tests
* Fix tests
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/zamba.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update configuration_zamba.py
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Update configuration_zamba.py
* Update modeling_zamba.py
* Update modeling_zamba.py
* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba
* Update ZambaForCausalLM
* Moved mamba init into `_init_weights`
* Update ZambaForCausalLM
* Describe diffs with original mamba layer
* make fixup fixes
* quality test fixes
* Fix Zamba model path
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* circleci fixes
* Update
* circleci fixes
* fix zamba test from merge
* fix ValueError for disabling mamba kernels
* add HF copyright
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* shared_transf --> shared_transformer
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fixes
* Move attention head dim to config
* Fix circle/ci tests
* Update modeling_zamba.py
* apply GenerationMixin inheritance change from upstream
* apply import ordering
* update needed transformers version for zamba
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add contribution author
* add @slow to avoid CI
* Update src/transformers/models/zamba/modeling_zamba.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Define attention_hidden_size
* Added doc for attention_head_size
* trigger CI
* Fix doc of attention_hidden_size
* [run-slow] zamba
* Fixed shared layer logic, swapped up<->gate in mlp
* shared_transformer -> shared_transf
* reformat HybridLayer __init__
* fix docstrings in zamba config
* added definition of _get_input_ids_and_config
* fixed formatting of _get_input_ids_and_config
---------
Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
2024-10-04 22:28:05 +02:00
Guillaume LEGENDRE
4df3ccddb7
Migrate the CI runners to the new clusters ( #33849 )
...
* try fixing push-ci
* move to new runners
* move benchmark.yml to new runners
* move doctest_job.yml to new runners
* move doctests.yml to new runners
* move push-important-models.yml to new runners
* move self-pr-slow-ci.yml to new runners
* fix typo
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix working directory
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix working directory
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* improve code
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-10-03 14:39:49 +02:00
Arthur
1dba608df9
[modular
] fixes! ( #33820 )
...
* fix converter for function definitions
* small changes
* no prints
* style
2024-09-30 16:43:55 +02:00
Tony Wu
e1b150862e
Fix modular model converter unable to generate Processor classes ( #33737 )
...
fix: fix wrong file type for processor in `modular_model_converter.py`
2024-09-27 00:00:39 +02:00
Andrés Marafioti
f2c388e3f9
Add Idefics 3! ( #32473 )
...
* Add Idefics 3!
* fixes to make both pipelines identical
* fix for quantized models
* First pass at the review
* remove vocab size from the main config (it's still in the text_config)
* hot fix for merve
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* re-add model_type for text_config
* remove support for old_cache
* remove hidden_size from main config
* rename idefics3 HF repo
* few changes suggested in the PR
* fix to input_data_format computation
* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion
* improve example
* few improvements from amy's review
* big change to enable processing input images as numpy arrays
* Changes to the code to uniformize processor kwargs
* image processing tests
* image processing tests fixes and some bugs they discovered
* addressed review comments from Yoni
* fix modeling tests
* remove special tokens that are not special
* fixes tests
* skip failing tests - they also fail for idefics2
* added paper and readded the tests with multi gpu, who knows
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review amy until image_processing_idefics3
* last comments from Amy
* review amy
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* doc improvement - amy review
* fix runtime error during fine-tuning
* amy's review
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ruff
* amy's comment on the order
* ruff ruff
* fix copies
* square images when they are not splitted
* ruff :(
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics3/test_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix small bug introduced in refactor
* amy's image processing changes
* fixes peft tests and ruff
* modify to_pil_image from transformers. and review from emanuele.
* add modified to_pil_image
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-25 21:28:49 +02:00
Arthur
19d58d31f1
Add MLLama ( #33703 )
...
* current changes
* nit
* Add cross_attenttion_mask to processor
* multi-image fixed
* Add cross_attenttion_mask to processor
* cross attn works in all cases
* WIP refactoring function for image processor
* WIP refactoring image processor functions
* Refactor preprocess to use global loops instead of list nested list comps
* Docstrings
* Add channels unification
* fix dtype issues
* Update docsrings and format
* Consistent max_image_tiles
* current script
* updates
* Add convert to rgb
* Add image processor tests
* updates!
* update
* god damn it I am dumb sometimes
* Precompute aspect ratios
* now this works, full match
* fix 😉
* nits
* style
* fix model and conversion
* nit
* nit
* kinda works
* hack for sdpa non-contiguous bias
* nits here and there
* latest c hanges
* merge?
* run forward
* Add aspect_ratio_mask
* vision attention mask
* update script and config variable names
* nit
* nits
* be able to load
* style
* nits
* there
* nits
* make forward run
* small update
* enable generation multi-turn
* nit
* nit
* Clean up a bit for errors and typos
* A bit more constant fixes
* 90B keys and shapes match
* Fix for 11B model
* Fixup, remove debug part
* Docs
* Make max_aspect_ratio_id to be minimal
* Update image processing code to match new implementation
* Adjust conversion for final checkpoint state
* Change dim in repeat_interleave (accordig to meta code)
* tmp fix for num_tiles
* Fix for conversion (gate<->up, q/k_proj rope permute)
* nits
* codestyle
* Vision encoder fixes
* pass cross attn mask further
* Refactor aspect ratio mask
* Disable text-only generation
* Fix cross attention layers order, remove q/k norm rotation for cross atention layers
* Refactor gated position embeddings
* fix bugs but needs test with new weights
* rope scaling should be llama3
* Fix rope scaling name
* Remove debug for linear layer
* fix copies
* Make mask prepare private func
* Remove linear patch embed
* Make precomputed embeddings as nn.Embedding module
* MllamaPrecomputedAspectRatioEmbedding with config init
* Remove unused self.output_dim
* nit, intermediate layers
* Rename ln and pos_embed
* vision_chunk_size -> image_size
* return_intermediate -> intermediate_layers_indices
* vision_input_dim -> hidden_size
* Fix copied from statements
* fix most tests
* Fix more copied from
* layer_id->layer_idx
* Comment
* Fix tests for processor
* Copied from for _prepare_4d_causal_attention_mask_with_cache_position
* Style fix
* Add MllamaForCausalLM
* WIP fixing tests
* Remove duplicated layers
* Remove dummy file
* Fix style
* Fix consistency
* Fix some TODOs
* fix language_model instantiation, add docstring
* Move docstring, remove todos for precomputed embeds (we cannot init them properly)
* Add initial docstrings
* Fix
* fix some tests
* lets skip these
* nits, remove print, style
* Add one more copied from
* Improve test message
* Make validate func private
* Fix dummy objects
* Refactor `data_format` a bit + add comment
* typos/nits
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* fix dummy objects and imports
* Add chat template config json
* remove num_kv_heads from vision attention
* fix
* move some commits and add more tests
* fix test
* Remove `update_key_name` from modeling utils
* remove num-kv-heads again
* some prelimiary docs
* Update chat template + tests
* nit, conversion script max_num_tiles from params
* Fix warning for text-only generation
* Update conversion script for instruct models
* Update chat template in converstion + test
* add tests for CausalLM model
* model_max_length, avoid null chat_template
* Refactor conversion script
* Fix forward
* Fix integration tests
* Refactor vision config + docs
* Fix default
* Refactor text config
* Doc fixes
* Remove unused args, fix docs example
* Squashed commit of the following:
commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date: Wed Sep 18 13:39:15 2024 +0000
Move model + add output hidden states and output attentions
* Fix num_channels
* Add mllama text and mllama vision models
* Fixing repo consistency
* Style fix
* Fixing repo consistency
* Fixing unused config params
* Fix failed tests after refactoring
* hidden_activation -> hidden_act for text mlp
* Remove from_pretrained from sub-configs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Reuse lambda in conversion script
* Remove run.py
* Update docs/source/en/model_doc/mllama.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/mllama/processing_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused LlamaTokenizerFast
* Fix logging
* Refactor gating
* Remove cycle for collecting intermediate states
* Refactor text-only check, add integration test for text-only
* Revert from pretrained to configs
* Fix example
* Add auto `bos_token` adding in processor
* Fix tips
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Enable supports_gradient_checkpointing model flag
* add eager/sdpa options
* don't skip attn tests and bring back GC skips (did i really remove those?)
* Fix signature, but get error with None gradient
* Fix output attention tests
* Disable GC back
* Change no split modules
* Fix dropout
* Style
* Add Mllama to sdpa list
* Add post init for vision model
* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model
* if skipped, say it, don't pass
* Clean vision tester config
* Doc for args
* Update tests/models/mllama/test_modeling_mllama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add cross_attention_mask to test
* typehint
* Remove todo
* Enable gradient checkpointing
* Docstring
* Style
* Fixing and skipping some tests for new cache
* Mark flaky test
* Skip `test_sdpa_can_compile_dynamic` test
* Fixing some offload tests
* Add direct GenerationMixin inheritance
* Remove unused code
* Add initializer_range to vision config
* update the test to make sure we show if split
* fix gc?
* Fix repo consistency
* Undo modeling utils debug changes
* Fix link
* mllama -> Mllama
* [mllama] -> [Mllama]
* Enable compile test for CausalLM model (text-only)
* Fix TextModel prefix
* Update doc
* Docs for forward, type hints, and vision model prefix
* make sure to reset
* fix init
* small script refactor and styling
* nit
* updates!
* some nits
* Interpolate embeddings for 560 size and update integration tests
* nit
* does not suppor static cache!
* update
* fix
* nit2
* this?
* Fix conversion
* Style
* 4x memory improvement with image cache AFAIK
* Token decorator for tests
* Skip failing tests
* update processor errors
* fix split issues
* style
* weird
* style
* fix failing tests
* update
* nit fixing the whisper tests
* fix path
* update
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00
Yoni Gozlan
94f18cf23c
Add OmDet-Turbo ( #31843 )
...
* Add template with add-new-model-like
* Add rough OmDetTurboEncoder and OmDetTurboDecoder
* Add working OmDetTurbo convert to hf
* Change OmDetTurbo encoder to RT-DETR encoder
* Add swin timm backbone as default, add always partition fix for swin timm
* Add labels and tasks caching
* Fix make fix-copies
* Format omdet_turbo
* fix Tokenizer tests
* Fix style and quality
* Reformat omdet_turbo
* Fix quality, style, copies
* Standardize processor kwargs
* Fix style
* Add output_hidden_states and ouput_attentions
* Add personalize multi-head attention, improve docstrings
* Add integrated test and fix copy, style, quality
* Fix unprotected import
* Cleanup comments and fix unprotected imports
* Add fix different prompts in batch (key_padding_mask)
* Add key_padding_mask to custom multi-head attention module
* Replace attention_mask by key_padding_mask
* Remove OmDetTurboModel and refactor
* Refactor processing of classes and abstract use of timm backbone
* Add testing, fix output attentions and hidden states, add cache for anchors generation
* Fix copies, style, quality
* Add documentation, conver key_padding_mask to attention_mask
* revert changes to backbone_utils
* Fic docstrings rst
* Fix unused argument in config
* Fix image link documentation
* Reorder config and cleanup
* Add tokenizer_init_kwargs in merge_kwargs of the processor
* Change AutoTokenizer to CLIPTokenizer in convert
* Fix init_weights
* Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs
* change processor kwargs and make task input optional
* Fix omdet docs
* Remove unnecessary tests for processor kwargs
* Replace nested BatchEncoding output of the processor by a flattened BatchFeature
* Make modifications from Pavel review
* Add changes Amy review
* Remove unused param
* Remove normalize_before param, Modify processor call docstring
* Remove redundant decoder class, add gradient checkpointing for decoder
* Remove commented out code
* Fix inference in fp16 and add fp16 integrated test
* update omdet md doc
* Add OmdetTurboModel
* fix caching and nit
* add OmDetTurboModel to tests
* nit change repeated key test
* Improve inference speed in eager mode
* fix copies
* Fix nit
* remove OmdetTurboModel
* [run-slow] omdet_turbo
* [run-slow] omdet_turbo
* skip dataparallel test
* [run-slow] omdet_turbo
* update weights to new path
* remove unnecessary config in class
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>
2024-09-25 13:26:28 -04:00
Joao Gante
238b13478d
Gemma2: fix config initialization (cache_implementation
) ( #33684 )
2024-09-24 18:22:00 +01:00
Arthur
317e069ee7
Modular transformers
: modularity and inheritance for new model additions ( #33248 )
...
* update exampel
* update
* push the converted diff files for testing and ci
* correct one example
* fix class attributes and docstring
* nits
* oups
* fixed config!
* update
* nitd
* class attributes are not matched against the other, this is missing
* fixed overwriting self.xxx now onto the attributes I think
* partial fix, now order with docstring
* fix docstring order?
* more fixes
* update
* fix missing docstrings!
* examples don't all work yet
* fixup
* nit
* updated
* hick
* update
* delete
* update
* update
* update
* fix
* all default
* no local import
* fix more diff
* some fix related to "safe imports"
* push fixed
* add helper!
* style
* add a check
* all by default
* add the
* update
* FINALLY!
* nit
* fix config dependencies
* man that is it
* fix fix
* update diffs
* fix the last issue
* re-default to all
* alll the fixes
* nice
* fix properties vs setter
* fixup
* updates
* update dependencies
* make sure to install what needs to be installed
* fixup
* quick fix for now
* fix!
* fixup
* update
* update
* updates
* whitespaces
* nit
* fix
* simplify everything, and make it file agnostic (should work for image processors)
* style
* finish fixing all import issues
* fixup
* empty modeling should not be written!
* Add logic to find who depends on what
* update
* cleanup
* update
* update gemma to support positions
* some small nits
* this is the correct docstring for gemma2
* fix merging of docstrings
* update
* fixup
* update
* take doc into account
* styling
* update
* fix hidden activation
* more fixes
* final fixes!
* fixup
* fixup instruct blip video
* update
* fix bugs
* align gemma2 with the rest as well
* updats
* revert
* update
* more reversiom
* grind
* more
* arf
* update
* order will matter
* finish del stuff
* update
* rename to modular
* fixup
* nits
* update makefile
* fixup
* update order of the checks!
* fix
* fix docstring that has a call inside
* fiix conversion check
* style
* add some initial documentation
* update
* update doc
* some fixup
* updates
* yups
* Mostly todo gimme a minut
* update
* fixup
* revert some stuff
* Review docs for the modular transformers (#33472 )
Docs
* good update
* fixup
* mmm current updates lead to this code
* okay, this fixes it
* cool
* fixes
* update
* nit
* updates
* nits
* fix doc
* update
* revert bad changes
* update
* updates
* proper update
* update
* update?
* up
* update
* cool
* nits
* nits
* bon bon
* fix
* ?
* minimise changes
* update
* update
* update
* updates?
* fixed gemma2
* kind of a hack
* nits
* update
* remove `diffs` in favor of `modular`
* fix make fix copies
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-09-24 15:54:07 +02:00
Mayank Mishra
e472e077c2
Granitemoe ( #33207 )
...
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* add granitemoe
* add decoration
* remove moe from sequenceclassification
* fix test
* fix
* fix
* fix
* move rope?
* merge
* drop bias
* drop bias
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* drop
* drop
* fix
* fix
* cleanup
* cleanup
* fix
* fix granite tests
* fp32 test
* fix
* drop jitter
* fix
* rename
* rename
* fix config
* add gen test
---------
Co-authored-by: Yikang Shen <yikang.shn@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-21 01:43:50 +02:00
Yih-Dar
077b552f07
Fix some missing tests in circleci ( #33559 )
...
* fix
* fix
* fix
* fix
* skip
* skip more
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-09-20 20:58:51 +02:00
Yih-Dar
6dc364616d
Fix CircleCI nightly run ( #33558 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-09-20 10:57:21 +02:00
Lysandre Debut
f24f084329
Import structure & first three model refactors ( #31329 )
...
* Import structure & first three model refactors
* Register -> Export. Export all in __all__. Sensible defaults according to filename.
* Apply most comments from Amy and some comments from Lucain
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* Style
* Add comment
* Clearer .py management
* Raise if not in backend mapping
* More specific type
* More efficient listdir
* Misc fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
2024-09-10 11:10:53 +02:00
Younes Belkada
7f112caac2
Fix import of FalconMambaForCausalLM
( #33381 )
...
* fix build issues with FM kernels
* try another approach
* test
* fix
* add init files
* push fix
* fix
* fixup
* fix duplicate
* fix
* fix
* fix
2024-09-10 09:14:54 +02:00
Younes Belkada
47b096412d
Fix: Fix FalconMamba
training issues due to incompatible kernels ( #33195 )
...
* fix FM training kernels
* fix copies
* fix copies
* propagate to slow path
* make it BC
* add comment
* fix test
2024-09-05 11:55:08 +02:00
Joao Gante
d6534f996b
Repo checks: check documented methods exist ( #32320 )
2024-09-03 17:40:27 +01:00
Arthur
979d24e7fd
fix the parallel number of CI nodes when it is smaller than number of tests ( #33276 )
...
* fix the parallel number
* this?
* keep it simple
* woups
* nit
* style
* fix param name
* fix
* fix dtype
* yups
* ???
* ??
* this?
* ????
* no default flow style
* ??
* print config
* ????
* there we go!
* documentation
* update
* remove unwanted file
2024-09-03 16:53:21 +02:00
Joao Gante
746104ba6f
Test fetcher: missing return on filtered tests; don't write empty files ( #33224 )
...
* missing return
* skip files without contents
* test 2
* dbg
* dbg
* how about this?
2024-08-31 00:41:52 +02:00
Arthur
b017a9eb11
Refactor CI: more explicit ( #30674 )
...
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-08-30 18:17:25 +02:00
JB (Don)
f1a385b1de
[RoBERTa-based] Add support for sdpa ( #30510 )
...
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-28 10:26:00 +02:00
Mayank Mishra
c35d2ccf5a
Granite language models ( #31502 )
...
* first commit
* drop tokenizer
* drop tokenizer
* drop tokenizer
* drop convert
* granite
* drop tokenization test
* mup
* fix
* reformat
* reformat
* reformat
* fix docs
* stop checking for checkpoint
* update support
* attention multiplier
* update model
* tiny drop
* saibo drop
* skip test
* fix test
* fix test
* drop
* drop useless imports
* update docs
* drop flash function
* copied from
* drop pretraining tp
* drop pretraining tp
* drop pretraining tp
* drop unused import
* drop code path
* change name
* softmax scale
* head dim
* drop legacy cache
* rename params
* cleanup
* fix copies
* comments
* add back legacy cache
* multipliers
* multipliers
* multipliers
* text fix
* fix copies
* merge
* multipliers
* attention multiplier
* drop unused imports
* fix
* fix
* fix
* move rope?
* Update src/transformers/models/granite/configuration_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* Update src/transformers/models/granite/modeling_granite.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix
* fix
* fix
* fix-copies
* torch rmsnorm
* add authors
* change model path
* fix
* test
* drop static cache test
* uupdate readme
* drop non-causal
* readme
* drop useless imports
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-27 21:27:21 +02:00
Juan Pizarro
7591ca5bc5
🚨 Add Blip2ForImageTextRetrieval ( #29261 )
...
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 18:50:27 +01:00
Joao Gante
ab0ac3b98f
CI: fix efficientnet
pipeline timeout and prevent future similar issues due to large image size ( #33123 )
...
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-27 11:58:27 +01:00
Shijie
19e6e80e10
support qwen2-vl ( #32318 )
...
* support-qwen2-vl
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* hyphen->underscore
* make style
* add-flash2-tipd
* delete-tokenize=False
* remove-image_processor-in-init-file
* add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES
* format-doct
* support-Qwen2VLVisionConfig
* remove-standardize_cache_format
* fix-letter-varaibles
* remove-torch-in-image-processor
* remove-useless-docstring
* fix-one-letter-varaible-name
* change-block-name
* default-quick-gelu-in-vision
* remove-useless-doc
* use-preimplemented-flash-forward
* fix-doc
* fix-image-processing-doc
* fix-apply-rotary-embed
* fix-flash-attn-sliding-window
* refactor
* remove-default_template
* remove-reorder_cache
* simple-get-rope_deltas
* update-prepare_inputs_for_generation
* update-attention-mask
* update-rotary_seq_len
* remove-state
* kv_seq_length
* remove-warning
* _supports_static_cache
* remove-legacy-cache
* refactor
* fix-replace
* mrope-section-doc
* code-quality
* code-quality
* polish-doc
* fix-image-processing-test
* update readme
* Update qwen2_vl.md
* fix-test
* Update qwen2_vl.md
* nit
* processor-kwargs
* hard-code-norm_layer
* code-quality
* discard-pixel-values-in-gen
* fix-inconsistent-error-msg
* unify-image-video
* hidden_act
* add-docstring
* vision-encode-as-PreTrainedModel
* pixel-to-target-dtype
* update doc and low memoryvit
* format
* format
* channel-foramt
* fix vit_flashatt
* format
* inherit-Qwen2VLPreTrainedModel
* simplify
* format-test
* remove-one-line-func-in-image-processing
* avoid-one-line-reshape
* simplify-rotary_seq_len
* avoid-single-letter-variable
* no-for-loop-sdpa
* avoid-single-letter-variable
* remove-one-line-reshape
* remove-one-line-reshape
* remove-no-rope-in-vit-logic
* default-mrope
* add-copied-from
* more-docs-for-mrope
* polish-doc
* comment-and-link
* polish-doc
* single-letter-variables
* simplify-image-processing
* video->images
* kv_seq_len-update
* vision-rope-on-the-fly
* vision-eager-attention
* change-processor-order
---------
Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
2024-08-26 15:16:44 +02:00
MAHIR DAIYAN
843e5e20ca
Add Flax Dinov2 ( #31960 )
...
* tfmsenv restored in main
* installed flax
* forward pass done and all tests passed
* make fix-copies and cleaning the scripts
* fixup attempt 1
* fixup attempt 2
* fixup third attempt
* fixup attempt 4
* fixup attempt 5
* dinov2 doc fixed
* FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE
* external pos_encoding layer removed
* fixup attempt 6
* fixed integration test values
* fixup attempt 7
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* comments removed
* comment removed from the test
* fixup
* Update src/transformers/models/dinov2/modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* new fixes 1
* interpolate_pos_encoding function removed
* droppath rng fixed, pretrained beit copied-from still not working
* modeling_flax_dinov2.py reformatted
* Update tests/models/dinov2/test_modeling_flax_dinov2.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* added Copied from, to the tests
* copied from statements removed from tests
* fixed copied from statements in the tests
* [run_slow] dinov2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-08-19 09:28:13 +01:00
Joao Gante
cf32ee1753
Cache: use batch_size
instead of max_batch_size
( #32657 )
...
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-16 11:48:45 +01:00
Joao Gante
70d5df6107
Generate: unify LogitsWarper
and LogitsProcessor
( #32626 )
2024-08-16 11:20:41 +01:00
Younes Belkada
7c11491208
Add new model ( #32615 )
...
* v1 - working version
* fix
* fix
* fix
* fix
* rename to correct name
* fix title
* fixup
* rename files
* fix
* add copied from on tests
* rename to `FalconMamba` everywhere and fix bugs
* fix quantization + accelerate
* fix copies
* add `torch.compile` support
* fix tests
* fix tests and add slow tests
* copies on config
* merge the latest changes
* fix tests
* add few lines about instruct
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* fix tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio ( #32137 )
...
* add qwen2audio
* Update check_repo.py
* fix style
* fix test
* fix style
* add model size
* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* switch the attention_mask and the feature_attention_mask
* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py
* fix initialization
* update chat_template
* fix consistency issue after copy
* add docstrings to _merge_input_ids_with_audio_features
* add copied from to prepare_inputs_for_generation
* add more details to docs
* rm comment
* add init_std
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* update
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update tests
* rm ignore_index
* update processor
* rm ffmpeg_read
* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_audio.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update
* typo
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* fix quality
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* [run_slow] qwen2_audio
* add official model
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Joao Gante
026a173a64
Repo checks: skip docstring checks if not in the diff ( #32328 )
...
* tmp
* skip files not in the diff
* use git.Repo instead of an external subprocess
* add tiny change to confirm that the diff is working on pushed changes
* add make quality task
* more profesh main commit reference
2024-07-30 18:56:10 +01:00
Joao Gante
535fe78b9f
Repo: remove exceptions in check_docstrings
( #32259 )
...
remove exceptions
2024-07-29 11:06:05 +02:00
Yih-Dar
f2122cc6eb
Upload new model failure report to Hub ( #32264 )
...
upload
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-29 09:42:54 +02:00
Lucain
f2a1e3ca68
Mention model_info.id instead of model_info.modelId ( #32106 )
2024-07-22 14:14:47 +01:00
Raushan Turganbay
24cfcc2114
Chameleon: add model ( #31534 )
...
* Chameleon model integration
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
* fix 7B, again. mask away image tokens
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove pretrained_config_map
* make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file
* remove tokenizer (use llama's); remove codechameleon tests
* a few copied from statements and minor changes
* copied from in ChameleonModel
* some copies in ChameleonForCausalLM
* a few more copies
* VQModel moved to ChameleonModel (as opposed to being in the processor)
* ChameleonProcessor ready
* Fix chameleon weights convert
* update conversion script
* clean-up processing
* update modeling a bit
* update
* update (throws error...)
* correct conversion ready
* fix tests
* fix docs
* docs
* ve swin norm
* fix device for vocab map
* add normalization
* update
* update script with rope rotations
* final fix on model conversion
* add slow tests
* more info in docs
* fix repo consistency tests
* fix repo tests
* fix-copies
* hope this will make CI happy
* fix for 30b model
* Update docs/source/en/index.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/image_processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/chameleon/processing_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/chameleon/test_modeling_chameleon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments
* remove assertion in conversion script
* add image processor test
* not copied
* port changes for qk layernorm
* fix-copies
* read token decorator for tests
* [run-slow] chameleon
* one more read-token
* address some comments
* qk norm changes
* tests and repo check
* moved rope permutations to conversion, YAY!
* fix past kv check
* docs
* layernorm done!
* let's be consistent in naming
* fix slow tests
* weird thing with slow CI, but let's see
* once more try
* remove past-kv as tuple following llama
* ignore
* style
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: jacobkahn <jacobkahn1@gmail.com>
Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Leonid Shamis <lshamis@meta.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-17 10:41:43 +05:00
Naman Garg
c1e139c2b0
Adding hiera ( #30356 )
...
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* initialized Structure
* Updated variable names
* Added Config class, basic HF setup, convert_to_hf
* Fixed Convert function, added hiera to HF files, Initilized test files
* better naming for x in forward pass
* Moved utils to hiera
* Change hiera -> hiera_model
* Fixed integration into tranformers
* Fix: Convert Checkpoint
* added documentation for hiera
* added documentation for hiera
* added Docstings to models, Transformers based changes
* make style and quality
* make style and quality
* Integration & Block tests running
* Fixed bugs
* Removed tim dependency
* added HieraBlock
* fixed: Model name
* added tests for HieraModel, HieraBlock
* fixed imports
* fixed quality & copies
* Fixes
* Update docs/source/en/model_doc/hiera.md
Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed formatting
* Code quality & Import differences
* quality and repo-consistency fix
* fixed no torch error
* Docstring fix
* Docstring fix
* doc string fix
* fixed example usage
* Resolved issues in modeling_hiera
* Removed Hiera MAE
* Added test and resolved bug
* fixed doc string
* First commit
* Finished conversion script and model forward working
* Resolved all issues
* nits
* Improving tests
* Nits
* More nits
* Improving HieraForMaskedImageModeling
* More improvements and nits
* Fixed docstrings of outputs
* More fixes
* More imrpovments
* Updated conversion script
* Fixed docstrings
* Improved tests
* Fixed attentou outputs test
* All tests green
* Removed unnecessary file
* contribution attribution
* Resolved a few issues
* Resolved Comments
* Updated model repo id and fixed bugs
* Removed loss print
* Make tests green
* Updated docstrings
* Fix style
* Fixed num_heads in config
* Removed unnecessary video checkpoint related code in the conversion script
* Fix style
* Changed atol in conversion script
* HieraConfig
* Fix copies
* Fixed typo
* Resolved few issues
* make
* converted conv_nd -> nn.Module
* Removed video complexities
* Removed video complexities
* fix style
* Addressing comments
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix style
* Fixed tests
* Fixed typo
* Fixed interpolate test
* Made torch fx compatible
* Made sure imageprocesor is correct
* Addressed comments
* Noise directly as torch
* Remove unnecesary attr
* Added return_dit
* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated checkpoints
* [run_slow] hiera
* Fixed device mismatch
* [run_slow] hiera
* Fixed GPU tests
* [run_slow] hiera
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-11 22:13:56 +01:00
Arthur
e314395277
Refactor flash attention implementation in transformers ( #31446 )
...
* dumb commit
* nit
* update
* something like this
* unpack in modeling utils
* safe import
* oups
* update
* nits
* diff convert gemma
* update
* start propagating
* udpate other modeling code as well
* update for sliding window models
* nits
* more init cleanups
* styling
* fixup
* noice
* pass fixup
* typo typing_extension -> typing_extensions
* torch.nn.functionnal -> torch.nn.functional
* add to import structure
* unpack
* simplify a bit more for this first version
* nut
* update
* update
* nit
* ease the import of `Unpack`
* remove useless `use_sliding_window`
* no qua please
* protect import?
* style
* [run-slow]
* [run slow] llama,gemma,mistral,mixtral
* remove extra kwargs
* fix llama
* address review comments
* apply diff_model_converter to modeling_gemma.py
* remove cache_position 1
* remove cache_position 2
* some cleaning
* refactor gemma2 as well
* apply review comments
* rename file to modeling_flash_attention_utils.py
* siglip refactor
* remove dead code
* is the hub down?
* still down?
* fix siglip
* fix gemma2
* fatal: Could not read from remote repository.
* fix typo in softcap implem
* flacky
* Failed: Timeout >120.0s
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
2024-07-11 20:37:31 +08:00
omahs
e5ca9b057c
Fix typos ( #31819 )
...
* fix typo
* fix typo
* fix typos
* fix typo
* fix typos
2024-07-08 11:52:47 +01:00
Arthur
0cf60f13ab
Add gemma 2 ( #31659 )
...
* inital commit
* Add doc
* protect?
* fixup stuffs
* update tests
* fix build documentation
* mmmmmmm config attributes
* style
* nit
* uodate
* nit
* Fix docs
* protect some stuff
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-06-27 17:36:19 +02:00
Raushan Turganbay
e71f2863d7
Add LLaVa NeXT Video ( #31252 )
...
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-26 21:52:28 +05:00
Pablo Montalvo
aab0829790
Improve error message for mismatched copies in code blocks ( #31535 )
...
improve error message for mismatched code blocks
2024-06-25 13:55:11 +02:00
Raushan Turganbay
fc689d75a0
Add video modality for InstrucBLIP ( #30182 )
...
* squash in single commit
* add docs
* dummy obj
* more changes in diff converter
* tiny fix
* make docs happy
* skip test
* repo consistency tests
* update docstring
* style
* fix tests
* change diff imports
* [run-slow] instructblipvideo
* [run-slow] instructblipvideo
* fix tests and remove logit check
* [run-slow] instructblipvideo
2024-06-25 15:45:39 +05:00
Yih-Dar
d4564df1d4
Revive Nightly/Past CI ( #31159 )
...
* build
* build
* build
* build
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-20 18:57:24 +02:00