* Correct the implementation of auxiliary loss of mixtrtal
* correct the implementation of auxiliary loss of mixtrtal
* Implement a simpler calculation method
---------
Co-authored-by: zhangliangxu3 <zhangliangxu3@jd.com>
* chore(phi): Updates configuration_phi with missing keys.
* chore(phi): Adds first draft of combined modeling_phi.
* fix(phi): Fixes according to latest review.
* fix(phi): Removes pad_vocab_size_multiple to prevent inconsistencies.
* fix(phi): Fixes unit and integration tests.
* fix(phi): Ensures that everything works with microsoft/phi-1 for first integration.
* fix(phi): Fixes output of docstring generation.
* fix(phi): Fixes according to latest review.
* fix(phi): Fixes according to latest review.
* fix(tests): Re-enables Phi-1.5 test.
* fix(phi): Fixes attention overflow on PhiAttention (for Phi-2).
* fix(phi): Improves how queries and keys are upcast.
* fix(phi): Small updates on latest changes.
* optionally preprocess segmentation maps for mobilevit
* changed pretrained model name to that of segmentation model
* removed voc-deeplabv3 from model archive list
* added preprocess_image and preprocess_mask methods for processing images and segmentation masks respectively
* added tests for segmentation masks based on segformer feature extractor
* use crop_size instead of size
* reverting to initial model
* Fix initialization for missing parameters in `from_pretrained` under ZeRO-3
* Test initialization for missing parameters under ZeRO-3
* Add more tests
* Only enable deepspeed context for per-module level parameters
* Enable deepspeed context only once
* Move class definition inside test case body
* Add first draft
* Use appropriate gelu function
* More improvements
* More improvements
* More improvements
* Convert checkpoint
* More improvements
* Improve docs, remove print statements
* More improvements
* Add link
* remove unused masking function
* begin tokenizer
* do_lower_case
* debug
* set split_special_tokens=True
* Remove script
* Fix style
* Fix rebase
* Use same design as CLIP
* Add fast tokenizer
* Add SiglipTokenizer to init, remove extra_ids
* Improve conversion script
* Use smaller inputs in conversion script
* Update conversion script
* More improvements
* Add processor to conversion script
* Add tests
* Remove print statements
* Add tokenizer tests
* Fix more tests
* More improvements related to weight initialization
* More improvements
* Make more tests pass
* More improvements
* More improvements
* Add copied from
* Add canonicalize_text
* Enable fast tokenizer tests
* More improvements
* Fix most slow tokenizer tests
* Address comments
* Fix style
* Remove script
* Address some comments
* Add copied from to tests
* Add more copied from
* Add more copied from
* Add more copied from
* Remove is_flax_available
* More updates
* Address comment
* Remove SiglipTokenizerFast for now
* Add caching
* Remove umt5 test
* Add canonicalize_text inside _tokenize, thanks Arthur
* Fix image processor tests
* Skip tests which are not applicable
* Skip test_initialization
* More improvements
* Compare pixel values
* Fix doc tests, add integration test
* Add do_normalize
* Remove causal mask and leverage ignore copy
* Fix attention_mask
* Fix remaining tests
* Fix dummies
* Rename temperature and bias
* Address comments
* Add copied from to tokenizer tests
* Add SiglipVisionModel to auto mapping
* Add copied from to image processor tests
* Improve doc
* Remove SiglipVisionModel from index
* Address comments
* Improve docs
* Simplify config
* Add first draft
* Make it like mistral
* More improvements
* Fix attention_mask
* Fix output_attentions
* Add note in docs
* Convert multilingual model
* Convert large checkpoint
* Convert more checkpoints
* Add pipeline support, correct image_mean and image_std
* Use padding=max_length by default
* Make processor like llava
* Add code snippet
* Convert more checkpoints
* Set keep_punctuation_string=None as in OpenCLIP
* Set normalized=False for special tokens
* Fix doc test
* Update integration test
* Add figure
* Update organization
* Happy new year
* Use AutoModel everywhere
---------
Co-authored-by: patil-suraj <surajp815@gmail.com>
* [DETA] fix freeze/unfreeze function
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add freeze/unfreeze test case in DETA
* fix type
* fix typo 2
* fix : enable aux and enc loss in training pipeline
* Add unsynced variables from original DETA for training
* modification for passing CI test
* make style
* make fix
* manual make fix
* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking
* remove print
* divide configuration in DetaModel and DetaForObjectDetection
* image smaller size than 224 will give topk error
* pred_boxes and logits should be equivalent to two_stage_num_proposals
* add missing part in DetaConfig
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add docstring in configure and prettify TO DO part
* change distribute related code to accelerate
* Update src/transformers/models/deta/configuration_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/deta/test_modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* protect importing accelerate
* change variable name to specific value
* wrong import
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
When running the case on multi-cards server with devcie_map-auto, It will not always be allocated to device 0,
Because other processes may be using these cards. It will select the devices that can accommodate this model.
Signed-off-by: yuanwu <yuan.wu@intel.com>
* remove token_type_ids from model_input_names (like #24788)
* removed test that assumed token_type_ids should be present and updated a model reference so that it points to an available model)
* start - docs, SpeechT5 copy and rename
* add relevant code from FastSpeech2 draft, have tests pass
* make it an actual conformer, demo ex.
* matching inference with original repo, includes debug code
* refactor nn.Sequentials, start more desc. var names
* more renaming
* more renaming
* vocoder scratchwork
* matching vocoder outputs
* hifigan vocoder conversion script
* convert model script, rename some config vars
* replace postnet with speecht5's implementation
* passing common tests, file cleanup
* expand testing, add output hidden states and attention
* tokenizer + passing tokenizer tests
* variety of updates and tests
* g2p_en pckg setup
* import structure edits
* docstrings and cleanup
* repo consistency
* deps
* small cleanup
* forward signature param order
* address comments except for masks and labels
* address comments on attention_mask and labels
* address second round of comments
* remove old unneeded line
* address comments part 1
* address comments pt 2
* rename auto mapping
* fixes for failing tests
* address comments part 3 (bart-like, train loss)
* make style
* pass config where possible
* add forward method + tests to WithHifiGan model
* make style
* address arg passing and generate_speech comments
* address Arthur comments
* address Arthur comments pt2
* lint changes
* Sanchit comment
* add g2p-en to doctest deps
* move up self.encoder
* onnx compatible tensor method
* fix is symbolic
* fix paper url
* move models to espnet org
* make style
* make fix-copies
* update docstring
* Arthur comments
* update docstring w/ new updates
* add model architecture images
* header size
* md wording update
* make style
* First draft
* More improvements
* More improvements
* Make all tests pass
* Remove script
* Update image processor
* Address comments
* Use new gradient checkpointing method
* Convert checkpoints, add integration test
* Do not keep aspect ratio for now
* Set keep_aspect_ratio=False for beit, add integration test
* Remove print statement
* fixes: code fixes on is_batched condition to also check for batched audio data in torch.Tensor format instead of only just checking for batched audio data in np.ndarray format
* Update src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* refactor: code refactoring to remove torch framework dependency
* docs: updated docstring to add torch tensor compatibility
* test: add test cases to incorporate torch tensor inputs
* test: ran make fix-copies for code conformity
* test: refactor test to separate the test_call into test_call_numpy and test_call_torch
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Fix vision text dual encoder
* Small cleanup for wav2vec2 (not fixed yet)
* Small fix for vision_encoder_decoder
* Fix SAM builds
* Update TFBertTokenizer test with modern exporting + tokenizer
* Fix DeBERTa
* Fix DeBERTav2
* Try RAG fix but it's impossible to test locally
* Actually fix RAG now that I got FAISS working somehow
* Fix Wav2Vec2, add sermon
* Fix Hubert
* some nits
* update test
* add support d\sd[a
* remove some dummy inputs
* all good
* style
* nits
* fixes
* fix more copies
* nits
* styling
* fix
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add a slow test just to be sure
* fixup
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Iteratre over out_features instead of stage_names
* Update for all backbones
* Add tests
* Fix
* Align timm backbone behaviour with other backbones
* Fix tests
* Stricter checks on set out_features and out_indices
* Revert back stage selection logic
* Remove out-of-order logic
* Document restriction in docstrings
* move code to Trainer.evaluate to enable use of that function with multiple datasets
* test
* update doc string
* and a tip
* forgot the type
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
* edits to _prepare_4d_causal_attention_mask()
* initial tests for 4d mask
* attention_mask_for_sdpa support
* added test for inner model hidden
* added autotest decorators
* test mask dtype to torch.int64
* torch.testing.assert_close
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* torch_device and @torch_gpu in tests
* upd tests
* +torch decorators
* torch decorators fixed
* more decorators!
* even more decorators
* fewer decorators
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add a convenience method for building in your own name scope
* Second attempt at auto layer building
* Revert "Second attempt at auto layer building"
This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.
* Attempt #3
* Revert "Attempt #3"
This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.
* Add missing attributes that we're going to need later
* Add some attributes we're going to need later
* A fourth attempt! Feel the power flow through you!
* Revert "A fourth attempt! Feel the power flow through you!"
This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.
* Add more values we'll need later
* TF refactor that we'll need later
* Revert "TF refactor that we'll need later"
This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.
* Revert "Revert "TF refactor that we'll need later""
This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.
* make fixup
* Attempt five!
* Revert "Attempt five!"
This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.
* Attempt six - this time don't add empty methods
* Revert "Attempt six - this time don't add empty methods"
This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.
* Attempt seven - better base model class detection!
* Revert "Attempt seven - better base model class detection!"
This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.
* Another attribute we'll need later
* Try again with the missing attribute!
* Revert "Try again with the missing attribute!"
This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.
* This is the attempt that will pierce the heavens!
* Revert "This is the attempt that will pierce the heavens!"
This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.
* Attempt seven - snag list is steadily decreasing
* Revert "Attempt seven - snag list is steadily decreasing"
This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.
* Attempt eight - will an empty snag list do it?
* Revert "Attempt eight - will an empty snag list do it?"
This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.
* Fixes to Hubert issues that cause problems later
* Trying again with Conv1D/SeparableConv fixes
* Revert "Trying again with Conv1D/SeparableConv fixes"
This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.
* Apply the build shape fixes to Wav2Vec2 as well
* One more attempt!
* Revert "One more attempt!"
This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.
* Another attempt!
* Revert "Another attempt!"
This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.
* Let's see how many failures we get without the internal build method
* Fix OpenAI
* Fix MobileBERT
* (Mostly) fix GroupVIT
* Fix BLIP
* One more BLIP fix
* One more BLIP fix!
* Fix Regnet
* Finally fully fix GroupViT
* Fix Data2Vec and add the new AdaptivePool
* Fix Segformer
* Fix Albert
* Fix Deberta/DebertaV2
* Fix XLM
* Actually fix XLM
* Fix Flaubert
* Fix lxmert
* Fix Resnet
* Fix ConvBERT
* Fix ESM
* Fix Convnext / ConvnextV2
* Fix SAM
* Fix Efficientformer
* Fix LayoutLMv3
* Fix speech_to_text
* Fix mpnet and mobilevit
* Fix Swin
* Fix CTRL
* Fix CVT
* Fix DPR
* Fix Wav2Vec2
* Fix T5
* Fix Hubert
* Fix GPT2
* Fix Whisper
* Fix DeiT
* Fix the encoder-decoder / dual-encoder classes
* make fix-copies
* build in name scope
* Fix summarization test
* Fix tied weight names for BART + Blenderbot
* Fix tied weight name building
* Fix to TFESM weight building
* Update TF SAM
* Expand all the shapes out into Big Boy Shapes
* fix a typo and add an illustrative test
* appease black
* reduce code duplication and add Annotion type back with a pending deprecation warning
* remove unused code
* change warning type
* black formatting fix
* change enum deprecation approach to support 3.8 and earlier
* add stacklevel
* fix black issue
* fix ruff issues
* fix ruff issues
* move tests to own mixin
* include yolos
* fix black formatting issue
* fix black formatting issue
* use logger instead of warnings and include target version for deprecation
* Skip nn.Module.reset_parameters
* Actually skip
* Check quality
* Maybe change all inits
* Fix init issues: only modify public functions
* Add a small test for now
* Style
* test updates
* style
* nice tes
* style
* make it even faster
* one more second
* remove fx icompatible
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* skip
* fix quality
* protect the import
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* [DETA] fix freeze/unfreeze function
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add freeze/unfreeze test case in DETA
* fix type
* fix typo 2
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add sdpa
* wip
* cleaning
* add ref
* yet more cleaning
* and more :)
* wip llama
* working llama
* add output_attentions=True support
* bigcode sdpa support
* fixes
* gpt-bigcode support, require torch>=2.1.1
* add falcon support
* fix conflicts falcon
* style
* fix attention_mask definition
* remove output_attentions from attnmaskconverter
* support whisper without removing any Copied from statement
* fix mbart default to eager renaming
* fix typo in falcon
* fix is_causal in SDPA
* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
* add warnings when falling back on the manual implementation
* precise doc
* wip replace _flash_attn_enabled by config.attn_implementation
* fix typo
* add tests
* style
* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
* obey to config.attn_implementation if a config is passed in from_pretrained
* fix is_torch_sdpa_available when torch is not installed
* remove dead code
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove duplicate pretraining_tp code
* add dropout in llama
* precise comment on attn_mask
* add fmt: off for _unmask_unattended docstring
* precise num_masks comment
* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
* cleanup modeling_utils
* backward compatibility
* fix style as requested
* style
* improve documentation
* test pass
* style
* add _unmask_unattended tests
* skip meaningless tests for idefics
* hard_check SDPA requirements when specifically requested
* standardize the use if XXX_ATTENTION_CLASSES
* fix SDPA bug with mem-efficient backend on CUDA when using fp32
* fix test
* rely on SDPA is_causal parameter to handle the causal mask in some cases
* fix FALCON_ATTENTION_CLASSES
* remove _flash_attn_2_enabled occurences
* fix test
* add OPT to the list of supported flash models
* improve test
* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
* remove remaining _flash_attn_2_enabled occurence
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove use_attn_implementation
* fix docstring & slight bug
* make attn_implementation internal (_attn_implementation)
* typos
* fix tests
* deprecate use_flash_attention_2=True
* fix test
* add back llama that was removed by mistake
* fix tests
* remove _flash_attn_2_enabled occurences bis
* add check & test that passed attn_implementation is valid
* fix falcon torchscript export
* fix device of mask in tests
* add tip about torch.jit.trace and move bt doc below sdpa
* fix parameterized.expand order
* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
* update sdpaattention class with the new cache
* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bark/modeling_bark.py
* address review comments
* WIP torch.jit.trace fix. left: test both eager & sdpa
* add test for torch.jit.trace for both eager/sdpa
* fix falcon with torch==2.0 that needs to use sdpa
* fix doc
* hopefully last fix
* fix key_value_length that has no default now in mask converter
* is it flacky?
* fix speculative decoding bug
* tests do pass
* fix following #27907
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fuffill request
* Add test
* Better test
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Better test
* Better test
* MOre comments
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix issues in add and is_done for BeamHypotheses
* make newly added arguments optional for better compatibility
* Directly use cur_len as generated_len, add note for retrocompatibility
* update test expectation
* make cur_len represents the length of the entire sequence including the decoder prompt
* remove redundant if/else in testing
* Draft version of new KV Caching
This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly
* Address numerous PR suggestions
1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.
Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
* Implement the SinkCache through backward+forward rotations
* Integrate (Sink)Cache with Llama FA2
* Set use_legacy_cache=True as default, allows for test passes
* Move from/to_legacy_cache to ...Model class
* Undo unnecessary newline change
* Remove copy utility from deprecated OpenLlama
* Match import style
* manual rebase with main
* Cache class working with generate (#1)
* Draft version of new KV Caching
This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly
* Address numerous PR suggestions
1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.
Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
* Integrate (Sink)Cache with Llama FA2
* Move from/to_legacy_cache to ...Model class
* Undo unnecessary newline change
* Match import style
* working generate
* Add tests; Simplify code; Apply changes to Mistral and Persimmon
* fix rebase mess
* a few more manual fixes
* last manual fix
* propagate changes to phi
* upgrade test
* add use_legacy_cache docstring; beef up tests
* reintroduce unwanted deletes
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
* move import
* add default to model_kwargs.get('use_legacy_cache')
* correct failing test
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* apply PR suggestions
* fix failing test
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
* PR comments
* tmp commit
* add docstrings
* more tests, more docstrings, add to docs
* derp
* tmp commit
* tmp dbg
* more dbg
* fix beam search bug
* cache can be a list of tuples in some models
* fix group beam search
* all but sinkcache integration tests
* fix sink cache and add hard integration test
* now also compatible with input_embeds input
* PR comments
* add Cache support to Phi+FA2
* make fixup
---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Un-skip tests
* Add aliasing support to tf_to_pt_weight_rename
* Refactor tf-to-pt weight rename for simplicity
* Patch mobilebert
* Let us pray that the transfo-xl one works
* Add XGLM rename
* Expand the test to see if we can get more models to break
* Expand the test to see if we can get more models to break
* Fix MPNet (it was actually an unrelated bug)
* Fix MPNet (it was actually an unrelated bug)
* Add speech2text fix
* Update src/transformers/modeling_tf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update to always return a tuple from tf_to_pt_weight_rename
* reformat
* Add a couple of missing tuples
* Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway
* Revert changes to modeling_tf_mpnet.py
* Skip MPNet test and add explanation
* Add weight link for BART
* Add TODO to clean this up a bit
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add model like
* logits match
* minor fixes
* fixes
* up
* up
* add todo
* llava processor
* keep the processor simple
* add conversion script
* fixup
* fix copies
* up
* add to index
* fix config + logits
* fix
* refactor
* more refactor
* more refactor
* fix copies
* add authors
* v1 tests
* add `LlavaProcessor` in init
* remove unneeded import
* up
* up
* docs
* up
* fix CI
* fix CI
* add attention mask in test
* make fixup
* remove the vision model
* that' s the dirty way to do it
* nits
* nits
* updates
* add more tests
* add input tests
* fixup
* more styling
* nits
* updates amd cleanup
* fixup the generation expected results
* fix the testing script
* some cleanup and simplification which does not work yet but almost there!
* make correct dispatch operations
* vectorize works for batch of images and text
* last todos
* nits
* update test and modeling code
* remove useless function for now
* fix few issues
* fix generation
* some nits
* add bakllava
* nits
* remove duplicated code
* finis merge
* cleanup
* missed this line
* fill the todos
* add left padding offset
* add left and rignt padding logic
* bool to properly index
* make sure
* more cleanups
* batch is fixed 😉
* add correct device for tensor creation
* fix some dtype missmatch
* ruff
* update conversion script
* Update src/transformers/__init__.py
* fa 2 support + fix conversion script
* more
* correct reshaping
* fix test dict
* fix copies by ignoring
* fix nit
* skip clip vision model
* fixup
* fixup
* LlavaForVisionText2Text -> LlavaForCausalLM
* update
* fix
* raise correct errors
* fix
* docs
* nuke for now
* nits here and there
* fixup
* fix remaining tests
* update LlavaForConditionalGeneration instead of CausalLM
* fixups
* pipeline support
* slow and piepline tests
* supports batch
* nits
* cleanup
* fix first integration tests
* add pad token where needed
* correct etsts
* fixups
* update pipeline testr
* fix quality
* nits
* revert unneeded change
* nit
* use BatchFeature
* from ...feature_extraction_utils import BatchFeature
* nits
* nits
* properly update
* more f*** nits
* fix copies
* comment
* keep slow test slow
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add piepline example
* add pixel values in docstrign
* update pr doctest
* fix
* fix slow tests
* remove hack
* fixup
* small note
* forward contrib credits from PR25789
* forward contrib credits from original implementation and work
* add arthur
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update docstring
* nit
* move to not doctested because of timeout issues
* fixup
* add description
* more
* fix-copies
* fix docs
* add beam search
* add more comments
* add typehints on processor
* add speedup plot
* update slow tests and docs
* push test
* push batched test
* fix batched generation with different number of images
* remove benchmark due to a bug
* fix test
* fix copies
* add gcolab demo
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Copies `modeling_flax_gpt_neo.py` to start
* MLP Block. WIP Attention and Block
* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue
* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.
* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating
* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..
* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?
* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now
* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now
* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting
* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗
* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later
* start porting pretrained wrappers
realised it probably needs return dict as a prereq
* cleanup, quality, style
* readds `return_dict` and model output named tuples
* (tentatively) pretrained wrappers work 🔥
* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.
* [WIP] debugging numerics
* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.
* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict
* adds missing TYPE_CHECKING import and `make fixup`
* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO
* commenting out equivalence test as can just use common
* debugging
* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥
* cleanup of modeling file
* cleanup of test file
* Resolving simpler review comments
* addresses more minor review comments
* fixing introduced pytest errors from review
* wip additional slow tests
* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay
* `make quality`, `make style`
* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs
* `make fix-copies`
* fix mangled function following `make fix-copies`
* adds missing type checking imports
* fixes missing parameter checkpoint warning
* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`
* swaps import guards
??? how did these get swapped initially?
* removing `inv_freq` again as pytorch version has now removed
* attempting to get CI to pass
* adds doc entries for llama flax models
* fixes typo in __init__.py imports
* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version
* overrides tests with dummy to see if CI passes
need to fill in these tests later
* adds my contribution to docs
* `make style; make quality`
* replaces random masking with fixed to work with flax version
* `make quality; make style`
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* updates `x`->`tensor` in `rotate_half`
* addresses smaller review comments
* Update docs/source/en/model_doc/llama.md
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adds integration test class
* adds `dtype` to rotary embedding to cast outputs
* adds type to flax llama rotary layer
* `make style`
* `make fix-copies`
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* applies suggestions from review
* Update modeling_flax_llama.py
* `make fix-copies`
* Update tests/models/llama/test_modeling_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixes shape mismatch in FlaxLlamaMLP
* applies some suggestions from reviews
* casts attn output logits to f32 regardless of dtype
* adds attn bias using `LlamaConfig.attention_bias`
* adds Copied From comments to Flax Llama test
* mistral and persimmon test change -copy from llama
* updates docs index
* removes Copied from in tests
it was preventing `make fix-copies` from succeeding
* quality and style
* ignores FlaxLlama input docstring
* adds revision to `_CHECKPOINT_FOR_DOC`
* repo consistency and quality
* removes unused import
* removes copied from from Phi test
now diverges from llama tests following FlaxLlama changes
* adds `_REAL_CHECKPOINT_FOR_DOC`
* removes refs from pr tests
* reformat to make ruff happy
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* v1 fusing modules
* add fused mlp support
* up
* fix CI
* block save_pretrained
* fixup
* small fix
* add new condition
* add v1 docs
* add some comments
* style
* fix nit
* adapt from suggestion
* add check
* change arg names
* change variables name
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* split up into 3 different private methods
* more conditions
* more checks
* add fused tests for custom models
* fix
* fix tests
* final update docs
* final fixes
* fix importlib metadata
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change it to `do_fuse`
* nit
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* few fixes
* revert
* fix test
* fix copies
* raise error if model is not quantized
* add test
* use quantization_config.config when fusing
* Update src/transformers/modeling_utils.py
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Added test cases for rembert refering to albert and reformer test_tokenization
* removed CURL_CA_BUNDLE='
* Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True
* Overrided test_added_tokens_serialization
* As slow->fast token failed due to the different initialization for [MASK] for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token
* Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to have AddedToken instance with lstrip=True
* Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True
* Cleaned the code and added fmt: skip to avoid line breaks after make style + added comments to indicate from the copied test cases
* Corrected few comments
* Fixed quality issue
* Ran fix-copies
* Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text
* Reverted the changes made by repo-consistancy
---------
Co-authored-by: Kokane <kokanen@apac.corpdir.net>