* first try
* codestyle
* idefics2 is happy
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
* fix-copies
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
* blip-2 needs to init vision from config
* when was this removed O_o
* minor fix
* tests
* this way?
* tests
* model-agnostic code
* codestyle
* add tests for idefics
* modify general test for VLMs
* no generation test for vlm yet!
* no generation test here also
* wanr in VIT-SDPA if output attn
* add more tests
* user can pass dict as attn impl
* repo consistency
* update
* muicgen
* no prints
* forgot speech enc-dec and clip
* how many composite models we have?
* musicgen meelody is same as mudicgen
* +siglip
* fix tests + add some more
* remove idefics custom overriden code
* make idefics2 automappable
* nits
* skip tests
* doctests
* Update src/transformers/models/idefics2/configuration_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/clip/test_modeling_clip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* major update, no need for automap
* clean up
* add FA2 test
* more tests
* style
* skip tests
* why did these started failing now?
* no attributes for FA2 needed
* one tiny test
* address comment about FA2 false warning
* style
* add new models and resolve conflicts
* fix copies
* let it be this way for now, come back tomorrow to review
* some more fixes
* update
* more updates
* update
* fix copies
* style and tests
* another big update
* fix tests
* fix tests
* update
* another update
* fix tests
* fix copies
* fix tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Trigger UDOP tests
* Try forcing dtype in LayoutLMV3
* Do checks to see where uint8 is getting in
* Do checks to see where uint8 is getting in
* Found it!
* Add .astype(np.float32)
* Remove forced check, make fixup
* Checking where exactly the uint8 creeps in
* More checking on the uint8 issues
* Manually upcast in rescale()
* Remove UDOP trigger
* bookmark
* Bookmark
* Bookmark
* Actually implement
* Pass in kwarg explicitly
* Adjust for if we do or don't have labels
* Bookmark fix for od
* bookmark
* Fin
* closer
* Negate accelerate grad accum div
* Fixup not training long enough
* Add in compute_loss to take full model output
* Document
* compute_loss -> compute_loss_fn
* Add a test
* Refactor
* Refactor
* Uncomment tests
* Update tests/trainer/test_trainer.py
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Support Llama 3.2 conversion (text models)
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
* Fix rope factor
* Update chat template
Initialize from a well-known template.
The guidance is that the changes should be applied to 3.1 models as
well.
* Remove import
* Support Llama Guard 3 conversion
* Tokenizer details
* Fix eos added token in base models
* Fix generation config for base models
* Specify revision for known tokenizers
* Style
* Reuse chat templates for older models
* Improve error when converting tokenizer < Llama 3
---------
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
* tmp
* all visited
* test all
* Update src/transformers/models/moshi/modeling_moshi.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* delete another one :D
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
There's a bug on M1 macs with transformer >= 4.43.0 and torch >= 2.1.0, where if a model has tied embeddings, then the fast loading from #31771 causes a bus error when the model is actually run. This can be solved by disabling `_supports_param_buffer_assignment` for these models.
More info in comments in #33357
* fix(Wav2Vec2ForCTC): torch export
Resolves the issue described in #34022 by implementing the
masking of the hidden states using an elementwise multiplication
rather than indexing with assignment.
The torch.export functionality seems to mark the tensor as frozen
even though the update is legal.
This change is a workaround for now to allow the export of the
model as a FxGraph. Further investigation is required to find
the real solution in pytorch.
* [run-slow] hubert, unispeech, unispeech_sat, wav2vec2
* change cpu offload warning for fp8 quantization
* change cpu offload warning for fp4 quantization
* change cpu offload variable name for fp8 and fp4 quantization
Update 'trainer._get_eval_sampler()' to support 'group_by_length' argument
Trainer didn't support grouping by length for evaluation, which made evaluation slow with 'eval_batch_size'>1.
Updated 'trainer._get_eval_sampler()' method was based off of 'trainer._get_train_sampler()'.
* auto-gptq requirement is removed & model is changed & tokenizer pad token is assigned
* values func is changed with extensions & sequence key value bug is fixed
* map key value check is added in ExtensionsTree
* empty trimmed_ids bug is fixed
* tail_id IndexError is fixed
* empty trimmed_ids bug fix is updated for failed test
* too much specific case for specific tokenizer is removed
* input_ids check is updated
* require auto-gptq import is removed
* key error check is changed with empty list check
* empty input_ids check is added
* empty trimmed_ids fix is checked with numel function
* usage change comments are added
* test changes are commented
* comment style and quality bugs are fixed
* test comment style and quality bug is fixed
* Fix FSDP Initialization for resume training
* Added init_fsdp function to work with dummy values
* Fix FSDP initialization for resuming training
* Added CUDA decorator for tests
* Added torch_gpu decorator to FSDP tests
* Fixup for failing code quality tests
* add idefics
* conflicts after merging main
* enable tests but need to fix some
* fix tests
* no print
* fix/skip some slow tests
* continue not skip
* rebasing broken smth, this is the fix