* archive_file may not be specified
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
* Remap partial disk offload to cpu for GGUF files
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
* Don't remap auto device_map and raise RuntimeError
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* new flute
* new higgs working
* small adjustments
* progress and quallity
* small updates
* style
---------
Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* allow processor to preprocess conversation + video metadata
* allow callable
* add test
* fix test
* nit: fix
* add metadata frames_indices
* Update src/transformers/processing_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update src/transformers/processing_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* port updates from Orr and add one more test
* Update src/transformers/processing_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* typo
* as dataclass
* style
* docstring + maek sure tests green
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* Add implementation for DataCollatorForMultipleChoice based on docs.
* Add DataCollatorForMultipleChoice to import structure.
* Remove custom DataCollatorForMultipleChoice implementations from example scripts.
* Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean.
* Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable.
* Apply suggested changes and run make fixup.
* fix copies, style and fixup
* add missing documentation
* nits
* fix docstring
* style
* nits
* isort
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* change order of unmasking of tokens
* library import
* class setup
* test function
* refactor
* add commit message
* test modified
* explict initiliasation of weights + made model smaller
* removed sepete testing file
* fixup
* fixup core
* test attention mask with token types
* tests fixup
* removed PaliGemmaAttentionMaskTest class
---------
Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
* Adding option to save/reload scaler
* Removing duplicate variable
* Adding save/reload test
* Small fixes on deterministic algorithm call
* Moving LLM test to another file to isolate its environment
* Moving back to old file and using subprocess to run test isolated
* Reverting back accidental change
* Reverting back accidental change
* add RAdamScheduleFree optimizer
* revert schedulefree version to the minimum requirement
* refine is_schedulefree_available so that it can take min_version
* refine documents
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* make output_dir optional
* inintaied a basic testing module to validate and verify the changes
* Test output_dir default to 'tmp_trainer' when unspecified.
* test existing functionality of output_dir.
* test that output dir only created when needed
* final check
* added doc string and changed the tmp_trainer to trainer_output
* amke style fixes to test file.
* another round of fixup
---------
Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
* Add is_torch_greater_or_equal test decorator
* Add common test for torch.export
* Fix bit
* Fix focalnet
* Fix imagegpt
* Fix seggpt
* Fix swin2sr
* Enable torch.export test for vision models
* Enable test for video models
* Remove json
* Enable for hiera
* Enable for ijepa
* Fix detr
* Fic conditional_detr
* Fix maskformer
* Enable test maskformer
* Fix test for deformable detr
* Fix custom kernels for export in rt-detr and deformable-detr
* Enable test for all DPT
* Remove custom test for deformable detr
* Simplify test to use only kwargs for export
* Add comment
* Move compile_compatible_method_lru_cache to utils
* Fix beit export
* Fix deformable detr
* Fix copies data2vec<->beit
* Fix typos, update test to work with dict
* Add seed to the test
* Enable test for vit_mae
* Fix beit tests
* [run-slow] beit, bit, conditional_detr, data2vec, deformable_detr, detr, focalnet, imagegpt, maskformer, rt_detr, seggpt, swin2sr
* Add vitpose test
* Add textnet test
* Add dinov2 with registers
* Update tests/test_modeling_common.py
* Switch to torch.testing.assert_close
* Fix masformer
* Remove save-load from test
* Add dab_detr
* Add depth_pro
* Fix and test RT-DETRv2
* Fix dab_detr
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and decay methods to 'get_wsd_schedule'
* support num_training_steps and num_stable_steps for get_wsd_schedule
* support num_training_steps and num_stable_steps for get_wsd_schedule
* get wsd scheduler before the `num_training_steps` decision
* fix code_quality
* Update stable branch logic
* fix code_quality
* Move stable stage decide to `get_wsd_schedule`
* Update docstring of `get_wsd_schedule`
* Update `num_train_steps` to optional
* Update `num_train_steps` to optional
* Update docstring of `get_wsd_schedule`
* Update src/transformers/optimization.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>