* Add files
* Init
* Add TimmWrapperModel
* Fix up
* Some fixes
* Fix up
* Remove old file
* Sort out import orders
* Fix some model loading
* Compatible with pipeline and trainer
* Fix up
* Delete test_timm_model_1/config.json
* Remove accidentally commited files
* Delete src/transformers/models/modeling_timm_wrapper.py
* Remove empty imports; fix transformations applied
* Tidy up
* Add image classifcation model to special cases
* Create pretrained model; enable device_map='auto'
* Enable most tests; fix init order
* Sort imports
* [run-slow] timm_wrapper
* Pass num_classes into timm.create_model
* Remove train transforms from image processor
* Update timm creation with pretrained=False
* Fix gamma/beta issue for timm models
* Fixing gamma and beta renaming for timm models
* Simplify config and model creation
* Remove attn_implementation diff
* Fixup
* Docstrings
* Fix warning msg text according to test case
* Fix device_map auto
* Set dtype and device for pixel_values in forward
* Enable output hidden states
* Enable tests for hidden_states and model parallel
* Remove default scriptable arg
* Refactor inner model
* Update timm version
* Fix _find_mismatched_keys function
* Change inheritance for Classification model (fix weights loading with device_map)
* Minor bugfix
* Disable save pretrained for image processor
* Rename hook method for loaded keys correction
* Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm`
* Managing num_labels <-> num_classes attributes
* Enable loading checkpoints in Trainer to resume training
* Update error message for output_hidden_states
* Add output hidden states test
* Decouple base and classification models
* Add more test cases
* Add save-load-to-timm test
* Fix test name
* Fixup
* Add do_pooling
* Add test for do_pooling
* Fix doc
* Add tests for TimmWrapperModel
* Add validation for `num_classes=0` in timm config + test for DINO checkpoint
* Adjust atol for test
* Fix docs
* dev-ci
* dev-ci
* Add tests for image processor
* Update docs
* Update init to new format
* Update docs in configuration
* Fix some docs in image processor
* Improve docs for modeling
* fix for is_timm_checkpoint
* Update code examples
* Fix header
* Fix typehint
* Increase tolerance a bit
* Fix Path
* Fixing model parallel tests
* Disable "parallel" tests
* Add comment for metadata
* Refactor AutoImageProcessor for timm wrapper loading
* Remove custom test_model_outputs_equivalence
* Add require_timm decorator
* Fix comment
* Make image processor work with older timm versions and tensor input
* Save config instead of whole model in image processor tests
* Add docstring for `image_processor_filename`
* Sanitize kwargs for timm image processor
* Fix doc style
* Update check for tensor input
* Update normalize
* Remove _load_timm_model function
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Original issue: https://github.com/huggingface/peft/issues/2256
There is a potential error when using load_best_model_at_end=True with a
prompt learning PEFT method. This is because Trainer uses load_adapter
under the hood but with some prompt learning methods, there is an
optimization on the saved model to remove parameters that are not
required for inference, which in turn requires a change to the model
architecture. This is why load_adapter will fail in such cases and users
should instead set load_best_model_at_end=False and use
PeftModel.from_pretrained. As this is not obvious, we now intercept the
error and add a helpful error message.
* add "Translating Benchmarks.md to Chinese "
* Removed all the English original text (which was previously kept as comments in the document) and refined some of the Chinese expressions.
* Add docs/source/ar/community.md to Add_docs_source_ar_community.md
* Update community.md
* Update community.md
* Update community.md
* Update _toctree.yml - add community.md
* Update docs/source/ar/community.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Create how_to_hack_models.md
* Create modular_transformers.md
* Create tiktoken.md
* Update _toctree.yml
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* Correct the new defaults (#34377)
* Correct the new defaults
* CIs
* add check
* Update utils.py
* Update utils.py
* Add the max_length in generate test checking shape without passing length
* style
* CIs
* fix fx CI issue
* [auto. ping] Avoid sending empty info + add more team members (#34383)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix glm (#34388)
* Fix duplicated
* fix import
* Use non nested images and batched text Idefics2/3 (#34222)
* add support for non nested images and add tests
* add tests error scenario
* fix style
* added single and no image to error tests
* Fix onnx non-expotable inplace aten op (#34376)
* fix onnx non-expotable inplace op
* mistral, qwen2, qwen2_vl, starcoder2
* fixup copies
* Fix right padding in LLaVA models (#34305)
* fix right pad llavas
* device mismatch
* no filter (#34391)
* no filter
* no filter
* no filter
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* SynthID: better example (#34372)
* better example
* Update src/transformers/generation/configuration_utils.py
* Update src/transformers/generation/logits_process.py
* nits
* Tests: upgrade `test_eager_matches_sdpa_generate` (#34386)
* Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
* Avoid check expected exception when it is on CUDA (#34408)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typos in agents_advanced.md (#34405)
* [docs] Cache implementations (#34325)
cache
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* [run-slow] hubert
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
In method `Trainer#get_batch_samples`, the return values should be a
list of batch samples and an integer indicating the number of items that
exist in the batch. However, this was not actually a case and what was
returned instead of an integer, was a tensor with one element. In the
multi-GPU setup, this tensor is placed in a different device than the
loss tensor, causing the loss function to raise a `RuntimeError`.
The problem arises from
5d7739f15a/src/transformers/trainer.py (L5139-L5144),
where the outer `sum` operates over a list of tensors which means that
the final result is also a tensor. To counter this issue, a new check
(after the accelerator gathering) has been added in order to convert a
potential tensor to an integer before returning the
`num_items_in_batch`.
* Option to set 'non_blocking' for to(device) operation for performance improvements. Defaults to 'false', thus no behavioral changes.
* Enabling non_blocking in to() operation of BatchFeature.
* Improved docstring on utilization of non_blocking
* Force non_blocking as keyword argument
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Daniel Bogdoll <dbogdoll@umich.edu>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix GA bugs and add unit test
* narrow down model loss unit test diff gap
* format code to make ruff happy
* send num_items_in_batch argument to decoder
* fix GA loss bug in BertLMHeadModel
* use TinyStories-33M to narrow down diff gap
* fotmat code
* missing .config
* avoid add extra args
---------
Co-authored-by: kangsheng <kangsheng@meituan.com>
Update bitnet.py, extremely small change to allow for easier PEFT integration
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* add commen to offloading
* Update docs/source/en/kv_cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update modular and add examples
* style
* improve example comments
* style
* fix small logic issue for imports
* fix relative order issue when files do not make sense
* Improve comments
* trigger CIs