* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
* Add check for target_sizes is None in post_process_image_guided_detection
* Make sure Owlvit and Owlv2 in sync
* Fix incorrect indentation; add check for correct size of target_sizes
* don't log base model architecture in wandb is log model is false
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* convert log model setting into an enum
* fix formatting
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use currently active microphone on mac for ffmpeg_microphone
* Allow ffmpeg_microphone device to be specified
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* fix: default value reflects the runtime environment variables rather than the ones present at import time.
* Fix: Change `deterministic` to None by default; use env var if None
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
* feat(cache): StaticCache uses index_copy_ to avoid useless copy
Using index_copy_ allows for explicit in-place change of the tensor.
Some backends (XLA) will otherwise copy the tensor, making the code
slower and using more memory.
Proposed implementation will end up using less memory and on XLA will
result in less compilation, but the change is also quite generic, making
no change whatsoever on CUDA or CPU backend.
* feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy
Applying the same change done in StaticCache.
* fix(cache): fallback of index_copy_ when not implemented
* fix(cache): in index_copy_ ensure tensors are on same device
* [run slow] llama
* fix(cache): add move of cache_position to same device in SlidingWindowCache
* Revert "[run slow] llama"
This reverts commit 02608dd142.