Aymeric Roucher
489cbfd6d3
Add visit webpage tool ( #33353 )
...
* Add VisitWebpageTool
2024-09-09 10:32:42 +02:00
Wing Lian
62aecd85ff
schedulefree optimizers ( #30079 )
...
* schedulefree optimizers
* fix train instead of eval for optimizer
* fixes and update docs
* chore: lint
* add tests and drop overly-verbose _32bit suffix
* chore: lint
* fix for docs
* fix code review issues
* use duck-typing to avoid per-optimizer patches
* fixup style
* fixup style
* warn if incorrect accelerate version with schedule free
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>
---------
Co-authored-by: Aman Karmani <aman@tmm1.net>
2024-09-09 09:51:39 +02:00
Raushan Turganbay
60226fdc1d
Fix quantized cache tests ( #33351 )
...
* fix
* fix
* better fix
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-09-09 09:09:58 +02:00
Nicholas Broad
66bc4def95
add sdpa mbart ( #32033 )
...
* add sdpa mbart
useful for donut
* update sdpa docs
* formatting
* add self._use_sdpa in mbartencoder
* use self.config to check attn
* retrigger checks
* [run-slow] mbart
2024-09-06 17:31:24 -07:00
Daniel Lok
a70286f827
Update author for QLorA/PEFT community notebook ( #33338 )
...
update author
Signed-off-by: Daniel Lok <daniel.lok@databricks.com>
2024-09-06 22:50:26 +02:00
Matt
d7b04ea14d
Fix Prefill docs ( #33352 )
...
last -> final
2024-09-06 17:57:54 +01:00
Joao Gante
6ff6069fa7
RoPE: fix BC warning ( #33331 )
2024-09-06 16:15:11 +01:00
Arthur
2d757002fc
red-ci on main, fix copies ( #33356 )
...
* fix copies
* ???
2024-09-06 17:06:39 +02:00
Ita Zaporozhets
e48e5f1f13
Support reading tiktoken tokenizer.model file ( #31656 )
...
* use existing TikTokenConverter to read tiktoken tokenizer.model file
* del test file
* create titktoken integration file
* adding tiktoken llama test
* ALTNATIVE IMPLEMENTATION: supports llama 405B
* fix one char
* remove redundant line
* small fix
* rm unused import
* flag for converting from tiktokeng
* remove unneeded file
* ruff
* remove llamatiktokenconverter, stick to general converter
* tiktoken support v2
* update test
* remove stale changes
* udpate doc
* protect import
* use is_protobuf_available
* add templateprocessor in tiktokenconverter
* reverting templateprocessor from tiktoken support
* update test
* add require_tiktoken
* dev-ci
* trigger build
* trigger build again
* dev-ci
* [build-ci-image] tiktoken
* dev-ci
* dev-ci
* dev-ci
* dev-ci
* change tiktoken file name
* feedback review
* feedback rev
* applying feedback, removing tiktoken converters
* conform test
* adding docs for review
* add doc file for review
* add doc file for review
* add doc file for review
* support loading model without config.json file
* Revert "support loading model without config.json file"
This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb.
* remove dev var
* updating docs
* safely import protobuf
* fix protobuf import error
* fix protobuf import error
* trying isort to fix ruff error
* fix ruff error
* try to fix ruff again
* try to fix ruff again
* try to fix ruff again
* doc table of contents
* add fix for consistency.dockerfile torchaudio
* ruff
* applying feedback
* minor typo
* merging with push-ci-image
* clean up imports
* revert dockerfile consistency
2024-09-06 14:24:02 +02:00
Shiyu
342e800086
support 3D attention mask in bert ( #32105 )
...
* support 3D/4D attention mask in bert
* test cases
* update doc
* fix doc
2024-09-06 14:20:48 +02:00
GeLee
2b18354106
add self.head_dim for VisionAttention in Qwen2-VL ( #33211 )
...
* add self.head_dim for VisionAttention in Qwen2-VL
* add self.head_dim for VisionAttention in Qwen2-VL
* fix ci
* black the test_modeling_qwen2_vl.py
* use ruff to format test_modeling_qwen2_vl.py
* [run-slow] qwen2_vl
* use tying for python3.8
* fix the import format
* use ruff to fix the ci error I001
* [run-slow] qwen2_vl
* remove unused import
* commit for rebase
* use ruff fix ci
* [run-slow] qwen2_vl
---------
Co-authored-by: root <liji>
2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi
3314fe1760
Add validation for maximum sequence length in modeling_whisper.py ( #33196 )
...
* Add validation for maximum sequence length in modeling_whisper.py
Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.
This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.
* Change exception message in src/transformers/models/whisper/modeling_whisper.py
The exception message is for whisper's label's sequence max length.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py
It's for whisper's config.max_target_positions.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py
* Add test for maximum label's sequence length in test_modeling_whisper.py
* Add self to modeling_whisper.py
* Update test_modeling_whisper.py with respect to automatic validations
* Update modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Remove assert from test_modeling_whisper.py
* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py
* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py
* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py
* Add new tests in test_modeling_whisper.py
* Update test_modeling_whisper.py
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-09-06 14:09:49 +02:00
Ita Zaporozhets
363301f221
support loading model without config.json file ( #32356 )
...
* support loading model without config.json file
* fix condition
* update tests
* add test
* ruff
* ruff
* ruff
2024-09-06 13:49:47 +02:00
Xuehai Pan
e1c2b69c34
Load dynamic module (remote code) only once if code isn't change ( #33162 )
...
* Load remote code only once
* Use hash as load indicator
* Add a new option `force_reload` for old behavior (i.e. always reload)
* Add test for dynamic module is cached
* Add more type annotations to improve code readability
* Address comments from code review
2024-09-06 12:49:35 +01:00
Shijie
1bd9d1c899
fix qwen2vl vision eager-attention ( #33213 )
...
* fix-qwen2vl-vision-eager-attention
* code-quality
* Update src/transformers/models/qwen2_vl/modeling_qwen2_vl.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* code-quality
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-06 13:42:17 +02:00
Sanchit Gandhi
51d15eb1c1
[whisper] alternative fix for long-form timestamps ( #32131 )
...
* [whisper] alternative fix for long-form timestamps
* update test
2024-09-06 12:57:08 +02:00
Joao Gante
2b789f27f3
Docs: add more cross-references to the KV cache docs ( #33323 )
...
* add more cross-references
* nit
* import guard
* more import guards
* nit
* Update src/transformers/generation/configuration_utils.py
2024-09-06 10:22:00 +01:00
Raushan Turganbay
1759bb9126
Fix: StaticCache & inputs_embeds
( #32932 )
...
squash commit
2024-09-06 12:56:59 +05:00
Daniel Lok
5792c459ed
Add a community notebook for fine-tuning with QLoRA, PEFT, and MLflow ( #33319 )
...
add notebook for finetuning with mlflow
Signed-off-by: Daniel Lok <daniel.lok@databricks.com>
2024-09-06 09:35:01 +02:00
Shijie
21fac7abba
simple align qwen2vl kv_seq_len calculation with qwen2 ( #33161 )
...
* qwen2vl_align_kv_seqlen_to_qwen2
* flash att test
* [run-slow] qwen2_vl
* [run-slow] qwen2_vl fix OOM
* [run-slow] qwen2_vl
* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* code quality
---------
Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-09-05 21:19:30 +05:00
Vladislav Bronzov
5d11de4a2f
Add Qwen2Moe GGUF loading support ( #33264 )
...
* update gguf doc, config and tensor mapping
* add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests
* apply code style fixes
* reformat files
* assign GGUFQwen2Converter to qwen2_moe
2024-09-05 17:42:03 +02:00
Michelle Habonneau
132e87500e
Update SECURITY.md ( #32680 )
...
updated reporting a vulnerability section
2024-09-05 16:41:01 +02:00
Joshua Lochner
c6d2848a23
🚨 Fix torch.jit.trace
for interpolate_pos_encoding
in all vision models ( #33226 )
...
* Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models
* Apply formatting
* Add missing `self.config = config`
* Fix copies
* Fix hiera interpolation unit test
* Formatting
* Update `_import_structure`
* make style
* Fix docstring
* Use `# Copied from` instead of utils
* DeiT variable renaming (`class_and_dist_pos_embed`)
* Fix Hiera `interpolate_pos_encoding`
2024-09-05 16:17:34 +02:00
Niklas Muennighoff
03164ba14e
Add paper link ( #33305 )
2024-09-05 15:49:28 +02:00
Younes Belkada
47b096412d
Fix: Fix FalconMamba
training issues due to incompatible kernels ( #33195 )
...
* fix FM training kernels
* fix copies
* fix copies
* propagate to slow path
* make it BC
* add comment
* fix test
2024-09-05 11:55:08 +02:00
Raushan Turganbay
43df47d8e7
Llava Onevision: add model ( #32673 )
...
* working version
* fix copies
* update
* tests
* update docs
* codestyle
* add more tests
* add returns for docs
* clean up
* Update src/transformers/models/llava_onevision/processing_llava_onevision.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates
* codestyle
* style
* shouldn't be reversed
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* add pooling in videos
* [run-slow] llava_onevision
* num-logits-to-keep
* [run-slow] llava_onevision
* [run-slow] llava_onevision
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* video matched orig impl
* fix tests
* chat template was modified
* Update docs/source/en/model_doc/llava_onevision.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add morer info in the doc page
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-05 14:43:20 +05:00
Yoni Gozlan
9230d78e76
Add validate images and text inputs order util for processors and test_processing_utils ( #33285 )
...
* Add validate images and test processing utils
* Remove encoded text from possible inputs in tests
* Removed encoded inputs as valid in processing_utils
* change text input check to be recursive
* change text check to all element of lists and not just the first one in recursive checks
2024-09-04 13:50:31 -04:00
Matthew Douglas
b3909989d3
Fix excessive CPU memory usage with FSDP and cpu_ram_efficient_loading ( #33154 )
2024-09-04 18:37:54 +02:00
Yoach Lacombe
a1faf22f2c
[BUG] fix upper nltk version ( #33301 )
...
fix upper nltk version
2024-09-04 18:28:08 +02:00
Aymeric Roucher
cfd92c64f5
Add new documentation page for advanced agent usage ( #33265 )
...
* Add new documentation page for advanced agent usage
2024-09-04 18:19:54 +02:00
Matt
01c8c6c419
Add a warning to the chat template docs about the tool_calls format ( #33277 )
...
* Add a warning to the chat template docs
* Add a warning to the chat template docs
* Add a warning to the chat template docs
2024-09-04 17:13:34 +01:00
Aymeric Roucher
2cb543db77
Multi agents with manager ( #32687 )
...
* Add Multi agents with a hierarchical system
2024-09-04 17:30:54 +02:00
amyeroberts
d2dcff96f8
[InstructBLIP] qformer_tokenizer is required input ( #33222 )
...
* [InstructBLIP] qformer_tokenizer is required input
* Bit safer
* Add to instructblipvideo processor
* Fix up
* Use video inputs
* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
2024-09-04 16:18:06 +01:00
dependabot[bot]
5731dc8dd8
Bump cryptography from 42.0.0 to 43.0.1 in /examples/research_projects/decision_transformer ( #33286 )
...
Bump cryptography in /examples/research_projects/decision_transformer
Bumps [cryptography](https://github.com/pyca/cryptography ) from 42.0.0 to 43.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst )
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...43.0.1 )
---
updated-dependencies:
- dependency-name: cryptography
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-04 17:13:18 +02:00
Alex Sherstinsky
122ded0a11
Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 ( #33188 )
...
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
2024-09-04 17:01:12 +02:00
Yih-Dar
178cb6bb1c
wait 15m before SSH into runner workflow stops ( #33300 )
...
15m
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-09-04 16:20:56 +02:00
laurentd-lunit
d703477265
[fix] LlavaNextProcessor '_get_unpadded_features' method ( #33263 )
...
* [fix] LlavaNextProcessor '_get_unpadded_features' method
* [tests] add test_image_token_filling
* [chore] style + comment
* [minor] improve readability
* [chore] run make fix-copies
2024-09-04 17:41:51 +05:00
Joao Gante
d750b509fc
Config: unified logic to retrieve text config ( #33219 )
2024-09-04 12:03:30 +01:00
Raushan Turganbay
ebbe8d8014
Cache docs: update ( #32929 )
...
* some changes
* more updates
* fix cache copy
* nits
* nits
* add tests
2024-09-04 15:05:31 +05:00
Raushan Turganbay
35f72ebf47
Fix: multigpu training ( #33271 )
...
fix
2024-09-04 15:01:08 +05:00
Niklas Muennighoff
ecd61c6286
Add OLMoE ( #32406 )
...
* Add OLMoE
* Add OLMoE
* Updates
* Make norm optional; add keys
* Add output
* Add
* Fix dtype
* Fix eos config
* Update
* Add OLMoE
* Fix OLMoE path
* Format
* Format
* Rmv copy statement
* Rmv copy statement
* Format
* Add copies
* Cp rotary
* Fix aming
* Fix naming
* Update RoPE integration; num_logits_to_keep; Add copy statements
* Add eps to config
* Format
* Add aux loss
* Adapt router_aux_loss_coef
* Update md
* Adapt
* adapt tests
2024-09-03 18:43:12 +02:00
Joao Gante
d6534f996b
Repo checks: check documented methods exist ( #32320 )
2024-09-03 17:40:27 +01:00
Arthur
979d24e7fd
fix the parallel number of CI nodes when it is smaller than number of tests ( #33276 )
...
* fix the parallel number
* this?
* keep it simple
* woups
* nit
* style
* fix param name
* fix
* fix dtype
* yups
* ???
* ??
* this?
* ????
* no default flow style
* ??
* print config
* ????
* there we go!
* documentation
* update
* remove unwanted file
2024-09-03 16:53:21 +02:00
Zach Mueller
6b7d64ac1c
Only disallow DeepSpeed Zero-3 for auto bs finder ( #31731 )
...
* Only disallow DeepSpeed
* Clean
* DeepSpeed!
* Add a test for deepspeed
2024-09-03 09:16:28 -04:00
Omar Salman
03c12d0d63
Add sdpa support for Albert ( #32092 )
...
* Add sdpa support for Albert
* [run_slow] albert
* Add benchmarks and PR suggestion
* Fix quality
* Fix
* [run_slow] albert
2024-09-03 14:01:00 +01:00
dependabot[bot]
e969d884a6
Bump opencv-python from 4.4.0.42 to 4.8.1.78 in /examples/research_projects/visual_bert ( #33251 )
...
Bump opencv-python in /examples/research_projects/visual_bert
Bumps [opencv-python](https://github.com/opencv/opencv-python ) from 4.4.0.42 to 4.8.1.78.
- [Release notes](https://github.com/opencv/opencv-python/releases )
- [Commits](https://github.com/opencv/opencv-python/commits )
---
updated-dependencies:
- dependency-name: opencv-python
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 14:32:23 +02:00
Matt
0d86727354
Update chat template docs to remove Blenderbot ( #33254 )
...
* Update docs to remove obsolete Blenderbot
* Remove another reference to Blenderbot
2024-09-03 12:18:04 +01:00
Isotr0py
edeca4387c
🚨 Support dequantization for most GGML types ( #32625 )
...
* use gguf internal dequantize
* add Q5_0 test
* add iq1 test
* add remained test
* remove duplicated test
* update docs
* add gguf version limit
* make style
* update gguf import catch
* revert vocab_size patch
* make style
* use GGUF_MIN_VERSION everywhere
2024-09-03 12:58:14 +02:00
Yoach Lacombe
979f4774f6
Fix Bark saving ( #33266 )
2024-09-03 10:57:59 +02:00
Raushan Turganbay
7ed9789e21
Fix: num_logits_to_keep
in composite models ( #33168 )
...
* fix
* paligemma
2024-09-03 13:48:45 +05:00