Commit Graph

541 Commits

Author SHA1 Message Date
Kyle Mylonakis
3542e0b844
build: 📌 Remove upper bound on PyTorch (#38789)
build: 📌 remove upper bound on torch dependency as issue which originally resulted in the pin has been released in torch 2.7.1
2025-06-12 16:34:13 +02:00
Sai-Suraj-27
88912b8e95
Remove isort from dependencies (#38616)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
Removed isort as a dependency
2025-06-05 16:42:49 +00:00
Yih-Dar
8c59cdb3f8
pin pandas (#38605)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 11:33:06 +02:00
Arthur
211f2b0875
Add CB (#38085)
* stash for now

* initial commit

* small updated

* up

* up

* works!

* nits and fixes

* don't loop too much

* finish working example

* update

* fix the small freeblocks issue

* feat: stream inputs to continuous batch

* fix: update attn from `eager` to `sdpa`

* refactor: fmt

* refactor: cleanup unnecessary code

* feat: add `update` fn to `PagedAttentionCache`

* feat: broken optimal block size computation

* fix: debugging invalid cache logic

* fix: attention mask

* refactor: use custom prompts for example

* feat: add streaming output

* fix: prefill split

refactor: add doc strings and unsound/redundant logic
fix: compute optimal blocks logic

* fix: send decoded tokens when `prefilling_split` -> `decoding`

* refactor: move logic to appropriate parent class

* fix: remove truncation as we split prefilling anyways

refactor: early return when we have enough selected requests

* feat: add paged attention forward

* push Ggraoh>

* add paged sdpa

* update

* btter mps defaults

* feat: add progress bar for `generate_batch`

* feat: add opentelemetry metrics (ttft + batch fill %age)

* feat: add tracing

* Add cuda graphs (#38059)

* draft cudagraphs addition

* nits

* styling

* update

* fix

* kinda draft of what it should look like

* fixes

* lol

* not sure why inf everywhere

* can generate but output is shit

* some fixes

* we should have a single device synch

* broken outputs but it does run

* refactor

* updates

* updates with some fixes

* fix mask causality

* another commit that casts after

* add error

* simplify example

* update

* updates

* revert llama changes

* fix merge conflicts

* fix: tracing and metrics

* my updates

* update script default values

* fix block allocation issue

* fix prefill split attnetion mask

* no bugs

* add paged eager

* fix

* update

* style

* feat: add pytorch traces

* fix

* fix

* refactor: remove pytorch profiler data

* style

* nits

* cleanup

* draft test file

* fix

* fix

* fix paged and graphs

* small renamings

* cleanups and push

* refactor: move tracing and metrics logic to utils

* refactor: trace more blocks of code

* nits

* nits

* update

* to profile or not to profile

* refactor: create new output object

* causal by default

* cleanup but generations are still off for IDK what reason

* simplifications but not running still

* this does work.

* small quality of life updates

* nits

* updaet

* fix the scheduler

* fix warning

* ol

* fully fixed

* nits

* different generation parameters

* nice

* just style

* feat: add cache memory usage

* feat: add kv cache free memory

* feat: add active/waiting count & req latency

* do the sampling

* fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager

* fix on mps

* feat: add dashboard & histogram buckets

* perf: improve waiting reqs data structures

* attempt to compile, but we should only do it on mps AFAIK

* feat: decouple scheduling logic

* just a draft

* c;eanup and fixup

* optional

* style

* update

* update

* remove the draft documentation

* fix import as well

* update

* fix the test

* style doomed

---------

Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
2025-05-22 17:43:48 +02:00
Arthur
7b7bb8df97
Protect ParallelInterface (#38262)
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-21 17:45:38 +02:00
Lysandre Debut
711d78d104
Revert parallelism temporarily (#38240)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Revert "Protect ParallelInterface"

This reverts commit cb513e35f9.

* Revert "parallelism goes brrr (#37877)"

This reverts commit 1c2f36b480.

* Empty commit
2025-05-20 22:43:04 +02:00
Lysandre
cb513e35f9 Protect ParallelInterface 2025-05-20 18:27:50 +02:00
Lysandre
f4ef41c45e v4.53.0.dev0 2025-05-20 18:12:56 +02:00
Yufeng Xu
1c65aef923
Fix incorrect installation instructions (for issue #37476) (#37640)
* debugging issue 36758

* debugging issue 36758

* debugging issue 36758

* updated attn_mask type specification in _flash_attention_forward

* removed pdb

* added a blank line

* removed indentation

* update constants

* remove unnecessary files

* created installation script, modified README

* modified requirements and install.sh

* undo irrelevant changes

* removed blank line

* fixing installation guide

* modified README, python requirements, and install script

* removed tests_otuput

* modified README

* discarded installation script and python<3.13 requirement
2025-05-08 16:32:58 +01:00
youngrok cha
acded47fe7
[llava] one pixel is missing from padding when length is odd (#37819)
* [fix] one pixel should be added when length is odd

* [fix] add vision_aspect_ratio args & typo

* [fix] style

* [fix] do not fix fast file directly

* [fix] convert using modular

* remove duplicate codes

* match unpad logic with pad logic

* test odd-sized images for llava & aria

* test unpad odd-sized padding for llava family

* fix style

* add kwarg to onvision modular

* move vision_aspect_ratio from image_processor to processor
(llava_onevision)
2025-05-06 13:11:26 +02:00
Lysandre Debut
d538293f62
Transformers cli clean command (#37657)
* transformers-cli -> transformers

* Chat command works with positional argument

* update doc references to transformers-cli

* doc headers

* deepspeed

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-04-30 12:15:43 +01:00
Yih-Dar
50d231a806
unpin pytest<8 (#37768)
* pytest 8

* pytest 8

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-25 12:34:33 +02:00
Joao Gante
af6d2756d9
[deps] pin max torch version (#37760)
pin max pt version :(
2025-04-24 16:18:25 +01:00
Joao Gante
be9b0e8521
[CI] add back sacrebleu (and document why) (#37700)
* example test

* add back dep

* dev-ci

* dev-ci
2025-04-23 14:45:00 +01:00
Joao Gante
0f8c34b0a0
[cleanup] remove old scripts in /scripts 🧹 🧹 (#37676)
* rm old files

* not this one
2025-04-22 16:59:03 +01:00
cyyever
1ef64710d2
Remove fsspec dependency which isn't directly used by transformers (#37318)
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-14 12:02:28 +02:00
Joao Gante
aaf129cdae
[agents] remove agents 🧹 (#37368) 2025-04-11 18:42:37 +01:00
Arthur
7e9b57ce62
Update-kernel-pin (#37448)
* update `kernels`

* oups

* new pinned version
2025-04-11 11:19:21 +02:00
Arthur
a2c2fb0108
update kernels to 0.4.3 (#37419)
* update `kernels`

* oups
2025-04-10 12:14:22 +02:00
Yih-Dar
e7ad077012
byebye torch 2.0 (#37277)
* bump Torch 2.1 with broken compatibility `torch.compile`

* dep table

* remove usage of is_torch_greater_or_equal_than_2_1

* remove usage of is_torch_greater_or_equal_than_2_1

* remove if is_torch_greater_or_equal("2.1.0")

* remove torch >= "2.1.0"

* deal with 2.0.0

* PyTorch 2.0+ --> PyTorch 2.1+

* ruff 1

* difficult ruff

* address comment

* address comment

---------

Co-authored-by: Jirka B <j.borovec+github@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-07 15:19:47 +02:00
Lysandre
d1b92369ca v4.52.0.dev0 2025-04-05 22:04:21 +02:00
Lysandre Debut
aa40fda346
Hf Xet extra (#37305)
* Hf Xet extra

* Hf Xet extra
2025-04-05 21:06:05 +02:00
Lysandre Debut
3d40bda30e
Hugging Face Hub pin to v0.30.0 for Xet (#37166) 2025-04-04 14:58:22 +02:00
Nikos Antoniou
f74d7da836
Introduce modular files for speech models (#35902)
* WAV_2_VEC_2 to WAV2VEC2

* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio

* remove unnessary definitions in modulars

* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer

* docstring fix for UniSpeechForCTC

* removed unneccessary re-definition of modular classes

* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput

* top-level import of deepspeed in seamless_m4t, speecht5

* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations

* convert modular

* tiny modular typing fixes

* some more modular fixes

* make style

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-04-04 11:46:27 +02:00
cyyever
7613cf1a45
Add py.typed (#37022) 2025-04-02 14:17:27 +01:00
Yih-Dar
b7fc2daf8b
Kenlm (#37091)
* kenlm

* kenlm

* kenlm

* kenlm

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 21:42:54 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2 (#36962)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Arthur Zucker
4542b8fb27 push v4.51.0.dev0 2025-03-21 13:45:25 +01:00
Daniël de Kok
f94b0c59f2
Use deformable_detr kernel from the Hub (#36853)
* Use `deformable_detr` kernel from the Hub

Remove the `deformable_detr` kernel from `kernels/` and use the
pre-built kernel from the Hub instead.

* Add license header

* Add `kernels` as an extra `hub-kernels`

Also add it to `testing`, so that the kernel replacement gets tested
when using CUDA in CI.
2025-03-21 13:08:47 +01:00
Joao Gante
949cca4061
[CI] doc builder without custom image (#36862)
* no image

* test

* revert jax version updates

* make fixup

* update autodoc path for model_addition_debugger

* shieldgemma2

* add missing pages to toctree
2025-03-21 09:10:27 +00:00
Marc Sun
388e6659bf
Update min safetensors bis (#36823)
* update setup.py

* style
2025-03-20 12:50:07 +01:00
rasmi
f0d5b2ff04
Update deprecated Jax calls (#35919)
* Remove deprecated arguments for jax.numpy.clip.

* Remove deprecated arguments for jax.numpy.clip.

* Update jax version to 0.4.27 to 0.4.38.

* Avoid use of deprecated xla_bridge.get_backend().platform

Co-authored-by: Jake Vanderplas <jakevdp@google.com>

---------

Co-authored-by: Jake Vanderplas <jakevdp@google.com>
2025-03-20 11:51:51 +01:00
Joao Gante
a3201cea14
[CI] Automatic rerun of certain test failures (#36694) 2025-03-13 15:40:23 +00:00
Ilyas Moutawwakil
89f6956015
HPU support (#36424)
* test

* fix

* fix

* skip some and run some first

* test fsdp

* fix

* patches for generate

* test distributed

* copy

* don't test distributed loss for hpu

* require fp16 and run first

* changes from marc's PR fixing zero3

* better alternative

* return True when fp16 support on gaudi without creating bridge

* fix

* fix tested dtype in deepspeed inference test

* test

* fix

* test

* fix

* skip

* require fp16

* run first fsdp

* Apply suggestions from code review

* address comments

* address comments and refactor test

* reduce precison

* avoid doing gaudi1 specific stuff in the genreation loop

* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
2025-03-12 09:08:12 +01:00
Joao Gante
27d1707586
[smolvlm] make CI green (#36306)
* add smolvlm to toctree

* add requirements

* dev-ci

* no docker changes

* dev-ci

* update torch-light.dockerfile

* derp

* dev-ci
2025-02-20 18:56:11 +01:00
Joao Gante
99adc74462
[tests] remove flax-pt equivalence and cross tests (#36283) 2025-02-19 15:13:27 +00:00
Joao Gante
0863eef248
[tests] remove pt_tf equivalence tests (#36253) 2025-02-19 11:55:11 +00:00
Arthur Zucker
c877c9fa5b v4.45.0-dev0 2025-02-17 15:21:20 +01:00
Lucain
e60ae0d078
Replace deprecated update_repo_visibility (#35970) 2025-02-13 11:27:55 +01:00
CalOmnie
f4f33a20a2
Remove pyav pin to allow python 3.11 to be used (#35823)
* Remove pyav pin to allow python 3.11 to be used

* Run make fixup

---------

Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-21 20:16:18 +00:00
Arthur Zucker
f63829c87b v4.49.0-dev 2025-01-10 12:31:11 +01:00
Yichen Yan
1a6c1d3a9a
Bump torch requirement to >= 2 (#35479)
Bump torch requirement, follow-up of #35358
2025-01-08 15:59:32 +01:00
nhamanasu
34ad1bd287
update codecarbon (#35243)
* update codecarbon

* replace directly-specified-test-dirs with tmp_dir

* Revert "replace directly-specified-test-dirs with tmp_dir"

This reverts commit 310a6d962e.

* revert the change of .gitignore

* Update .gitignore

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-12-20 15:04:36 +01:00
Sigbjørn Skjæret
eafbb0eca7
Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931)
* Add AsyncTextIteratorStreamer class

* export AsyncTextIteratorStreamer

* export AsyncTextIteratorStreamer

* improve docs

* missing import

* missing import

* doc example fix

* doc example output fix

* add pytest-asyncio

* first attempt at tests

* missing import

* add pytest-asyncio

* fallback to wait_for and raise TimeoutError on timeout

* check for TimeoutError

* autodoc

* reorder imports

* fix style

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-20 12:08:12 +01:00
Matt
e0ae9b5974
🚨🚨🚨 Delete conversion scripts when making release wheels (#35296)
* Delete conversion scripts when making release wheels

* make fixup

* Update docstring
2024-12-17 14:18:42 +00:00
Lysandre
66ab300aaf Dev version 2024-12-05 19:12:22 +01:00
Arthur
93f87d3cf5
[tokenizers] bump to 0.21 (#34972)
bump to 0.21
2024-12-05 15:46:02 +01:00
Pavel Iakubovskii
737f4dc4b6
Update timm version (#35005)
* Bump timm

* dev-ci
2024-11-29 12:46:59 +00:00
Joao Gante
13493215ab
🧼 remove v4.44 deprecations (#34245)
* remove v4.44 deprecations

* PR comments

* deprecations scheduled for v4.50

* hub version update

* make fiuxp

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-15 23:07:24 +01:00
Arthur Zucker
9643069465 v4.47.0.dev0 2024-10-24 11:23:29 +02:00