transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

History

Arthur 211f2b0875 Add CB (#38085 ) * stash for now * initial commit * small updated * up * up * works! * nits and fixes * don't loop too much * finish working example * update * fix the small freeblocks issue * feat: stream inputs to continuous batch * fix: update attn from `eager` to `sdpa` * refactor: fmt * refactor: cleanup unnecessary code * feat: add `update` fn to `PagedAttentionCache` * feat: broken optimal block size computation * fix: debugging invalid cache logic * fix: attention mask * refactor: use custom prompts for example * feat: add streaming output * fix: prefill split refactor: add doc strings and unsound/redundant logic fix: compute optimal blocks logic * fix: send decoded tokens when `prefilling_split` -> `decoding` * refactor: move logic to appropriate parent class * fix: remove truncation as we split prefilling anyways refactor: early return when we have enough selected requests * feat: add paged attention forward * push Ggraoh> * add paged sdpa * update * btter mps defaults * feat: add progress bar for `generate_batch` * feat: add opentelemetry metrics (ttft + batch fill %age) * feat: add tracing * Add cuda graphs (#38059) * draft cudagraphs addition * nits * styling * update * fix * kinda draft of what it should look like * fixes * lol * not sure why inf everywhere * can generate but output is shit * some fixes * we should have a single device synch * broken outputs but it does run * refactor * updates * updates with some fixes * fix mask causality * another commit that casts after * add error * simplify example * update * updates * revert llama changes * fix merge conflicts * fix: tracing and metrics * my updates * update script default values * fix block allocation issue * fix prefill split attnetion mask * no bugs * add paged eager * fix * update * style * feat: add pytorch traces * fix * fix * refactor: remove pytorch profiler data * style * nits * cleanup * draft test file * fix * fix * fix paged and graphs * small renamings * cleanups and push * refactor: move tracing and metrics logic to utils * refactor: trace more blocks of code * nits * nits * update * to profile or not to profile * refactor: create new output object * causal by default * cleanup but generations are still off for IDK what reason * simplifications but not running still * this does work. * small quality of life updates * nits * updaet * fix the scheduler * fix warning * ol * fully fixed * nits * different generation parameters * nice * just style * feat: add cache memory usage * feat: add kv cache free memory * feat: add active/waiting count & req latency * do the sampling * fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager * fix on mps * feat: add dashboard & histogram buckets * perf: improve waiting reqs data structures * attempt to compile, but we should only do it on mps AFAIK * feat: decouple scheduling logic * just a draft * c;eanup and fixup * optional * style * update * update * remove the draft documentation * fix import as well * update * fix the test * style doomed --------- Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>		2025-05-22 17:43:48 +02:00
..
bettertransformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
deepspeed	🚨 rm already deprecated pad_to_max_length arg (#37617 )	2025-05-01 15:21:55 +02:00
extended	Add Optional to remaining types (#37808 )	2025-04-28 14:20:45 +01:00
fixtures	Implementation of SuperPoint and AutoModelForKeypointDetection (#28966 )	2024-03-19 14:43:02 +00:00
fsdp	Fix the fsdp config cannot work issue. (#37549 )	2025-04-28 10:44:51 +02:00
generation	Add CB (#38085 )	2025-05-22 17:43:48 +02:00
models	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
optimization	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
peft_integration	FIX: Faulty PEFT tests (#37757 )	2025-04-28 15:10:46 +02:00
pipelines	[whisper] move processor test into processor test file 🧹 (#38266 )	2025-05-22 10:07:11 +01:00
quantization	Add tearDown method to Quark to solve OOM issues (#38234 )	2025-05-21 14:26:44 +02:00
repo_utils	Simplify soft dependencies and update the dummy-creation process (#36827 )	2025-04-11 11:08:36 +02:00
sagemaker	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
tensor_parallel	enable misc cases on XPU & use device agnostic APIs for cases in tests (#38192 )	2025-05-20 10:09:01 +02:00
tokenization	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
trainer	enable misc cases on XPU & use device agnostic APIs for cases in tests (#38192 )	2025-05-20 10:09:01 +02:00
utils	🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866 )	2025-05-22 11:38:26 +02:00
__init__.py	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
test_backbone_common.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_configuration_common.py	Update composition flag usage (#36263 )	2025-04-09 11:48:49 +02:00
test_feature_extraction_common.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_image_processing_common.py	fix multi-image case for llava-onevision (#38084 )	2025-05-21 11:50:46 +02:00
test_image_transforms.py	Fix `pad` image transform for batched inputs (#37544 )	2025-05-08 10:51:15 +01:00
test_modeling_common.py	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
test_modeling_flax_common.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_modeling_tf_common.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_pipeline_mixin.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_processing_common.py	🔴 Video processors as a separate class (#35206 )	2025-05-12 11:55:51 +02:00
test_sequence_feature_extraction_common.py	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
test_tokenization_common.py	🚨 rm already deprecated pad_to_max_length arg (#37617 )	2025-05-01 15:21:55 +02:00
test_training_args.py	Fix `TrainingArguments.torch_empty_cache_steps` post_init check (#36734 )	2025-03-17 16:09:46 +01:00
test_video_processing_common.py	🔴 Video processors as a separate class (#35206 )	2025-05-12 11:55:51 +02:00