mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 21:30:07 +06:00

* stash for now * initial commit * small updated * up * up * works! * nits and fixes * don't loop too much * finish working example * update * fix the small freeblocks issue * feat: stream inputs to continuous batch * fix: update attn from `eager` to `sdpa` * refactor: fmt * refactor: cleanup unnecessary code * feat: add `update` fn to `PagedAttentionCache` * feat: broken optimal block size computation * fix: debugging invalid cache logic * fix: attention mask * refactor: use custom prompts for example * feat: add streaming output * fix: prefill split refactor: add doc strings and unsound/redundant logic fix: compute optimal blocks logic * fix: send decoded tokens when `prefilling_split` -> `decoding` * refactor: move logic to appropriate parent class * fix: remove truncation as we split prefilling anyways refactor: early return when we have enough selected requests * feat: add paged attention forward * push Ggraoh> * add paged sdpa * update * btter mps defaults * feat: add progress bar for `generate_batch` * feat: add opentelemetry metrics (ttft + batch fill %age) * feat: add tracing * Add cuda graphs (#38059) * draft cudagraphs addition * nits * styling * update * fix * kinda draft of what it should look like * fixes * lol * not sure why inf everywhere * can generate but output is shit * some fixes * we should have a single device synch * broken outputs but it does run * refactor * updates * updates with some fixes * fix mask causality * another commit that casts after * add error * simplify example * update * updates * revert llama changes * fix merge conflicts * fix: tracing and metrics * my updates * update script default values * fix block allocation issue * fix prefill split attnetion mask * no bugs * add paged eager * fix * update * style * feat: add pytorch traces * fix * fix * refactor: remove pytorch profiler data * style * nits * cleanup * draft test file * fix * fix * fix paged and graphs * small renamings * cleanups and push * refactor: move tracing and metrics logic to utils * refactor: trace more blocks of code * nits * nits * update * to profile or not to profile * refactor: create new output object * causal by default * cleanup but generations are still off for IDK what reason * simplifications but not running still * this does work. * small quality of life updates * nits * updaet * fix the scheduler * fix warning * ol * fully fixed * nits * different generation parameters * nice * just style * feat: add cache memory usage * feat: add kv cache free memory * feat: add active/waiting count & req latency * do the sampling * fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager * fix on mps * feat: add dashboard & histogram buckets * perf: improve waiting reqs data structures * attempt to compile, but we should only do it on mps AFAIK * feat: decouple scheduling logic * just a draft * c;eanup and fixup * optional * style * update * update * remove the draft documentation * fix import as well * update * fix the test * style doomed --------- Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
49 lines
1.5 KiB
Python
49 lines
1.5 KiB
Python
# Example usage of the trace and attach_tracer decorators
|
|
|
|
from transformers.utils.metrics import attach_tracer, traced
|
|
|
|
|
|
@attach_tracer()
|
|
class ExampleClass:
|
|
def __init__(self, name):
|
|
# The attach_tracer decorator has already created self.tracer for us
|
|
self.name = name
|
|
|
|
@traced # This method will use the tracer from the class instance
|
|
def process_data(self, data):
|
|
# This method is traced and can use self.tracer
|
|
return f"Processed {data} with {self.name}"
|
|
|
|
@traced(span_name="custom_operation") # With custom span name
|
|
def special_operation(self, value):
|
|
# Also traced, with a custom span name
|
|
return value * 2
|
|
|
|
@traced(
|
|
additional_attributes=[
|
|
("name", "object.name", lambda x: x.upper()), # Using a transform function
|
|
("name", "object.fixed_value", "static_value"), # Using a fixed value
|
|
]
|
|
)
|
|
def operation_with_attributes(self):
|
|
# This will add the specified attributes to the span
|
|
return "Operation completed"
|
|
|
|
|
|
# For functions without a class, the traced decorator still works
|
|
@traced
|
|
def standalone_function(arg1, arg2):
|
|
# For functions, a tracer is created based on the module name
|
|
return arg1 + arg2
|
|
|
|
|
|
# Usage:
|
|
if __name__ == "__main__":
|
|
# With OpenTelemetry configured, these will produce traces
|
|
example = ExampleClass("test_object")
|
|
example.process_data("sample")
|
|
example.special_operation(42)
|
|
example.operation_with_attributes()
|
|
|
|
result = standalone_function(1, 2)
|