What is this?
This benchmark compares vanilla LLM calls vs UAMP protocol-wrapped calls to measure the overhead of the agent message protocol. Both use the same underlying WebLLM engine.
UAMP wraps LLM calls in typed events: ChatRequest ā ChatResponse with serialization, routing, and streaming support.
Fair comparison: We measure time to generate the first N common tokens to avoid variance from different output lengths.
UAMP Event Flow (measured):
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Client ā
ā āāā create ChatRequest event (t0) ā
ā ā ā
ā Agent ā
ā āāā receive + deserialize event ā
ā āāā extract message, call LLM ā
ā ā āāā [LLM INFERENCE TIME] āāā this is what we compare ā
ā āāā wrap response in ChatResponse event ā
ā āāā serialize + send ā
ā ā
ā Client ā
ā āāā receive + deserialize event (t1) ā
ā āāā extract response text ā
ā ā
ā Overhead = (t1 - t0) - [LLM INFERENCE TIME] ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā