Author, got annoyed by Python startup times, frameworks, orchestration layers, and the general state of AI tooling. We write everything in C/C++. Toast overhead is ~20ms per invocation — that's what makes the loop practical - toastd does https connection pooling. With Cerebras it can run at ~2000 tok/s. Local toasted gets ~100 tok/s with 0.6s time-to-first-token. Happy to answer questions.
Author, got annoyed by Python startup times, frameworks, orchestration layers, and the general state of AI tooling. We write everything in C/C++. Toast overhead is ~20ms per invocation — that's what makes the loop practical - toastd does https connection pooling. With Cerebras it can run at ~2000 tok/s. Local toasted gets ~100 tok/s with 0.6s time-to-first-token. Happy to answer questions.