20 comments

  • CuriouslyC 3 days ago ago

    Kind of amazes me how many people bitch about agent performance but don't hook their guys up to Otel, crack Phoenix and get to work, but instead randomly tweak prompts in response to team vibes.

    • chrisweekly 3 days ago ago

      Good point. Also (tangent), I followed your profile link to https://sibylline.dev and am thoroughly impressed. Stoked to have found your treasure trove of repos and insights.

      • CuriouslyC 3 days ago ago

        Don't play with them unless you're good at debugging alpha code (claude/codex can do it fine), I haven't ironed out env specific stuff or clarified the installation/usage, and I'm still doing UI polish/optimization passes (yay async simd rust). I'll do showy releases once I've got the tools one click install ready, in the meantime please feel free to drop an issue on any of my projects if there are features or questions you have.

        • chrisweekly 3 days ago ago

          Sounds good, will do. Good luck getting them polished up!

    • yahoozoo 3 days ago ago

      Could you elaborate? How does knowing numerical usage metrics help?

      • CuriouslyC 3 days ago ago

        With Phoenix + Clickhouse being fed from Otel, you can do queries over your traces for deep analysis. If I want to see which tool calls are failing and why (or just get tool statistics), or find common patterns in flagged/failure traces ("simpler solution") and their causes, it's one query and some wiring.

        • thewisenerd 3 days ago ago

          you mention traces

          but the documentation only specifies metrics and logs

          https://docs.claude.com/en/docs/claude-code/monitoring-usage

          am i missing something here? or is this a case where they were lazy to do traces in the claude code sdk, so its logs + log attributes?

          • CuriouslyC 3 days ago ago

            You can get traces out of every agent framework that doesn't suck. You might need to collect from a proxy to get them from Claude because Anthropic is bigtime amateur hour.

  • oefrha 3 days ago ago

    Collecting detailed per-request traces and calculating user-specific metrics finer than a total cost feels about as intrusive as one of those periodic screenshot programs forced by really shitty remote jobs or freelancing contracts. It's pretty gross.

    • pranay01 3 days ago ago

      I don't think the primary goal here is "surveillance" but better understanding where in the team are tools like claude code getting adopted, what models are being used, are there best practices to learn in token usage which could make it more efficient

  • sofia44 3 days ago ago

    I think this tackles a really important area - nice job. Looking forward to following.

    • pranay01 3 days ago ago

      great to hear. yes, it can help understand how developers are using Claude Code and also optimise token usage etc.

  • N_Lens 3 days ago ago

    I’d like to see this leveraged for agent platforms & orchestration rather than for surveillance on human software engineers. Humans don’t perform well in panopticons, but robots do (In my humble opinion).

    • pranay01 2 days ago ago

      > leveraged for agent platforms & orchestration

      can you share more on what you mean by this?

      • N_Lens 2 days ago ago

        Claude Code Agents can be integrated into existing platforms such as github. I can envision agents automatically handling issues with certain tags, or doing pull request reviews, or other such similar trigger based behaviour.

        In that kind of orchestration this observability would be invaluable.

        • pranay01 a day ago ago

          interesting. So, you mean say if an agent is working on automatically doing a PR review, how many such calls to agents are failing, how much time they are taking, etc?

          Lot of this you can do with traces today which trace AI specific calls

  • pdntspa 3 days ago ago

    aka let's spy on our devs more than we already are and give their pointy-haired bosses even more leverage to harass them with AI-usage KPI BS

  • tomrod 3 days ago ago

    Very nice!

  • dat_attack a day ago ago

    [dead]

  • black_13 2 days ago ago

    [dead]