7 comments

  • the_harpia_io 15 hours ago ago

    Not a full team adoption story, but relevant data point: I run a small engineering org (~40 engineers across teams) and we've been tracking AI coding tool adoption informally.

    The split is roughly: 30% all-in (Claude Code or Cursor for everything), 50% selective users (use it for boilerplate, tests, docs but still hand-write core logic), 20% holdouts.

    What I've noticed on PR velocity: it went up initially, then plateaued. The PRs got bigger, which means reviews take longer. We actually had to introduce a "max diff size" policy because AI-assisted PRs were becoming 800+ line monsters that nobody could review meaningfully.

    The quality concern that keeps coming up: security. AI-generated code tends to take shortcuts on auth, input validation, error handling. We've started running dedicated security scans specifically tuned for patterns that AI likes to produce. That's been the biggest process change.

    Net effect: probably 20-30% faster on feature delivery, but we're spending more time on review and security validation than before.

    • boghy8823 7 hours ago ago

      I have seen the same Ai hallucinations that you mentioned: auth, input validation, error handling, non-existent dependencies, etc. It's tricky to get them all as LLM's have mastered the art of being "confidently wrong". What tools are you using to catch those issues? I feel current tooling is ill equiped for this new wave of Ai generated output.

      • the_harpia_io 4 hours ago ago

        "Confidently wrong" is the perfect description. The code compiles, the tests pass (because the AI also wrote the tests to match), and the auth flow looks reasonable at first glance.

        For catching these we layer a few things:

        - Standard SAST (Semgrep, CodeQL) catches the obvious stuff but misses AI-specific patterns - npm audit / pip-audit for dependency issues, especially non-existent packages the AI hallucinates - Custom rules tuned for patterns we keep seeing: overly permissive CORS, missing rate limiting, auth checks that look correct but have subtle logic bugs - Manual review with a specific checklist for AI-generated code (different from our normal review checklist)

        You're right that current tooling has a gap. Traditional scanners assume human-written code patterns. AI code looks structurally different - it tends to be more verbose but miss edge cases in ways humans wouldn't. We've been experimenting with scanning approaches specifically tuned for AI output.

        The biggest wins have been simple: requiring all AI-generated auth and input validation code to go through a dedicated security reviewer, not just a regular code review.

    • softwaredoug 7 hours ago ago

      The joke I hear is Claude Code will double your PRs

      One PR from Claude. The next PR from you fixing Claude’s mistakes.

      • the_harpia_io an hour ago ago

        Ha, pretty accurate in my experience. Though I'd say it's more like 1.5x the PRs - Claude does the initial PR, then you do half a PR fixing the subtle stuff it got wrong, and then you spend the other half wondering if you missed something.

        The security fixes are the worst because the code looks correct. It's not like a typo you'd catch immediately - it's an auth check that works for 95% of cases but fails on edge cases the model never considered.

  • znq 5 hours ago ago

    Here are some real examples from our projects in 2025 at SIROC (for context: we are a 18 people venture studio; 140+ projects completed):

    * A task estimated at 4 hours → solved with one well specified prompt

    * A 20 hour engineering effort → executed in about 3 hours

    * A 3 month project → delivered in 1 month

    These are clearly best case scenarios. They are not the norm, yet. But they demonstrate what is possible.

    We have also seen what happens when things go wrong. Companies, including startups, come to us with broken systems and spaghetti code and architecture caused by weak prompts, unclear requirements, and no verification.

    It is important to understand that the efficiency gains we are seeing do not come from the tools alone. They come from a specific combination:

    1) Engineers who have spent 20 years building everything from robotics to enterprise-scale technology. You cannot give a perfect instruction to an AI if you do not know what perfect looks like in a production environment.

    2) A technical prompt should not be treated as a quick input or question. It is a detailed specification that requires experience and deliberate thinking.

    3) Knowing the right combination of tools, workflows, and validation processes.

    That said, some (many?) members of our team are dinosaurs in the software engineering world. They bring a ton of experience but are used to tools from 15 years ago and don't like change. We really had to push AI adoption (mostly Cursor and Claude Code) on them. It’s still an ongoing process, and probably will be for a while.

  • SaberTail 5 hours ago ago

    I was on a greenfield project late last year with a team that was very enthusiastic about coding agents. I would personally call it a failure, and the project is quietly being wound down after only a few months. It went in a few stages:

    At first, it proceeded very quickly. Using agents, the team were able to generate a lot of code very fast, and so they were checking off requirements at an amazing pace. PRs were rubber stamped, and I found myself arguing with copy/pasted answers from an agent most of the time I tried to offer feedback.

    As the components started to get more integrated, things started breaking. At first these were obvious things with easy fixes, like some code calling other code with wrong arguments, and the coding agents could handle those. But a lot of the code was written in the overly-defensive style agents were fond of, so there were a lot more subtle errors. Things like the agent adding code to substitute an invalid default value in instead of erroring out, far away from where that value was causing other errors.

    At this point, the agents started making things strictly worse because they couldn't fit that much code in their context. Instead of actually fixing bugs, they'd catch any exceptions and substitute in more defaults. There was some manual work by some engineers to remove a lot of the defensive code, but they could not keep up with the agents. This is also about when the team discovered that most of the tests were effectively "assert true" because they mocked out so much.

    We did ship the project, but it shipped in an incredibly buggy state, and also the performance was terrible. And, as I said, it's now being wound down. That's probably the right thing to do because it would be easier to restart from scratch than try to make sense of the mess we ended up with. Agents were used to write the documentation, and very little of it is comprehensible.

    We did screw some things up. People were so enthusiastic about agents, and they produced so much code so fast, that code reviews were essentially non-existent. Instead of taking action on feedback in the reviews, a lot of the time there was some LLM-generated "won't do" response that sounded plausible enough that it could convince managers that the reviewers were slowing things down. We also didn't explicitly figure out things like how error-handling or logging should work ahead of time, and so what the agents did was all over the place depending on what was in their context.

    Maybe the whole mess was a necessary learning as we figure out these new ways of working. Personally I'm still using the coding agents, but very selectively to "fill-in-the-blanks" on code where I know what it should look like, but don't need to write it all by hand myself.