To AI or not to AI

(antropia.studio)

101 points | by serchinastico 2 days ago ago

89 comments

  • wintermutestwin 2 days ago ago

    I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands. It kept giving me command line parameters that didn't work on the version of rsync on my mac. With ~50% of the failures, it would go down troubleshooting rabbit holes and the rest of the time it would "realize" that it was giving incorrect version responses. I tell it to validate each parameter against my version moving forward and it clearly doesn't do that. I am sure I could have figured it out on my own in 5 mins, but I couldn't stop watching the trainwreck of this zeitgeist tech wasting my time doing a simple task.

    I am not a coder (much), but I have to wonder if my experience is common in the coding world? I guess if you are writing code against the version that was the bulk of its training then you wouldn't face this specific issue. Maybe there are ways to avoid this (and others) pitfall with prompting? As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.

    • goalieca 2 days ago ago

      > I tell it to validate each parameter against my version moving forward and it clearly doesn't do that.

      I would like an AI expert to weigh in on this point. I run into this a lot. It seems that LLMs, being language models and all, don't actually understand what i'm asking. Whenever i dive into the math, superficially, it kind of makes sense why they don't. But it also seems like transformers or some secret sauce is code up specially for tasks like counting letters in a word so that AI doesn't seem embarrassing. Am I missing something?

      • nerdponx 2 days ago ago

        LLMs are next-token prediction models. They "understand" in that, if the previous 1000 tokens are such-and-such, then they emit a best guess at the 1001th token.

        It "knows" what rsync is because it has a lot of material about rsync in the training data. However it has no idea about that particular version because it doesn't have much training data where the actual version is stated, and differences are elaborated.

        What would probably produce a much better result if you included the man page for the specific version you have on your system. Then you're not relying on the model having "memorized" the relationship relationships among the specific tokens you are trying to get the model to focus on, instead just passing it all in as part of the input sequence to be completed.

        It is absolutely astounding that LLMs work at all, but they're not magic, and some understanding of how they actually work can be helpful when it comes to using them effectively.

        • pglevy 2 days ago ago

          Our low-code expression language is not well-represented in the pre-training data. So as a baseline we get lots of syntax errors and really bad-looking UIs. But we're getting much better results by setting up our design system documentation as an MCP server. Our docs include curated guidance and code samples, so when the LLM uses the server, it's able to more competently search for things and call the relevant tools. With this small but high-quality dataset, it also looks better than some of our experiments with fine tuning. I imagine this could work for other docs use cases that are more dynamic (ie, we're actively updating the docs so having the LLM call APIs for what it needs seems more appropriate than a static RAG setup).

      • bashy 2 days ago ago

        Words are tokens so it can't really 'see' the word(s), it just knows how to link them.

      • beefnugs 2 days ago ago

        did you feed the context a link or examples of the exact version of the documentation?

        I am not an expert but i assume if there are any basic knowledge of the tool then it will try to use it as it knows chunks of some old version. And it wont likely decide to search for the newest docs, you have to tell it search for exact version docs, or feed the exact version docs to context

        • goalieca 2 days ago ago

          I find context _sometimes_ work but more often than not i rephrase the question a million times, try to break it down into smaller problems, .. whatever. It seems like it just doesn't "understand".

    • latexr 2 days ago ago

      > I am not a coder (much), but I have to wonder if my experience is common in the coding world?

      It is, yes. Surely someone will come and tell you it doesn’t happen to them, but all that tells you is that it ostensibly isn’t universal, but still common enough you’ll find no end of complaints.

      > Maybe there are ways to avoid this (and others) pitfall with prompting?

      Prompting can’t help you with things not in the training set. For many languages, all LLMs absolutely suck. Even for simple CLI tools, telling an LLM you are on macOS or using the BSD version may not be enough to get them to stop giving you the GNU flags. Furthermore, the rsync change in macOS is fairly recent so there’s even fewer data online on it.

      https://derflounder.wordpress.com/2025/04/06/rsync-replaced-...

      > As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.

      And that’s the best case scenario. It also happens that people blindly commit LLM code and introduce bugs and security flaw they cannot understand or fix.

      https://secondthoughts.ai/p/ai-coding-slowdown

      https://arxiv.org/abs/2211.03622

      • Helmut10001 2 days ago ago

        Usually, in these edge cases, I go to the documentation page and dump all pages as Markdown into the AI tool (most often Gemini, due to token count). This context engeneering has helped a lot to get better answers. However, it also means I am consuming sometimes 1 Million tokens on relatively simple problems. Like recently, when I needed to solve a relativey simple but specific MermaidJS issue.

    • erichocean 2 days ago ago

      Ask Gemini Pro 2.5 to build the rsync command and then give it the man page for your version of rsync. It should succeed the first time.

      Here's a command to copy the man page to the clipboard than you can immediately paste into aistudio (on a Mac):

          man rsync | col -b | pbcopy
      
      As a general rule, if you would need to look something up to complete a task, the AI needs the same information you do—but it's your job to provide it.
      • wintermutestwin 2 days ago ago

        So I paste the man page into llm and tell it to only give me parameters that are in that page? Even if it obeyed, it would still choke on how to exclude hidden MacOS kruft files from the copy...

        • erichocean 2 days ago ago

          No, you don't need to "tell it to only give me parameters that are in that page."

          Here's the entire prompt:

              I need the rsync command to copy local files from `/foo/bar` to `~/baz/qux` on my `user@example.com` server.
              Exclude macOS cruft like `.DS_Store`, etc. Here's the man page:
          
              <paste man page for your rsync that you copied earlier, see above>
        • nehal3m 2 days ago ago

          If you have trouble talking to an AI, how do you ever expect to merge with Neuromancer’s twin?

    • athrowaway3z 2 days ago ago

      > Maybe there are ways to avoid this (and others) pitfall with prompting?

      Not sure about Codex, but in Claude Code you can run commands. So instead of letting it freestyle / guess, do a:

      `! man rsync` or `! rsync --help`

      This puts the output into context.

    • dangerface 2 days ago ago

      Yup even when you tell it your version it forgets pretty quickly. Or agrees it messed up and assures you this time it will give you the correct info for your version number then gives you the same command.

      Javascript is a nightmare as they change everything constantly. PHP has backwards compatibility for everything so its not really an issue.

      It also gives out dated info on salesforce, and im not just talking about the latest and greatest, it recommends stuff that was deprecated years ago.

    • chrisweekly 2 days ago ago

      Why involve an LLM at all, if you're looking up docs for a particular tool like rsync?

      • wintermutestwin 2 days ago ago

        Lots of reasons! First off: where else do I go to learn this stuff? Man pages are reference for people who work in CLI all the time and not for virgin learners as they are are necessarily packed with the complete lexicon but with barely a thought to explaining real world examples of common tasks. There are a million Linux websites with the same versioning issues and inadequate explanations. I guess I could buy an oreily book and learn the topic end to end even though I will only need to know the syntax of a couple commands.

        With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding. That is a massive leg up over the knowledge spaghetti approach...

        • latexr 2 days ago ago

          > There are a million Linux websites with the same versioning issues and inadequate explanations.

          Where do you think the LLM is getting its data from? At least on the website you can see the surrounding discussion and have a chance of finding someone with the same problem as you, or learning something tangential which will help you further down the line.

          > With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding.

          If it’s giving you wrong flags, why do you assume the explanations it gives you are accurate? LLMs can make those up just as well.

          What you should do is verify the given flags on the man page. Not only will it clarify if they exist, it will also clarify if they’re what you’re looking for, and will likely even point to other relevant options.

        • rsynnott a day ago ago

          > There are a million Linux websites with the same versioning issues and inadequate explanations.

          So, instead you ask a magic robot to recite a fuzzily 'remembered' version of those websites?

          Bear in mind that an LLM does not _know_ anything, and that, despite some recent marketing, they are not 'reasoning'.

        • rsynnott a day ago ago

          Learn to read documentation. It's really a very important skill, and many people were becoming _somewhat_ deficient in it even before our good friends the magic robots arrived, due to Stackoverflow et al.

          • wintermutestwin a day ago ago

            Bah! Coders need to learn to write documentation. The tldr for rsync was hilarious! Actually, this is one place where LLMs are currently useful! Instead of getting LLMs to write code, get it to write documentation.

        • joshribakoff 2 days ago ago

          One pitfall is the LLM hallucinates, might sometimes seem to fulfill your requirements but it can subtly break down. The man pages could be used in conjunction to fact check your understanding.

        • heckelson 2 days ago ago

          I like the tldr pages to learn the most common features and use cases of new command line tools! I think it's great, albeit, a bit slow sometimes

          • wintermutestwin 2 days ago ago

            tldr pages are a great idea, but the execution is a total fail. Looking at the rsync entry, it fails to provide the most blatantly common requirements:

            I just needed two commands: one to mirror a folder from one drive to another, updating only the changes (excluding all of the hidden MacOS cruft). And another command to do a deep validation of the copy. These have to be two of the most commonly used commands right???

            In the end, I felt that messing around with precious data without being 100% certain of what I am doing just wasn't worth it so I got a GUI app that was intuitive.

      • harvey9 2 days ago ago

        Not the op, but I sometimes find the official documents hard to parse. Not looking at rsync in this case. On the other hand I have the same experience with LLMs as the op.

        Big thanks to all the doc writers who include vignettes right in the documents.

      • Ancapistani 2 days ago ago

        I’m using “aichat”, and have all my man pages in a RAG. It’s far faster to query that than it is for me to read through it manually.

      • dist-epoch 2 days ago ago

        I guess you never used ffmpeg, there is a whole industry of "how to do X with ffmpeg"

    • KronisLV 2 days ago ago

      > I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands.

      Try N times, adding more context about the environment and error messages along the way. If it doesn't work after those, try other models (Claude, Gemini, ...). If none of those work on whatever number of attempts you've chosen, then LLMs won't be able to help you well enough and you should save yourself some time and look elsewhere.

      A good starting point is trying for 10-20 minutes, after which point an LLM might actually become slower than you going the old fashioned way of digging into docs and reading forum posts and such. There are also problems that are a bit too complex for LLMs as well and they'd just take you in circles no matter for how long you try.

    • meowface 2 days ago ago

      As a programmer I have noticed this problem much more with command help than with code. Maybe partly because the training data has way more, and more diverse, code examples than all the relevant permutations and use cases for command argument examples.

    • doug_durham 2 days ago ago

      I'm a coder and I've never had your experience. It usually does an amazing job. I think that coders have an advantage because there are many questions I would never ask an LLM because of my intuition on what would work well and what wouldn't. In your case I would have dumped the output of `rsync --help` into the context window once I saw it wasn't familiar with my particular version of rsync. That's they way these tools work.

    • BryanLegend 2 days ago ago

      I've recently been misled by ChatGPT a lot as well. I think it's the router. I'm on the free plan so I assume they're just being tight with the GPU cycles.

    • dimgl 2 days ago ago

      LLMs are a very specific kind of beast. Using an `rsync --help` or getting any kind of specific documentation into context would have unblocked you.

    • BinaryIgor 2 days ago ago

      All the time I find that if I know the stack and tools I am working in, it's faster to just write code on my own, manually; If I want to learn on the other hand, LLMs are quite useful - as long as you understand (or learn to) and validate the output

    • 2 days ago ago
      [deleted]
    • alain94040 2 days ago ago

      Instead of just saying: rsync on my system is version 3.2, have you tried copy/pasting rsync --help? In my experience, that would be enough for the AI to figure out what it needs to do and which arguments to use. I don't treat AI like an oracle, I treat it like an eager CS grad. I must give it the right information for the task.

    • righthand 2 days ago ago

      What is the cost (in tokens/$$$) of spending an hour restating questions to a chat bot vs typing `man rsync`?

      • danielbln 2 days ago ago

        Telling the agent to execute "man rsync" and synthesize the answer from there is probably the cheapest and most efficient option.

        Letting some detached LLM fumble around for an hour is never the right way to go, and inversely sifting through the man page of rsync or fmmpeg or (God forbid) jq to figure out some arcane syntax isn't exactly a great use of anyone's time either, all things considered.

        • righthand 2 days ago ago

          Sifting through just means you don’t know how to use the man interface to search/grep (which from a discoverability perspective is fair). However I think reeling through an Llm (using an agent or not) for a task that probably could take <10mins at $0, demonstrates enthusiasts disregard for a good set of research and reading habits.

          All of this is an attempt at circumventing RTFM because you’re privileged enough to afford it.

          Just lay yourself down on the WALL-E floating bed and give up already.

          • wintermutestwin 2 days ago ago

            The “fucking” manual is obtuse, overly verbose, and almost always lacking in super clear real world examples. That you claim it is a <10 min problem demonstrates experts disregard for the degree of arcane crap, a beginner needs to synthesize.

            I am backing up and verifying critical data here. This is not a task that should be taken lightly. And as I learned, it is not a task that one can rely on an LLM for.

            • righthand 2 days ago ago

              I disagree. The topic is rsync which is well documented. The path forward might not work in every situation but the manual has good examples too. This is probably true for what Llm users use it for 90% of the time. To hack together things that are well known and well documented into a result. Llms arguably only work because these tools ffmpeg, rsync, etc were already solving problems and widely used and documented. So burning energy to have a computer look up commands because you couldn’t spend 10 minutes reading yourself could be a waste of time and money. Where as having to spend time researching is likely only a waste of time only.

              • wintermutestwin a day ago ago

                >rsync which is well documented

                Here is the man page entry for the --delete flag:

                --delete is used. This option is mutually exclusive with --delete-during, --delete-delay, and --delete-after.

                Hilarious!

                Reading and understanding the rsync command would take much more than 10 mins and I am not a total newb here.

                • righthand a day ago ago

                  What command? Just to clarify there is no example command we’re discussing here. You’re just cherry picking results exclusively from the man page and then arguing that chatgpt is better because it gets to use example documentation from the internet. Well I get to use examples from the internet too.

                  A search query takes a matter of seconds to type in, select a result and read. No doubt still under 10 minutes.

                  But still to my original point it’s insanely more expensive to have chatgpt look it up. This doesn’t bother you because you are privileged enough to waste money there. If time is money then IMO the only valuable time I have with my money is when it’s gaining interest and not being spent.

                  You can abstract away all the “but I had to scroll down the page and click a different result” steps as “time savings” all you want, but no one was wasting a ton of time there for already well established tools. That is a deluded myth.

                  I’m not sure I even grasped your point. The delete flag is pretty self explanatory and gives you options for more granularity. Why does that take greater than 10 mins? What is the issue with that entry?

                  Here is what I get when I type `man rsync`:

                  ``` --delete This tells rsync to delete extraneous files from the receiving side (ones that aren't on the sending side), but only for the directories that are being synchronized. You must have asked rsync to send the whole directory (e.g. "dir" or "dir/") without using a wildcard for the directory's contents (e.g. "dir/*") since the wildcard is expanded by the shell and rsync thus gets a request to transfer individual files, not the files' parent directory. Files that are excluded from the transfer are also excluded from being deleted unless you use the --delete-excluded option or mark the rules as only matching on the sending side (see the include/exclude modifiers in the FILTER RULES section).

                                Prior to rsync 2.6.7, this option would have no effect unless
                                --recursive was enabled.  Beginning with 2.6.7, deletions will
                                also occur when --dirs (-d) is enabled, but only for directories
                                whose contents are being copied.
                  
                                This option can be dangerous if used incorrectly! It is a very
                                good idea to first try a run using the --dry-run (-n) option to
                                see what files are going to be deleted.
                  
                                If the sending side detects any I/O errors, then the deletion of
                                any files at the destination will be automatically disabled.
                                This is to prevent temporary filesystem failures (such as NFS
                                errors) on the sending side from causing a massive deletion of
                                files on the destination.  You can override this with the
                                --ignore-errors option.
                  
                                The --delete option may be combined with one of the --delete-
                                WHEN options without conflict, as well as --delete-excluded.
                                However, if none of the --delete-WHEN options are specified,
                                rsync will choose the --delete-during algorithm when talking to
                                rsync 3.0.0 or newer, or the --delete-before algorithm when
                                talking to an older rsync.  See also --delete-delay and
                                --delete-after.
                  ```
                  • wintermutestwin 21 hours ago ago

                    I pasted the results of typing man rsync into my macbook’s terminal. I looked up the —delete parameter and pasted the entry. Not sure why your entry was more useful - perhaps a version issue (which is at the root of the painful time I have spent trying to learn how to do something trivial).

                    Later in the man page, it gives examples and totally fails to explain those examples. And yes, for someone who is going to be doing this frequently and professionally. They should understand this deeply and spending the hours required to be fluent in a command with a kitchen sink full of parameters. I, on the other hand, will be executing these commands maybe a few times in a year.

                    The more I think about it, the more I think the solution here is to use LLM‘s to write better documentation with lenses for different types of users with different needs.

      • pletnes 2 days ago ago

        Lower than the cost of a meatbag-office with heating, cooling and coffee for the time spent reading the manual.

        • righthand 2 days ago ago

          How is the cost of an office relative? Llms didn’t destroy offices. You can type `man rsync` without any of that.

    • Yoric 2 days ago ago

      Pretty much my experience, yes.

  • jmkni 2 days ago ago

    I can relate to a lot of this.

    Where I find AI most useful is getting it to do tasks I already know how to do, but would take time.

    If you understand the problem you are trying to solve well enough to explain it to the LLM, you can get good results, you can also eyeball the outputted code and know right away if it's what you are after.

    Getting it to do things you don't know how to do is where it goes off the rails IMO

    • matula 2 days ago ago

      This post (aside from the title) is fairly nuanced, with the reality that "let an LLM do all the things" is going to be fraught with problems... but "let an LLM do some very specific things that saves me an hour or so from boilerplate code or tests" is very nice. Of course, I doubt a blog post titled "Sometimes AI is ok, sometimes it's not" would get the clicks.

    • serbuvlad 2 days ago ago

      Exactly. AI is your intern, not your contractor.

      • Yoric 2 days ago ago

        More precisely, AI is your intern who won't improve during the internship, ever.

        • rhetocj23 2 days ago ago

          Just to play devils advocate - why do you think this is so?

          And if you have a compelling thesis, why hasn't this spread to the investing community?

          • Yoric 2 days ago ago

            > why do you think this is so?

            Well, because nothing in the transformer architecture supports such learning ability. All AI researchers and most serious AI users are aware of this, so I'm not sure I understand the question.

            > And if you have a compelling thesis, why hasn't this spread to the investing community?

            The investing community believes that they can make money. That's feels pretty much orthogonal to whether the metaphorical intern can learn, and much more related to whether clients can be made to buy the product, one way or another.

          • ewoodrich 2 days ago ago

            They mean it doesn't learn from experience/mistakes after spending time on your codebase. There are workarounds like documenting in CLAUDE.md/CODEX.md/.roorules which dump it back into the active context but are hit and miss in my experience. It's definitely better than nothing but Claude still routinely ignores important directives whenever it's in the right mood.

        • jmkni 2 days ago ago

          And is also a pathological liar lol

          • Yoric 2 days ago ago

            Yes, that, too.

  • zenmac 2 days ago ago

    >There is never enough context. We learned quickly that the more context we provided and the smaller the issues, the better the results. However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback. AI would just not understand if it didn’t have enough information to finish a task, it would assume, a lot, and fail.

    Is it me or does it feels like the genie in the bottle thing. Remember a TV show where the guy and his friend sat down with the Genie like a lawyer to make sure every angle is covered (going to spare you the details here). That is what it feels like interacting to a LLM sometimes.

    • mnky9800n 2 days ago ago

      I think the AI acts like that shitty coworker that is super smart but never tells you what they are thinking so they are likely capable of doing whatever you want them to do but working on a team is asking too much and they are apparently not capable of doing that. Because AI promises that you can interact with it like it is human because of it's chat capabilities but it never ever does something like, "hey, i don't understand this part, can you tell me more of what you mean here?"

    • rsynnott a day ago ago

      Well, except that (I think I know the scene you're referring to), it ultimately worked. The LLM, on the other hand, will feel no need to stick to its 'promises'.

      (Really the genie is closer to the traditional sci-fi AI in that it's legalistic and rules-bound; the LLM very much isn't.)

  • richardguerre_ 2 days ago ago

    I don't like vibe coding as much as actual coding, but the biggest improvement in my workflow was shifting left even more.

    Now I dedicate at least one session to just writing a spec file, and have it ask me clarifying questions on my requirements and based on what it finds in the codebase and online. I ask it to also break down the implementation plan in phases with a checklist for each phase.

    I then start at least one new session per phase and make sure to nail down that phase before continuing.

    The nice thing is if it gets annoying to vibe code it, I or someone on my team can just use the spec to implement things.

  • avighnay 2 days ago ago

    I decided to adopt AI assisted coding for a recent project. Not sure what defines 'vibe coding' but the process I ended up was a iterative interaction at a measured pace.

    I used Gemini AI studio for this and I was very pleased at the result and decided to open source it. I have completely captured and documented the development transcript. Personally it has give me considerable productivity boost. My only irritation was the unnecessarily over politeness that AI adopts in My take is

    AI yields good ROI when you know exactly what you want at the end of the process and when you want to compare and contrast decision choices during the process.

    I have used it for all artifacts of the project: - Core code base - Test cases - Build scripts - Documentation - Sample apps - Utilities

    Transcript - https://gingerhome.github.io/gingee-docs/docs/ai-transcript/... Project - https://github.com/gingerhome/gingee

  • amelius 2 days ago ago

    Yes, this is how I use AI.

    Indeed, self-invented abstractions are a bridge too far for AI.

    You have to keep it close to the path already walked before by thousands of developers.

    This makes AI more of a search engine on steroids than anything else.

    • rhetocj23 2 days ago ago

      ChatGPT is literally just a search engine that Google shouldve moved to, but waited because they didnt want to touch their assets in place.

      • falcor84 2 days ago ago

        I agree that Google were too slow to move, but entirely disagree with the first part. ChatGPT is very much not a "search engine". Arguably it is an "Answer engine", but more so, it is a conversational partner - I almost never use ChatGPT to just get one response; the real benefit is being able to follow up with it until I'm satisfied. It's an entirely different medium of interaction as compared to search engines.

      • amelius 2 days ago ago

        GPT is orders of magnitude more expensive to run, though.

  • xpil 2 days ago ago

    My preferred approach in similar situations is to ask an LLM for an initial solution or code snippet, then take over manually - no endless prompt tweaking, just stop prompting and start coding. Finally (optionally), I let the LLM do a final pass to review my completed solution for bugs, optimizations, etc.

    The key win is skipping the prompt refinement loop, which is (A) tedious and time-consuming, and (B) debilitating in the long run.

  • BinaryIgor 2 days ago ago

    "We just don’t think we will incorporate AI to do more than that, given the current state of things. We will, however, keep an eye in case the technology changes fundamentally."

    I wonder whether LLMs are capable of doing more; probably, we need another paradigm for that; still, they are very, very useful when used right

    • falcor84 2 days ago ago

      > I wonder whether LLMs are capable of doing more

      I don't see how that is a question. I come up with new ideas to improve the LLM-based tools I'm using at least once a day, and the vast majority of these are plain engineering changes that I could do on my own if I wanted to put the effort into it. I think that even if God comes down from heaven to prevent us from further training the LLMs themselves (if God is listening to Yudkowsky's prayers), then we would still have a good few decades of extensively improving the capabilities of LLM-based tools to extract a massive amount of further productivity by just building better agentic wrappers and pipelines, applying proper software development and QA methodology.

  • christoff12 2 days ago ago

    > However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback.

    The proceeding without clarifying or asking questions thing really grinds my gears.

  • rel_ic 2 days ago ago

    Using AI to improve facebook ads... y'all are the breakers from the Dark Tower series.

  • qweiopqweiop 2 days ago ago

    Not asking for feedback is the killer for me. Even most junior developers will ask for more information if they don't have enough context/confidence to complete a task.

    • hatefulmoron 2 days ago ago

      I often ask Claude to scan through the code first and then come back with questions related to the task. It sometimes comes back with useful questions, but most of the time it acts like a university student looking for participation marks from a tutorial; choosing questions to signal understanding rather than be helpful.

    • jrexilius 2 days ago ago

      I have taken to appending "DO NOT START WRITING CODE." to almost every prompt.. I try to get it to analyze and ask questions and summarize what its going to do first, and even then it will sometimes ignore that and jump into writing (the wrong) code. A big part of the wrangling seems to be getting it to analyze or reason before charging down a wrong path.

      • mattmanser 2 days ago ago

        If you use Claude Code you can go into plan mode, where it doesn't write code, you can back and forth.

      • jmkni 2 days ago ago

        Gemini is terrible for this

    • qudat 2 days ago ago

      GitHub just released spec-kit which I think attempts to get the human more involved in the spec/planning/task building process. You basically instruct the LLM to generate these docs and you tweak them to flesh it out all fix mistakes. Then you tell the LLM to work on a single task at a time, reviewing in small chunks.

      • mattmanser 2 days ago ago

        That's how everyone is already using Claude Code, it's not GitHub's idea. You go into plan mode, get it to iterate on the idea, then ask it to make (and save) a to do list md. Then you get it to run through the to-do list, checking tasks off as it goes.

  • jwpapi 2 days ago ago

    This aligns very well with my experience and what I’ve commented on other posts!

  • ASalazarMX 2 days ago ago

    "AI, but verify" -- Winston Churchill (alternate universe)

  • chrischen 2 days ago ago

    I don't understand why people take bad coding practices and just let AI run with it and then expect nothing but poor quality code. Nothing about the AI revolution here changes how good software has always been written. Write tests, use a typed language, review code. If you have good patterns, good procedures, AI fits right in and fills in the blanks perfectly. Poor AI results tend to be the pot calling the kettle black.

    • righthand 2 days ago ago

      I think bad practices will always be around as most code on Guthub was probably written with bad practices. The well is poisoned.

      • rhetocj23 2 days ago ago

        Out of curiosity, how do you think the model producers will/would attempt to discern what information on the web is of high quality vs not so high quality (i.e. poisonous)? Akin to clean/drinkable water vs dirty water/harmful water in the well.

        • righthand 2 days ago ago

          I don’t think they will. The well will always have some level of poison if all information has bias and intent. Bad software design is bad grammar, it’s ubiquitous.

    • rhetocj23 2 days ago ago

      The real problem is the quality of knowledge and education of engineers. No amount of AI fixes that (until you complete displace labour as an input that is).

    • dweinus 2 days ago ago

      I mean, it sounds like reviews and tests are already their standard practice, and explicitly part of their AI practice. So it should have worked, right?

  • esafak 2 days ago ago

    Filler content.

    > Our marketing director (that’d be me) said that if we don’t write something about it, we will be left behind...

    Write when you have something to say. What was I supposed to learn here?