AI deleted my most tests, and said "All Tests Pass"

(typia.io)

24 points | by autobe a day ago ago

14 comments

  • patrakov 15 hours ago ago

    I confirm the same experience. I tried to port the Dynamic Range Optimization code in OpenCamera from Java to mathematical formulas (i.e., the spec), with the intention to translate that into Python with NumPy, so that I could run it on my own images not coming from my phone camera sensor. Tool used: just a chat with ChatGPT with the relevant files uploaded, research activated, and questions asked where I did not understand or doubted something in the response.

    Result: ChatGPT faithfully and correctly reverse-engineered the initial highlight pre-compression step and then said that the rest (the real thing!) is too complex and not important anyway. I did not pursue it further.

  • valunlabs 7 hours ago ago

    Then, when it’s time for the monthly payment, let’s write “Paid” in the chat and request the token.

    AI tools are great, but they require a lot of oversight. Developing a product is a process. People are generating so much code that they can’t keep up with code reviews. What they really need to do is spread the work out over several days. If your project is solid, you’ll get positive results anyway. (I’m not talking about the MVP stage here.)

  • benchwright a day ago ago

    Tends to be a problem. I've tried to mitigate these problems by using either external harnesses (aka GitHub actions that are "fixed" based on known-good) or by using n-number of witness agents (e.g. Kimi/Qwen/whatever <=> Claude/OpenAI/Google). Generally sucks more time and energy (and now token/$).

    that being said, I still have a "fix the code, not the test" line somewhere in here...

  • GoToRO 3 hours ago ago

    Very human

  • smrtinsert 6 hours ago ago

    Did the author say which model and harness handled the first attempts? Codex at the end ok but what did he try the rest with?

    • subscribed 2 minutes ago ago

      They specifically refused to do it, which is very meh and in retrospect reads like Codex astroturfing because of that ("all AI bad except for Codex"?

    • veunes 6 hours ago ago

      [dead]

  • OMGWTF 20 hours ago ago

    8 billion tokens? What did that cost?

    • autobe 18 hours ago ago

      No additional cost occured, but just weekly limit exhausted

  • a day ago ago
    [deleted]
  • cyanydeez 21 hours ago ago

    believing that oneshot prompt is enough specification is pretty delusional.

    I keep seeing people talk about the power of these SOTA models, yet keep reading the types of prompts that make no sense to anyone who understands the ludicrous number of decisions that would need to be made.

    • brianwawok 19 hours ago ago

      Yah it’s a weird thing to one prompt. I’m porting some code (Python to go in my case), and I’m pretty specific about doing it sessions by session. Port one file, hit a new wall, break into another session to fix it, and return to my original chat with the problem fixed.

    • veunes 6 hours ago ago

      [flagged]

  • veunes 6 hours ago ago

    [flagged]