Do you pay your employer when you introduce bugs?
I think you're lucky if you get usable output which you don't consider a mistake.
Also, you might be mistaken if you think that you pay for a deterministic service.
I think it would be expensive to check. For a coding task any reviewer would need to understand programming (these people aren't cheap), the domain context, cultural differences (e.g. American "cookie" vs British "biscuit"), and make a determination.
If the AI companies just paid all of that out of the goodness of their pocketbook I'd be fine with it, but in reality I think they'd just pass on the costs. The same way that basically every business passes on spoilage, theft, return rates, etc.. So I think the value would be risk mitigation rather than cost (as in, you know if you pay for $10 worth of tokens, it will $10 worth of good tokens, but the individual token cost would need to account for all the tokens that the company doesn't get paid for)
I think so. IMO, at this point, AI systems should also be using expert/rule systems to validate their output to avoid bad/obvious mistakes. In ambiguous/complex cases, I don't think so, but in certain circumstances, the output is ridiculous and could have been caught by a relatively simple expert system/rules engine, likely something the AI itself could have helped build.
Interesting, you have just identified a potential market distinction. First we need a group (ala consumers report) to evaluate different services. Then different services would be motivated to perform the sub agent verification automatically as a Competitive Advantage,
It's interesting on the grounds of aligning incentives.
It's not interesting due to the fact that it suggests humans are still in the loop of some slow-cycle improvements. That'd never get by any board. In fact, selection of model modes implies it's your responsibility, so that meal was scraped into your flowerpot years ago.
I mean “mistakes” can be hard to define. IMHO there is an area of responsibility between the LLM, the LLM user, and the code itself.
Did it make a mistake because I didn’t follow instructions properly or hallucinated some content?
Did it make a mistake because the prompt was unclear/open to interpretation or plain wrong?
Did it make a mistake because it lacked some context? Or too much context and it starts getting confused?
Is not handling edge cases automatically when that was not requested a mistake?
I am not just trying to defend LLMs, in many cases they make obvious mistakes and just don’t follow my arguably clear instructions properly. But sometimes it is not so clear cut. Maybe I didn’t link a relevant file (you can argue it could have looked to it), maybe my prompt just wasn’t that clear etc
If you choose to use them, you go in knowing they need help to be accurate. You clearly know how to use the tools to reach the accuracy you desire, but asking for that usage to be free seems to be based on a false premise. There has never been an expectation of accuracy in the first place when it comes to LLM output.
No, you should know that no man or machine writes bug or mistake free code. You are paying for tokens (electricity and cooling), not what those tokens represent. How would you define mistakes in non code tasks?
Would you give money back to your employer when you make a mistake?
Do you pay your employer when you introduce bugs? I think you're lucky if you get usable output which you don't consider a mistake. Also, you might be mistaken if you think that you pay for a deterministic service.
edit: typo
I think it would be expensive to check. For a coding task any reviewer would need to understand programming (these people aren't cheap), the domain context, cultural differences (e.g. American "cookie" vs British "biscuit"), and make a determination.
If the AI companies just paid all of that out of the goodness of their pocketbook I'd be fine with it, but in reality I think they'd just pass on the costs. The same way that basically every business passes on spoilage, theft, return rates, etc.. So I think the value would be risk mitigation rather than cost (as in, you know if you pay for $10 worth of tokens, it will $10 worth of good tokens, but the individual token cost would need to account for all the tokens that the company doesn't get paid for)
I think so. IMO, at this point, AI systems should also be using expert/rule systems to validate their output to avoid bad/obvious mistakes. In ambiguous/complex cases, I don't think so, but in certain circumstances, the output is ridiculous and could have been caught by a relatively simple expert system/rules engine, likely something the AI itself could have helped build.
Interesting, you have just identified a potential market distinction. First we need a group (ala consumers report) to evaluate different services. Then different services would be motivated to perform the sub agent verification automatically as a Competitive Advantage,
Just parse responses for "sorry about that"!
It's interesting on the grounds of aligning incentives.
It's not interesting due to the fact that it suggests humans are still in the loop of some slow-cycle improvements. That'd never get by any board. In fact, selection of model modes implies it's your responsibility, so that meal was scraped into your flowerpot years ago.
I'd say fat chance.
I think they are already doing it case by case basis, but the support experience is worst
I mean “mistakes” can be hard to define. IMHO there is an area of responsibility between the LLM, the LLM user, and the code itself.
Did it make a mistake because I didn’t follow instructions properly or hallucinated some content?
Did it make a mistake because the prompt was unclear/open to interpretation or plain wrong?
Did it make a mistake because it lacked some context? Or too much context and it starts getting confused?
Is not handling edge cases automatically when that was not requested a mistake?
I am not just trying to defend LLMs, in many cases they make obvious mistakes and just don’t follow my arguably clear instructions properly. But sometimes it is not so clear cut. Maybe I didn’t link a relevant file (you can argue it could have looked to it), maybe my prompt just wasn’t that clear etc
probably not. but they should be more explicit about the usage, not just - you've used up 5%.
LLMs hallucinate - This is known.
If you choose to use them, you go in knowing they need help to be accurate. You clearly know how to use the tools to reach the accuracy you desire, but asking for that usage to be free seems to be based on a false premise. There has never been an expectation of accuracy in the first place when it comes to LLM output.
No, you should know that no man or machine writes bug or mistake free code. You are paying for tokens (electricity and cooling), not what those tokens represent. How would you define mistakes in non code tasks?
Would you give money back to your employer when you make a mistake?