Review: the document starts strong with a methodology and numbers. It covers 3 approaches: Copilot code assistance, Llama3 fine-tuning on their codebase, and RAG on documentation. The first one is the only one supported by numbers, with 27% of code suggested being accepted by developers. Although they set up a control group they fail to relate the LLM findings to it.
Fine-tuning is suggested to improve jobs like tooling upgrades but no concrete numbers are offered.
Lastly RAG on documentation. The RAG has a simple system prompt to improve uncertain responses. They're tracking meeting and support requests but don't show any results. They mention frustration with nonsensical answers but use a RL human feedback technique to improve responses. No numbers offered.
Overall a simple overview of what they tried but the strong methodological start doesn't get reflected in the numbers reported later on.
What about the overall value of the code itself? Are the developers making similar amount of money using their code as they were before the advent of the LLMs?
> We also found good levels of accuracy: the generated documents were 70% accurate, and the generated code was at 60%.
How is accuracy measured here? Is a document a single file? Is the LLM generating code and some separate kind of “document” such that “code” accuracy can be 60% while “document” accuracy can be 70%?
27% code acceptance, and generated documents that are 70% accurate, are 'good' outcomes in the following sense:
> Work at large companies has a tendency to blow up, run far behind schedule, then ultimately limp past the finish line in a maimed state.
> One of my friends talks about how, when faced by his first failed project on a team, a management consultant responded to all critical self-reflection with "But you'd say that, overall, this was a success?" in a desperate bid to generate a misleading quote to put into a presentation to the board.
> We also found good levels of accuracy: the generated documents were 70% accurate, and the generated code was at 60%.
I am available to work for you at good levels of accuracy, asking mid 6 figures + bonus + stock options.
Indeed. I'm genuinely shocked to discover they consider 60-70% accuracy "good". I call it "awful".
Close only counts in horseshoes, hand grenades, and LLMs apparently.
Review: the document starts strong with a methodology and numbers. It covers 3 approaches: Copilot code assistance, Llama3 fine-tuning on their codebase, and RAG on documentation. The first one is the only one supported by numbers, with 27% of code suggested being accepted by developers. Although they set up a control group they fail to relate the LLM findings to it.
Fine-tuning is suggested to improve jobs like tooling upgrades but no concrete numbers are offered.
Lastly RAG on documentation. The RAG has a simple system prompt to improve uncertain responses. They're tracking meeting and support requests but don't show any results. They mention frustration with nonsensical answers but use a RL human feedback technique to improve responses. No numbers offered.
Overall a simple overview of what they tried but the strong methodological start doesn't get reflected in the numbers reported later on.
What about the overall value of the code itself? Are the developers making similar amount of money using their code as they were before the advent of the LLMs?
> We also found good levels of accuracy: the generated documents were 70% accurate, and the generated code was at 60%.
How is accuracy measured here? Is a document a single file? Is the LLM generating code and some separate kind of “document” such that “code” accuracy can be 60% while “document” accuracy can be 70%?
> We also found good levels of accuracy: the generated documents were 70% accurate, and the generated code was at 60%.
I mean, define 'good'. Yikes.
27% code acceptance, and generated documents that are 70% accurate, are 'good' outcomes in the following sense:
> Work at large companies has a tendency to blow up, run far behind schedule, then ultimately limp past the finish line in a maimed state.
> One of my friends talks about how, when faced by his first failed project on a team, a management consultant responded to all critical self-reflection with "But you'd say that, overall, this was a success?" in a desperate bid to generate a misleading quote to put into a presentation to the board.
https://ludic.mataroa.blog/blog/tossed-salads-and-scrumbled-...