Not sure if I got the question right, but there are benchmarks like SWE pro and stuff. There's whole another debate whether you can trust it or not, and whether the labs are training on those benchmarks, but that's one way to measure that.
Other than benchmarks, I'd say that's your own test suite
Why would a metric for code quality be different depending on how the code got to to a file? In other words, if there was a good measure, would it not exist already for us? How do we measure the quality of our own code?
Not sure if I got the question right, but there are benchmarks like SWE pro and stuff. There's whole another debate whether you can trust it or not, and whether the labs are training on those benchmarks, but that's one way to measure that.
Other than benchmarks, I'd say that's your own test suite
i would never trust benchmarks tbh most of the new model releases do benchmaxxing
Why would a metric for code quality be different depending on how the code got to to a file? In other words, if there was a good measure, would it not exist already for us? How do we measure the quality of our own code?