AI recognizes correct code, fixes programming errors
The researchers tested their system on a set of programming errors, culled from real open-source applications that had been compiled to evaluate automatic bug-repair systems. While earlier systems were able to repair one or two of the bugs, the MIT machine-learning system repaired between 15 and 18, depending on whether it settled on the first solution it found or was allowed to run longer.
“One of the most intriguing aspects of this research is that we’ve found that there are indeed universal properties of correct code that you can learn from one set of applications and apply to another set of applications,” explained Martin Rinard, professor of electrical engineering and computer science.
These results could lead not only to developing automatic bug-repair tools but could be used across other engineering domains, according to Rinard.
“If you can recognize correct code, that has enormous implications across all software engineering. This is just the first application of what we hope will be a brand-new, fabulous technique.”
The research, which was presented in a paper by graduate student Fan Long at the latest Symposium on Principles of Programming Languages, describes how Long was able to write a computer script to automatically extract both the uncorrected code and patches for 777 errors in eight common open-source applications stored in the online repository GitHub.
To initiate their machine-learning system, Long and Rinard’s first had to select a “feature set” that the system would analyze. The researchers concentrated on values stored in memory, either variables, which can be modified during a program’s execution, or constants, which can’t. They identified 30 prime characteristics of a given value: being involved in an operation (addition, multiplication, comparison); being local or global; being variable or not, and so on.
They then wrote a computer program to evaluate all the possible relationships between these characteristics in successive lines of code, finding over 3,500 such relationships in their feature set. Their machine-learning algorithm then tried to determine what combination of features most consistently predicted the success of a patch.
“All the features we’re trying to look at are relationships between the patch you insert and the code you are trying to patch,” Long explained. “Typically, there will be good connections in the correct patches, corresponding to useful or productive program logic. And there will be bad patterns that mean disconnections in program logic or redundant program logic that are less likely to be successful.”
Using such machine learning to learn from ‘big code’ could improve or speed up many other programming tasks often built-in software development suites, such as code completion, or reverse-engineering.
In earlier work, Long had developed an algorithm that attempts to repair program bugs by systematically modifying program code. The modified code is then subjected to a suite of tests designed to elicit the buggy behaviour. This approach may find a modification that passes the tests, but it could take a prohibitively long time. Moreover, the modified code may still contain errors that the tests don’t trigger.
Long and Rinard’s machine-learning system works in conjunction with this earlier algorithm, ranking proposed modifications according to the probability that they are correct before subjecting them to time-consuming tests.
The researchers tested their system, which they call Prophet, on a set of 69 program errors that had cropped up in eight popular open-source programs. Of those, 19 are amenable to the type of modifications that Long’s algorithm uses; the other 50 have more complicated problems that involve logical inconsistencies across larger swaths of code.
When Long and Rinard configured their system to settle for the first solution that passed the bug-eliciting tests, it was able to correctly repair 15 of the 19 errors; when they allowed it to run for 12 hours per problem, it repaired 18.
That still leaves the other 50 errors in the test set untouched, but Long is working on a machine-learning system that will look at more coarse-grained manipulation of program values across larger stretches of code, in the hope of producing a bug-repair system that can handle more complex errors.
Visit the MIT at https://web.mit.edu