Competition to convert legacy C code to Rust automatically with GenAI
Cette publication existe aussi en Français
US research agency DARPA has started an ambitious programme to automate the translation of legacy C code to the inherently safer Rust programming language using large language models (LLMs) and generative AI.
The Translating All C to Rust (TRACTOR) programme at DARPA aims to identify LLMs that can create the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities in C programmes.
Memory safety vulnerabilities are the most prevalent type of disclosed software vulnerability and affect a computer’s memory in two primary ways. First, programming languages like C allow programmers to manipulate memory directly, making it easy to accidentally introduce errors in their program that would enable a seemingly routine operation to corrupt the state of memory.
Second, memory safety issues can arise when a programming language exhibits an undefined behaviour. These happen when the programming language standard provides no specification or guidance on how the program should behave under conditions not explicitly defined in the standard.
Projects such as CHERI in the UK have been developing a hardware architecture for ARM, RISC-V and x86 chips to address this challenge. While memory safe programming languages can eliminate memory safety vulnerabilities, the challenge has been rewriting legacy code at scale that matches the vastness of the problem.
A cultural shift toward the programming language Rust and recent breakthroughs in machine learning techniques, like large language models (LLMs), have created an environment that may lend itself to a new class of solutions.
“You can go to any of the LLM websites, start chatting with one of the AI chatbots, and all you need to say is ‘here’s some C code, please translate it to safe idiomatic Rust code,’ cut, paste, and something comes out, and it’s often very good, but not always,” said Dan Wallach, DARPA programme manager for TRACTOR. “The research challenge is to dramatically improve the automated translation from C to Rust, particularly for program constructs with the most relevance.”
Wallach anticipates proposals that include novel combinations of software analysis, such as static and dynamic analysis, and large language models. The programme will host public competitions throughout the effort to test the capabilities of the LLM-powered solutions.
“Rust forces the programmer to get things right,” said Wallach. “It can feel constraining to deal with all the rules it forces, but when you acclimate to them, the rules give you freedom. They’re like guardrails; once you realize they’re there to protect you, you’ll become free to focus on more important things.”
DARPA will sponsor a Proposers Day on Aug. 26, 2024, which attendees can attend in person or virtually. Participants must register by Aug. 19, 2024. Details and registration info are available at SAM.Gov.