In the paper presented at IEDM 2018, the researchers detail a compact RRAM layout that can be directly integrated in the Back End Of Line of a 130nm CMOS process, on top of the fourth metal layer, slashing silicon area by a factor of 8 compared to the implementation of conventional 16-transistor Ternary Content-Addressable Memories (TCAM).
TCAM circuits provide a way to search large data sets using masks that indicate ranges, making these circuits useful for complex routing and big data applications, where an exact match is rarely necessary. With TCAMs, stored information can be searched by its content, as opposed to classic memory systems in which a memory cell’s stored information is retrieved by its physical address. This shortens search times dramatically, but due to their relatively cumbersome architecture (16 CMOS transistors), TCAMs’ storage capacity is often limited to tens of Mbs in standard memory structures in order to save up valuable silicon real estate.
By replacing the SRAM cells with resistive-RAM (RRAM) in TCAM circuits, the researchers reduced the number of required transistors to two (2T), and to two RRAMs (2R). In addition, they were able to fabricate the RRAMs on top of the transistors, which further reduced the required silicon real estate (hence the 8x shrink).
But coming up with an innovative architecture was not enough, as reliability issues were looming and such RRAMs may not be suitable for every application. What the authors explain is that circuit reliability is strongly dependent on the ratio between the ON and OFF states of the memory cells, and RRAM-based TCAMs have a relatively low ON/OFF ratio (from 10 to 100) with respect to the 16-transistor structure with an ON/OFF ratio about 105, and RRAMs also suffer from a limited endurance with respect to CMOS transistors.
To identify the right trade-offs and overcome these challenges, the researchers clarified the link between RRAM electrical properties and TCAM performance with extensive characterizations of a fabricated RRAM-based circuit. The limited endurance can be overcome by either decreasing the voltage applied during each search, or increasing the power used to program the TCAM cells beforehand. The research showed a trade-off exists between TCAM performance (search speed) and TCAM reliability (match/mismatch detection and search/read endurance).
While a performance reduction in order to increase endurance could be a critical limiting factor in standard TCAM-based applications requiring many read/write cycles, trading speed for better search/read endurance and better search margin could be sufficient to retain network configuration data in a multi-core neuromorphic chip where long match times (from few tens to hundreds of μs) are required to be compatible with spike length.
So to improve the search margin and search/read endurance, the researchers adopted strong RRAM programming conditions, low search voltage and a limited word length. This came at the expense of lower performance in terms of longer search latencies and lower write endurance, but as the authors emphasized in their conclusion, multi-core neuromorphic computing architectures would not be affected by these problems and could greatly benefit from the RRAM’s high density.
In such an application, search operations are frequent, but write operations are few and idle times are long, taking full advantage from the zero standby power consumption of RRAMs while not being affected by the longer search latencies and lower write endurance.
One example cited in the paper is the NeuRAM3 DYNAP-SEL neuromorphic chip (EU H2020 Project running until 2019) whose processing cores comprise multiple TCAM cells per neuron to implement memory-optimized source-address routing schemes. These TCAM cells are typically small and are only programmed at network configuration time.
“Assuming many future neuromorphic computing architectures will have thousands of cores, the non-volatility feature of the proposed TCAM circuits will provide an additional crucial benefit, since users will have to upload all the configuration bits only the first time the network is configured,” explains Denys R.B. Ly, a Ph.D. student at Leti and lead author of the paper.
“Users will also be able to skip this potentially time-consuming process every time the chip is reset or power-cycled.”
CEA-Leti – www.leti-cea.com
ETH Zurich – www.ethz.ch