Placing hundreds or thousands of processing elements in DRAM able to perform work for a controlling server CPU could have a revolutionary impact on how data centers are designed. Simulation results have shown a performance increase of factors of 10 to 25 compared with traditional server architectures, the company said. In other words a single server built using Upmem technology and running certain algorithms could perform the work of 10 to 25 servers while consuming less power.
The use of local processing and extreme parallelism has been understood to increase processor-memory bandwidth and reduce power consumption for a long time – it is part of the promise of FPGAs – but delivering such architectures at the system level has rarely happened.
Upmem, thinks it can change that and is developing a RISC processor design codenamed DPU (stands for DRAM processing unit) that is optimized for intensive data processing and compatible with DRAM manufacturing process constraints. DPUs can be used as simple functional co-processing units or as a network co-operating within a massively parallel environment. Typical applications that Upmem claims are suitable for this approach include: real-time analytics, pattern matching, database serving and implementation of artificial neural networks.
The company was co-founded by Farbice Devaux (CTO) and Jean-Francois Roy (COO) – both formerly of Trango Virtual Processors – and serial entrepreneur Gilles Hamou (CEO). To date the company has been funded by the founders and angel investors.
Building on research work done by Devaux prior to the company’s formation the startup has now finalized its RISC architecture and micro-architecture and has prototyped the design. “Our proof of concept was done on a 4x-nm process, and we reached 600MHz [clock frequency]. This is very promising and demonstrated that we will reach 1GHz with a more recent process,” Roy told eeNews Europe. The architecture is 32bits and one processor is attached per DRAM bank of 64Mbytes.
Designing logic to sit in a DRAM process presents several constraints, not least routing with only few metal layers. This means that the same functions in DRAM process would take up much more area than in a CMOS logic process. However, the Upmem team have designed their DPU specifically to the functionality and DRAM process constraints while producing an instruction set that remains general purpose, Roy said.
“The fundamental benefit of processing-in-memory is the combination of DRAM and CPU. We attach 1 DPU per DRAM bank. It means 16 cores per 8Gbit DRAM chip. On a 16Gbyte DIMM, we deliver 256 cores, and 8 of them can be added to a standard CPU socket. We end up with a co-processing system of 2048 cores together with 128Gbytes of DRAM per socket,” said Roy.
However, such a design with a 2000 processing elements each handling maybe a dozen threads on behalf of a controlling CPU would require a major change in software approach. Roy made the point that although compilers and SDKs will be provided for the DPU it does not need to directly support an operating system because it is design run small programs or routines on behalf of the CPU. “It’s a programmable co-processor optimised for data computing,” said Roy.
“The high level application is distributing tasks to the co-pocessors easily because it knows what DPU has what data. You can see it as a distributed computing system at the server level. The DPUs are independent from each other in term of code and data, making the solution scalable. All the communication is done thought the x86, under the control of the application,” he added.
Roy said Upmem is in talks with all of the major DRAM manufacturers about making DPU. “They bring the DRAM building blocks, the PHY interface, the facility and the test infrastructure, while Upmem brings the CPU IP and the software stack.”
Upmem is aiming to reach mass production in 2018 having conducted a beta program in 2017 with its v1 silicon. Roy didn’t say which of the DRAM companies had manufactured its prototype but it is notable that Micron Technology Inc. (Boise, Idaho) has long researched the implementation of logic in DRAM process (see Micron’s Automata Exploits Parallelism to Solve Big Data Problems).
Roy said that such developments were on the roadmap of at least two of the top three DRAM makers. “We are working with academic and industrial partners to evaluate the performance benefit of our solution using the SDK and the cycle-accurate simulator. We see a 10x to 25x speed up for many algorithms including in-memory database, graph analytics or genomics. Not a surprise, they are all memory bounded on x86. However, it’s a game changer if one server equipped with our DIMMs can deliver the workload of 20 servers,” he said.
Upmem partners include CEA-Tech Leti, INRIA and Dolphin Integration in Grenoble.
Related links and articles: