Soft Machines: promising, not proven
The company claims it can build a multicore processor where hardware orchestration logic allows multiple CPU cores to act as one, significantly improving instruction per cycle (IPC) performance over a single CPU core and allowing multicore processors to perform significantly better on single-threaded code.
The company has built a multi-national team, raised $175 million and is now close to demonstrating the first real products using its VISC technology. Soft Machine’s business model is flexible – offering a mix of both chips and licensable CPU IP.
The company’s first test silicon was built in late 2014 in a 28nm process. The details of the original test chip were reported back in 2014. The original demo didn’t silence the skeptics, but was enough to convince a number of investors to put more money in Soft Machines.
This year the company plans to tape out an SoC code named Mojave in a 16nm FinFET process based on a core named Shasta. The company also revealed an ambitious roadmap at The Linley Group Processor Conference in 2015 to deliver a new CPU and SoC every year for the next three years.
While the original 28nm proof-of-concept design was 32-bit, the new Shasta CPU emulates the 64-bit ARMv8 instruction set. Shasta includes two physical cores and can support one to two virtual CPUs. Future versions will support up to four physical cores.
The company’s stated goals are to deliver up to 2.5x performance improvement at the same power, or up to 4x energy advantage at the same performance compared with a standard ARM core. The comparison is of a present day ARM CPU with a future four core VISC CPU called Tahoe, schedule to ship in 2018.
Soft Machines recently updated its simulated performance numbers with comparisons to Apple’s A9X and Intel’s Skylake shown in the figure below. The company doesn’t have physical Shasta-based chips for testing, so its data was from simulations. Actual Shasta core RTL is scheduled to be released mid-2016 and the Mojave SoC tape out is expected in Q3 of 2016.
For the sake of an apples-to-apples comparison in its performance charts, Soft Machines normalized the processor configurations (cache sizes) between the ARM Cortex-A72, Apple A9X, Intel Skylake, and its VISC cores. This does not mean that the VISC cores will be configured exactly this way with regard to cache size.
The numbers look impressive as the single-threaded SPEC benchmark code sees a significant performance jump of roughly 40% even over high IPC processors like the A9X and Skylake. With that said, the SPEC code is staticly compiled and easy to optimize for. The tougher task for VISC will be dynamic code like Java. In addition, multicore benchmarks may not benefit as much from the dynamic workload balancing.
The VISC concept is similar to Transmeta’s Crusoe processors in that it uses a layer of software that emulates an existing instruction set architecture (ISA). The Transmeta processors emulated x86. The Soft Machine architecture can also emulate multiple ISAs, but the company is most interested in ARMv8.
Transmeta used a complex software layer that was responsible for providing higher performance by optimizing code that runs on simple CPU hardware cores. Soft Machines uses a thinner software layer that does quick ISA conversion for compatibility. The chips get nearly all of their performance gains from the hardware architecture.
The Soft Machines simulations were limited to 2.5 GHz peak rates, which indicates a short pipeline was chosen for VISC. But clock speed is no longer the sole measure of performance, so the 2-2.5 GHz range is considered acceptable for most mobile and power-constrained processors.
The architecture gets its performance lift when a single thread can use the resources of two or more VISC cores, making the cores act like one very wide core. A global front-end inside the VISC cores allocates and packs VISC instructions to each core based on dynamic load balancing. This step inside the hardware is a key ingredient of Soft Machine’s secret sauce.
This is a big differentiation from any other processor that only schedules one CPU core. Dynamically allocating resources from two cores allows single-threaded code to execute faster — not quite 2x faster, but significantly faster.
The VISC architecture is also flexible enough to allow one fat thread to schedule part of the resources from a second VISC core, using roughly 1.5 cores. A second thin thread can still execute using the remaining resources of the second VISC core.
There is about 5% overhead involved with using a software layer to translate ARMv8 instructions to the native VISC instruction set, Soft Machine previously said. Still, by dynamically using the second core, the VISC architecture gets more performance with less power than an inflexibly ultra-long-instruction-word architecture.
Each VISC core has a relatively high IPC, giving each core more performance at lower clock speeds, which saves power. In addition, there is some power savings is due to the simpler microachitecture and shorter pipelines.
Past Soft Machines demos have shown promise, and the new simulation data shows significant IPC performance in the sweet spot for mobile computing. With the quality of the backers and the team, it’s tempting to give them the benefit of the doubt.
However, until the Shasta SoC can be tested running multiple workloads by a third party, it’s hard for the company to convince the doubters. For some, the technology appears to be too good to be true.
I hope Soft Machines turns out to be the real deal. We need some radical new innovation as present CPU designs offer mostly incremental improvements and Moore’s Law is drawing to an end. And it comes just as microprocessor design was getting boring.
About the author:
Kevin Krewell is Principal Analyst at Tirias Research – www.tiriasresearch.com