
Universal processor startup goes for split tape-out
The company’s timetable has slipped from some earlier reports in 2018 and 2019 but now Tachyum (Santa Clara, Calif.) reports the master chip layout has been completed and about 90 percent of the physical design verified.
The design targets TSMC’s EUV-enabled 7nm FinFET manufacturing process and Tachyum’s CEO Radoslav Danilak said that he wants to have an FPGA-board emulation of the design in 3Q20 followed by tape-out before the end of the year. Samples with customers in 1Q21 followed by production starting in 2Q21 and volume production in 2H21.
Given that silicon wafers on a leading-edge process can spend three months or more just moving through a wafer fab the timing still seems breathtakingly fast. “It’s tight, but not impossible. That’s why we will be going for a split tape-out,” said Danilak.
The fab will be able to start on bottom layers and then receive information about the upper metal connections at a later date; a method used to overlap design engineering and manufacturing cycles. It is probably not something that TSMC would do with every customer but Danilak has built up a track record in previous employment with the likes of Nvidia, SandForce and Skyera.
The Prodigy processor is aimed at the Data Center but it intends to beat out the competition by being a Universal Processor that can do general-purpose work and high-performance computing work in the style of x86 and ARM and also artificial intelligence (AI), deep machine learning (ML), explainable AI, biological AI and other AI disciplines within a single chip.
There are considerable energy consumption, resource utilization and cost benefits that accrue from being able to simplify the architecture and network of the data center, Danilak argued and in addition Prodigy will offer great scalability and PPA benefits at the device level.
Next: One die, four SKU
Tachyum has specificed four instantiations of Prodigy from 16 cores at 2GHz clock frequency up to 128 cores at 4GHz. The specs are based on a single 64-core die that will be fault sorted and frequency binned for the lower spec chips and with the higher spec being achieved with a two-die component.
The core architecture is a combination of RISC, CISC and VLIW [very long instruction word] with the multiple cores connected via a synchronous mesh network. The ALU pipeline depth is a relatively modest nine stages and there is some branch prediction and out of order execution but mainly through compiler support, Danilak said. This is to keep the hardware simpler and power consumption lower.
In addition to the conventional data types: the 8-, 16-, and 32-bit integers, the standard IEEE floating-point formats Prodigy also supports the bfloat16 – a truncation of float32 to its first 16 bits. It has 8 bits for exponent and 7 bits for mantissa – first proposed by Intel in its now defunct Nervana processors (see Intel to launch “commercial” Nervana NN processor in 2019). Prodigy also has its own format of 8bit floating point data type with 3 bits of mantissa and 5 bits of exponent. This restricts dynamic range, not necessarily a problem in neural network calculations, but allows Prodify to handle multiple FP8 numbers simultaneously using 32bit registers and data paths for enhanced efficiency. The Prodigy also handles a 4bit integer data type, Danilak said.
Tachyum is also making is TPU [Tachyum Processing Unit] available for license. In the data center the processors will be used for training neural networks but at the edge a licensed core could be included in ASICs to perform inferencing.
Tachyum is including extensive communications links on its chip and the company said that the simulation of multiple cores with DDR4/DDR5 DRAM controllers, PCIE 5.0, 112Gbps SERDES, USB, GPIO, PLLs and various I/Os indicate the die size is going to be within design goals and top-level clock frequency could be better than expected.
Next: Flagship
The 64-core flagship product is expected to outperform the fastest Xeon processors while consuming one tenth of the power on data center workloads. In addition, Prodigy can switch its workload and outperform Nvidia’s fastest GPUs on neural net AI training and inference.
It should be noted that Nvidia has just announced its 7nm A100 GPU, which it claims can unify AI training and inference and boost performance by up to 20x over its predecessors. Nvidia also describes A100 as a universal workload accelerator in that it can also perform data analytics, scientific computing and cloud graphics. The A100 is in full production and shipping to customers worldwide, Nvidia said. Which may explain Tachyum’s desire for haste.
To address the legacy code of data centers in x86 and ARM compiled forms format Prodigy will use software emulation of the x86 and ARMv8 instruction sets. The emulator lets Prodigy run x86 and other binaries at about 50 percent of native speed, which might be seen as a problem as it kills at least part of Prodigy’s claimed advantage over its competition. But Danilak claims that data center purchasers will be savvy enough to realize that this is enough to get them going until they recompile apps into native Prodigy code and enjoy a speed up but more importantly, a power down.
Danilak co-founded Tachyum Inc. in 2016 and the company is headquartered in Santa Clara, California with a major engineering site in Bratislava, Slovak Republic, where Danilak was born. Staffing of about 60 people is roughly split equally. Bratislava is responsible for software and AI developments while Santa Clara is where the chip development engineering resides.
Tachyum announced the raising of $25 million in Series A equity investment in July 2019 led by IPM Group and with the participation of the Slovak government. At the time Tachyum said it would bring its chip to market in 2H20 so clearly there has been some slippage over the last few months.
Danilak agreed that Tachyum will need to raise more money in 2020. The cost of complete mask sets at 7nm alone is of the order of tens of millions of dollars.
Related links and articles:
News articles:
Intel to launch “commercial” Nervana NN processor in 2019
ARM invests in cloud processor hopeful
Funds flood hardware startups, SambaNova raises $250 million
Graphcore nears ‘double-unicorn’ status with extra funding
NovuMind benchmarks tensor processor
Intel pays $2 billion for AI chip firm
