
Apple has introduced a scheme it calls dynamic caching to support raytracing in its latest M3 3nm microprocessors as it sets up an Ultra battle with Intel in the coming months
The dynamic caching in the new GPU microarchitecture allocates the use of local memory in hardware in real time. Rather than allocating the GPU memory at compile time, only the exact amount of memory needed is used for each task. This increases the average utilisation of the GPU, which significantly increases performance for the most demanding applications and games. This is an industry first says Apple and is transparent to developers.
The chips use a unified memory with up to 128Gbytes of storage. Apple has not yet specified the bandwidth to local memory or the latency, although the previous generation has a 400Gbit/s bandwidth.
The GPUs also support hardware-accelerated ray tracing for the first time at Apple. Ray tracing models the properties of light as it interacts with a scene, allowing apps to create extremely realistic and physically accurate images but require larger amounts of memory. The new GPU design also brings hardware-accelerated mesh shading for more visually complex scenes.
The move to a 3nm process technology at TSMC has boosted the performance of the devices with the M3 GPU able to deliver the same performance as M1 using nearly half the power, and up to 65 percent more performance at its peak.
The M3 has 25bn transistors — 5 billion more than M2, with a 10-core GPU and 8-core CPU, with four performance cores and four efficiency cores that are up to 35 percent faster than M1 for CPU performance with up to 24GB of unified memory. This performance lift can mostly be accounted for by the shift for an early 5nm process to 3nm.
The M3 Pro has 37bn transistors and an 18-core GPU with 36GB of unified memory and a 12-core CPU with six performance cores and six efficiency cores.
The M3 Max pushes the transistor count up to 92bn with a 40-core GPU and support for up to 128GB of unified memory allows AI developers to work with even larger transformer models with billions of parameters. The 16-core CPU features 12 performance cores and four efficiency cores that are up to 80 percent faster than M1 Max.
- Looking to Apple’s M2 Ultra chip
- Apple completes its ARM lineup with 5nm, 134billion M2 Ultra
- Apple goes to 2.5D for 114bn transistor M1 Ultra
However this also sets up a showdown with Intel with competing Ultra chips. Apple has previously used the 2.5D packaging technology to produce the M2 Ultra with 134bn transistors using two die and says that its M3 Max is the highest performance chip in the industry.
The anticipated M3 Ultra would have two M3 Max die connected via the UltraFusion interconnect and silicon interposer for 184bn transistors in a package and up to 256Gbytes of local memory.
Intel is planning to launch its Core Ultra chiplet-based processor, previously called Meteor Lake, on December 14th with a new Arc GPU architecture and CPU with power and performance cores as well as significant local memory in the chip package.
The Apple M3 chips are shipping in the latest generation of the iMac desktop and MacBook Pro laptops.
