Xilinx launches its most powerful FPGA card

Xilinx launches its most powerful FPGA card
New Products |
The Xilinx Alveo U55C accelerator card uses the XCU55 UltraScale+ FPGA with 16Gbytes of high speed HBM2 memory for data centres and HPC
By Nick Flaherty


Xilinx has launched its most power powerful FPGA accelerator card for data centres and high performance computing (HPC) systems.

The Alveo U55C uses the 16nm XCU55 UltraScale+ FPGA system-on-chip with 16Gbytes of high speed HBM2 memory and 16 lanes of Gen3 PCI Express or 8 lanes of Gen4. This is a single-slot full height, half length (FHHL) form factor with a low 150W max power, compared to the previous U280 which took up two slots.

The $4,395 card is purpose-built for HPC and big data workloads and works with the new Xilinx RoCE v2-based clustering technology. This RDMA over Converged Ethernet (RoCE) allows hundreds of cards to be combined in a cluster with a standard software API and no custom hardware.

“Scaling out Alveo compute capabilities to target HPC workloads is now easier, more efficient and more powerful than ever,” said Salil Raje, executive vice president and general manager, Data Centre Group at Xilinx. “Architecturally, FPGA-based accelerators like Alveo cards provide the highest performance at the lowest cost for many compute-intensive workloads. By introducing a standards-based methodology that enables the creation of Alveo HPC clusters using a customer’s existing infrastructure and network, we’re delivering those key advantages at massive scale to any data centre. This is a major leap forward for even broader adoption of Alveo and adaptive computing throughout the data centre.”

The RoCE v2 and data centre bridging, coupled with 200 Gbit/s bandwidth, the API-driven clustering solution enables an Alveo network that competes with InfiniBand networks in performance and latency, with no vendor lock-in. MPI integration allows for HPC developers to scale out Alveo data pipelining from the Xilinx Vitis unified software platform that can programme the FPGA without using a high level HDL language.

This allows software developers and data scientists to use the adaptive computing of the FPGA through high-level programmability of both the application and cluster. The major AI frameworks like Pytorch and Tensorflow are supported, as well as high-level programming languages like C, C++ and Python, allowing developers to build domain solutions using specific APIs and libraries, or use Xilinx software development kits, to easily accelerate key HPC workloads within an existing data centre.

Next: Using the U55C FPGA card

CSIRO, Australia’s national research organization along with the world’s largest radio astronomy antenna array, is using the Alveo U55C cards for signal processing in the Square Kilometer Array radio telescope. Deploying the Alveo cards as network-attached accelerators with HBM allows for massive throughput at scale across the HPC signal processing cluster. The Alveo accelerator-based cluster allows CSIRO to tackle the massive compute task of aggregating, filtering, preparing and processing data from 131,000 antennas in real time. The 460Gbit/s of HBM2 bandwidth across the signal processing cluster is served by 420 Alveo U55C cards fully networked together across P4-enabled 100Gbps switches. The Alveo U55C cluster delivers processing performance with overall throughput at 15Tb/s in a compact power and cost efficient footprint. CSIRO is now completing an example Alveo reference design in order to help other radio astronomy or adjacent industries achieve the same success. 

Ansys LS-DYNA crash simulation software is used by nearly every automotive company in the world. The design of safety and structural systems hinges on the performance of models as they mitigate the costs of physical crash testing with computer-aided design finite element method (FEM) simulations. FEM solvers are the primary algorithms driving simulations with hundreds of millions of degrees of freedom, these enormous algorithms can be broken out into more rudimentary solvers like PCG, Sparse matrices, ICCG. By scaling out across many Alveo cards with hyperparallel data pipelining, LS-DYNA can accelerate performance by more than 5X in comparison to x86 CPUs. This results in more work per clock cycle in an Alveo pipeline with LS-DYNA customers benefiting from game changing simulation times.

TigerGraph, provider of a leading graph analytics platform, is using multiple Alveo U55C cards to cluster and accelerate the two most prolific algorithms that drive graph-based recommendation and clustering engines. Graph databases are a disruptive platform for data scientists. Graphs take data from silos and bring focus to the relationships between data. The next frontier for graph is finding those answers in real time. Alveo U55C accelerates the query times and predictions for recommendation engines from minutes down to milliseconds. By using multiple U55C cards to scale up analytics, the superior computational power and memory bandwidth accelerates graph query speeds up to 45X faster compared to CPU-based clusters. The quality of scores is also increases by up to 35 percent, resulting in greater confidence dramatically lowering false positives to low single digits.

The Alveo U55C card is currently available via Xilinx and its distributors. Clustering is available now for private previews, with general availability expected in the second quarter of next year.


Related articles

Other articles on eeNews Europe


Linked Articles
eeNews Europe