LAS VEGAS--(BUSINESS WIRE)--Tachyum® today added to its extensive white paper library with the publication of an overview examining its Reliability, Availability and Serviceability (RAS) strategy, including a detailed look at the key RAS features being built into Prodigy®, the world’s first Universal Processor, which will help satisfy the demands of today’s data centers.
RAS is a set of related attributes that must be considered when designing, manufacturing, purchasing and utilizing a computer product or component. Designed from the ground up, Tachyum’s comprehensive RAS strategy encompasses multiple facets at the silicon, platform and system levels to ensure Prodigy deployments provide high performance along with high reliability and availability at all levels.
Prodigy’s RAS strategy is comprised of Device RAS, which includes advanced error detection and correction in all functional blocks; System RAS, which includes critical features such as machine check and recovery working with the Linux EDAC driver; and Platform RAS, encompassing features such as redundant power supplies and ease of serviceability.
Prodigy’s memory hierarchy provides robust error detection and correction for all memory subsystems. Both the L1 I-Cache and D-Cache are protected with SECDED (single error correction double error detection), and the L2/L3 block utilizes DECTED (double error correction triple error detection), exceeding Arm’s current parity offerings. In addition to Prodigy’s memory hierarchy, other functional blocks integrate significant amounts of memory that require protection to ensure Prodigy runs error-free and maintains high RAS standards.
Additional RAS features built into Prodigy include:
- Error correcting codes (ECC) scrubbing and data poisoning
- Watchdog timer
- RAID for booting
- PCIe 5.0 RAS features
Tachyum has incorporated redundant power supply unit (PSU) fans, network interface card (NIC) and efficient maintenance into its Prodigy evaluation platform. When launched, Tachyum will offer data center family SKUs with a 5-year warranty/support period and 10-year warranty/support periods for enterprise/telco family SKUs.
"A comprehensive approach to RAS becomes increasingly important as process shrinks drive higher density for components and platforms," said Dr. Radoslav Danilak, founder and CEO of Tachyum. “In addition to the ever-increasing density, manufacturing chips on shrinking process nodes increases the risk of soft errors. Prodigy addresses these increased risks with a thorough approach to device and system reliability, ensuring that Prodigy-based systems function with maximum uptime to address the performance and demands of today’s data centers.”
As part of their recent keynote for GTC 24, Nvidia stressed the importance of RAS in their latest product introduction, spending valuable keynote time to include RAS features as part of their new products and features overview, and the importance of RAS was highlighted as they showed a large potential data center deployment that would provide 645 EF of AI performance.
The Tachyum RAS paper complements an earlier Tachyum white paper which showcased a large Prodigy lead customer data center designed to run 8,000 EF of AI performance where RAS will be a critical component.
As a Universal Processor offering industry-leading performance for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.
Those interested in learning more about Tachyum’s RAS strategy and how it extends reliability, availability and serviceability in the data center can download the full white paper at https://www.tachyum.com/resources/whitepapers/2024/03/26/tachyum-prodigy-ras-features/.
Follow Tachyum
https://twitter.com/tachyum
https://www.linkedin.com/company/tachyum
https://www.facebook.com/Tachyum/
About Tachyum
Tachyum is transforming the economics of AI, HPC, public and private cloud workloads with Prodigy, the world’s first Universal Processor. Prodigy unifies the functionality of a CPU, a GPU, and a TPU in a single processor to deliver industry-leading performance, cost and power efficiency for both specialty and general-purpose computing. As global data center emissions continue to contribute to a changing climate, with projections of their consuming 10 percent of the world’s electricity by 2030, the ultra-low power Prodigy is positioned to help balance the world’s appetite for computing at a lower environmental cost. Tachyum recently received a major purchase order from a US company to build a large-scale system that can deliver more than 50 exaflops performance, which will exponentially exceed the computational capabilities of the fastest inference or generative AI supercomputers available anywhere in the world today. When complete in 2025, the Prodigy-powered system will deliver a 25x multiplier vs. the world’s fastest conventional supercomputer – built just this year – and will achieve AI capabilities 25,000x larger than models for ChatGPT4. Tachyum has offices in the United States and Slovakia. For more information, visit https://www.tachyum.com/.