Quadric Announces Llama2 LLM Support Immediately Available for Chimera GPNPUs

First Processor IP Core or Consumer Device Silicon Product Known to Run Llama2

BURLINGAME, Calif.--(BUSINESS WIRE)--Quadric^® today announced that support for the Llama 2 large language model (LLM) is immediately available on its Chimera^TM general purpose neural processing unit (GPNPU) intellectual property (IP) core. Unlike other IP and semiconductor application processor suppliers, Quadric was able to add this support with a simple software port with no hardware changes, so existing designs can immediately run this model. Other suppliers have announced plans to change their hardware to offer support in 2024 or beyond.

Quadric Beats Other Timelines for Support

Meta introduced the Llama2 LLM for generative artificial intelligence (AI) on July 18 of this year. Coincident with the unveiling of Llama2, Meta and Qualcomm announced a partnership to port Llama2 to future Qualcomm Snapdragon chips expected in 2024 smartphones and laptops. Use of LLMs had previously been considered to only be viable in cloud datacenters. Meta’s announcement set off a flurry of activity from chip and IP providers to race to capture market attention – and investment dollars – for on-device LLM implementations.

In addition to Qualcomm’s announced timeline of greater than six months to port Llama2, fellow silicon company Mediatek announced four weeks later that it, too, was working on Llama2 support with an expectation that 2024 mobile phones powered by Mediatek flagship silicon would support Llama2. Also in August, IP licensor Ceva announced that a redesigned IP core would soon be delivered to its customers that could support LLMs. Cadence similarly unveiled new IP in September to support LLMs. Those Ceva and Cadance customers will need to license new cores and design new chips for delivery sometime in 2025 or 2026. Notably, all four chip and IP announcements directly or indirectly imply that silicon respins are required to gain this new capability.

“Why would these titans of the semiconductor and IP worlds need to wait until 2024 or 2025 or beyond to support today’s newest, hottest ML model?” noted Quadric CMO, Steve Roddy. “Why would a new machine learning model force a silicon respin – typically costing in excess of $100M – to be able to run new ML code? SoCs with Quadric’s Chimera GPNPU are ready to run Llama2 today!”

Why Not Just Port to the CPU?

Existing silicon chips for consumer devices all have leading-edge applications-class CPUs. Many have eight ostensibly high-performance CPUs. Why wouldn’t SoC vendors simply port Llama2 to the CPU, which can run any ML model? Clearly either the performance won’t meet consumer expectations, or the power consumed will kill device battery life. Hence the need to respin the ML accelerator to run LLMs. Or why not choose a fully programmable GPU from a leading ML vendor such as Nvidia? Perhaps the 10 Watt power dissipation and the need for a cooling fan prohibits use in a smartphone?

Like those easily programmable but power-hungry CPU and GPU solutions, Quadric’s GPNPU is also easily programmable. But only Quadric Chimera GPNPUs deliver programmability with the power-performance profile needed in today’s portable consumer devices.

Quadric’s processor architecture uniquely combines the best attributes of C++ programmability – the ability to run any ML model – with the performance efficiency of NPU accelerators found in many first-generation SoCs in the market today. But unlike inflexible accelerators that force silicon respins when complex new models such as Llama2 are invented, Chimera cores are fully programmable. Chimera GPNPUs run any model. All of the model – all of the layers. No removal of problematic layers. No partitioning. No forcing the data scientist to convert convolutions to adhere to the limited subset of conv types supported in hardware. Any model, any network, any operator.

Fast Porting, Low Effort

Quadric’s team invested a total of 13 engineer-weeks over a total of 4 elapsed weeks to port an INT8 quantized version of Llama2 to the Chimera platform and tune performance. Two new ML operator layers, and two variants of existing operator kernels, were coded in C++ by the Quadric applications team to get the model running. A further two engineer-weeks investment ironed out corner case performance and accuracy tweaks to ensure operation across all three sizes of the Chimera QB series processors (1 TOPs, 4 TOPs and 16 TOPs variants). Other machine learning inference solution providers with much larger teams are still struggling to meet six-month porting targets!

Impressive Performance

Quadric’s Chimera QB4 4 TOPs GPNPU running Llama2 15M delivers 225 Token/Sec/Watt efficiency in a 5nm technology, while occupying only 2.5 mm2.

For comparison, an M1 Pro laptop’s highest performance single CPU delivers only 11 Token/Sec/W running the same Int8 version of Llama2. (Int8 model running ONNX runtime). Quadric delivers 20X improvement in ML inference per Watt compared to a leading-edge CPU!

Visit Quadric.io to learn more.

About Quadric

Quadric Inc. is the leading licensor of general-purpose neural processor IP (GPNPU) that runs both machine learning inference workloads and classic DSP and control algorithms. Quadric’s unified hardware and software architecture is optimized for on-device ML inference. Learn more at www.quadric.io.

Contacts

Steve Roddy – Chief Marketing Officer – steve@quadric.io – 925-321-6744
Paula Jones – paula@paulaejones.com – 650-279-8997

Industry:

More News From Quadric Inc.

Kyocera Licenses Quadric’s Chimera GPNPU AI Processor IP

BURLINGAME, Calif.--(BUSINESS WIRE)--Quadric® today announced that Kyocera Document Solutions Inc. (hereinafter: Kyocera) has licensed the Chimera™ general purpose neural processor (GPNPU) intellectual property (IP) core for use in next generation office automation system on-chip (SoC) designs. “Kyocera wanted to build-in the most flexible and powerful AI processing solution into our market-leading document preparation and printing equipment,” said Michihiro Okada, Deputy Senior General Manager...

Quadric’s 3rd Generation Chimera GPNPU Product Family Expands to 864 TOPs, Adds Automotive-Grade Safety Enhanced Versions

BURLINGAME, Calif.--(BUSINESS WIRE)--Quadric® today introduced the Chimera™ QC Series family of general-purpose neural processors (GPNPUs), a semiconductor intellectual property (IP) offering that blends the machine learning (ML) performance characteristics of a neural processing accelerator with the full C++ programmability of a modern digital signal processor (DSP). The third-generation implementation of the Chimera architecture, the QC family includes both single core and multicore cluster o...

Quadric Announces Vision Transformer Support for Chimera GPNPUs

BURLINGAME, Calif.--(BUSINESS WIRE)--Quadric® today announced that its Chimera™ general purpose neural processing unit (GPNPU) processor intellectual property (IP) supports vision transformer (ViT) machine learning (ML) inference models. These newer ViT models are not supported by almost all NPUs currently in production, making it impractical to run ViT on many existing edge AI system-on-chip (SoC) devices. ViT models are the latest state-of-the-art ML models for image and vision processing in...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Quadric Announces Llama2 LLM Support Immediately Available for Chimera GPNPUs

Contacts

Quadric Inc.

Contacts

Kyocera Licenses Quadric’s Chimera GPNPU AI Processor IP

Quadric’s 3rd Generation Chimera GPNPU Product Family Expands to 864 TOPs, Adds Automotive-Grade Safety Enhanced Versions

Quadric Announces Vision Transformer Support for Chimera GPNPUs

Quadric Inc.

Contacts