The new Mozart AI chip by SimpleMachines, Inc., is a new first-of-its-kind high-performance processor for artificial intelligence applications. Traditional chipmakers struggle to embrace the challenges presented by the rapidly evolving AI software landscape. SimpleMachines has emerged to address that core issue.
Mozart was designed to future-proof AI software evolution at the silicon level, by introducing a radically different paradigm.
SimpleMachines has introduced a new type of AI chip paradigm the company calls “composable computing,” addressing the fact that AI and machine learning algorithms are constantly evolving, making it difficult for chip designers to target their designs to optimize software that is intensely shifting. So fast is algorithms in AI that purpose-built custom AI chips are often obsolete on arrival because better AI algorithms have already been written.
SimpleMachines’ Mozart AI chip is a big departure from the current AI/ML chip options, including Apple’s Neural Engine silicon. It also boasts 35 8-bit TOPS at just 4 watts, compared to Qualcomm’s new Snapdragon 888 processor with its 6th generation AI Engine cranking out 26 8-bit TOPS. The chip sits on a PCIe card. Theoretically, this card can eventually enter Windows or even Mac Pro workstations, but today it assumes Linux backend servers in the cloud.
At the same time, general CPUs and GPUs can take on AI and machine learning software applications but are highly inefficient. Think for example about why Apple has developed a “neural engine” in their own chips, as a way to specifically address machine learning applications for AI.
What the AI industry has been needing is a new paradigm that offers the performance benefits of custom silicon optimization and while offering the flexibility of general CPU and GPU processors. This is where Mozart comes in.
SimpleMachines’ breakthrough innovation is in the discovery of four “course-grained behaviors” that compose any algorithm. By ignoring application semantics at the silicon level and instead focus on extracting behaviors, SimpleMachines’ Mozart chip compiler can compile arbitrary programs into these four course-grained behaviors, embedded into TensorFlow.
SimpleMachines’s AI chip with its composable behavior execution units moves source code through SimpleMachines’ behavior decomposition compiler and sorts machine code into four algorithmic behavior execution units. The resulting performance is application compute at 85 percent with minimal overhead at 15 percent, a near inverse of how the same code would work over an Intel Skylake CPU.
“The chip’s design can support very large models today and is capable of running up to 64 different models simultaneously,” said Greg Wright, SMI’s Chief Architect. “Our next-generation 7-nanometer design is expected to be ready to sample by the end of 2021 and will be 20x faster on a diverse set of workloads than current chips.”
“SimpleMachines’s solution is a radically new software-centric approach that deploys a programmable platform with a breakthrough software stack and compiler that enables the programmer to easily optimize the hardware on the fly and get the performance of custom silicon with a platform that supports hundreds of different use cases,” Wright said.
SimpleMachines, Inc. (SMI) is led by its founder, CEO, and CTO, Karu Sankaralingam, who is also a computer science professor at the University of Wisconsin. SMI’s chip team includes leading scientists and industry heavyweights formerly of Qualcomm, Intel, and Sun Microsystems.
“As fast, flexible computing becomes more accessible, AI will be used by more industries for more applications more frequently, so chip design must evolve accordingly,” Sankaralingam said. “We are disrupting the next wave of computing with our breakthrough technology and are excited about the market opportunity, especially for AI chips along the power spectrum.”
Chip Details and Applications
The chip, Mozart, is currently a 16nm design that utilizes HBM2 memory and is sampling as a standard PCIe card. It is manufactured by Taiwan Semiconductor Manufacturing Company (TSMC). The chip’s software interface includes direct support for TensorFlow as well as APIs for C/C++ and Python. Future versions of the chip will leverage its unique architecture to scale up and down the thermal power spectrum, from enterprise-class high-performance systems to 5-watt IoT devices.
Karu Sankaralingam told Architosh that the 16nm design helps cut costs at this stage of the company and product but future versions of the chip will move to more advanced and smaller processes, like the 7nm version due end of 2021.
Mozart’s architecture leverages the concept of Composable Computing, which abstracts any software application into a small number of defined behaviors. SMI’s novel compiler integrates into the backend of standard AI frameworks like TensorFlow to translate those programs and reconfigures the hardware on the fly to result in a chip that behaves as if it were originally designed for that application.
SMI is initially targeting companies in the public datacenter, network security, finance, and insurance industries for its Mozart Platform with plans to disrupt the edge and mobile device markets in future product generations. According to Allied Market Research, the global AI chip market will reach $91 billion by 2025, with growth rates of 45% a year until then. Market drivers include a surge in demand for smart homes and smart cities, more investment in AI startups, and the rise of smart robots.
To learn more, visit them online here.
Architosh Analysis and Commentary
We had a chance to speak with Karu Sankaralingam just before this unveiling. The conversation was fascinating and his new company is offering an exciting direction for AI/ML chip design and application optimization. A very good AI chip roundup article over at Forbes notes that SimpleMachines is angling for the “inference market” in AI. It is not the only AI chip company taking this direction, but its first chip, Mozart (and at 16nm) at 35 8-bit TOPS (which is a standardized performance metric) places it near the top of the heap. Of course, there is debate about TOPS as a proxy for AI chip application performance, with the same pitfalls that Intel presented with higher MHz back in the 1990s. There are multiple different and more meaningful benchmarks to test SimpleMachines’s new Mozart chip. That is for another day and article. More to come!
About SimpleMachines, Inc.
Founded in 2017, SimpleMachines, Inc. (SMI) is an AI-focused semiconductor startup that is disrupting the next wave of computing. The company’s flagship technology, Mozart Platform, is a first-of-its-kind, easily programmable high-performance silicon chip that can run on various classes of AI algorithms. SMI’s first-generation chip – available via a PCIe card called Accelerando, or Symphony Cloud Service – is designed to provide high-performance execution of software being developed for AI, machine learning, and big data analytics. The product roadmap includes a second and third-generation chip design that will target the large and rapidly growing mobile and edge computing markets. For more information, please visit https://www.simplemachines.ai/