Faced with a slowing down of traditional markets and Moore’s Law scaling, the semiconductor industry is working hard to reinvent itself, to figure out the needs of new markets such as artificial intelligence, autonomous vehicles, the Internet of Things, and others.
Perhaps the most intriguing of these is artificial intelligence, with compute paradigms that can differ markedly from traditional processor-memory approaches. “For a long time, pattern recognition and cognitive tasks such as recognizing and interpreting images, understanding spoken language, and automatic translation were weak points for computers,” said Damien Querlioz, a French researcher who spoke on “Emerging Device Technologies for Neuromorphic Computing” at the recent International Electron Devices Meeting in San Francisco.
Since about 2012, progress has been accelerating in AI, both during the training and inference stages, but power consumption is still a huge challenge when traditional compute architectures are used. Querlioz, a researcher based at the French national laboratory CNRS, gave a telling example: the famous game of Go played in 2016 between Google’s AlphaGo and Lee Sedol, a world champion at the game. Sedol’s brain consumed about 20 Watts during their contest, while AlphaGo required an estimated >250,000 Watts to keep its CPUs and GPUs humming.
While power improvements have been made since then at Google and elsewhere, the effort to come up with new, less power-hungry devices for neuromorphic computing is intensifying.
Ted Letavic, senior fellow for strategic marketing at GlobalFoundries, said he thinks about AI in stages, a timeline moving from ways to improve conventional compute technologies to radically new devices and architectures that consume much less power. All along the timeline advanced packaging will play a key role.
“AI is upon us now, and we can use existing technology and add derivatives, using DTCO (design technology co-optimization) to optimize down to the bit cell design level,” Letavic said. GF technologists are developing ways to reduce power and boost performance for the 14/12 nm FinFET platform, including dual work function SRAMs, faster and lower power multiply accumulate (MAC) elements, higher bandwidth access to SRAM, and others. The FD-SOI-based FDX processes also consume much less power, especially when back-biasing techniques are deployed. With these technologies in the designer’s toolkit, Letavic said customers can “redesign the elements inherent to AI with a much lower power envelope than if they went right to 7 nm.”
In parallel to these DTCO improvements are the research and development efforts underway throughout the world for embedded memory and in-memory compute solutions based on phase-change memory (PCM), resistive RAM (ReRAM), and spin-torque-transfer magnetic RAM (STT-MRAM), and FeFET. A PCM-based chip, developed at the IBM Almaden Research Center headed up by Jeff Welser, has demonstrated great progress, Querlioz said at the IEDM tutorial session, and STT-MRAM- and ReRAM-based AI processors also show great promise. “We now have a huge potential to re-invent electronics for cognitive-type tasks and pattern recognition,” Querlioz said.
Letavic said the long-range need to reduce power consumption, especially for inference processing, is driving a host of startups to develop new AI solutions, and GF is working closely with several of them, as well as with long-time partners AMD and IBM.
“We can only get so far with DTCO improvements to von Neumann computing. The next step beyond disaggregated logic and memory is to move to compute-in-memory and analog-based computing,” Letavic said. Moreover, the instruction set architectures (ISAs) that have served the industry well for 35 years will need to be supplanted with new software stacks and algorithms. “When we go to domain specific compute, someone has to reinvent the software. IBM has some really good insights about the software stack,” he said.
“Everyone has to take this turn toward AI together. Foundries will go hand-in-hand with lead customers, and we can’t separate algorithms from the technology,” said Letavic, referring to this close cooperation at STCO, or system technology co-optimization. “STCO is a natural extension of DTCO as we move into the fourth era of computing. As we move to domain-specific compute that is a shift we will all take together.”
Packaging to Help Reduce Costs
While silicon advances – including dual work function metals in the gate stack, FD-SOI, and STT-MRAM – will improve performance, Letavic said packaging will play an equally large role, as companies move to link heterogenous devices made with the optimum process for each function. “I think after 20 years of discussion, 2.5D and 3D are going to be mainstream. We will see as much differentiation, if not more, from the packaging as you will from the silicon flows.”
Kevin Krewell, principal analyst at Tirias Research, said work being done with Advanced Micro Devices will give GF an advantage as companies put two or more chiplets in a single package. Earlier, AMD and Intel combined an AMD Radeon graphics processor with an Intel CPU in a single package. Now, AMD is boosting its Epyc server CPU line by using AMD’s Infinity Fabric interconnect technology. The forthcoming “Rome” server processor will feature multiple CPU and cache memory chip cores, linking those 7nm parts to a 14nm chiplet fabbed by GF that provides the I/O links to DRAM and the PCI bus.
By dividing tasks and using the optimum process for each function, chiplets connected over high-speed links will change how processors for several markets are created, Krewell said, noting that Nvidia, Intel and others are supporting high-speed chip-to-chip links.
“Using a mix of process nodes in a chiplet design, I do expect to see more of that. The I/O especially doesn’t scale well to 7 nm, and those functions take up a lot of space, even in 7nm. Sometimes it makes sense to put the I/O functions in an older chip. Historically, PC chip sets were made in an N minus 1 process, as part of a fab utilization strategy. Putting those functions in the right process node that can handle the I/O, where it is not as expensive per transistor, makes a lot of sense,” Krewell said.
Letavic said systems companies are demanding heterogenous integration using various forms of advanced packaging ranging from interposers, vertical through-silicon vias (TSVs), special laminates, fan-outs, and others. The strategy will also provide a boon to photonic connections, as opto-electronics can provide higher bit rates than some electrical connections can support.
Bob O’Donnell, principal analyst at market research firm TECHnalysis, said the chiplet strategy still has a ways to go before industry-wide standards are nailed down. Until then, companies such as AMD and others will use their own internal technologies to link multiple chiplets into SoCs.
“At a certain point, complexity becomes overwhelming and then companies start to look to simplify again. The problem is coming up with a fertile ecosystem among multiple vendors, allow packaging companies to package different parts from multiple companies. Those standards haven’t been nailed down yet.”
O’Donnell said the effort to use the optimum technology for each function is largely motivated by the high-cost of designing and fabbing large SoCs in a 7nm process, for example.
“The basic concept with chiplets, ironically, is that we are taking apart things that had been integrated in the past. The industry was able to integrate systems into fewer components, all the way down to SoCs that had almost everything in a single chip. But now, there is a slowdown because it is just so much harder from a technical perspective. The design costs at 7nm are extremely high, and the challenges from a manufacturing perspective are just crazy.”
Letavic said advanced packaging will provide benefits “at the chip level and at the system level. We are seeing it in the data center already. It is here to stay, and it will just get bigger.”