Faced with bandwidth issues, networking companies are turning to interposers, HBM2 DRAM and leading-edge ASIC technology.
When the big networking companies began developing a new class of terabit routers, they reached what Bob Wheeler, networking analyst at The Linley Group, calls “the breaking point.”
These companies — Cisco, Juniper, Nokia, and others — had been watching the pin counts on their router ASICs “explode” as they worked to get enough bandwidth from commodity DDR DRAMs, mounted on laminate printed circuit boards.
Networking customers are now able to use a new 14nm ASIC (FX-14™) solution from GLOBALFOUNDRIES® which offers connections to High-Bandwidth Memory (HBM2) mounted on a silicon interposer. Rambus Inc. (Sunnyvale) and GF engineers cooperated to bring a Rambus PHY to the FX-14 ASIC platform that provides an impressive 2 terabits per second (Tb/s) of bandwidth.
“This is a solution to a problem we’ve seen coming, which is the inability of external memory to keep up with the bandwidth requirements on the buffers of these ASICs,” Wheeler said. “People tried to use commodity DRAM as long as they could, but because of the pin count explosion, that reached a breaking point.”
The market for communications ASICs is roughly a billion dollars, Wheeler said, noting that routers are expensive systems that can support the cost of an interposer-based (2.5D) solution to get the bandwidth required for high-speed packet buffering.
For the incumbent — DDR-type DRAM running on a laminate PCB — Wheeler said “the big problem from the ASIC perspective was the pin count. You could end up with 2,000-plus-pin devices. The beauty of HBM is that it has a wide interface and stays in the package, so you don’t have to go to a serial interface.”
Markets Beyond Networking?
Depending how well costs can be improved, the 2.5D (interposer-based) solutions could find other applications in data processing, high-end graphics, self-driving cars, artificial intelligence, and other bandwidth-hungry solutions, said Dave McCann, vice president of packaging R&D and business technical operations at GLOBALFOUNDRIES.
Moving to an interposer brings an enormous improvement in the routing density. For laminate PCB-based solutions, lines and spaces were at 12 microns, but that wiring density often was not achieved because the vertical 50-micron vias between the layers had to be avoided, or routed around, wasting a huge amount of space. With a silicon interposer, the lines and spaces are essentially the same as the back-end of a logic chip, currently about 0.8 microns, said Walter Kocon, a senior manager of technology development at GF.
Using logic-like wiring for routing between the PHY and the HBM2 memory on an interposer involves using fab-level tools, including lithography. Because the interposers are much larger than conventional chips, multiple fields must be stitched together. But Kocon said today’s steppers are very good at switching between reticles, and progress is being made in creating ever-larger interposers.
These fab processing tools are more expensive than conventional laminate-processing tools, but the payback is a massive number of on-chip I/Os (roughly 1,700) between the PHY and the HBM2 memory. And as McCann noted, by keeping the traces very short, power consumption is kept under control compared with the laminate-based serial interfaces used to date.
No Keep Out Area
“With vias enabled by wafer fab technology (<1 micron) in silicon interposers, multiple layers of 0.8-micron lines and spaces can be utilized, because there is essentially no keep out area for the vias. That compares with the conventional PCBs, where routing had to come down from the ASIC and over to the DIMM card, consuming both power and time,” McCann said. With interposer-based interconnect being orders of magnitude smaller, and devices only hundreds of microns apart, the massively parallel routing density supports multi-terabit levels of bandwidth.
But there are manufacturing challenges associated with interposers. “These are big interposers and big ASICs. First, we have to create an interface between the ASIC and the interposer. Matched expansion properties of the ASIC and silicon interposer are one key to a non-stressed interface. Design and assembly processes that control warpage are critical. Then spreading the stress between the interposer and the laminate below is critical, because there is a big mismatch at that interface,” McCann said.
Controlling warpage is key to getting good interconnect yields with 2.5D. With very close spacing between the interposer and the ASIC and a bump height of about 70 microns. “This means there is very little tolerance for warpage,” McCann said. Solder that is pushed together, or pulled in the opposite direction, creates connection issues. “We need manufacturing processes to keep all of these surfaces flat, and we believe, along with our OSAT partners, that we can do that,” McCann said.
The PHY was another technical challenge, one that Rambus tackled along with GF. Frank Ferro, senior director of product marketing at Rambus, explained that an HBM2 PHY is a mixed signal function that must be designed very specifically to each process node.
“We do a significant of amount channel modeling and then designed the PHY to meet those channel requirements. And it was a collaboration. We had many discussions over the whole process to ensure a robust design. From Day One, it worked, and that is a strong testament to the Rambus (modeling and signal integrity) tools and the engineers who have a history of designing these PHYs.”
DDR DRAMs support 72 bits of bandwidth, compared with 1,024 for HBM2. With 1,024 bits, controlling the signal integrity is challenging, and Ferro tipped his cap to the GF engineers, many of whom brought experience with high-speed signaling from their days at IBM’s Microelectronics Group.
Asked if he thought 2.5D solutions would spread throughout the high-performance part of the industry, Ferro said it depends on manufacturing yields, and bringing down the cost of the HBM2 DRAM. “2.5D needs to be proven out with high-volume manufacturing. It is a fairly big piece of silicon, and you have to really control warpage.”
Tad Wilder, a principal member of the technical staff at GF, said the 2 terabits-per-second of bandwidth “is quite an impressive amount of bandwidth for a single core. And with the ability to place up to four HBM2 PHYs on a chip, this gives ASIC designers an unprecedented eight terabits-per-second of low power, low latency DRAM access to work with.” He added that the 14nm HBM PHY “Is the largest core we’ve produced for an ASIC, with 15,000 internal pins talking to the Memory Controller and 1,700 external pins talking to the base die of the DRAM stack across the interposer.”
Each DRAM stack contains a base die, which communicates with the ASIC’s HBM2 PHY and up to eight stacked DRAM die above, through thousands of vertical Through Silicon Vias (TSVs). The total memory per HBM DRAM stack is up to 32GB. To mitigate the noise of more than 1,000 I/O possibly switching, the ASIC HBM2 PHY can take advantage of the complete independence of the eight 128 bit channels by skewing the timing of each channel with respect to another.
Linley Group analyst Wheeler sees momentum building for the HBM2 standard. While Hynix was the initial backer, Wheeler said Samsung has come on strong with its own HBM2 parts. Because so much of the total solution cost is wrapped up in the cost of the HBM2 memories, competition among multiple HBM2 vendors will help drive volumes, reduce costs and improve performance.
Asked if he thought 2.5D solutions would proliferate, McCann said “it is a really great technology that has come of age, with significant revenues. The question is: can we drive down the cost to get it to the next level of volume?”