Narrator:
The University of Texas at Austin’s Texas Advanced Computing Center, or TACC, designs and operates some of the world’s most powerful computing systems. Their mission? Deliver advanced computing technologies that drive discoveries benefiting science and society.
Over 10 years ago, TACC recognized some important trends. That processor power and its heat would grow dramatically. That air cooling would soon hit its limitations, and that an alternative technology would be needed. They also saw that sustainability would be critical to the future of data centers. So they began testing liquid cooling, including GRC’s single-phase immersion systems. That testing was very successful, and subsequently led to three GRC ICEraQ deployments to support TACC’s world-class supercomputers, including their newest immersion-cooled Lonestar6 running Dell Technologies’ most powerful servers with AMD 64-core processors.
Watch now as we chat with TACC to discuss their experience with immersion, and gain insights on the future of data center cooling.
Bérengère Anthony:
Hi, I’m Bérengère Anthony with GRC Product Marketing. I’m joined today by Dan Stanzione. Dan is the Associate Vice President for Research at the University of Texas at Austin, and the Executive Director of TACC, the Texas Advanced Computing Center. GRC has had the privilege of working with TACC for more than a decade, starting with the deployment of our first prototype system back in 2009.
TACC’s single-phase liquid immersion cooling deployments with GRC continued from there with the Maverick2 proof of concept deployment in 2012, Frontera compute cluster deployment in 2019, and most recently with their Lonestar6 supercomputer, which includes GRC’s ICEraQ Series 10 Quad system.
Dan, thank you so much for joining us today.
Dan Stanzione:
It’s great to be here. Thanks very much.
Bérengère Anthony:
Can you give us a brief overview of what TACC does and the type of projects you and your team support?
Dan Stanzione:
Our job is to provide large scale advanced computing resources to look at large scale simulation, large scale AI, large scale data analysis. And we do this with support from the National Science Foundation. We also have contributions from Texas Tech and Texas A&M and the University of North Texas. And a lot of our industrial customers end up on there too. So we actually support users all around the country and around the world.
Bérengère Anthony:
As I mentioned, TACC deployed their Lonestar6 supercomputing system earlier this year. Can you tell us a bit about the system and the work it’s used for?
Dan Stanzione:
Yeah. So Lonestar6 is obviously the sixth in a long line of supercomputers we’ve had that mostly focus on our users here in Texas. And we use it for a broad array of science from astronomy problems like processing the James Webb Telescope data, to electronics design, aircraft design. Did a lot of COVID work over the last couple of years. It really gets usage from all sorts of things.
Bérengère Anthony:
There’s no question that the planning and design process for a system like Lonestar6 is exhaustive. Can you share with us why immersion was the ideal choice here?
Dan Stanzione:
It’s that when we do a big parallel simulation with one of these machines, we really want to use the whole thing like one computer. We’d have to run the air at hurricane speeds across them to try and keep that cool. And the only option other than to do that is to just slow the chips down so they use less power. And then we’d get less per hour or less per year out of that machine.
You have a finite life. We’re putting millions of dollars into the compute side. We want to squeeze every bit of performance we can out of it. And it’s really a great solution both from an environmental perspective to give us really efficient cooling, but also being able to run these very high-power chips at very high density really helps us with that.
Bérengère Anthony:
Lonestar6 is a hybrid air- and immersion-cooled computing cluster. How does TACC distribute the compute load across the air- and liquid-cooled components?
Dan Stanzione:
The core dense compute is all in the oil, but we needed some bigger nodes for storage and nodes to put GPUs in and things like that. So we scattered those in air around the tanks. But really we put everything in the tanks that we could because again, that’s where we can get the highest performance at the highest efficiency.
Bérengère Anthony:
So Dan, with the latest deployment being your third GRC immersion-cooled installation, what do you see as unique about the Series 10?
Dan Stanzione:
This new system has a lot of advantages. It’s a lot more compact. We can push them right together. We don’t have to keep an aisle between them for hot air and cold air on either side. And we have the inner containment tank so we don’t have to have containment out on the floor space anymore. That also means we use a lot less fluid. It gives us more space to run power cables and network cables, and that’s been great because that’s always at a premium with running multiple networks into these things.
And then we can put a lot more nodes in a rack, right? We’re running at 84 compute nodes in a rack and the immersion-cooled racks. The air-cooled side, we’re only running at 24 nodes per rack because we just don’t have the airflow or any other cooling solution to get to that density. And they also just look better. I mean, as they’ve evolved and become a little less blocky, and now we have this sleek, sort of spaceship look for the whole thing.
Bérengère Anthony:
So let’s look into the future for a moment. Processors continue to use more power and get hotter, and ESG and sustainability are top of mind for data center operators. With that in mind, what’s next for TACC and immersion?
Dan Stanzione:
The power density per chip’s going to keep going up. That’s the only way we’re going to get performance. Air-cooled is not something we’re probably ever going to go back to. If you use an air-cooled solution, you’re putting more and more power into the fans that we get to completely take out in the immersion-cooled solutions.
And the power cost is significant. I mean, there’s the environmental cost of it, and also just the dollar cost of putting that much energy in. So every bit we can save is worth doing. But yeah, one way or another, we need to bring liquid right to the chips. And we’re going to keep doing that.
Bérengère Anthony:
Did TACC need to make any change to the existing data center infrastructure before deploying the system, such as removing or reinforcing raised floors, adding power or cooling, or so on?
Dan Stanzione:
No. We actually run it on the exact same raised floor we run everything else on, because we already had for a previous system or we had in-row coolers. We already had piping under the floor, which gives us a nice place to just hook the supply and return lines to the heat exchangers. And we’re using the same kind of circuits that we had for other cabinets that were there. So we didn’t have to make many power changes. It’s really just a drop-in replacement for us.
Bérengère Anthony:
Yeah. It looks really sharp in there.
Dan Stanzione:
Thank you.
Bérengère Anthony:
How does immersing servers in coolant affect the equipment? Does it limit your choices of systems, and what impact does immersion have on the lifecycle of these assets?
Dan Stanzione:
It feels like the failure rate’s actually a lot lower. I think because one, it’s constant temperature, less moving parts, and also just the oil being a great electrical insulator. We’ve only had the Lonestar6 components in for about a year now, so it’s hard to say too much about lifecycle. But just anecdotally, from the other systems we’ve had … In fact, for a while, we’d pull the motherboards out about one a year, and send them off to Intel to test components. We saw no real degradation of any of the components on there.
So at this point, we can get servers from just about anybody with the fans optional. It hasn’t really limited us in choice of equipment in any way. In this case, everything’s designed to just sit immersed in there. So that becomes a non-factor for us. So reliability’s been great.
Bérengère Anthony:
What about servicing the immersed servers? Does immersion cooling present any unique challenges?
Dan Stanzione:
I mean, you have to let the nodes drip a little before you get into them, which we don’t traditionally have to do, but it’s really not that much different. One of the things I like about the new design is we have some extra cable space on both sides. So we don’t have to take down any of the adjacent servers to do service, which is nice.
And we’re using the four and 2U blade factor on these, so the individual nodes are pretty small. So you just reach in, lift one straight out, let it drip for a second. You can service it just like any other system.
Bérengère Anthony:
Dan, I’ve enjoyed our conversation and appreciate you taking the time today to share your thoughts on the new Lonestar6 from planning to deployment. Thank you so much.
Dan Stanzione:
Thank you. It was fun.
Narrator:
Supercomputing operations or not, data centers worldwide are looking for cost-effective, efficient, and sustainability-focused cooling solutions. Immersion delivers on all counts. Contact us to explore the benefits of single-phase immersion with a GRC data center cooling expert. GRC, redefining the efficiency and sustainability of data center cooling.