Gigabyte G292-Z45 GPU Server ReviewAugust 29, 2022
The Gigabyte G292-Z45 GPU server is compatible with AMD’s 3rd generation AMD EPYC Milan processors and AMD 3D vCache processors too. It’s a little different than the G291-Z20 and the G292-Z22 we reviewed a while back, both of which support only a single AMD processor. However, they do all use the same form factor.
With two AMD EPYC Gen 3 processors, this system can support up to 128 physical cores with top-of-the-line processors and up to 4TB of memory. At only 2U it also supports an impressive 8x double-wide GPUs for use with AI, AI Training and Inference plus high-performance computing applications. 8x drive bays along the front of the chassis provide the CPUs and GPUs with close access to large data sets. With supply issues causing major strain on the microchip market, Gigabyte has still been able to deliver! We can’t say that for many of the other major manufacturers. If you would like to see a close single-socket sibling, then click here for the Gigabyte G292-Z22 also with a 3rd Generation AMD EPYC processor.
This design is clearly working for Gigabyte and is found in a bunch of Gigabyte systems. Again, we have two large fans right and left with 8x SAS SATA storage bays in the middle, and a control panel on the left. The control panel has an On/Off button and ID button both with LEDs and a column of LED lights for LAN 1 & 2 plus a hard disk drive status LED. The drive trays also have LED status lights.
On the back of the system, two large fans on the right and left with power supply units on the bottom, then on top a VGA port, dedicated RJ45 management port, ID button with LED, non-maskable interrupt button, with reset button on top, and then some LEDs again for LAN 2 and LAN 1 on the other side of the 1Gb Ethernet ports with dual USB 3.1 ports beside them. Above that are 2x low profile PCIe slots.
Popping the lid off you can see that dedicated management port provides access to the baseboard management controller Aspeed AST2500 module for remote and at-chassis management of the system. There are several management options available for this system not to mention third-party possibilities given it supports AMI MegaRAC SP-X software and firmware. It’s compatible across a range of manufacturers too and that’s all you really need to know about that.
There’s a whole list of other features offered by Gigabyte’s Management Console and includes this list of capabilities to help you manage your system. Gigabyte’s management Console is really just for this one server, but there is also Gigabyte Server Management for multiple servers and is supported by Windows and Linux plus compatible with IPMI 2.0 and Redfish. You get more functionality with Gigabyte Server Management, and did we mention it’s free? The Gigabyte G292-Z45 is also compatible with all the major operating systems and including VMware and Citrix for a hypervisor. What is really quite interesting is the Nvidia AI Enterprise software for use with the A100 GPUs. It’s specifically for AI and data analytics with proven compatibility with VMware and RedHat so you can get up and running quick!
Inside the Gigabyte G292-Z45 case, starting from the front of the chassis you can see those two large fans offer direct cooling for the GPU cages to either side. Two more fans positioned behind the HD backplane deliver fresh air right down the middle for the CPUs and memory modules. Given the 8x GPUs supported on this system cooling is essential to performance and Gigabyte has a very clever design with those big fans on the front driving air over the front GPU cages each of which can be outfitted with 2x double-wide GPUs. The GPU-scented air from the first 4x GPUs is diverted into the middle of the chassis to merge with the DRAM and CPU scented air, then over the PCIe slots and out the back.
Now those last GPU cages to either side in back, again each with two GPUs each, never get any of the pre-heated air from the first set of GPU cages, which are blocked off. Instead, huge fans in back pull fresh air in from perforated slots on the side, which is then pulled over the GPUs and sucked out the back of the chassis by the two big fans to either side.
The backplane supports SAS at 12Gb/s and SATA at 6Gb/s, only. If you want to install SAS you will need a SAS HD/RAID card. That card will go into one of the low-profile slots at the back of the chassis.
There are no M.2 slots or SATA DOM slots so an add-on card would be installed if you need another option to boot the system. Both low-profile slots have a x16 physical slot length but one is actually a x8 PCIe 4.0 link and the other has a x16 PCIe 4.0 link. This is a dual-root system meaning half the GPUs are connected to one CPU and the other to the second CPU. All slots on this system are PCIe 4.0 including those GPU slots.
The Gigabyte G292-Z45 is designed for AI, and AI Training and Inference, plus Machine learning and High-Performance computing. For those types of applications, you might consider installing an Nvidia A100 40Gig GPU offering simply the best performance and enterprise-ready software for AI. These cards have a TDP of 250W each, so 8×250 that’s 2000W right there. You can see how those 2200W PSUs are not redundant, with both providing some of the juice to power this mini-titan.
The A100 features Ampere architecture and offers significantly more performance overall than the previous generation. 20X the performance to be specific. Not only that but it can be partitioned into GPU instances. The only other card mentioned in their QVL report, which is just another acronym for “Qualified Vendor List,” is the AMD M100 with CDNA. CDNA is what AMD calls its GPU architecture and is an analog to Nvidia’s Ampere architecture. Highly likely there will be more cards supported on this system. Like the V100, A2, A10, T4 and RTX 6000 for Nvidia, and the M150 16GB and 32GB on AMD’s side. Again, cooling is an important consideration on the Gigabyte G292-Z45 even with the clear steps taken to account for any thermal buildup. That may be why the 80Gig version of the A100 is not certified on this system as it has a TDP of 300W and would pull an additional 400W of power, compared to only 250W on the 40Gig unit.
Third-generation AMD EPYC processors provide a PCIe 4.0 bus. With both processors installed you get 128 lanes. Yes, each CPU does have 128 PCIe 4.0 lanes but with 2x CPUs you still only get 128 lanes total. Even with 3rd gen Intel Xeon Scalable processors named after some LAKE you only get 80 PCIe 4.0 lanes in a dual processor configuration. Using EPYCs, you have 48 more PCIe lanes than an Intel system. Instead of a SAS controller in back you could also install some 200Gb per port high-speed Mellanox I/O cards. Since each slot is controlled by one of the CPUs it would also enable faster Remote Direct Memory Access of GPU memory and mitigate any latency since not all GPUs go through the same CPU. They still need to talk to each other through AMD’s infinity fabric.
Back to the CPUs… each supports 8x memory module slots for a total of 16x active memory module slots with both processors installed. Unsurprisingly given the density of this system, there is a thermal design point limit on these CPUs of 240W. However, CPUs with up to 64 physical cores and 128 virtual threads can still be installed even at that configurable TDP. Milan X 3D vCache versions are also supported. At least the 7473X with 24 cores and 7373X with 16 cores. Both draw 240W but offer significantly more cache at 768MB compared to a maximum of 256GB on the regular Milan processors. With a full core count, it can support up to 128 physical cores with 256 virtual threads.
Each CPU also supports 8x memory channels, which means each memory module is supported in its very own memory channel for maximum performance. 3rd gen EPYC processors are designed to support up to 4TB of memory each. That would be 8TB with 16 slots per CPU. But, since we only have the 8x memory slots per CPU, we get 4TB for the maximum capacity using 256GB modules in all slots. Most people will install like 512GB or maybe a Terabyte Max and have done with. Registered, Load reduced and 3DS versions can be installed on this system but standard RDIMMs and LRDIMMs not the 3DS variety will only support 2TB. 3DS memory modules are made on a stacking die enabling greater density of dram chips on the memory modules.
So, Gigabyte does it again, and again, with a very familiar form factor. At 2U, this system really delivers a punch with support for up to 8x doublewide high-performance GPUs. If you are considering AI, machine learning, AI training and inference or just need a high-performance GPU-enhanced computing platform, check out the Gigabyte G292-Z45 GPU server. Like Baskin Robbins, available in a bunch of different flavors that all come in a remarkably similar package. And if you are looking for that next server, try IT Creations.