Dell EMC PowerEdge XE8545 Server Review
January 2, 2024The Dell EMC PowerEdge XE8545 Server is a 4U platform offering AI infrastructure without compromise (SHOP HERE). It’s designed for Artificial intelligence, Machine Learning, not to mention Inference, Analytics and a host of other high-performance computing applications. It is powered by dual 3rd generation AMD EPYC Milan processors with up to 64 Zen 3 cores each and supports 4x Nvidia SMX4 A100 GPUs with NVLink technology.
Back in the day and by that, we mean like two years ago this was a CRN’s 2021 product of the year recipient. We will mention there is actually an Intel-based version of this same system but that one is the XE9680. Aside from the AI computing, machine learning, and inference, it can also be used for multi-tenant environments with virtualization. Those A100 GPUs supported on this platform can be segmented to provide multiple instances for multiple users. The Multi-Tenant and virtualization options are embedded in the GPU architecture with MIG support or Multi-Instance GPU capability and Nvidia’s virtual GPU software providing a flexible set of virtualization options.
For such a high-performance platform, this system still has a standard rack depth paired with air-cooling to easily integrate with your existing infrastructure. The front bezel has a lock and optional LCD panel.
Once removed, there are 10x 2.5-inch storage bays on the Dell EMC PowerEdge XE8545 Server. Supporting a universal backplane, the system can be configured with SAS and SATA drive types. An alternate configuration includes up to 8x U.2 NVMe SSDs with 8x PCIe NVMe connectors on the system board. Utilizing NVMe drives reduces latency and keeps data close to the CPU when dealing with large data sets. A row of 12x hot-swap GPU fans, are just below the drive bays.
The right server ear has the power ON button a USB 2.0 port next to a VGA port, and micro-AB USB port. The left control panel has the System Health and System ID Indicator and to the right of that a few status indicator LEDs including, Drives, Temperature, Electrical, Memory and PCIe.
Additional at chassis management options include QuickSync 2.0, which uses the OpenManage mobile app. The OpenManage mobile app can be used with a smartphone or tablet and is compatible with iOS or Android operating systems. The optional Bezel with optional LCD panel can also provide system information, status, and error messages but can also be used to configure or view the iDRAC IP address for the Dell EMC PowerEdge XE8545 Server.
The iDRAC port on the back of the system provides remote access to the system. The port on the front also enables a crash cart with monitor, keyboard, and mouse to quickly access iDRAC for at chassis management of the system. iDRAC with Lifcycle Controller helps keep the system up to date and can also be used to install the OS, configure, maintain and diagnose. This platform can also Use Dell’s OpenManage portfolio to manage the system in physical, virtual, local and remote environments in-band or out-of-band. It also provides one-to-many management for Dell PowerEdge Servers and can integrate with other third-party consoles like Microsoft System center, VMware vCenter, Ansible Modules, and ServiceNow. The last one seems like it could be useful at restaurants and fast-food places. Service Now! It will also connect with Miro Focus and other HPE tools, plus IBM Tivoli and Nagios Core.
On the back of the server, there is a row of either 4x hot-plug 2400W Platinum or 2800W titanium redundant or mixed node PSUs. Depending on how the system is configured there are three redundancy policies. Not Redundant, PSU Redundant, and A/B Grid Redundant. The first two are fairly self-explanatory. In a nutshell that last feature takes 2x PSUs say 1 and 3 and places them in grid A 2 and 4 in grid B and if one of the PSUs fails in Grid A, Grid B takes over. Just redundancy of a different kind. This system also has a hot-spare features to reduce the power overhead.
Above those, there are two 1GbE NIC ports embedded on the LOM card. That LOM card is also removeable and is connected to the motherboard. Then an optional OCP 3.0 card slot offering a number of options for port connections and link speeds depending on your choice. Next, the ID button with integrated LED, an mLAN port to access iDRAC for remote management of the system, two USB ports with a 2.0 port on the top and a USB 3.0 port on the bottom and a VGA port to connect a monitor. Above all of that the PCIe slot covers and a weird handle thing that is just to support rigidity in the case. As this system will be quite heavy when loaded up, there are two handles on each side of the chassis.
With the cover removed, you can see several risers in back. Risers 1, 3 and 4 on top and riser 2 placed under riser 3 in the middle. There are either three x16 PCIe 4.0 slots or two PCIe 4.0 x16 slots plus two x8 slots. Those PCIe slots can be used to support additional high-speed I/O devices.
They can also be outfitted with a PCIe based BOSS card with dual M.2 NVMe drives to boot the system or a PERC or PowerEdge RAID Controller card like the H745 or H755. A row of 6x fans pulls cool air from the front of the server and directs it over the CPUs, memory modules, then over the PCIe slots and out the back of the server. That bank of fans can be removed as a single unit or individual fans can be pulled and replaced if need be.
A black plastic air shroud, when removed exposes the CPUs and memory modules. Under the mid-cover is the GPU Air shroud which covers the GPUs and NVLink board in the lower portion of the chassis.
Under the panel is the NVLink board with support for 4x A100 SXM4 GPUs. Those are kept cool by that plastic cover which directs the fresh air from a bank of 10x fans on the front through some tall heatsinks. You can see the backplane for the system drives with various connectors, including a few PCIe NVME connectors, which in turn are cabled to the system board for drive support. That backplane is also easily removeable without tools.
Featuring Zen 3 architecture and up to 64 cores, the 3rd generation AMD EPYC processors feature a system on a chip design with integrated I/O controllers. They also support 8x memory channels each. 2x memory modules can be installed per memory channel for a larger memory footprint but memory speed will be reduced. With each 8 core chiplet sharing 32MB L3 cache, the CPUs can have up to 256MB of L3 cache plus up to 4.1GHz turbo. This chassis will support CPUs with a thermal design power rating up 280W. Memory speeds are variable depending on a number of factors.
Only DDR4 modules are supported in capacities of 32GB and 64GB for up to 2TB of memory with all 32 DIMM slots loaded with 64GB memory modules. The documentation also says up to 256GB per memory channel, which would provide up to 4TB of memory at capacity.
The processors in the Dell EMC PowerEdge XE8545 Server get a little, well a lot, of help from the 4x NVMe SXM4 A100 Tensor core GPUs as this system is NVIDIA Certified. These GPUs support NVIDIA’s A100 GPUS with Ampere architecture and can cover a large range of processing needs for highly parallel workloads like machine learning and AI inference, not to mention virtualized workloads with the MIG or multi-Instance GPU feature.
With MIG they can be partitioned into 7 isolated GPU instances. They offer a 20 times improvement over the performance of just the previous generation Volta-based GPUs. The 80GB version offers the World’s fastest memory bandwidth at over 2TB per second.
The GPUs are installed on an NVIDIA NVLink board offering GPU-to-GPU communications speeds of up to 600GB/s. That is significantly less than the 2TB per second bandwidth they are capable of. I think that 2TB memory bandwidth is just the GPU having an internal conversation from its tensor cores to the GPU memory before it arrives at some conclusion and spits it out.
Choose from either the NVIDIA 400W A100 40GB GPUs or NVIDIA 500W A100 80GB GPUs. 400W is for the standard configuration, but with a custom thermal solution like this baby has, you’re good to go at up to 500W. That said, with the 500W GPUS installed, the system will also have a thermal warning at 28 degrees Celsius instead of the normal 38 degrees Celsius to prevent thermal damage to the system.
For high-performance computing, AI, AI inference and machine learning applications, the Dell EMC PowerEdge XE8545 GPU server delivers oodles of performance. Seems like just about every server today addresses AI in some way or another but this one is designed to hit it in a big way with 4x SXM4 A100 GPUs. And with dual 3rd gen AMD EPYC CPUs you can throw up to 128 cores of processing power at your workloads and expect results quickly.
If you have any questions about this system or any other, post them in the comments section below. We are an authorized partner with Dell Technologies and an Elite Partner with NVIDIA! Visit our website.