WHEN WE LAST REVIEWED NVIDIA’s Ampere-generation RTX A2000 GPU for workstations, we were impressed with how much performance that entry-level GPU brought to the user. This time, we are reviewing the most powerful single-slot professional GPU on the market, the mid-tier Ada generation RTX 4000.
General Summary
This graphics card is a professional application-oriented GPU board designed for use in professional workstations. It has been on the market for quite some time, but recently, Nvidia sent us a version to review, and we have done the most exhaustive review of a GPU to date. We should note that the last mid-tier professional GPU we reviewed was from AMD with its Radeon Pro W6600, which was designed to compete with this GPU’s immediate predecessor, the Ampere-generation RTX A4000. Because we never reviewed this GPU’s predecessor, we have relied on some published benchmarks below.
Nvidia’s mid-tier 4000-class GPU models are generally ideal for AEC professionals who are involved in a fair amount of real-time and photo-realistic rendering workflows, whether in supplement to BIM or not. This is also the case for product design and MCAD professionals; those doing more rendering workflows will see greater value benefits from this GPU. (see Conclusions on the last page for more info).
The RTX 4000 Ada GPU is the successor to the RTX A4000 Ampere generation GPU. For the purposes of our GPU economics charts and calculations, we are using MSRP pricing for the 4000 Ada GPU at USD 1,250, while the initial price for the Ampere generation predecessor GPU was USD 1,000 when first released in April 2021.
The Nvidia Ampere generation GPU we tested and reviewed was the RTX A2000 SFF GPU, and we have carried over benchmarks for non-direct comparison. The Ampere generation’s rival AMD Radeon Pro GPU was the W6600, which we also reviewed a few years ago. Again, we have carried over some comparison benchmarks.
In-house benchmarking is a tough business, especially if working on different systems. That is not our case. Our review of the RTX 4000 Ada GPU ran on the same BOXX workstation as our previously reviewed GPUs under the same Windows 10 Professional operating system.
While we will summarize our conclusions at the end of the review, some highlights about the RTX 4000 Ada GPU to bring up now include the following:
- Most powerful single-slot GPU for professional workstation computing
- substantial performance gains in apps like V-ray, Arnold, and Blender
- DLSS is supported in Enscape and upscales with AI
- Ada generation brings dual encode/decode engines (2 encode / 2 decode engines)
- AD1 code support
- Excellent ventilation with blower style thermal design versus gaming card thermals
The RTX 4000 Ada GPU is powered by Nvidia’s AD104 graphics processor, built on TSMCs 4nm process node. The 4N node is a custom node designed for Nvidia and differs from its regular N4 node. The custom node for Nvidia lays an emphasis on its power efficiency.
To put the chip into perspective, the die area is around 294 mm² and contains 35.8 billion transistors. The predecessor chip in the A4000 Ampere was the GA104 chip with 17.4 billion transistors, similar in size to Apple’s M1 chip at 16 billion transistors. We bring up the M1 because we will make some comparisons to SoCs in our review. The important take-away is that there are twice as many transistors in this Ada generation RTX 4000 GPU.
GPU Economics
As part of our review work, we like to compare GPUs on a performance-per-dollar basis, but we calculate this a bit differently than the common formula. We divide the cost of the chip or GPU by the benchmark score to obtain the cost of one compute unit (i.e., the cost in dollars to obtain one unit of measure on the benchmark). In our charts, our notation for “compute unit” is “CU.”
- Cost in USD per Compute Unit (CU) = cost of GPU / benchmark score
You can see this kind of metric in our review of a GPU workstation here. We began this type of benchmark because we have begun research into methods of economically optimizing workstation configurations across both GPU and CPU for specific workflows. Such work is in an early phase and explained in a feature article titled, “BIM Manager: The Economic Value of Workstation Performance,” inside Xpresso newsletter #44.
Our GPU economics charts are partial and selective, but hopefully informative. This kind of metric allows you to compare the value proposition of different chips, especially in isolation, to a specific application or type of workflow. However, one must keep in mind that benchmark scores alone do not tell the whole story of “value” or “performance,” as overall system performance impacts GPU performance. Furthermore, workstation-class GPUs offer other tangible and intangible benefits, including the certification of professional applications.
Since we have so many benchmarks to test, which ones do you select for performance per dollar calculation and comparison? We have selected three:
- (1) Creo and Solidworks (OpenGL-based) composite tests
- (2) Creo (OpenGL shaded and edge performance) tests
- (3) Cinebench GPU (Redshift Renderer) tests
The first two are SPECviewperf tests and are widely regarded for OpenGL-based 3D CAD applications. Redshift is widely deployed across DCC apps plus all of Nemetschek Group’s AEC BIM solutions as a render option. We review these three performance per dollar metrics at the end of our Benchmarking section below.
GPU Details: RTX 4000 Ada
Nvidia’s RTX 4000 Ada generation GPU is a full-size single-slot GPU that boasts 20 GB of video memory, which is more than the 16 GB that shipped on its predecessor Ampere generation GPU. Details include:
- 6,144 CUDA Cores
- 48 3rd-gen RT (raytracing) cores
- 192 4th-gen Tensor cores
- 26.7 peak TeraFlops single-precision (FP32)
- Nvidia AD104 chip
- PCI Express 4 x 16 for advanced data transfer
- OpenGL 4.6 / Vulkan 1.3 / DirectX 12 Shader Model 6.7
- 20 GB GDDR6 with ECC memory
- 360 GB/s peak memory bandwidth
- 4x DisplayPort 1.4a
- 2x 7680 x 4320 @ 60 Hz max resolution
- 130W max power consumption
- Single-Slot, full height, 4.4″ H x 9.5″ L
- VR Ready
Compute APIs support CUDA 12.2, OpenCL 3.0 and DirectCompute. Nvidia NVlink = no.
Those are the specs at a glance. This is Nvidia’s most powerful single-slot GPU ever and the ideal GPU for demanding professionals across CAD industries like AEC, DCC, MCAD, in addition to science, medicine, energy, financial and software development.
As generative AI technologies begin to dominate the focus of professional application workflows, the Nvidia RTX 4000 Ada GPU unlocks accelerated AI compute workloads. We have also noted the new DLSS support for Enscape, which stands for Deep Learning Super Sampling), which is AI-powered upscaling. This game-inspired technology enables game rendering at lower resolution and then uses deep learning to upscale the image to a higher resolution. This technology is inside the popular AEC rendering software Enscape, starting with version 3.1.
next page: Benchmarking and Performance Results