Continued from page 1
PCI-Express 3.0 and Super Highway
We don’t hear much about the Apple-invented OpenCL these days, but it’s out there and it still has much potential. What is lacking partly is a bus that is fast enough to saturate the massively parallel computational units which are today’s modern GPUs.
At the moment highend workstations and desktops are still pretty much clustered around the use of PCI-Express 2.1 technology, which supports up to 8GB/sec data transfer rates. Yet GPU’s are internally capable of so much more, up to 264 GB/sec and even more with cache. This means that the data highway between CPU and GPU is a high latency experience. GPU’s may be phenomenal parallel executing devices but there is a massive bottleneck that must be overcome.
This, by the way, is one area where Apple might innovate in the next Mac Pro, possibly combining CPU and GPU on the same die. The benefits could be felt right away if it did this. And even if it didn’t but simply built support for PCI-Express 3.0 this would bring the bandwidth speed up to 16 GB/sec between all PCI-Express 3.0 cards and the main CPU.
The Competition and PCI-Express 3.0
At the moment the majority of workstations and highend PCs are still clustered around PCI-Express 2.1 but you can find several workstations and highend PCs with PCI-Express 3.0 support, often in extensive fashion, like HP’s Z620 workstation, a machine which offers not one but two (2) PCI-Express 3.0 x16 slots plus an array of lower lane PCI-Express 2 and 3 slots.
An interesting tidbit is that the three new-ish GPUs cited below, while supportive of PCI-Express 3.0, are gaming cards. There is a great irony in this however. Games won’t benefit from faster PCI-Express implementations. As it stands today games don’t even really benefit when moving graphics cards from PCI-Express 2.1 x8 slots to x16 slots.
So what would benefit from PCI-Express 3.0 x16 cards, the kind that have currently started to ship? The answer, is OpenCL and CUDA-based apps.
Superhigh PCI-Express Cards and Parallel Compute
There are currently three new or new-ish graphics cards for Mac that support PCI-Express 3.0, whether it is supported on the driver side yet or not. At this time it’s not clear to us what is being supported on the driver side. When Mac OS X 10.9 ships we may see updates. These GPU cards include:
- AMD Radeon Sapphire HD 7950 Mac Edition – supports PCI-Express 3.0, OpenCL 1.2, OpenGL 4.2 /3.2 (see our review)
- Nvidia GeForce GTX Titan — Mac version spotted in the wild, supports PCI-Express 3.0, OpenCL, CUDA, OpenGL 4.3/3.2
- Nvidia based EVGA GTX 680 Mac Edition – supports PCI-Express 3.0, CUDA and OpenGL 4.3/3.2
We may have some minor detail fact-checking to do but to the best of our knowledge only these currently or soon-to-be available game-oriented graphics cards for Mac support PCI-Express 3.0, and therefore could increase GPU-PCIe-CPU bandwidth from 8 GB/sec to 16 GB/sec. (Editor’s note: The Nvidia Quadra K5000 for Mac doesn’t list anything on PCI-Express 3.0.)
The benefit would be immediately felt for professional applications that utilize either OpenCL or CUDA or both and are designed to benefit from maximum parallel code execution. One such application that fits our Mac 3D users is Octane Render, one of the more exciting GPU-based rendering engines out there. It is strictly CUDA-based at this time so it only supports Nvidia cards for the Mac. A new Mac Pro with onboard support for PCI-Express 3.0 would theoretically boost Octane Render performance if using one of the above supported cards.
CPU vs GPU Rendering: Finding More Cores
On the subject of accelerating rendering some in the industry, such as Luxology co-founder Brad Peebler, have questioned the GPU-rendering path and see it not as a “if-or” proposition but rather a “if-and” proposition. In our article Peepler says Luxology is always looking for parallelization increases and perhaps looking at the cloud is a better pathway to finding more rendering cores. Yet, he says Luxology is not anti-GPU rendering just sees it as adding cores to the CPU side.
Rendering is one area where you can saturate the GPU-PCIe-CPU pipes. But it is not the only area. Arandtech published a superb article that discussed the benefits and limitations of PCI Express 3.0 back in December of 2011, testing the PCI-Express 3.0 implementation of the AMD Radeon HD 7970 on encryption/decryption software.
Apple’s OpenCL: Implementations
Apple invented OpenCL and then helped turn it into an open standard. The technology taps the massive parallelization advantages of today’s modern graphics processing units (GPUs) and is a core technology in OS X. Apple has utilized OpenCL within some of its key pro apps like Final Cut Pro and there are dozens and dozens of applications that are built on OpenCL. Here are some key ones that apply to Mac OS X pro users:
- Adobe Photoshop CS6 — Adobe’s Mercury Graphics Engine (MGE) in CS5 delivers accelerated previews of many photo filter transformations.
- Bullet Physics Engine — Bullet is the number one physics engine in use by professional 3D software packages like CINEMA 4D to numerous games.
- Apple Final Cut Pro X — A leading video editing program it utilizes OpenCL to accelerate rendering on export
- GIMP — Photo-editing tool
- HandBrake — an open source video conversion tool for Mac, Windows and Linux, uses OpenCL for video scaling, color space conversion and more
- Indigo Renderer 3.0 — another unbiased, physically based rendering solution that simulates the physics of light to achieve near perfect image realism, this tool works on OS X and Windows and utilizes OpenCL and CUDA.
- LuxRender — LuxRender is a physically based and unbiased rendering engine based on state-of-the-art algorithms that simulates the flow of light based on physical equations. It utilizes OpenCL and runs as both a plugin and standalone free GPL license program
- LuxMark — This is an OpenCL benchmark tool that was originally associated with the promotion of LuxRender. LuxRender is a part of the Architosh GPU Test Suite
- Mathematica 8 — renowned mathematics and science application
This is just a partial list of some key applications that many Mac pros utilize and this is just OpenCL. There are also key plugins for Adobe’s products like Photoshop which are CUDA-based and also might benefit from a faster PCI-Express pipeline.
Closing Thoughts
Apple’s Mac Pros are old, plain and simple. This demands that they get some attention. Whether that comes in the form of a dramatic new design or simply a significant overhaul of the current tower design, the bottom line is we believe the timing is right at WWDC. This is a developers’ conference. And developers love the Mac Pros as well. The faster the Mac the faster they can compile in Xcode. Even Apple’s current Mac Pro product pages feature Xcode in their own performance benchmarks.
Beyond just modernizing the PCI-Express slots Apple could also update the processors and importantly consider the possibility of adding a lower-end model that taps the power of Intel’s i7 class (Haswell) 4th generation processors. This brings an affordable scalable option to the Mac Pro and even PC workstation specialist Boxx has already done this with its model 4120.
If some of the rumors hold true we can expect Apple to do the distance and off us a dramatically new redesigned tower or workstation class Mac. We may even see them redesign the system architecturally in such a significant way that the change the product name all together. One thing they may do is also introduce a new Cinema Retina Display at perhaps a 24-inch model size. Because developers utilize Mac Pros they need a monitor in which they can natively develop Retina Display capable apps. Though such a monitor would be expensive, it could be possible with a whole new set of kit tomorrow that gets the professional crowd really pumped and excited about producing software products for iOS and OS X.
We shall see….
Reader Comments
The June 3rd press announcement from BOXX Technologies shows that Haswell (4th gen Core Intel) processors are noteworthy CPUs for serious entry-level workstations. The PR for this note is here, on the BOXX 4120. – http://www.boxxtech.com/news/haswell-is-here
The June 3rd press announcement from BOXX Technologies shows that Haswell (4th gen Core Intel) processors are noteworthy CPUs for serious entry-level workstations. The PR for this note is here, on the BOXX 4120. – http://www.boxxtech.com/news/haswell-is-here
BOXX’s Haswell-based 4120 supports a great array of mid-level cards all the way to the Quadro K5000. Interestingly, the GeForce Titan in 6 GB edition is the 2nd most expensive card choice at nearly 1200.USD. Another nice choice you see with many workstation makers is that you can order a machine sans GPU card. This is useful if you already have a card from another machine you want to use in the meantime, etc. These are the kind of options I’d like to see Apple make.
BOXX’s Haswell-based 4120 supports a great array of mid-level cards all the way to the Quadro K5000. Interestingly, the GeForce Titan in 6 GB edition is the 2nd most expensive card choice at nearly 1200.USD. Another nice choice you see with many workstation makers is that you can order a machine sans GPU card. This is useful if you already have a card from another machine you want to use in the meantime, etc. These are the kind of options I’d like to see Apple make.
Comments are closed.