Heterogeneous computing / GPGPU (INTC, AMD, NVDA)

Heterogeneous computing refers to computing systems that use a mix of different types of computational units.  The type of heterogeneous computing that I am most interested in is rise of the general purpose graphics processing unit (GPGPU).  The industry trend has been to use the GPU graphics chip for purposes other than graphics.  Innovations in the GPGPU could cause some changes to the landscape between Intel, AMD, and Nvidia.  I believe that heterogenous computing will grow from a small niche into a larger one.

From what I understand (keep in mind that I don’t know how to program GPUs), GPUs:

  1. Are much faster (and/or power efficient) than CPUs at certain problems.
  2. They are only good for a small set of problems out there.

One example of GPU acceleration is Folding @ Home, a program designed to simulate protein folding.  There is a GPU version of the program which excels at some proteins (that only need small amounts of memory to simulate) and a CPU version of the program that excels at all the other proteins.  GPUs will never be ideal for all problems.

The reason why GPGPU has been getting more popular is because the shaders/processing units on a graphics card have been becoming more general-purpose as the industry has been pushing towards more complex graphics.

The tricky part with getting GPUs to work is that it takes additional programming effort to write code to do GPU acceleration.  One area of innovation is developing tools to make programming GPUs easier.  Another area of innovation is to get the software and hardware working in unison to deliver the best performance.  There may be tradeoffs in making the software easy to develop and getting the most performance possible out of the hardware.

Proprietary versus open approaches

Nvidia has a proprietary approach to programming its GPUs that only works with Nvidia GPUs (CUDA).  CUDA is currently the most popular GPGPU framework.  ATI/AMD has its own proprietary framework (Stream / Close to metal).

OpenCL is the open approach to GPGPU and is supported by ATI/AMD, Nvidia, and Intel graphics chips.  The open approach is not necessarily the ultimate solution.  In the history of computing, sometimes the proprietary approach is used over the open approach.  For programs that are available on both Mac and Windows, many vendors have two sets of user interface code (UI) for each operating system.  An alternative approach for UI such as using Java is not always ideal as Java user interfaces tend to be slow.  If you want the highest performance, the proprietary approach is sometimes the best.

Mainstream consumer markets

For mainstream consumer markets, I am not entirely sure what the future of GPU acceleration will be.  Only a few consumer applications really need it (video playback / decoding compressed video, video editing, encoding, and maybe photo editing).  Currently, GPU acceleration can be buggy and many consumers do not have the right hardware (integrated graphics in Intel chips tend to be too slow for GPU accleration to be of any use).  Applications targeting broad consumer markets will likely have to have optimized non-GPU accelerated code alongside GPU accelerated code.  This will take a lot more work for the application developers though I think they will step up to the challenge.  There are video editing programs on the market currently that are doing this (e.g. Sony Vegas).

Specialized markets

For more specialized markets (e.g. workstations, high-performance computing, supercomputers, servers), the adoption of GPU acceleration will be higher.  These specialized markets only have to target one set of hardware so they do not need to write code to target different types of hardware like consumer video editing applications.  There is a possibility that Nvidia’s CUDA framework stays popular if enough of a software ecosystem develops around it and the cost of developing for CUDA is lower than other GPU frameworks.  Companies that have already coded their programs to work solely with CUDA are likely to stick with it in the future as switching vendors is an expense that requires porting their code.

Intel Xeon Phi (formerly Larrabee)

One of the ideas behind the Larrabee project is that eventually the processing units in a GPU will be so complex that they will resemble a traditional CPU core.  The Larrabee project stuck a huge number of traditional x86 CPU cores (40+) onto a single chip and sought to figure out how such a chip would perform as a GPU graphics chip.  This experiment seems to have failed as Intel does not intend on shipping a GPU graphics chip based on Larrabee.  My guess is that this underlying premise does not work well in the real world and is unlikely to work well in the next few years.  Presumably the specialized hardware approach has its merits versus using general-purpose x86 cores.  For example, commercial graphics cards have lower-precision floating point units… these processing units take up less space and therefore a lot more of them can be packed onto a graphics chip.  They also have hardware designed to do certain calculations such as sine very quickly (it is a native instruction on some Nvidia cards).

Currently, the Larrabee project has morphed into something very different.  Intel’s Xeon Phi product is a double-precision math coprocessor for supercomputers with some of the old Larrabee technology along for the ride.  There is a very limited market for it since it targets the supercomputing market (which is small) and only some of those users actually need double-precision math (e.g. scientific modeling).  The users who do need double-precision math are willing to pay a lot for such a coprocessor since a typical CPU contains no double-precision hardware and are very slow at double-precision math.  I don’t believe that the Xeon Phi will have uses outside of its supercomputing niche.  It may not be competitive at all in other GPGPU market segments where single-precision math is sufficient.

Integrated graphics

Both AMD and Intel make silicon chips for PCs that contain both a CPU and a GPU.  Consolidating different chips onto a single chip can lower costs.  In theory, having both the CPU and GPU on a single chip means that they can communicate with each other much faster.  This could prove beneficial to heterogeneous computing.  In practice there is still very little mainstream software that takes advantage of the GPU.  Some manufacturers may have a lot of rhetoric regarding the future of heterogeneous computing but I don’t think that it will be a huge inflection point in the CPU/GPU industry.  One inherent problem is that most applications cannot benefit from GPU acceleration.

The disruptive change that matters more is the trend towards putting a CPU and GPU onto a single chip to reduce cost.  (*There is a limit to this integration as increased die size increases yield problems exponentially, so there is a commercial limit as to how much CPU and GPU can co-exist on a single chip.  High performance will require a high-end CPU and high-end GPU to be on separate chips.)  This can create problems for Nvidia as the market for budget standalone graphics chips is more or less gone and the less-low-end market will shrink.  AMD/ATI could also have problems if Intel starts implementing better GPUs onto its chips.  Historically, the GPUs on Intel chips with integrated graphics had terrible hardware, terrible software drivers, did not support the latest versions of DirectX, and could not play most 3-D games.  Despite how awful Intel’s integrated graphics has been, it has the most market share based on # of units shipped because the low-end market is so large.  Intel has been slowly improving its integrated graphics offerings.  Eventually AMD/ATI will no longer be alone in offering a mid-range CPU and GPU on a single chip.

Sony/Toshiba/IBM Cell processor

The Cell processor in the Sony Playstation 3 is an example of a heterogeneous CPU.  It has a single fast core and several slower CPU cores.  The benefit of slower cores is that they take up significantly less space on the silicon die and you can cram a lot of them onto the die.

In hindsight, the Cell approach was a mistake and it looks like it will not have any further development ever.  It is too costly to force programmers to write parallel code that takes advantage of the Cell’s many cores.  As well, even with parallel code, performance does not scale linearly with the number of cores for most problems.  Outside of the Playstation 3, the Cell processor gained very little traction.

Where I think the future is headed

GPGPU:  It’s a small niche that will grow into a larger niche.  My guess is that Nvidia will continue to be the market leader in specialized markets, but I am not very confident about this prediction.  The entire GPGPU business might only be a $1B/year business in revenue.

Nvidia:  I’m not sure.  Its video card market will shrink due to integrated graphics.  A very scary possibility for Nvidia is if Intel starts integrating good enough graphics onto its chips.  This will probably happen eventually as transistor counts go up; using the extra transistors to improve CPU performance is rapidly seeing diminishing returns so budget parts will likely use those transistors for better graphics instead.  This will kill the market for everything but the high-end standalone graphics chip, which is a smaller market than the mid-range.

Nvidia has valuable patents that can be used to patent troll Intel.  It currently receives significant payments from Intel for its patents.

AMD:  It will eventually face competition for mid-end graphics and CPU integrated onto a single chip.

Intel:  Its execution in developing graphics chips and the drivers for them has been dismal.  Many of its integrated graphics are hardware designs licensed from Imagination technologies.  Its in-house Larrabee project has pretty much been a continual failure in developing a graphics chip.  To be somewhat fair, developing a graphics chip is probably very difficult as it is more of an open-ended problem.  You are allowed to cheat and use computational shortcuts (either in the drivers or in the hardware) that can cause the resulting image to look different than competitors’ graphics.  Unlike designing CPUs, there is no one right answer in graphics.  There is a lot of room for creativity and innovation.  The graphics market has been extremely Darwinian and only the best companies have survived (AMD and Nvidia are very good at what they do).

Despite its many shortcomings in graphics, Intel has been a major beneficiary of the rise of integrated graphics and has dominant market share based on volume.  This suggests to me that Intel has a moat when competing against Nvidia and AMD/ATI.  In an efficient market, I believe that Intel should trade at a higher multiple/valuation than the combination of ATI and Nvidia to reflect its economic advantage (currently it does not).

The rise of the GPGPU could also mean that Intel and AMD sell fewer server CPUs since much of their workload has been offloaded onto more efficient GPUs.  Overall, I do not believe that this will affect Intel and AMD much as most applications cannot be accelerated by GPUs.

*Disclosure: Long Intel.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.