Comparison of GPU’s

SANKET BENDREY
6 min readJun 24, 2022

What is GPU’s?

With the emergence of extreme scale computing, modern graphics processing units (GPUs) have been widely used to build powerful supercomputers and data centers. With large number of processing cores and high-performance memory subsystem, modern GPUs are perfect candidates to facilitate high performance computing (HPC).

What Does a GPU Do?

The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in creative production and artificial intelligence (AI).

GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.

Types of GPU’S?

On broader category, there are two types of Graphic processing units

(1) Integrated Graphics Processing Unit

The majority of GPUs on the market are actually integrated graphics. So, what are integrated graphics and how does it work in your computer? A CPU that comes with a fully integrated GPU on its motherboard allows for thinner and lighter systems, reduced power consumption, and lower system costs.

Intel® Graphics Technology, which includes Intel® Iris® Plus and Intel® Iris® Xe graphics, is at the forefront of integrated graphics technology. With Intel® Graphics, users can experience immersive graphics in systems that run cooler and deliver long battery life.

(2) Discrete Graphics Processing Unit

Many computing applications can run well with integrated GPUs. However, for more resource-intensive applications with extensive performance demands, a discrete GPU (sometimes called a dedicated graphics card) is better suited to the job.

These GPUs add processing power at the cost of additional energy consumption and heat creation. Discrete GPUs generally require dedicated cooling for maximum performance.

We concentrate on two recently released GPUs: an Nvidia GeForce GTX 580 (Fermi) and an ATI Radeon HD 5870 (Cypress), and compare their performance and power consumption features.

By running a set of representative general-purpose GPU (GPGPU) programs, we demonstrate the key design difference between the two platforms and illustrate their impact on the performance.

The first architectural deviation between the target GPUs is that the ATI GPUs adopt very long instruction word (VLIW) processors to carry out multiple operations in a single VLIW instruction to gain an extra level of parallelism over its single instruction multiple data (SIMD) engines.

Typically, in an n-way VLIW processor, up to n independent instructions can be assigned to the slots and be executed simultaneously. Obviously, if the n slots can be filled with valid instructions, the VLIW architecture can execute n operations per VLIW instruction. However, this is not likely to always happen because the compiler may fail to find sufficient independent instruction.

The second major difference between two GPUs exists in the memory subsystem. Inherent from the graphics applications, both GPUs have separate global memories located off-chip for the global, private (referred as local in Nvidia GPU), texture, and constant data. They also have fast on-chip local memory (called shared memory in Nvidia and local data share in ATI) and caches for the texture and constant data. The Nvidia Fermi introduces new L1 and L2 caches for caching both global and local data that are not allowed in Radeon HD 5870. In the GTX 580, the L1 cache and shared memory can be configured to two different size combinations. The L1 cache can also be disabled by setting a compiler flag. All off-chip memory accesses go through the L2 in GTX 580. Given the additional L1 and L2 caches for global and local data, we will investigate and compare the performance of the memory system of the target GPUs.

Thirdly, power consumption and energy efficiency stand as a first-order concern in high performance computing areas. Due to the large amount of transistors integrated on chip, a modern GPU is likely to consume more power than a typical CPU. The resultant high power consumption tends to generate substantial heat and increase the cost on the system cooling, thus mitigating the benefits gained from the performance boost. Both Nvidia and ATI are well aware of this issue and have introduced effective techniques to trim the power budget of their products. For instance, the PowerPlay technology is implemented on ATI Radeon graphics cards, which significantly drops the GPU idle power.

Similarly, Nvidia use the PowerMizer technique to reduce the power consumption of its mo-

bile GPUs. In this paper, we measure and compare energy efficiencies of these two GPUs for further assessment.

Table: System Information

Fermi Architecture

Fermi is the latest generation of CUDA-capable GPU architecture introduced by Nvidia. Derived from prior families such as G80 and GT200, the Fermi architecture has been improved to satisfy the requirements of large scale computing problems. The GeForce GTX 580 used in this study is a Fermi-generation GPU.

The major component of this device is an array of streaming multi-processors (SMs), each of which contains 32 Streaming Processors (SPs, or CUDA cores). There are 16 SMs on the chip with a total of 512 cores integrated in the GPU. Within a CUDA core, there exist a fully pipelined integer ALU and a floating point unit (FPU). In addition, each SM also includes four special function units (SFU) which are capable of executing transcendental operations such as sine, co-sine, and square root.

Cypress Architecture A

Cypress is the codename of the ATI Radeon HD 5800 series GPU. In general, it is composed of 20 Compute Units (CUs), which are also referred as Single-Instruction-Multiple-Data (SIMD) computation engines, and the underlying memory hierarchy. Inside an SIMD engine, there are 16 thread processors (TP) and a 32KB local data share. Basically, an SIMD engine is similar to a stream multiprocessor (SM) on an Nvidia GPU while the local data share is equivalent to the shared memory on an SM. Note that on the Radeon HD 5870 GPU, there is an 8KB L1 cache on each SIMD engine and a 512KB L2 cache shared among all compute units. However, these components function differently from the caches on the Fermi GPU in that they are mainly used to cache image objects. In this paper, we use the

term HD 5870, Cypress GPU, and ATI GPU interchangeably.

Conclusion:

To conclude, discrete graphics cards are standalone graphics processors connected to the motherboard via the PCIe slot. They provide cutting-edge rendering technology in real-time and a plethora of other features. Some of these include streaming 4K and 8K videos and games, as well as VR.

While integrated graphics have displayed major improvement in recent years, they are still better suited for light daily use. Discrete graphics cards have the power to make complex graphical tasks look sleek, making them a better option for gaming, video editing, and game development.

Authors:- Bashir Rahmani, Mohit Mawal, Sanket Bendrey, Sandesh Mankar, Hasibullah Nuristani

--

--