Knowing how a GPU works isn’t necessarily easy, but we’ll attempt to walk you thru the fundamentals here, in order that you’ll refine your knowledge within the field later.
Let’s start from the start . what’s a GPU? With its full name Graphic Processing Unit, it’s no more and no but your graphics card’s processor. Unlike the central processing unit (CPU), it’s much less versatile, but also far more powerful.
The basic units
Whatever the micro-architecture of GPUs, one basis remains common: the units of a GPU. I even have the honor to present to you:
- Stream Processors
- Raster Engines
- Texture Mapping Units
- Render OutPut Units
Stream Processors are the foremost famous components of a graphics card. Nvidia calls them CUDA Cores, others mistakenly call them “cores”, but they’re indeed Stream Processors.
A Stream Processor is neither more nor but a unit capable of performing calculations on floats (English translation of floats, which are floating point numbers). Typically, the more Stream Processors there are, the faster the mathematical data calculation is performed. And there’s tons of math to be calculated during a game for a graphics card: movements, particle effects, direction of sunshine , shadows, reflections …
Note that Stream Processors (SP) calculate on 32-bit floats (very roughly, it’s variety with 32 binary digits after the decimal point) because these are the numbers most employed by our programs. Some Stream Processors are ready to calculate 64-bit floats, which are more precise. These 64 floats are very fashionable within the computing world and therefore the professional world, because they’re quite used. On the opposite hand, there are always fewer units capable of processing 64 streams than those capable of processing 32 floats.
The Raster Engine
Raster Engines (in French rasterization units) are large units capable of generating polygons. As you almost certainly know, your favorite computer game is formed from many objects (characters, walls, floors, wooden boxes…) each made from many polygons (triangles). Currently, to realize the extent of realism demanded by players, it takes an appalling number of polygons for the thing to not be “cubic”.
Look at the 2 examples below: Lara Croft with 300 polygons and Lara Croft with 5,000 polygons. But currently, a personality during a very nice game (eg Ryse: Son of Rome) can reach or exceed 100,000 polygons. which number keeps increasing! By 2025, it’s estimated that we’ll exceed the Million polygon for a character.
Texture Mapping Units
Now let’s advance to Texture Mapping Units (TMU). These are the units that apply textures to the generated polygons. A drawing worth far better than a speech, here is an example of a monster within the sort of a polygon (on the right), and therefore the same on the left once “textured” (I call “La Rousse” to recommend this word to them) .
Typically, once you download an HD texture pack (for Skyrim for example), or more simply if you increase the small print of the textures in your game, it is the TMUs that you simply do most work with, you slavers!
The Render Output Units
And finally, we end with the Render Output Units (ROP) which are units liable for processing the image. These are the ROPs that you simply are going to be putting to figure once you push the anti-aliasing and anisotropic filtering or increase the display resolution.
These four specialized units are an integral a part of the theoretical computing power of graphics cards. the overall principle is to multiply the amount of valid units by the frequency of the GPU. So :
- Calculation power FP32 / FMA32 (in GFLOP) = number of SP 32 x2 x frequency (in GHz)
- Calculation power FP64 / FMA64 (in GFLOP) = number of SP 64 x2 x frequency (in GHz)
- Triangle rate (in GTriangle / s) = number of Raster Engines x frequency (in GHz)
- Filtering rate (in GTexel / s) = number of TMU x frequency (in GHz)
- Pixel rate (in GPixel / s) = number of ROPs x frequency (in GHz)
Nothing very complicated actually . take care , there are some rare exceptions which don’t have all the units 100% effective on some references, which distorts this calculation a touch . But concretely everything is there. we’ll not speak Chinese to you once we ask you about GTexel / s.
Comparison of two graphics cards:
As an example, we are therefore getting to do a theoretical and comparative study of two mid / high-end GPUs: the R9 280X from AMD and therefore the GTX 770 from Nvidia.
These similar performance cards don’t have an equivalent strengths. Indeed, the R9 280X features a huge advantage in terms of Stream Processor number (2048 against 1536, 33% advantage anyway), which allows it to calculate much faster, especially in 64 bits! Indeed, 1 SP out of 4 (ratio of 1/4) is in a position to calculate 64 bits on the Radeon, against 1 SP in 24 for the Geforce (ratio of 1/24), hence this monstrous gap.
On the opposite hand, the GTX 770 beats its competitor flatly in terms of triangle flow, since it’s double the Raster Engine (4 against 2).
However, in terms of filter throughput and pixel throughput, almost perfect equality since same number of units whenever , with a small advantage for the Nvidia, because of its higher frequency.
In the end, we will see that the advantage of the Radeon in terms of computing power outweighs the advantage of the Geforce over Triangulum generation.
Now that’s it, you’ll compare your b… cards on their theoretical performance. Who here would skills to calculate one among the theoretical performances of their card?