14.12.2012 Views

ATI CrossFire Technology White Paper

ATI CrossFire Technology White Paper

ATI CrossFire Technology White Paper

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ATI</strong> <strong>CrossFire</strong> <br />

<strong>Technology</strong> <strong>White</strong> <strong>Paper</strong>


<strong>ATI</strong> <strong>CrossFire</strong><br />

<strong>CrossFire</strong> is an exciting new technology developed by <strong>ATI</strong> that allows the power of multiple Graphics<br />

Processing Units to be combined in a single system.<br />

Key benefits include:<br />

Ultimate Performance<br />

• Up to two times the frame rates of a single GPU.<br />

Ultimate Image Quality<br />

• Up to twice the anti-aliasing quality of a single GPU, at full speed.<br />

Ultimate Compatibility<br />

• Improves the experience in any 3D application.<br />

Ultimate Flexibility<br />

• Simple upgrade for single GPU systems, driving single or multiple displays.<br />

<strong>ATI</strong> <strong>CrossFire</strong> system with dual graphics cards<br />

- 1 -


Background<br />

The world around us possesses an infinite level of detail. The process of creating virtual worlds that approach<br />

this level of detail can consume practically any amount of computing power made available to it. However, by<br />

breaking the task down into a finite set of elements (such as pixels and polygons), parallel processing can be<br />

used to perform calculations on multiple elements at once.<br />

Cost.<br />

Multi-Processor<br />

Single Processor<br />

Transistor Count (Performance)<br />

<strong>ATI</strong> is one of the pioneers of multi-processor technology<br />

for graphics. In 1999, <strong>ATI</strong> introduced MAXX <br />

technology, using a patented Alternate Frame Rendering<br />

technique. The Rage Fury MAXX graphics card<br />

combined two Rage 128 PRO processors on a single<br />

board for high-end gaming. Despite the limitations<br />

imposed by the AGP bus interface, it was able to<br />

achieve over 50% higher frame rates in games like<br />

Quake 3 at high resolutions.<br />

<strong>ATI</strong> Rage Fury MAXX with dual<br />

Rage 128 Pro graphics processors<br />

The most straightforward way to drive processing<br />

power upward is to add more transistors dedicated to<br />

the task. However, the cost of a microprocessor starts<br />

to increase exponentially as its size and transistor<br />

count increases beyond a certain point. Performance,<br />

on the other hand, does not necessarily continue to<br />

increase at the same rate.<br />

The goal of multi-processor technology is to allow<br />

performance increases to scale linearly with cost,<br />

independent of manufacturing process technology or<br />

architectural improvements.<br />

- 2 -


In 2001, <strong>ATI</strong> began working with Evans & Sutherland to collaborate on multi-GPU workstation products. The<br />

result was the powerful simFUSION ® image generator, which uses up to 4 <strong>ATI</strong> GPUs per board. Versions built<br />

using the Radeon® 9700 and 9800 series GPUs feature <strong>ATI</strong>’s exclusive Supertiling and Super AA technologies,<br />

and can support up to 24-sample anti-aliasing at extremely high resolutions in real time.<br />

For even more demanding applications, multiple simFUSION image generators can be linked together. The<br />

Evans & Sutherland RenderBeast ® visualization system uses up to 16 simFUSION boards in parallel, for a total<br />

of 64 GPUs. The complete system is capable of rendering up to 768 billion pixels per second and supporting up<br />

to 384-sample anti-aliasing. This technology has proven ideal for ultra-high fidelity military simulators,<br />

commercial flight simulators, planetariums, automotive design visualization, and other industrial applications.<br />

Multi-GPU Workstation products developed by Evans & Sutherland.<br />

Left: SimFUSION ® 6000q board with quad Radeon 9800 GPUs.<br />

Right: RenderBeast ® visualization system with 64 parallel Radeon GPUs.<br />

While these multi-GPU technologies were revolutionary, limitations in supporting platform technology have kept<br />

them from being able to deliver their full benefits to a wide range of PC users. To make that a reality, a fast,<br />

cost-effective, bi-directional interface between PC components was needed. In 2004, the introduction of the PCI<br />

Express® platform fulfilled this requirement, and enabled a new generation of highly efficient multi-GPU<br />

technology for PCs.<br />

- 3 -


<strong>CrossFire</strong> – How it Works<br />

<strong>CrossFire</strong> is the most sophisticated and powerful technology currently available for multi-GPU graphics<br />

rendering. It consists of the following components:<br />

• Radeon <strong>CrossFire</strong> Edition graphics card with Compositing Engine<br />

• Secondary PCI Express graphics card with Radeon X800 or X850 series GPU<br />

• <strong>CrossFire</strong> Ready motherboard with dual PCI Express graphics slots<br />

• Catalyst driver with <strong>CrossFire</strong> support<br />

<strong>CrossFire</strong> Block Diagram<br />

In a <strong>CrossFire</strong> system, each GPU has its own dedicated PCI Express link to the North Bridge of the<br />

motherboard chipset, and is allocated its own command buffer and non-local storage space in system memory.<br />

Allocating separate command buffers allows each GPU to be assigned its own unique set of tasks.<br />

Another portion of system memory is set aside for sharing of data between the GPUs. Shared data includes<br />

synchronization commands, textures, off-screen buffers, and other temporary data generated during the<br />

- 4 -


endering process. This configuration takes advantage of the high speed bi-directional PCI Express links to<br />

ensure smooth, efficient co-operation between the GPUs.<br />

When each GPU has completed its assigned tasks for a given frame, the resulting outputs are sent to the<br />

<strong>CrossFire</strong> Compositing Engine. This device combines the results from each GPU according to the selected<br />

operating mode, and sends the final frames out to the display device. It is capable of performing advanced<br />

blending operations without burdening either of the GPUs.<br />

<strong>CrossFire</strong> supports four modes of operation, including three performance-oriented modes (AFR, Supertile, and<br />

Scissor) and one quality-oriented mode (Super AA). Each mode uses a different method for dividing the<br />

workload required to render a 3D image across multiple GPUs.<br />

Ultimate Performance<br />

The key to maximizing the speed of a multi-GPU system is to divide the rendering workload of each frame as<br />

efficiently as possible. It is also important to minimize any performance overhead caused by additional driver<br />

processing or synchronization between the GPUs, while maintaining compatibility with a wide range of<br />

applications.<br />

To achieve these goals, <strong>CrossFire</strong> employs one of three different techniques to divide the rendering workload.<br />

The optimal technique is determined automatically for each 3D application, using the Catalyst A.I. feature of the<br />

<strong>ATI</strong> display drivers. The three techniques are Alternate Frame Rendering, Supertiling, and Scissor.<br />

- 5 -


Alternate Frame Rendering (AFR) Mode<br />

In this mode, all even frames are rendered on one GPU, while all odd frames are rendered on the other. The<br />

completed frames from both GPUs are sent to the Compositing Engine on the <strong>CrossFire</strong> Edition board, which<br />

then sends them on to the display in turn. By allowing both GPUs to work completely independently, AFR<br />

provides the greatest potential performance improvements of all the available modes. It is also the only mode<br />

that allows the full vertex processing performance of both GPUs to be combined.<br />

Alternate Frame Rendering (AFR) Mode<br />

The main limitation of this mode is that it cannot be used in applications where the appearance of the current<br />

frame is dependent upon data generated in previous frames, since AFR generates successive frames<br />

simultaneously on different GPUs. In these cases, the Supertile or Scissor modes can be used instead.<br />

- 6 -


Supertile Mode<br />

In this mode, each frame to be rendered is divided into a number of tiles in an alternating checkerboard pattern,<br />

such that half of the tiles are assigned to each of the two GPUs. Because the tiles are kept fairly small (32x32<br />

pixels), this method does a good job of balancing the workload across each GPU regardless of what is being<br />

drawn on the screen, and it does so without any software overhead.<br />

Supertile Mode<br />

Supertile Mode has the advantage of being able to work with practically any 3D application. However, there are<br />

a small number of applications where the Supertile workload distribution does not provide optimal performance.<br />

For these special cases, Scissor Mode can be used.<br />

- 7 -


Scissor Mode<br />

In this mode, each frame is split into two sections, with each section being processed by one GPU. The split<br />

can be horizontal or vertical, and it can be even (50/50) or uneven (such as 60/40 or 70/30). The ideal<br />

configuration is determined automatically for each application.<br />

Scissor Mode<br />

Although in general Scissor Mode is a less efficient means of splitting the workload than the Supertile method,<br />

there are a few cases where it can be more efficient. It is supported by <strong>CrossFire</strong> in order to maximize<br />

compatibility and performance.<br />

- 8 -


Ultimate Image Quality<br />

One of the key problems with early multi-GPU technologies is how they handle 3D applications limited by CPU<br />

performance rather than GPU performance. A similar situation holds true for applications that can execute so<br />

quickly with a single GPU that they can achieve frame rates well above the refresh rate of the attached display<br />

device. In these cases, the additional power provided by a second GPU is essentially wasted, since the CPU or<br />

display device becomes a bottleneck.<br />

These situations can be quite common on a fast system, even when running at maximum image quality settings.<br />

On systems equipped with less than cutting-edge CPUs or limited resolution displays (such as the common<br />

1280x1024 LCD resolution), the value of adding a second GPU can be severely limited. <strong>ATI</strong> <strong>CrossFire</strong><br />

technology addresses these problems with its new Super AA mode, which takes advantage of a second GPU to<br />

improve image quality, rather than just to increase frame rates.<br />

Super AA Mode<br />

Anti-Aliasing (AA) is a well known rendering technique designed to remove jagged edges, shimmering, and<br />

pixelation problems that are common in rendered 3D images. Rather than simply determining the color of each<br />

pixel on the screen by sampling a single location at the pixel's center, anti-aliasing works by sampling multiple<br />

locations within each pixel and blending the results together to determine the final color.<br />

Today's graphics processors employ a variety of different anti-aliasing techniques. The latest generation of<br />

<strong>ATI</strong>’s Radeon GPUs with SmoothVision HD technology use a method known as Multi-Sample Anti-Aliasing<br />

(MSAA). This method takes samples from 2, 4, or 6 programmable locations within each pixel, and uses<br />

gamma correct sample blending for high quality smoothing of polygon edges. Taking more samples per pixel<br />

increases the quality of the final image.<br />

The new <strong>CrossFire</strong> Super AA modes take advantage of the programmable sample capability of SmoothVision<br />

HD to provide even higher quality anti-aliasing on multi-GPU systems. It works by having each GPU render the<br />

same frame with anti-aliasing enabled, but uses different sample locations for each. When both versions of the<br />

frame are completed, they are blended in the <strong>CrossFire</strong> Compositing engine. The resulting image has<br />

effectively twice the number of samples, so 4x and 6x AA become 8x and 12x Super AA respectively.<br />

- 9 -


Super AA Mode<br />

6x AA 12x SuperAA<br />

Comparison of edge smoothing with 6x AA and 12x Super AA.<br />

Note the smoother gradients along the edges in the Super AA image.<br />

While these new anti-aliasing modes deliver outstanding quality and high performance, they are still limited to<br />

polygon edges. Some types of textures, especially those with transparent portions, can exhibit aliasing that is<br />

not removed by MSAA techniques. Another form of anti-aliasing, known as Super-Sample Anti-Aliasing (SSAA),<br />

can be useful in these cases since it affects every pixel in an image. Although it normally runs quite a bit more<br />

slowly than MSAA, the power of multiple GPUs can make SSAA much more practical to use.<br />

The simplest and most commonly used form of SSAA requires a scene to be first rendered at a higher resolution<br />

than that which is output to the display. Once completed, the image is then downsampled to the display’s<br />

- 10 -


esolution. This method has two main disadvantages. First, it requires rendering many more pixels than<br />

normal, which can have a drastic impact on performance. Second, it results in an ordered grid sample pattern,<br />

which does a poor job of anti-aliasing some types of jagged edges (specifically those that are nearly vertical or<br />

nearly horizontal, which tend to result in the most noticeable problems).<br />

Super AA overcomes both of these problems. It takes advantage of the second GPU to render the additional<br />

pixels required for each frame, so there is little or no performance impact. Also, it can make use of a more<br />

effective sample pattern (known as a rotated grid) that does a better job of anti-aliasing near-horizontal and<br />

near-vertical edges, resulting in better overall image quality.<br />

Two of the new Super AA modes use a combination of MSAA and SSAA to achieve the ultimate in image<br />

quality. They work by not only using different sample multi-sample locations on each GPU, but by also<br />

offsetting the pixel centers slightly as well. In effect, each GPU renders the image from a different viewpoint,<br />

about half a pixel width apart. The new 10x and 14x Super AA modes operate in this way, combining 2x SSAA<br />

with 4x and 6x MSAA respectively.<br />

6x 6x AA AA<br />

14x<br />

14x<br />

SuperAA<br />

SuperAA<br />

Comparison of SmoothVision HD 6x Anti-Aliasing (left) and new 14x Super AA mode (right).<br />

Note how the branches and leaves look smoother and less pixelated in the image on the right.<br />

Images taken from Half-Life 2 by Valve Software.<br />

- 11 -


Anti-Aliasing Sample Patterns<br />

An added benefit of these modes is that they work together with SmoothVision HD Anisotropic Filtering (AF).<br />

This is a high quality filtering technique designed to produce sharper, clearer textures by blending multiple<br />

texture samples (2, 4, 8, or 16) for each pixel. Since Super AA can render each pixel from two slightly different<br />

viewpoints and combine them, the texture samples from each viewpoint get combined as well. This means the<br />

number of texture samples per pixel is effectively doubled, so up to 32x Anisotropic Filtering can be supported.<br />

The new Super AA modes can be enabled by users through the <strong>ATI</strong> Catalyst Control Center interface.<br />

New SmoothVision HD Anti-Aliasing Slider in Catalyst Control Center<br />

- 12 -


Ultimate Compatibility<br />

Another limitation of previous multi-GPU techniques was that they did not always work well with all applications.<br />

This was due to a variety of reasons, including certain rendering techniques that were incompatible with the<br />

particular multi-GPU implementation, or heavy CPU dependence that created performance bottlenecks. In any<br />

case, these issues limited the value of adding a second GPU to a system.<br />

<strong>CrossFire</strong> is the first multi-GPU technology that can provide a benefit to any 3D application. This is made<br />

possible by its support for four different operating modes that maximize compatibility. The <strong>ATI</strong> Catalyst display<br />

driver will automatically select the best of the three performance modes when a 3D application is started,<br />

without requiring user intervention. Alternatively, the user can choose to improve image quality by selecting the<br />

new Super AA modes in the Catalyst Control Center.<br />

Ultimate Flexibility<br />

Earlier multi-GPU technologies were characterized by limited upgradeability. In general, they required two<br />

identical or near-identical graphics cards be installed in a system in order to allow them to work together.<br />

Furthermore, each of these cards typically had to be special models designed specifically for multi-GPU<br />

configurations. These special models could be more expensive or harder to find than standard models. Finally,<br />

this approach meant that every compatible GPU produced had to include logic to support Multi-GPU, thus<br />

increasing the cost of these products unnecessarily for those who had no need for such configurations.<br />

<strong>ATI</strong> <strong>CrossFire</strong> technology takes a novel approach to improving upgradeability. It is designed such that a<br />

Radeon <strong>CrossFire</strong> Edition board can be connected to any existing model, configuration, or brand of PCI Express<br />

graphics card that uses a Radeon X800 or Radeon X850 series GPU. This provides an easy upgrade path to<br />

existing owners of Radeon X800- or Radeon X850-based graphics cards. The dedicated <strong>CrossFire</strong><br />

Compositing Engine ensures that only users interested in multi-GPU configurations need to pay for the circuitry<br />

that makes it possible.<br />

In order to ensure the workload does not become too strongly mismatched between two cards with differing<br />

levels of performance, <strong>ATI</strong> <strong>CrossFire</strong> technology has the capability throttle the speeds of each GPU. This<br />

makes it possible to obtain better performance and image quality from almost any combination of compatible<br />

graphics boards.<br />

Summary<br />

<strong>ATI</strong> <strong>CrossFire</strong> technology offers the most advantageous means available of combining multiple high<br />

performance Graphics Processing Units in a single PC. Combining a range of different operating modes with<br />

intelligent software design and an innovative interconnect mechanism, <strong>CrossFire</strong> enables the highest possible<br />

level of performance and image quality in any 3D application.<br />

Copyright 2005, <strong>ATI</strong> Technologies Inc. <strong>ATI</strong>, Radeon, <strong>CrossFire</strong>, Catalyst, SmoothVision, MAXX, Rage Fury, and Rage 128 PRO are<br />

trademarks and/or registered trademarks of <strong>ATI</strong> Technologies Inc. simFUSION and RenderBeast are registered trademarks of Evans &<br />

Sutherland. All other company and/or product names are trademarks and/or registered trademarks of their respective owners.<br />

- 13 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!