Intel tricks: Optimising for Intel Atom devices

Intel tricks: Optimising for Intel Atom devices

By Orion Granatir

January 7th 2011 at 3:00PM

Intel software engineer Orion Granatir on how to get the most from the small but mighty processor

The Intel® Atom™ processor is the brains behind the recently released Google TV and the core of the popular netbook market.

Since its launch in 2008, over 40 million Intel Atom processor-based devices have shipped, and the momentum shows no signs of slowing. With so many of these rice-sized powerhouses already out there, the opportunities and possibilities are lining up from here to the horizon.

In this Intel-sponsored Develop article, Orion Granatir explores ways to unlock its potential.

Intel Atom Processor: Overview

The Intel Atom processor is unique. It was designed from the ground up with power efficiency in mind. However, unlike other low-power solutions, this processor is fully x86-compatible. So, if you already know how to program for a standard x86 machine, you are ready for the Intel Atom processor. It’s just a lot smaller.

Since the Intel Atom processor is a fully compatible x86 core, it can run all the existing x86 programs, including Microsoft Windows and Linux. However, developers need to understand a few things about the design of the processor to maximize performance. First and foremost, the Intel Atom processor is an “in-order” processor. Unlike other x86 CPUs, the Intel Atom processor doesn’t have a large “out-of-order” execution unit. This means that it will not examine the stream of instructions and actively reorder them to improve instruction level parallelism.

For example:

a = b 7;
c = d 7;

In assembly, this could be written:

movl %eax, b
imull %eax, $7 ; Stall - memory load dependency
movl a, %eax
movl %edx, d
imull %edx, $7 ; Stall - memory load dependency
movl c, %edx

An out-of-order engine will notice that the load of b and d could be done first and decrease the memory load dependencies. However, with the Intel Atom processor these stalls will hang around. In some cases, your compiler will fix this. The Intel C++ Compiler (ICC) has several flags to optimize for Intel Atom processor-based devices: /QxL /QxSSE3_ATOM. These two flags will optimize for the in-order instruction scheduler and automatically use Streaming SIMD Extensions 3 (SSE3) instructions where possible.

Streaming SIMD Extensions 3

The Intel Atom processor supports SSE3. In fact, it supports Supplemental SSE3, which is a super set of SSE3. All modern PC games should take advantage of SSE3; it’s in 97 percent of all machines according to Valve's Steam Hardware & Software Survey (under the Windows and Mac section, expand Other Settings).

To really gain an understanding of the in-and-outs of the Intel Atom processor, check out the Intel software developer manuals. For optimizations, you should start with the “Intel 64 and IA-32 Architectures Optimization Reference Manual.”

Hyper-threading

In addition to SSE3, all Atom processors support Intel Hyper-Threading Technology. This allows multiple threads to run simultaneously on the same processor core. Hyper-threading can benefit performance and power efficiency, but it also creates some avoidable pitfalls for game performance.

One such pitfall is silent oversubscription because of graphics driver threading. It’s common for games to take advantage of multiple threads. A game might use threads for the main game loop, audio playback, and the graphics driver. It’s the graphics driver that’s key for most Intel Atom processor platforms.

Since the processor is packaged with various graphic solutions, the software sometimes handles the vertex processing. This means that a scene with heavy vertex processing will generate a lot of work in the graphics drivers. So, while it’s important to properly thread a game, oversubscription can quickly become a bottleneck on single core systems. The Intel VTune Amplifier XE 2011 is a useful tool to help you identify areas of oversubscription or underutilization within your application.

In any case, it’s worthwhile to write code that will adapt to the number of available cores on a platform. A year ago, most Intel Atom processors were single core, but now dual-core SKUs are becoming more common. With hyper-threading, a dual-core Intel Atom processor can support four simultaneous threads. On these dual core SKUs, it’s paramount to use all cores to achieve maximum performance.

Graphics

It’s also important to understand the various graphics solutions packaged with the Intel Atom processor. The Intel Atom processor ships with three integrated graphic solutions: Intel Graphics Media Accelerator (GMA) 945, GMA 3150, and GMA 500.

Intel Graphics Media Accelerator

Microsoft DirectX

OpenGL (on Microsoft Windows)

Vertex Shader

Pixel Shader

Support for Intel Graphics Performance Analyzers

945

9.0c

1.4

Software

Hardware

Yes

3150

9.0c

2.0

Software

Hardware

Yes

500

9.0c

1.5

Hardware

Hardware

N

For maximum compatibility across Intel Atom processor-based devices, verify acceptable performance of your app on a GMA 945/3150 and a GMA 500. Performance of these parts are very similar but different enough to warrant independent validation.

Across all the Intel Atom processor’s graphics hardware, there are a few key things to keep in mind. Remember, vertex processing might be done in software, so it’s best to minimize vertex processing as much as possible.

Effects that touch a lot of memory, such as post processing, can be expensive on integrated graphics processors because there is not enough local cache on the GPU to hold the whole frame buffer. So tasks like post-processing effects often involve rounds trips to main memory. It’s also important to properly compress textures because of the lack of on-GPU memory for texture storage. And remember to check device caps in DX9. For example, GMA 945 and GMA 3150 don’t support multisample anti-aliasing (MSAA).

On Linux-based platforms the drivers support OpenGL 2.0 for all three graphic solutions (GMA 945, 3150, and 500). Developers targeting Linux-based platforms can safely target OpenGL 2.0 for all devices.

For more information about Intel’s integrated graphics solutions, check out the Intel Graphics Developer’s Guides.

Linux-Based Devices

Numerous Linux distros support the Intel Atom processor and its associated chipsets: Android and MeeGo are of particular interest. MeeGo is an open source version of Linux championed by Intel and Nokia. It’s the combination of Intel’s original mobile Linux project Moblin and Nokia’s Maemo. MeeGo is being built to run on netbooks, phones, tablets, and other consumer devices. If you are interested in downloading MeeGo, the associated IDE, and emulator, check out Meego.com.

Intel provides several tools to help build, optimize, and debug applications targeting the Intel Atom processor. Check out the Intel Application Software Development Tool Suite for Intel Atom Processor. Most of the Intel tools, including the Intel C++ Compiler and Intel VTune Amplifier XE 2011, offer Windows and Linux versions.

An Ongoing Story

The story of the Intel Atom processor has just begun, chapter one being the processor’s popularity in netbooks. As the next chapters unfold, we will see the processor venture into a large array of devices. While the uses for the Intel Atom processor might change, between the covers it’s just the same x86-based processor with which we’re already familiar.

If you want to learn more about optimizing for the Intel Atom processor and netbooks, check out the article Omar Rodriguez and I wrote in the latest issue of Visual Adrenaline magazine. This article is based on a presentation we delivered at GDC 2010.

Refer to http://software.intel.com/en-us/articles/optimization-notice for more information regarding performance and optimization choices in Intel software products.

Other names and brands may be claimed as the property of others.