Author Archives: Martyn Bliss

Codeplay add OpenCL kernel code coverage!

 

AMD’s opinion of LPGPU2

Codeplay recently reached out to AMD, and asked for their thoughts on what we are doing with LPGPU2 and how we are leveraging their open source CodeXL tool.

AMD stated “The LPGPU2 project validates AMD’s decision 2 years ago to create GPUOpen and open up the source code to our tools and libraries.  Seeing CodeXL evolve through 3rd party participation to be used in new markets is very satisfying and a clear endorsement of our strategy.”

We at LPGPU2 are very pleased to base our work on the technologies made freely available by AMD’s forward looking decision to open source CodeXL.

 

Codeplay & Samsung update on remote profiling

This blog in brief is about Codeplay modifying CodeXL, AMD’s open sourced profiling tool, to be able for the first time to capture and display power usage data from a standard Android device. Codeplay have created a video to demonstrate this new capability. By adding this new capability LPGPU2 has opened up far reaching possibilities of profiling non AMD hardware and/or remote low power devices utilizing mobile OS’s such as Android and Tizen.

About the CodeXL <-> Android power profiling

LPGPU2’s version of CodeXL has added a new capability to be able to communicate with and retrieve power data from standard Android devices or any other device that implements the DC API (see below for more on this new API) and makes the data captured accessible to CodeXL via a custom extension to its remote protocol.

By extending CodeXL’s remote protocol to communicate with the Android device it is now able to receive power data sent from any remote application or library which implements the API on that device. We intend to further augment this by adding API call data as well as other items of interest.

The CodeXL remote device protocol remains backward compatible. CodeXL can update its live visualisations in real time while also recording the new power data to its standard but extended database for static analysis offline later (see below for more information). Again, like the remote protocol the database layer within CodeXL remains compatible with existing CodeXL projects.

The Android device is a standard phone, which has not been rooted. This ability to allow profiling to take place on any standard device is important for ease of use and to increase the number of possible users.

The phone has a custom application installed on it which installs a service which listens out for CodeXL to attach to it. When a connection is established the service provides CodeXL with information about the applications it can profile. The user can then start the selected application which commences sending profiling data back to CodeXL using the DC API.

For this demonstration the version of CodeXL shown is only able to communicate with an Android OS type device but this will change to manage communication with applications on other OS / devices in the future. For the LPGPU2 project Android is the primary mobile operating system.

The Data Collection (DC) API

At part of LPGPU2’s statement of work and early project planning it became clear that a standardised performance and power counter API was necessary to be developed and ready before trying to profile supported devices.

Rather than implement a different mechanism for each supported Khronos API the DC API was designed to support those APIs in a non-intrusive manner by being an API neutral solution to the problem of enumerating, describing, enabling, disabling and collecting data from disparate hardware with equally varied counter implementations.

The DC API has been developed by Samsung and implemented by Samsung and ThinkSilicon. The DC API can be implemented by an application or library on any remote device, not just an Android device.

Static Analysis

The beauty of being able to capture profiling data from a device is not just in the real time capture capability shown in the video but also when CodeXL is in offline mode permitting static analysis.

Samsung are developing as part of the LPGPU2 project a feedback engine which will analyze the captured data and feed back to the user efficiency anomalies. Codeplay have extended CodeXL’s data visualisations to highlight regions of interest within the profiling data from which the user can choose to examine the source code which is associated with the areas identified as possible sources of inefficiencies and be presented with advice and suggestions to improve their applications.

For more information on the Feedback engine please see the article “Farewell Borja and whither Feedback” on LPGU2’s web site.

Codeplay & Samsung deliver live power profiling from an Android device!

Working together closely Codeplay and Samsung have delivered remote power profiling from an Android device into the live power profiling view of CodeXL. This is a huge step forward and moves the tool suite a big step closer to full operation.

 

Farewell Borja and whither Feedback?

The purpose of my three month internship in Samsung has been working on the feedback engine of the LPGPU2 Tool (part of work package 6 of the LPGPU2 Project). The goal of the project is to develop a tool to help programmers in the development of efficient GPU applications (OpenGL, OpenCL and Vulkan supported). The key to making optimization easier is providing information to the programmer, so he is able to reason about the behavior of the app, identify issues and fix them. To that end, the LPGPU2 Tool captures data of the execution of an app on a cell phone and writes it to a database. This data includes a function call trace, used shaders, hardware counter samples and more. Those alone are very helpful assets, but the feedback engine makes optimization even easier. It is responsible for extracting information from the raw data that the database holds and reporting it to the user, so he knows where to focus his optimization efforts.

The feedback engine is formed by a set of Lua functions that are run by the LPGPU2 tool. These rely on the database access functions offered by the modified version of CodeXL that the tool is built upon. Feedback analysis results are written back to the database, so they can be shown to the user through the GUI. My work for the last three months has been implementing the Lua functions that perform the analysis of the database and produce useful information for optimization. However, the approach I chose for these functions has also had an impact on the tool as whole. As a result, I have also taken part into some design decisions, such as how the database should be accessed or how feedback results are written back and presented to the user. The following is an overview of the functionality I implemented, which covers several optimization stages, including limitation detection, identification of regions of interest and even specific suggestions on how to improve the code.

First, the engine can help the programmer to decide where to optimize. It analyzes the call trace and identifies regions of frames that are below a certain, user-defined, performance threshold. These are sets of contiguous frames that present identical library call mixes and that are consequently likely to benefit from the same optimizations. Information on these is shown, such as the functions (or type of functions) that take the longest, so efforts can be focused on the areas that will produce the greatest benefits.

Two different strategies to identify the causes limiting the performance of an application were implemented. One is based on analyzing the function call trace and the other on counters. The first one identifies causes why a certain frame is not meeting its performance target, related to the use of the library, such as too many blocking calls or too much non-GL work. For a lower level analysis, hardware counters are analyzed. For this, two levels of detail are also available. First, at a coarse grain, the app is identified as CPU, GPU or Temperature Limited. Then, considering the output of the previous analysis, a more detailed cause is looked for. Some examples of these are: failing to adequately use multi-threading, too many frame buffer stalls or too many instructions per pixel.

Regarding energy, a model to predict the impact of counter samples on energy consumption was used. By using this model, the engine can determine what counters are making the greatest contribution to energy, so optimizations can be performed on the correct places to make an efficient application.

To make optimizations even easier, the engine can even make suggestions of its own. By analyzing the library call stream, it can detect common patterns that are known to harm performance and recommend ways to avoid them. This feature is available for all the supported libraries and includes suggestions on adequate management of data, setup functions, textures and more. For OpenCL, kernels themselves are also analyzed and issues like incorrect use of vectorization or calculations for constant values are detected.

The feedback engine also supports the Mali Offline Compiler and the compilers included as part of PVRTools. These tools represent an extra source of information that enables the detection of important code issues, such as divergence or spilling. A computing power budget calculation is also performed on the shaders, so the programmer can get an idea of a much resources each one requires.

These features could be improved or extended by adding new capabilities, such as tracking the use of resources to determine if library calls could be reordered to avoid unnecessary state changes. Nevertheless, I consider that my work for the last three months has laid a good foundation on which to build a really useful feedback engine.

How do we test the LPGPU2 Tool?

As we continue to develop the LPGPU2 tool suite, its important that we verify the collected data and the analysis generated by feedback engine. How do we do this? We write targeted test apps that stress the system in different ways. Here’s a short video of some of the apps being written to stress GPU’s:

 

What happened at the LPGPU2 Hackathon? Find out here!

In August the LPGPU2 consortium met for a hackathon in Berlin. Codeplay Software Ltd kindly recorded and edited a video with some of the highlights. Find out more about what we achieved at the Hackathon and the LPGPU2 project in general:

New Starter at Samsung

Luke West is a scientific software developer with a background in applied mathematics spanning Engineering Simulation, Oil and Gas applications and Climate Science. He is rarely seen far from graphics, however, as much of his day to day work has been concerned with the visualisation of various types of scientific data: satellite measurements of sea-surface height for storm surge prediction, for example, or particle track animation methods for simulating oil-slicks. Luke also built a tool for 3D visualisation of millennium-scale climate predictions for investigating the ocean’s impact on global warming.

 

Luke also has a long-term interest in High Performance Computing, and says he watched with great interest as the emphasis shifted from flops-per-second to flops per Watt. Now he is looking forward to coding closer to the metal than ever before with the LPGPU2 Tools team at Samsung where he will be working on diverse applications to test and challenge the LPGPU profiler to its limits.

 

Ad-blockers help reduce power consumption

View story at Medium.com

 

F2F Video Interviews #9: Ignacio Aransay Think Silicon

Ignacio talks low power GPU’s, Think Silicon and his hopes for the project:

 

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close