Farewell Borja and whither Feedback?

The purpose of my three month internship in Samsung has been working on the feedback engine of the LPGPU2 Tool (part of work package 6 of the LPGPU2 Project). The goal of the project is to develop a tool to help programmers in the development of efficient GPU applications (OpenGL, OpenCL and Vulkan supported). The key to making optimization easier is providing information to the programmer, so he is able to reason about the behavior of the app, identify issues and fix them. To that end, the LPGPU2 Tool captures data of the execution of an app on a cell phone and writes it to a database. This data includes a function call trace, used shaders, hardware counter samples and more. Those alone are very helpful assets, but the feedback engine makes optimization even easier. It is responsible for extracting information from the raw data that the database holds and reporting it to the user, so he knows where to focus his optimization efforts.

The feedback engine is formed by a set of Lua functions that are run by the LPGPU2 tool. These rely on the database access functions offered by the modified version of CodeXL that the tool is built upon. Feedback analysis results are written back to the database, so they can be shown to the user through the GUI. My work for the last three months has been implementing the Lua functions that perform the analysis of the database and produce useful information for optimization. However, the approach I chose for these functions has also had an impact on the tool as whole. As a result, I have also taken part into some design decisions, such as how the database should be accessed or how feedback results are written back and presented to the user. The following is an overview of the functionality I implemented, which covers several optimization stages, including limitation detection, identification of regions of interest and even specific suggestions on how to improve the code.

First, the engine can help the programmer to decide where to optimize. It analyzes the call trace and identifies regions of frames that are below a certain, user-defined, performance threshold. These are sets of contiguous frames that present identical library call mixes and that are consequently likely to benefit from the same optimizations. Information on these is shown, such as the functions (or type of functions) that take the longest, so efforts can be focused on the areas that will produce the greatest benefits.

Two different strategies to identify the causes limiting the performance of an application were implemented. One is based on analyzing the function call trace and the other on counters. The first one identifies causes why a certain frame is not meeting its performance target, related to the use of the library, such as too many blocking calls or too much non-GL work. For a lower level analysis, hardware counters are analyzed. For this, two levels of detail are also available. First, at a coarse grain, the app is identified as CPU, GPU or Temperature Limited. Then, considering the output of the previous analysis, a more detailed cause is looked for. Some examples of these are: failing to adequately use multi-threading, too many frame buffer stalls or too many instructions per pixel.

Regarding energy, a model to predict the impact of counter samples on energy consumption was used. By using this model, the engine can determine what counters are making the greatest contribution to energy, so optimizations can be performed on the correct places to make an efficient application.

To make optimizations even easier, the engine can even make suggestions of its own. By analyzing the library call stream, it can detect common patterns that are known to harm performance and recommend ways to avoid them. This feature is available for all the supported libraries and includes suggestions on adequate management of data, setup functions, textures and more. For OpenCL, kernels themselves are also analyzed and issues like incorrect use of vectorization or calculations for constant values are detected.

The feedback engine also supports the Mali Offline Compiler and the compilers included as part of PVRTools. These tools represent an extra source of information that enables the detection of important code issues, such as divergence or spilling. A computing power budget calculation is also performed on the shaders, so the programmer can get an idea of a much resources each one requires.

These features could be improved or extended by adding new capabilities, such as tracking the use of resources to determine if library calls could be reordered to avoid unnecessary state changes. Nevertheless, I consider that my work for the last three months has laid a good foundation on which to build a really useful feedback engine.

How do we test the LPGPU2 Tool?

As we continue to develop the LPGPU2 tool suite, its important that we verify the collected data and the analysis generated by feedback engine. How do we do this? We write targeted test apps that stress the system in different ways. Here’s a short video of some of the apps being written to stress GPU’s:


What happened at the LPGPU2 Hackathon? Find out here!

In August the LPGPU2 consortium met for a hackathon in Berlin. Codeplay Software Ltd kindly recorded and edited a video with some of the highlights. Find out more about what we achieved at the Hackathon and the LPGPU2 project in general:

Ben Juurlink presents LPGPU2 research results at ScalPerf workshop

Ben Juurlink has been invited to the ScalPerf workshop to present recent research results. The ScalPerf’17 workshop is held in Bertinoro, Italy from September 17 to September 22 and this 15th edition focuses on “Storage and Memory Issues in Computing Systems.” Ben Juurlink will present the work “E²MC: Entropy Encoding Based Memory Compression for GPUs” which has been carried out in the context of the LPGPU2 project and which has been previously presented at the IPDPS conference.

New Starter at Samsung

Luke West is a scientific software developer with a background in applied mathematics spanning Engineering Simulation, Oil and Gas applications and Climate Science. He is rarely seen far from graphics, however, as much of his day to day work has been concerned with the visualisation of various types of scientific data: satellite measurements of sea-surface height for storm surge prediction, for example, or particle track animation methods for simulating oil-slicks. Luke also built a tool for 3D visualisation of millennium-scale climate predictions for investigating the ocean’s impact on global warming.


Luke also has a long-term interest in High Performance Computing, and says he watched with great interest as the emphasis shifted from flops-per-second to flops per Watt. Now he is looking forward to coding closer to the metal than ever before with the LPGPU2 Tools team at Samsung where he will be working on diverse applications to test and challenge the LPGPU profiler to its limits.


Spin Digital participating at the QLED and HDR10+ Summit at IFA Berlin

IFA SummitSpin Digital is participating at the QLED & HDR10+ Summit at IFA 2017 event that takes places in Berlin on Sept 1st.

This event is focused on emerging display technologies including Quantum Dot, HDR and Wide Color Gamut. Organized by Insight Media and sponsored by Samsung, the Summit is a meeting for industry specialists working on the latest video technologies.

Spin Digital HEVC Media Player and encoder include support for High Dynamic Range and Wide Color Gamut, and Spin Digital is working with several partners in the media industry for having complete workflows for next generation UHD video. Read more »

Ad-blockers help reduce power consumption

View story at Medium.com


Spin Digital at IBC 2017

Spin Digital at IBC 2017Spin Digital Video Technologies GmbH (Spin Digital) will present at IBC 2017 a new generation of its advanced HEVC/H.265 software solution for ultra-high definition video. A demonstration will be presented from September 15-19 at the Amsterdam RAI at Hall 1 Booth 1.F11.

Spin Digital is going to present a new version of its media player, capable of processing very high quality HEVC/H.265 video including 8K at 120 fps. Additional software optimizations allow to reach even higher performance (higher bitrate) or be able to process 8K using smaller, lower power computing systems. Read more »

LPGPU2 Hackathon in Berlin

On the 22nd and 23rd of August, members of the LPGPU2 consortium held their first hackathon in Berlin with the aim of integrating different parts of the LPGPU2 tool suite, discussing project progress and refining plans for the upcoming tasks and deliverables. The event intermingled many coding sessions with several discussion/ planning sessions.

F2F Video Interviews #9: Ignacio Aransay Think Silicon

Ignacio talks low power GPU’s, Think Silicon and his hopes for the project:


By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.