The purpose of my three month internship in Samsung has been working on the feedback engine of the LPGPU2 Tool (part of work package 6 of the LPGPU2 Project). The goal of the project is to develop a tool to help programmers in the development of efficient GPU applications (OpenGL, OpenCL and Vulkan supported). The key to making optimization easier is providing information to the programmer, so he is able to reason about the behavior of the app, identify issues and fix them. To that end, the LPGPU2 Tool captures data of the execution of an app on a cell phone and writes it to a database. This data includes a function call trace, used shaders, hardware counter samples and more. Those alone are very helpful assets, but the feedback engine makes optimization even easier. It is responsible for extracting information from the raw data that the database holds and reporting it to the user, so he knows where to focus his optimization efforts.
The feedback engine is formed by a set of Lua functions that are run by the LPGPU2 tool. These rely on the database access functions offered by the modified version of CodeXL that the tool is built upon. Feedback analysis results are written back to the database, so they can be shown to the user through the GUI. My work for the last three months has been implementing the Lua functions that perform the analysis of the database and produce useful information for optimization. However, the approach I chose for these functions has also had an impact on the tool as whole. As a result, I have also taken part into some design decisions, such as how the database should be accessed or how feedback results are written back and presented to the user. The following is an overview of the functionality I implemented, which covers several optimization stages, including limitation detection, identification of regions of interest and even specific suggestions on how to improve the code.
First, the engine can help the programmer to decide where to optimize. It analyzes the call trace and identifies regions of frames that are below a certain, user-defined, performance threshold. These are sets of contiguous frames that present identical library call mixes and that are consequently likely to benefit from the same optimizations. Information on these is shown, such as the functions (or type of functions) that take the longest, so efforts can be focused on the areas that will produce the greatest benefits.
Two different strategies to identify the causes limiting the performance of an application were implemented. One is based on analyzing the function call trace and the other on counters. The first one identifies causes why a certain frame is not meeting its performance target, related to the use of the library, such as too many blocking calls or too much non-GL work. For a lower level analysis, hardware counters are analyzed. For this, two levels of detail are also available. First, at a coarse grain, the app is identified as CPU, GPU or Temperature Limited. Then, considering the output of the previous analysis, a more detailed cause is looked for. Some examples of these are: failing to adequately use multi-threading, too many frame buffer stalls or too many instructions per pixel.
Regarding energy, a model to predict the impact of counter samples on energy consumption was used. By using this model, the engine can determine what counters are making the greatest contribution to energy, so optimizations can be performed on the correct places to make an efficient application.
To make optimizations even easier, the engine can even make suggestions of its own. By analyzing the library call stream, it can detect common patterns that are known to harm performance and recommend ways to avoid them. This feature is available for all the supported libraries and includes suggestions on adequate management of data, setup functions, textures and more. For OpenCL, kernels themselves are also analyzed and issues like incorrect use of vectorization or calculations for constant values are detected.
The feedback engine also supports the Mali Offline Compiler and the compilers included as part of PVRTools. These tools represent an extra source of information that enables the detection of important code issues, such as divergence or spilling. A computing power budget calculation is also performed on the shaders, so the programmer can get an idea of a much resources each one requires.
These features could be improved or extended by adding new capabilities, such as tracking the use of resources to determine if library calls could be reordered to avoid unnecessary state changes. Nevertheless, I consider that my work for the last three months has laid a good foundation on which to build a really useful feedback engine.
Comments are closed.