LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2013)
In conjunction with the HiPEAC’13 Conference
Tuesday morning, January 22, 2013, Berlin, Germany
Room location: Turmalin
The recent success of advanced mobile platforms, such as iOS, Android, and Windows Phone coincides with the rising challenge of ensuring a long battery life, and accompanies a larger trend away from increasing processor clock speeds in favour of increasing parallelism. That high performance computing (HPC) is also strongly motivated in this area, as witnessed by the recent Green500 List, illustrates the timeliness and ubiquity of topics relating to power-efficient hardware and software design. The LPGPU Workshop on Power-Efficient GPU and Manycore Computing (PEGPUM) workshop, colocated with HiPEAC 2013 in Berlin, intends to foster dialogue and interaction among researchers addressing contemporary issues in low-power GPU and many-core software and hardware design.
Approaches to the design challenges of power-efficient GPU and many-core computing will be addressed and may include topics such as:
- Heterogeneous Many-core Architectures including Mobile and Embedded Platforms
- GPU Programming Models, APIs, Languages, Tools and Compilers
- Low-Power Application Case Studies and Performance Evaluations
- “Green” High Performance Computing
Further information on the LPGPU project may be found at lpgpu.org.
The half-day PEGPUM 2013 workshop is administered within the auspices of the LPGPU FP7 project (lpgpu.org).
Paul Keir (Codeplay Software Ltd.) Mauricio Alvarez Mesa (TU Berlin) “Welcome”
11:00-11:30 Coffee Break
11:30-11:55 Andreas Olofsson (Adapteva) “An in-depth analysis of the Epiphany-IV 28nm 64-core coprocessor” (Slides)
11:55-12:15 Stefanos Kaxiras (Uppsala University) “Practically Costless Coherence for GPUs and Manycore Accelerators” (Slides)
12:15-12:40 Theo Drane (Imagination Technologies) “Architectural Numeration”
12:40-13:00 James Pallister (Embecosm/University of Bristol) “The Impact of Different Compiler Options on Energy Consumption” (Slides)
Speakers, Titles and Abstracts
Speaker: Simon McIntosh-Smith, Head of Microelectronics Group and University of Bristol Business Fellow, University of Bristol
Abstract: Heterogeneous architectures combining multiple different kinds of programmable cores are rapidly becoming mainstream. They are already ubiquitous in mobile devices, are rapidly spreading in laptops and desktops, and are expected to penetrate high performance computing in the near future. Initial architectures were “Frankensteins” that simply integrated previously discrete CPUs and GPUs; now, systems designers are revisiting and redesigning architectures from the ground up to take heterogeneity into account. In this talk we will introduce some of the most important developments, including the new Heterogeneous System Architecture (HSA) standard, and OpenCL’s Standard Portable Intermediate Representation (SPIR).
Speaker: Sohan Lal, AES group, Technische Universität Berlin.
Abstract: While GPUs architecture offer general-purpose (GPGPU) compute performance an order of magnitude higher than that of conventional CPUs, they have also been rapidly approaching the “power wall”. Thus, the design space of GPGPU micro-architecture needs to take power into account. While GPU researchers have previously relied on cycle-accurate simulators for estimating performance during design cycles, there are no simulation tools that include power as well. In this talk I will present a framework that we are developing at TU-Berlin for estimating GPU power for GPGPUs. Our methodology combines analytical and empirical power models for regular and irregular hardware components. Initial results show that our framework can be used for microarchitectural power and performance estimation with an acceptable error margin.
Speaker: Prof. Stefanos Kaxiras, Uppsala University.
Abstract: Much of the complexity and overhead (directory, state bits, invalidations, broadcasts) of a typical coherence implementation stems from the effort to make it “invisible” even to the strongest memory consistency model. In this talk, I argue that a much simpler, directory-less/broadcast-less, manycore coherence can outperform a typical protocol but without its complexity and overhead with just a data-race-free guarantee from software. This leads to a new result: a virtually costless coherence that outperforms a MESI protocol (in performance and power) and has important power, area and performance implications, especially for GPUs and manycore accelerators.
Speaker: Theo Drane, Principal Design Engineer, Imagination Technologies.
Abstract: Datapath design requires the challenge of translating algorithm, accuracy and performance requirements into hardware. Graphics chips use dozens of different number formats, a host of precisions and a variety of operator accuracies – architects, RTL designers and verification engineers must understand the complex interplay between these choices and the power, area, frequency, latency, throughput and numerical accuracy of the system. This talk will show what is at stake when considering numeration, that seemingly minor changes to accuracy specifications can give rise to significantly more power efficient hardware and consider some of general techniques used in meeting these challenges.
Speaker: James Pallister, Embecosm/University of Bristol
Abstract: This talk describes an extensive study into how compiler optimization affects the energy usage of benchmarks on different platforms. We use an fractional factorial design to explore the energy consumption of 87 optimizations GCC performs when compiling 10 benchmarks for five different embedded platforms. Hardware power measurements on each platform are taken to ensure all architectural effects on the energy are captured and that no information is lost due to inaccurate or incomplete models.
We find that in the majority of cases execution time and energy consumption are highly correlated, but the effect a particular optimization may have is non-trivial due to its interactions with other optimizations. There is no one optimization that is universally positive for run-time or energy consumption, as the structure of the benchmark heavily influences the optimization’s effectiveness.
While this research focussed on GCC and embedded systems, in this presentation I shall also discuss the wider applicability of these results. I shall consider how these results are relevant to other compilers, such as LLVM, their suitability for other low power platforms such as Mobile GPUs, and how a similar approach would help applications such as OpenCL and CUDA.
Organisers and their affiliations
Ben Juurlink and Mauricio Alvarez Mesa, TU Berlin
Simon McIntosh-Smith, University of Bristol
Stefanos Kaxiras, Uppsala University
Paul Keir, Codeplay Software Ltd.
Codeplay Software Ltd.,
45 York Place,
T +44 (0)131 466 0503
F +44 (0)131 557 6600
E paul AT codeplay DOT com