Category Archives: Hardware

LPGPU2 YouTube channel is live!

Everyone check out the LPGPU2 YouTube channel at:

We are now live and featuring a video of the power simulator developed by TUB in the LPGPU project:

Power measurement testbed designed by TUB to accurately measure the power consumption of discrete GPUs under EU funded LPGPU project. The testbed was used to validate the GPU power simulator which was also developed under LPGPU project.

More content coming soon!

ARM Acquires Geomerics

ARM logo

ARM today acquired Geomerics, one of the LPGPU consortium members and developer of Enlighten. Geomerics and ARM have been collaborating on efficient mobile rendering technologies as part of Task 3.6, and presented their work at SIGGRAPH earlier this year. The acquisition expands ARM’s position at the forefront of the visual computing and graphics industries. Additionally, the agreement enables Geomerics to build on their existing partnerships as well as accelerate their development in mobile.

Read the full press release here, and the FAQ here.

TU Berlin Paper to appear at MTAGS13 workshop, Co-located with SC 2013

The paper “FPGA-Based Prototype of Nexus++ Task Manager”, by Tamer Dallou, Ahmed Elhossini and Ben Juurlink, is accepted to appear at the 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers, which is Co-located with Supercomputing/SC 2013, on November 17th, 2013, Denver, Colorado, USA.

The Nexus++ task manager is designed for task-based programming Nexus++_HL2models. Furthermore, it will be ported to GPGPUSim as an extension to add dependency-awareness to GPUs, at block level granularity.

Abstract: StarSs is one of several programming models that try to relieve parallel programming. In StarSs, the programmer has to identify pieces of code that can be executed as tasks, as well as their inputs and outputs. Thereafter, the runtime system (RTS) determines the dependencies between tasks and schedules ready tasks onto worker cores. Previous work has shown, however, that the StarSs RTS may constitute a bottleneck that limits the scalability of the system and proposed a hardware task management system called Nexus++ to eliminate this bottleneck. The first prototype of Nexus++ was implemented in SystemC. Its architecture also had a nondeterministic multi-cycle search algorithm in its critical path, potentially limiting its scalability. In this paper, we improved the architecture of Nexus++ and employed a multi-way set-associative cache-like data structures to optimize its search algorithm and increase task throughput. We also modeled the new architecture in VHDL and targeted a Virtex~5 FPGA from Xilinx. Experimental results show that the new architecture is very resource-efficient utilizing only 19% of the target FPGA. It also shows that Nexus++ achieves a speedup of up to 81x using some synthetic benchmarks modeled after H.264 decoding. Hence, Nexus++ significantly enhances the scalability of applications parallelized using StarSs.


By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.