Technical Program
- 9:00–9:15 Opening Remarks (Jeremy Singer, Tim Harris, Milind Kulkarni)
- 9:15–10:20 Keynote: (Chair: Milind Kulkarni)
Memory Hierarchy Visibility in Parallel Programming Languages
The choice as to which levels in a memory hierarchy are exposed within a programming language or API can be critical. Expose too many, and you risk programmability, and performance portability.
Heterogeneous computing and GPGPU aims to repurpose the data-parallel capability of graphics and commodity hardware for general calculations. GPGPU APIs, which now include OpenCL SYCL; Apple's Metal; and Qualcomm's MARE; must all decide on a suitable abstraction for hardware memory levels. Established GPGPU APIs such as CUDA, C++AMP, and OpenCL offer language support for four levels of volatile memory. However, while the presence of GPUs are now essentially ubiquitous, the diminished role of discrete graphics cards invigorates questions regarding memory abstraction.
The multicore revolutionaries have now ceded mobile computing to the CPU-GPU system-on-chips; firmly established in mainstream options such as the Qualcomm Snapdragon; Samsung Exynos; and the AMD APU series. Meanwhile, the HSA Foundation builds upon a bedrock of uniform memory access; the Android GPGPU API, Renderscript, eschews explicit memory address spaces; and CUDA now offers "unified" memory. Can caché once again mean hidden?
- 10:20–10:50 Coffee break
- 10:50–12:05 Session 1: Hardware, memory technologies and optimization (Chair: Hans Boehm)
- 12:05–13:35 Lunch
- 13:35–14:50 Session 2: Locality and memory allocation (Chair: Dave Grove)
- 14:50–15:20 Coffee break
- 15:20–16:10 Session 3: Semantics (Chair: Richard Jones)
- 16:15–16:50 Session 4: Poster lightning session (Chair: Tim Harris)
- Cache-Conscious Memory Management
- DataProf: Exposing Data Movements Across The Memory Hierarchy
- Field Pinning Garbage Collector
- High-level Portable Programming Language for Optimized Memory Use of Network Processors
- Nesoi: Static checking of transactional coverage in parallel programs
- Optimal Thread-to-Core Mapping for Pipeline Programs
- Precise Memory Allocator for Compressed Swap Cache in Mobile Devices
- 16:50–17:40 Poster session and discussions