Course: Node-Level Performance Engineering

LRZ aktuell publish at lrz.de
Do Okt 18 11:04:08 CEST 2012


+----------------------------------------------------------------------+
 |    Date:    |Thursday Dec 6, 2012 10:00 - 18:00                      |
 |             |Friday Dec 7, 2012 09:00 - 17:00                        |
 |-------------+--------------------------------------------------------|
 |             |LRZ Building, University campus Garching, near Munich   |
 |  Location:  |Boltzmannstr. 1                                         |
 |             |Hoersaal H.E.009                                        |
 |-------------+--------------------------------------------------------|
 |             |This course teaches performance engineering approaches  |
 |             |on the compute node level. "Performance Engineering" as |
 |             |we define it is more than employing tools to identify   |
 |             |hotspots and bottlenecks. It is about developing a      |
 |             |thorough understanding of the interactions between      |
 |             |software and hardware. This process must start at the   |
 |             |core, socket, and node level, where the code gets       |
 |             |executed that does the actual computational work. Once  |
 |             |the architectural requirements of a code are understood |
 |             |and correlated with performance measurements, the       |
 |             |potential benefit of optimizations can often be         |
 |             |predicted. We introduce a "holistic" node-level         |
 |             |performance engineering strategy, apply it to different |
 |             |algorithms from computational science, and also show how|
 |             |an awareness of the performance features of an          |
 |             |application may lead to notable reductions in power     |
 |             |consumption.                                            |
 |             |                                                        |
 |             |Introduction                                            |
 |             |                                                        |
 |             |  * Intel and AMD x86 architectures                     |
 |             |  * ccNUMA                                              |
 |             |  * Performance modeling & engineering approaches       |
 |             |  * Our Approach                                        |
 |             |                                                        |
 |             |Practical performance analysis                          |
 |             |                                                        |
 |             |  * The LIKWID tools                                    |
 |             |  * Typical performance patterns                        |
 |             |                                                        |
 |             |Microbenchmarks and the memory hierarchy                |
 |             |                                                        |
 |             |  * Understanding the memory hierarchy                  |
 |             |      + Data transfer between memory levels             |
 |             |      + Write allocate vs. NT stores                    |
 |             |      + Modeling of cache hierarchies                   |
 |             |      + Contention                                      |
 |             |  * NUMA effects - anisotropy and asymmetry             |
 |             |                                                        |
 |             |Typical node-level software overheads                   |
 |             |                                                        |
 |             |  * Cost of synchronization                             |
 |             |  * Work distribution                                   |
 |  Contents:  |                                                        |
 |             |Example Problem: The 3D Jacobi solver                   |
 |             |                                                        |
 |             |  * Core-level optimizations                            |
 |             |      + Blocking                                        |
 |             |      + Non Temporal stores                             |
 |             |      + SIMD vectorization (SSE, AVX)                   |
 |             |  * Multithreading - contention at different memory     |
 |             |    hierarchies                                         |
 |             |  * Temporal Blocking                                   |
 |             |                                                        |
 |             |Example Problem: The Lattice-Boltzmann Method (LBM)     |
 |             |                                                        |
 |             |  * Introduction                                        |
 |             |  * Roofline Model                                      |
 |             |  * Data layout                                         |
 |             |  * Non Temporal stores                                 |
 |             |  * Model for in-cache data & multicore scaling         |
 |             |  * Sparse representation and options for propagation   |
 |             |                                                        |
 |             |Example Problem: Sparse Matrix-Vector Multiplication    |
 |             |                                                        |
 |             |  * Data layouts                                        |
 |             |  * Performance model - CPU vs. GPU                     |
 |             |  * Bandwidth reduction                                 |
 |             |                                                        |
 |             |Example Problem: A backprojection algorithm for CT      |
 |             |reconstruction                                          |
 |             |                                                        |
 |             |  * The algorithm                                       |
 |             |  * Naive analysis                                      |
 |             |  * Detailed analysis and performance model             |
 |             |  * Optimizations                                       |
 |             |                                                        |
 |             |Energy & Parallel Scalability                           |
 |             |                                                        |
 |             |  * Energy consumption of modern processors             |
 |             |  * The energy-to-solution metric                       |
 |             |  * Performance engineering = Power engineering and     |
 |             |    energy efficiency                                   |
 |             |  * Case studies                                        |
 |             |                                                        |
 |             |Between each module, there is time for Questions and    |
 |             |Answers!                                                |
 |-------------+--------------------------------------------------------|
 |Prerequisites|Participants must have basic knowledge in programming   |
 |             |with Fortran or C                                       |
 |-------------+--------------------------------------------------------|
 |  Language:  |English                                                 |
 |-------------+--------------------------------------------------------|
 |  Teacher:   |Prof. Gerhard Wellen/RRZE, Dr. Georg Hager/RRZE et. al. |
 |-------------+--------------------------------------------------------|
 |Registration:|Please register via the LRZ registration form (http://  |
 |             |www.lrz.de/services/schulung/kursanmeldung)             |
 |             |(Please choose course HNPF1W12)                         |
 +----------------------------------------------------------------------+


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4442/

 Matthias Brehm



Mehr Informationen über die Mailingliste aktuell