Here’s a flyer for the class.
Instructors:
- Georg Stadler, Warner Weaver Hall Office #1111
- Dhairya Malhotra, Warner Weaver Hall Office #1008
Lectures: Mondays 5:10-7:00pm, class starts on January 28
Location: Warren Weaver Hall #1302
Organization: We use Slack for organization, communication and cooperation. Please email if you want to be added.
Lecture Material
Lec. no. | Date | Topics | Slides/ Material | Code | Homework |
---|---|---|---|---|---|
1 | Jan 28 | Theory: HPC Tour, Top 500 lists, applications, examples; Tools: ssh, module | slides computing@CIMS | code | hw #1 [PDF, TEX], due Feb 11 |
2 | Feb 4 | Theory: Memory hierarchies, computational intensity, programming models, scalability, Amdahl’s law; Tools: valgrind, cachgrind | slides | code, video | |
3 | Feb 11 | Theory: Bandwidth, latency and valgrind examples, single core performance (pipelining, vectorization) | slides | code | hw #2 [PDF, TEX], due March 11 |
- | Feb 18 | No class due to NYU holiday | |||
4 | Feb 25 | Theory: Amdahl’s law, parallel scalability, Distributed memory model, OpenMP, shared memory performance and speedup; Tools: local git | slides | code | |
- | Mar 4 | No class due to NYU closure (fake snow storm day) | |||
5 | Mar 8 | Makeup class, Rm 101, 4:10-6:00PM; Video of the class is on NYU Classes (login required) HPC19-Panopto: Theory: More OpenMP, atomic operations; Tools: remote git, git merge, Make | slides | code | |
6 | Mar 11 | Theory: More OpenMP, NUMA, review of vectorization + lec3-demo, Libraries: BLAS, LAPACK, FFTW | slides | code | hw #3 [PDF, TEX], due April 1 |
7 | Mar 25 | Theory: computing on GPGPUs, GPU architecture, thread hierarchy, memory spaces, global memory management, launching kernels; Tools: nvcc compiler | slides | code | |
8 | Apr 1 | Theory: computing on GPGPUs, shared memory, thread synchronization; Tools: job schedulers (SLURM), final project examples | slides | code | hw #4 [PDF, TEX], due April 15 |
9 | Apr 8 | Theory: Final project, GPU performance, occupancy, streams; Examples: bitonic sort, scan, image blurring, cuBLAS | slides | code | |
10 | Apr 15 | Theory: Sources of parallelism, distributed memory computing, partitioning I, MPI blocking and non-blocking Send and Recv | slides | code | hw #5 [PDF, TEX], due April 29 |
11 | Apr 22 | Theory: collective MPI calls (reduce, gather, etc), more distributed memory examples; Tools: more SLURM | slides | code | |
12 | Apr 29 | Theory: MPI algorithms: parallel Jacobi (blocking and non-blocking), hypercube algorithms; problem partitioning and distribution | slides | code | hw #6 [PDF, TEX], due May 13 |
13 | May 6 | Theory: Space filling curves and Morton IDs, multigrid; Tools: paraview | slides | code | |
14 | May 13 | Theory: N-body problems, Tree code, Fast Multipole Method; | slides | code |