Spring 2019: Advanced Topics in Numerical Analysis: High Performance Computing

Cross-listed as MATH-GA.2012-001 and CSCI-GA 2945.001

Here’s a flyer for the class.

Instructors:

Lectures: Mondays 5:10-7:00pm, class starts on January 28

Location: Warren Weaver Hall #1302

Organization: We use Slack for organization, communication and cooperation. Please email if you want to be added.

Lecture Material

Lec. no. Date Topics Slides/ Material Code Homework
1 Jan 28 Theory: HPC Tour, Top 500 lists, applications, examples; Tools: ssh, module slides computing@CIMS code hw #1 [PDF, TEX], due Feb 11
2 Feb 4 Theory: Memory hierarchies, computational intensity, programming models, scalability, Amdahl’s law; Tools: valgrind, cachgrind slides code, video  
3 Feb 11 Theory: Bandwidth, latency and valgrind examples, single core performance (pipelining, vectorization) slides code hw #2 [PDF, TEX], due March 11
- Feb 18 No class due to NYU holiday      
4 Feb 25 Theory: Amdahl’s law, parallel scalability, Distributed memory model, OpenMP, shared memory performance and speedup; Tools: local git slides code  
- Mar 4 No class due to NYU closure (fake snow storm day)      
5 Mar 8 Makeup class, Rm 101, 4:10-6:00PM; Video of the class is on NYU Classes (login required) HPC19-Panopto: Theory: More OpenMP, atomic operations; Tools: remote git, git merge, Make slides code  
6 Mar 11 Theory: More OpenMP, NUMA, review of vectorization + lec3-demo, Libraries: BLAS, LAPACK, FFTW slides code hw #3 [PDF, TEX], due April 1
7 Mar 25 Theory: computing on GPGPUs, GPU architecture, thread hierarchy, memory spaces, global memory management, launching kernels; Tools: nvcc compiler slides code  
8 Apr 1 Theory: computing on GPGPUs, shared memory, thread synchronization; Tools: job schedulers (SLURM), final project examples slides code hw #4 [PDF, TEX], due April 15
9 Apr 8 Theory: Final project, GPU performance, occupancy, streams; Examples: bitonic sort, scan, image blurring, cuBLAS slides code  
10 Apr 15 Theory: Sources of parallelism, distributed memory computing, partitioning I, MPI blocking and non-blocking Send and Recv slides code hw #5 [PDF, TEX], due April 29
11 Apr 22 Theory: collective MPI calls (reduce, gather, etc), more distributed memory examples; Tools: more SLURM slides code  
12 Apr 29 Theory: MPI algorithms: parallel Jacobi (blocking and non-blocking), hypercube algorithms; problem partitioning and distribution slides code hw #6 [PDF, TEX], due May 13
13 May 6 Theory: Space filling curves and Morton IDs, multigrid; Tools: paraview slides code  
14 May 13 Theory: N-body problems, Tree code, Fast Multipole Method; slides code  

Other