CSC447
Parallel Programming for Multicore and Cluster Systems
Course Description
This course provides an introduction to parallel programming with a focus on multicore architectures and cluster programming techniques. Topics include relevant architectural trends and aspects of multicores, writing multicore programs and extracting data parallelism using vectors and SIMD, thread-level parallelism, task-based parallelism, efficient synchronization, program profiling, and performance tuning. Message-passing cluster-based parallel computing is also introduced. The course includes several programming assignments to provide students first-hand experience with programming, and experimentally analyzing and tuning parallel software. Prerequisite: CSC310 Algorithms and Data Structures; CSC326 Operating Systems.
Course Learning Outcomes
CLO.1 Students should understand the challenges of as well as the motivations for using parallel programming.
CLO.2 Students shall demonstrate an ability to analyze the efficiency of a given parallel algorithm.
CLO.3 Students shall demonstrate an ability to design, analyze, and implement programming applications on multicore and manycore systems.
Important Dates
January 20, 2020: Spring classes begin
February 21, 2020: Last day for early withdrawal (WI)
February 20, 2020: Midterm Examination
March 30, 2020: Last day for withdrawal from courses (WP/WF)
May 24, 2020: Spring classes end
Instructor
Professor Haidar M. Harmanani
haidar@lau.edu.lb • http://vlsi.byblos.lau.edu.lb • http://harmanani.github.io
Office Hours:
Block A • Room 810
Tuesday, Thursday • 3:00pm – 4:30pm • 8:00pm – 9:30pm or by appointment
Lectures
Lecture 01: Administrivia and Introduction
Lecture 02: Why Parallel Programming?
Lecture 03: Parallel Architectures
Lecture 04: Performance Analysis
Lecture 05: Designing Parallel Programs
Lecture 06: Shared-Memory Programming: Processes, Threads, Data Races, and False Sharing
Lecture 07: Shared Memory Programming Using POSIX Threads
Lecture 08: Shared Memory Programming Using POSIX Threads CV
Lecture 09: Shared Parallel Programming Using OpenMP
Lecture 10: Shared Parallel Programming Using OpenMP (Continued)
Lecture 11: Shared Parallel Programming Using OpenMP (Continued)
Lecture 12: Introduction to OpenACC
Lecture 13: OpenACC Directives
Lecture 14: OpenACC Data Management
Lecture 15: OpenACC Loop Optimization
Lecture 16: Introduction to CUDA C
Lecture 17: Portability and Scalability in Heterogeneous Parallel Computing
Lecture 18: Threads and Kernel Functions
Lecture 19: CUDA Parallelism Model
Lecture 20: CUDA Parallelism Model: Examples
Lecture 20: Memory and Data Locality
Lecture 21: Tiled Matrix Multiplication
Lecture 22: Introduction to Neural Networks
Lecture 23: Introduction to Deep Learning
Lecture 24: Convolution Neural Networks (if time permits)
Lecture 25: Recurrent Neural Network Basics (if time permits)
Readings
Inside the Kepler Architecture
Assignments
Assignment 1: Introduction to Pthreads Programming
Assignment 2: Parallel Computation using OpenMP
Assignment 3: Parallel Computation using OpenACC
Assignment 4: Advanced CUDA Programming
Labs
Project
TBA
Exams
All students are expected to take exams during the scheduled time slots. With the permission of the instructor, you may be allowed to take an exam at an alternate time. However, you must request this rescheduling at least 2 weeks prior to the exam date. Exceptions will naturally be made for sudden problems such as serious illnesses/injury. Since the exam schedule is being published at the beginning of the semester, scheduling conflicts (e.g., job interviews, GREs, etc.) are not legitimate reasons to miss an exam.
Midterm Exam
The midterm exam is scheduled for February 16, 2016: Midterm Examination. The midterm exam will be a closed book exam. In principle, all topics discussed in class (whether on the lecture notes or not) and in the assigned readings are a legitimate source for exam questions.
Spring 2011 Midterm Examination.
Final Exam
The final will be comprehensive, with roughly 1/3 of the material devoted to material covered prior to the midterm. The exam will be on May 15, 2015 from 8:00 am - 11:00 am.
Grades
Midterm Grades
Final Grades
Course Grades
Resources
Programming Tutorials
Self-Paced Lab1
Self-Paced Lab2
Self-Paced Lab3
POSIX Threads Programming
pthreads Functions Guide
OpenMP
J CUDA
CUDA Tutorial
CUDA Zone
mpiJava
Intel C++ Compiler: Register as a student and then check the following link
MPI Tutorial - Part I
MPI Tutoria - Part II
MPI Tutorial - Part III
Programming References
OpenGL MPI Implementation of the Mandelbrot Set
MPI Ping Pong
MPI Matrix Multiplication
Readings