Cuda 1 - Basics on GPU, CUDA, Memory Model
UDACITY教程 Intro to Parallel Programming
- Basics on GPU, CUDA, Memory Model
- Parallel Algorithms(Reduce, Scan, Histogram, Sort)
- Optimize Parallel GPU Programs
- Others(Library, OpenACC, Dynamic parallelism)
1. GPU Architecture
Hardware -> SM: Streaming Multiprocessor, 高度线程化的多核流处理器
Software -> Block: Could run group of threads cooperate to work
One SM –> Multi-Block; Threads in different blocks should not cooperate(even in a same SM)
GPU Device Query
2. 3-Ways to Accelerate Applications
Libraries, OpenACC Directives, Programming Languages
3. Cuda Kernel
hello world example
4. Problem 1
Convert color image to gray:
solution
5. Parallel Communication Patterns
stencil patterns: data reuse, 从特定位置邻居获取data
transpose: reorder data elements in array: array of structures(AOS), structure of arrays(SOA)
out[i + j*128] = in[j + i*128] ==> transpose operation
6. Memory Model
Local > Shared >> Global Memory
7. Synchronize & Mutex
同步:__syncthreads()
互斥:atomicAdd() example
8. Problem 2
Image blur:
solution