CUDA Vault

Search Ctrl + K

CUDA Vault

Search Ctrl + K

00-Dashboard

面試陷阱題 (Exam Traps)

CUDA 學習地圖 (MOC)

CUDA 速查表 (Quick Reference)

01-Introduction-to-CUDA

GPU 運算基礎 (GPU Computing Foundations)

執行模型與 SIMT (Execution Model and SIMT)

Tile 程式設計 (Tile Programming)

GPU 記憶體階層 (GPU Memory Hierarchy)

CUDA 平台 (The CUDA Platform)

CUDA 入門練習題 (Practice - Introduction to CUDA)

02-Programming-GPUs

CUDA C++ Kernel 與啟動 (CUDA C++ Kernels and Launch)

CUDA C++ 記憶體管理 (CUDA C++ Memory Management)

CPU/GPU 同步與完整流程 (Synchronization and Full Workflow)

錯誤檢查與函式/變數修飾符 (Error Checking and Specifiers)

CUDA Python 入門 (Intro to CUDA Python)

SIMT 基礎與 Thread 階層 (SIMT Basics and Thread Hierarchy)

SIMT 裝置記憶體空間 (SIMT Device Memory Spaces)

SIMT 記憶體效能 (SIMT Memory Performance)

Atomics、Cooperative Groups 與 Occupancy

Tile Kernel 結構與啟動 (Tile Kernel Structure)

Tile 載入/儲存與控制流 (Tile Load/Store and Control Flow)

Tile 運算與基本操作 (Tile Operations and Primitives)

Tile Atomics 與最佳化 (Tile Atomics and Optimization)

非同步執行：Streams 與 Events (Async Streams and Events)

Stream 回呼、排序與 CUDA Graphs (Callbacks, Ordering, Graphs)

Unified 與 System Memory (Unified and System Memory)

NVCC：NVIDIA CUDA 編譯器 (The NVIDIA CUDA Compiler)

第二章練習題 (Practice - Programming GPUs in CUDA)

03-Advanced-CUDA

進階啟動與 Clusters (Advanced Launch and Clusters)

進階 Streams 與相依啟動 (Advanced Streams and Dependent Launch)

批次記憶體傳輸與環境變數 (Batched Transfers and Environment Variables)

使用 PTX 與硬體模型 (Using PTX and Hardware Model)

Thread Scopes 與 Scoped Atomics (Thread Scopes and Scoped Atomics)

非同步 Barriers 與 Pipelines (Asynchronous Barriers and Pipelines)

非同步資料複製與 L1/Shared 配置 (Async Data Copies and L1/Shared Config)

CUDA Driver API (The CUDA Driver API)

多 GPU 程式設計 (Multi-GPU Programming)

CUDA 功能導覽 (A Tour of CUDA Features)

第三章練習題 (Practice - Advanced CUDA)

04-CUDA-Features

Unified Memory：完整支援深入 (Unified Memory Full Support)

Unified Memory：平台差異與效能提示 (Platforms and Performance Hints)

CUDA Graphs：結構與擷取 (Graph Structure and Capture)

CUDA Graphs：更新與條件節點 (Updating and Conditional Nodes)

CUDA Graphs：記憶體節點與裝置端啟動 (Memory Nodes and Device Launch)

Stream-Ordered Memory Allocator

Cooperative Groups 深入 (Cooperative Groups Deep Dive)

Programmatic Dependent Launch 深入 (PDL Deep Dive)

Lazy Loading 與 Error Log Management

Asynchronous Barriers 深入 (Async Barriers Deep Dive)

Pipelines 深入 (Pipelines Deep Dive)

非同步資料複製：LDGSTS (Async Data Copies with LDGSTS)

非同步資料複製：TMA (Async Data Copies with TMA)

非同步資料複製：STAS (Async Data Copies with STAS)

Work Stealing 與 Cluster Launch Control

L2 Cache Control

Memory Synchronization Domains

Interprocess Communication (IPC)

Virtual Memory Management

Extended GPU Memory (EGM)

CUDA Dynamic Parallelism

Graphics Interoperability (OpenGL/Direct3D/SLI)

External Resource Interoperability (Vulkan/Direct3D/NVSCI)

Driver Entry Point Access

第四章練習題 (Practice - CUDA Features)

Enter your search text in the box above

Select a result to preview

Enter to select

⇅ to navigate

ESC to close

CUDA Vault

Coming soon...

Connected Pages

Pages mentioning this page

No other pages mentions this page