Intro of CSE234
Intro of CSE234
#CSE234#
1. Workloads
-
什么是 Model
- 参数parameters
- loss
- optimizer
-
CSE234关注三部分
-
Data
- Images, Text, Audio, Table,
-
Models
- CNN, RNN, Transformer, MoE,
-
Compute
- CPU, GPU/TPU/LPU, M1/M2/M3/M4, FPGA
-
-
我觉得这是很重要的思想,抓住重点
我学习重要的模型,做general的事情
- We will not be able to build systems that can support all models
- What are the most important workloads that solve 80% of the problems?
- System building is the process to reveal the most important factors
- We will keep asking ourselves: what are the most importanct X in Y
-
CNN
-
Top 3 Model
- AlexNet
- ResNet
- U-Net
-
top components
- Conv
- Matmul
- Softmax
- Elementwise operations: ReLU, add, sub, normalization, pooling
-
-
RNN
-
top 3 Model
- Bidirectional RNNs
- LSTM
- GRU
-
Most Important Components in RNNs
- Matmul
- Elementwise nonlinear: ReLU, Tanh, sigmoid
-
-
Attention: Enable parallelism
将每个位置的表示视为一个查询,以访问和整合一组值中的信息
实际上是把每个token看作相同的了,不用时间序列表示时间关系,能够同时处理不同"时刻"的结果
- Massively parallelizable
- Transformer: Attention + MLP0
-
Transformer
-
Top 3 Model
- Bert
- GPT/LLMs
- DiT
-
Attention components
- Matmul
- Softmax
- Normalization
-
MLP components
- Matmul
-
Something else:
- Layernorm
- GeLU
-
-
MoE: 三个臭皮匠胜过诸葛亮
-
Novel Components
- Router
-
What constitutes Router
- Matmul
- Softmax
-
-
回归Data, Model, Compute三个部分,分别对应
- 数据
- Math primitives(数学原语)
用数学原语表达模型计算流程 - Hardware
2. Dataflow graph representation
目标:尽可能多地使用一套编程接口,通过连接数学原语来表达尽可能多的模型
- Model and architecture
- Objective function
- Optimizer
- Data
-
这里在说明Application和System Design的相互塑造关系
- 不同的应用需求,催生出适配的系统架构;
- 而系统又反过来定义了 “如何表达计算”
-
两个计算表达风格
-
Symbolic,TenserFlow为代表
- 先定义完整计算图再计算
- 方便优化
- 编程不直观,调试苦难
-
Imperative,PyTorch为代表
- define-and-run
- 灵活、易编程和调试
- 优化难度大,效率较低
-
但是命令式编程就是比声明式编程爽;
因为更适合集成Python代码
-
-
即时编译(JIT)
- 解决 “灵活” 与 “高效” 的矛盾
- 开发时候define-and-run
- 推理时候
@torch.compile()

浙公网安备 33010602011771号