1. Table of Contents
- Original implementation of SmoothQuant and why it’s not correct: ✅
- The correct way of implementation IMO: ✅
2. Original way of getting absMax values
def register_collect_smoothquant_hook(model, data_loader, num_batch=200):
model.eval()
act_scales = {}
weight_scales = {}
def forward_hook(module, input, name):
hidden_dim_act = input[0].shape[1]
tensor_act = input[0].view(-1, hidden_dim_act).abs().detach()
comming_max_act = torch.max(tensor_act, dim=0)[0].float().cpu()
if name not in act_scales:
act_scales[name] = comming_max_act
else:
act_scales[name] = torch.max(act_scales[name], comming_max_act)
- Input shape: [4, 256, 182, 182]
- hidden_dim_act = 256
- tensor_act: [4*182*182, 256]
- torch.max(tensor_act, dim=0): [1, 256]
- torch.max(tensor_act, dim=0)[0]: [256]
- Divide input by the scaling factor $ s $ computed with these max values $\neq$ SmoothQuant
3. The correct way of implementation
- Convolution operation can be further divided into im2col+sgemm.
- Therefore, the above two processes should be similar.
4. What’s next?
- Implement im2col+sgemm on Pytorch level.
- Implement quantization between im2col and sgemm.
- Implement SmoothQuant operation (migrate difficulty) before applying quantization.