1. Table of Contents

  • Original implementation of SmoothQuant and why it’s not correct: ✅
  • The correct way of implementation IMO: ✅

2. Original way of getting absMax values

def register_collect_smoothquant_hook(model, data_loader, num_batch=200):
    model.eval()
    act_scales = {}
    weight_scales = {}

    def forward_hook(module, input, name):
        hidden_dim_act = input[0].shape[1]
        tensor_act = input[0].view(-1, hidden_dim_act).abs().detach()
        comming_max_act = torch.max(tensor_act, dim=0)[0].float().cpu()
        if name not in act_scales:
            act_scales[name] = comming_max_act
        else:
            act_scales[name] = torch.max(act_scales[name], comming_max_act)
  • Input shape: [4, 256, 182, 182]
  • hidden_dim_act = 256
  • tensor_act: [4*182*182, 256]
  • torch.max(tensor_act, dim=0): [1, 256]
  • torch.max(tensor_act, dim=0)[0]: [256]
  • Divide input by the scaling factor $ s $ computed with these max values $\neq$ SmoothQuant

3. The correct way of implementation

image-20240514141635648

  • Convolution operation can be further divided into im2col+sgemm.
  • Therefore, the above two processes should be similar.

4. What’s next?

  • Implement im2col+sgemm on Pytorch level.
  • Implement quantization between im2col and sgemm.
  • Implement SmoothQuant operation (migrate difficulty) before applying quantization.