1. Table of Contents:

  • Final results and comparison: ✅
  • Do L1loss tests within the model again to see which part (SmoothQuant or transformation) has greater benefits: ✅
  • Do multiple tests on 50X input-scale with the scaling factor of SmoothQuant changing to see if other factors can provide better results (lower L1loss): ✅
  • Lots of experiments to analyze the accuracy loss: ✅
  • Modify the model based on the above experiments: ✅
  • Validate the accuracy of the final model:
    • through mAP and NDS: ✅

2. Final results and comparison:

image-20240523143346185

From the graph, we can clearly see that our method fill in the accuracy gap brought by standard W8A8 quantization by migrating the quantization diffculty from activation to weight.

3. Verify our method by example inputs

L1loss

We can clearly see the benefits of applying SmoothQuant to Convolutional operation from the graph due to lower L1loss.

4. Multiple tests on 50X input with different scaling factors

scaling

After doing multiple tests on our method with example 50X scaled inputs, we found that bigger the scaling factor is, better the results are. But this may not be always the case since here it is only example inputs. We are going to do a similar tests on the entire model in the following section.

5. Initial status:

  • Baseline methods (W8A8 quantization):
    • which quantize all activation of Conv2d (in both backbone_2d and dense_head) with signed quantization.
Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) (0.5782 - 0.5804) (0.6482 - 0.6499) signed quantization for all activation of Conv2d (in both backbone_2d and dense_head)
Our methods

5.1 Baseline methods: Signed activation quantization -> unsigned activation quantization for all Conv2d layers due to ReLU

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) (0.5782 - 0.5804) -> (0.5882 - 0.5888) (0.6482 - 0.6499) -> 0.6603 signed -> unsigned quantization for all activation of Conv2d (in both backbone_2d and dense_head)
Our methods

5.2 Our methods: Replace original model’s all Conv2d layers with our layers (including Conv2d in both “backbone_2d” and “dense_head”)

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) (0.5882 - 0.5888) 0.6603 unsigned quantization for all activation of Conv2d (in both backbone_2d and dense_head)
Our methods 0.0332 0.1086 Replace original model’s all Conv2d layers with our layers (including Conv2d in both backbone_2d and dense_head) with scaling factor = 0.5

meaning that head is really sensitive to both SQ operation and quantization. Therefore, in the next section, we will take a look at

5.3 Our methods: Replace original model’s backbone_2d’s Conv2d layers with our layers and all its dense_head’s Conv2d with baseline methods (standard W8A8))

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) (0.5882 - 0.5888) 0.6603 unsigned quantization for all activation of Conv2d (in both backbone_2d and dense_head)
Our methods 0.0332 -> 0.5889 0.1086 -> 0.6607 Replace original model’s backbone_2d’s Conv2d layers with our layers with scaling factor = 0.5, and all its dense_head’s Conv2d with baseline methods (standard W8A8)).

At this stage, the result is good, but the scaling factor for SmoothQuant is 0.5 for now, and we have no idea what about other factors. Therefore, we will try different scaling factors to see which will be the best scaling factor.

5.4 Our methods: with different scaling factor

  • mAP:

accuracy

  • NDS:

nds

meaning that scaling factor = 0.5 may be a good and stable choice. Hence, in the following experiments, we are gonna stick with scaling factor = 0.5.

5.5 Test: Only quantizing the Conv2d in dense_head (standard W8A8) without quantizing the Conv2d in backbone_2d (W16A16)

The goal of this experiment is to test which module still cause the accuracy gap. So we only quantize the Conv2d in dense_head:

  • If the accuracy stays the same compared with our current method’s results, then it means that the accuracy gap is caused by quantization of Conv2d in dense_head, since we only quantized the Conv2d in dense_head here.
  • If accuracy doesn’t stay the same, then the accuracy gap is caused by something else.

It turns out:

Model mAP NDS
Our methods 0.5889 -> 0.5891 0.6607 -> 0.6607

which doesn’t make much difference. Hence, it means that the accuracy gap is caused by quantization of Conv2d in dense_head, which further indicates that we should selectively enable of quantization of Conv2d in dense_head.

5.6 Compare the dense_head activation L1loss between (the original model and the standard W8A8-quantized Conv2d in backbone_2d and our method’s Conv2d in dense_head) and (the original model and fully standard W8A8-quantized model) to get the selective list.

image-20240521165153103

In this case, we compare the L1loss layer by layer within dense_head and find a list of Conv2d’s name that shouldn’t be quantized (either with too large L1loss or NaN in L1loss):

  • dense_head.heads_list.0.center.1
  • dense_head.heads_list.0.center_z.1
  • dense_head.heads_list.0.dim.1
  • dense_head.heads_list.0.rot.1
  • dense_head.heads_list.0.vel.1
  • dense_head.heads_list.0.hm.0.0
  • dense_head.heads_list.0.hm.1
  • dense_head.heads_list.1.center.1
  • dense_head.heads_list.1.center_z.1
  • dense_head.heads_list.1.dim.1
  • dense_head.heads_list.1.rot.1
  • dense_head.heads_list.1.vel.1
  • dense_head.heads_list.1.hm.0.0
  • dense_head.heads_list.1.hm.1
  • dense_head.heads_list.2.center.1
  • dense_head.heads_list.2.dim.1
  • dense_head.heads_list.2.rot.1
  • dense_head.heads_list.2.vel.1
  • dense_head.heads_list.2.hm.0.0
  • dense_head.heads_list.2.hm.1
  • dense_head.heads_list.3.center.1
  • dense_head.heads_list.3.dim.1
  • dense_head.heads_list.3.vel.1
  • dense_head.heads_list.3.hm.0.0
  • dense_head.heads_list.3.hm.1
  • dense_head.heads_list.4.center.1
  • dense_head.heads_list.4.dim.1
  • dense_head.heads_list.4.vel.1
  • dense_head.heads_list.4.hm.0.0
  • dense_head.heads_list.4.hm.1
  • dense_head.heads_list.5.center.1
  • dense_head.heads_list.5.dim.1
  • dense_head.heads_list.5.vel.1
  • dense_head.heads_list.5.hm.0.0
  • dense_head.heads_list.5.hm.1

Therefore, in the following experiment, we would just apply standard W8A8 quantization to these Conv2d or even not quantize them at all.

5.7 Two more experiments as compared to the No.5 test.

  • Standard W8A8-quantized Conv2d in the above list and backbone_2d, with other Conv2d quantized with our methods (rest in dense_head):
    • The mAP is 0.5881.
  • Standard W8A8-quantized Conv2d in the above list, with other Conv2d quantized with our methods (rest in dense_head and all in backbone_2d):
    • The mAP turns to 0.5889.
  • No.5 test: Only quantizing the Conv2d in dense_head (standard W8A8) without quantizing the Conv2d in backbone_2d (W16A16):
    • The mAP is 0.5889.

This shows that our methods are indeed working, and the issue lies within the dense_head part. Therefore, to prove our methods work, we will not quantize Conv2d in the above list.

5.8 Baseline methods: Only quantize Conv2d in backbone_2d without quantization of Conv2d in dense_head

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) (0.5882 - 0.5888) -> 0.5912 0.6603 -> 0.6630 Only quantize Conv2d in backbone_2d without quantization of Conv2d in dense_head
Our methods 0.5889 0.6607 Replace original model’s backbone_2d’s Conv2d layers with our layers with scaling factor = 0.5, and all its dense_head’s Conv2d with baseline methods (standard W8A8)).

5.9 Our methods: Replace all original model’s backbone_2d’s Conv2d layers and dense_head’s Conv2d layers with our layers with scaling factor = 0.5, except for the Conv2d in the above list.

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) 0.5912 0.6630 Only quantize Conv2d in backbone_2d without quantization of Conv2d in dense_head
Our methods 0.5889 -> 0.5914 0.6607 -> 0.6638 Replace all original model’s backbone_2d’s Conv2d layers and dense_head’s Conv2d layers with our layers with scaling factor = 0.5, except for the Conv2d in the above list, but the quantizer still exist in them.

But the W16A16 Conv2d quantizer for those which are in the above list still exist, so we delete them since they are useless in the case of W16A16 and will also cause accuracy loss after applying static PTQ.

5.10 Our methods: Delete the W16A16 Conv2d quantizer for those which are in the above list.

Model mAP NDS Structure
Original model (0.5921 - 0.5922) 0.6648 No change
Baseline methods (W8A8 quantization) 0.5912 0.6630 Only quantize Conv2d in backbone_2d without quantization of Conv2d in dense_head
Our methods 0.5914 -> 0.5921 0.6638 -> 0.6641 Replace all original model’s backbone_2d’s Conv2d layers and dense_head’s Conv2d layers with our layers with scaling factor = 0.5, except for the Conv2d in the above list.

which proves that our methods are effective.

6 What’s next?

  • Check the result of our method using symmetric quantization.
  • Implementation of Quantization of Sparse Conv3d
  • Find math method of how to apply SmoothQuant to Sparse Conv3d
  • Different quantization combo on both baseline method and our method:
    • W16A8, W16A4 W16A3, W16A2
    • W8A16, W4A16 W3A16, W2A16
    • W8A8, W8A4, W4A8, W4A4
    • to show greater benefits brought by our method
  • Try on different dataset:
    • Waymo
  • Try different 3DOD model:
    • Such as FSD etc
  • Since what we now implement is Dynamic SmoothQuant operation, we can try static SmoothQuant operation