1. Table of Contents
- The strcuture of the CenterPoint-Vexel model: ✅
- Inference time of each layer: ✅
- Memory usage of each layer: ✅
- Storage usage of each layer: ✅
- Quantization of the model: ✅
2. The strcuture of the CenterPoint-Vexel model
CenterPoint(
(vfe): MeanVFE()
(backbone_3d): VoxelResBackBone8x(
(conv_input): SparseSequential(
(0): SubMConv3d(5, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
(1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv1): SparseSequential(
(0): SparseBasicBlock(
(conv1): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(1): SparseBasicBlock(
(conv1): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
(conv2): SparseSequential(
(0): SparseSequential(
(0): SparseConv3d(16, 32, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
(1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): SparseBasicBlock(
(conv1): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(2): SparseBasicBlock(
(conv1): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
(conv3): SparseSequential(
(0): SparseSequential(
(0): SparseConv3d(32, 64, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
(1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): SparseBasicBlock(
(conv1): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(2): SparseBasicBlock(
(conv1): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
(conv4): SparseSequential(
(0): SparseSequential(
(0): SparseConv3d(64, 128, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[0, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
(1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): SparseBasicBlock(
(conv1): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
(2): SparseBasicBlock(
(conv1): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(relu): ReLU()
(conv2): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
(bn2): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
(conv_out): SparseSequential(
(0): SparseConv3d(128, 128, kernel_size=[3, 1, 1], stride=[2, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
(1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(map_to_bev_module): HeightCompression()
(pfe): None
(backbone_2d): BaseBEVBackbone(
(blocks): ModuleList(
(0): Sequential(
(0): ZeroPad2d((1, 1, 1, 1))
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False)
(2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
(1): Sequential(
(0): ZeroPad2d((1, 1, 1, 1))
(1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
)
(deblocks): ModuleList(
(0): Sequential(
(0): ConvTranspose2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Sequential(
(0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
)
)
(dense_head): CenterHead(
(shared_conv): Sequential(
(0): Conv2d(512, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(heads_list): ModuleList(
(0): SeparateHead(
(center): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(center_z): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(dim): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(rot): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(vel): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(hm): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(1-2): 2 x SeparateHead(
(center): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(center_z): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(dim): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(rot): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(vel): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(hm): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(3): SeparateHead(
(center): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(center_z): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(dim): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(rot): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(vel): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(hm): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(4-5): 2 x SeparateHead(
(center): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(center_z): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(dim): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(rot): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(vel): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(hm): Sequential(
(0): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
)
(hm_loss_func): FocalLossCenterNet()
(reg_loss_func): RegLossCenterNet()
)
(point_head): None
(roi_head): None
)
Conclusion: compared with normal 2D object detection algorithm’s used layers, LiDAR-based 3D object detection utilize layers like SparseConv3d
and SubMConv3d
to implement 3D backbone. Other layers are regular.
3. Inference time of each layer
- With the use of
Profiler
:
But what we get right here is the layer name of the inner framework, i.e., the function name of C
language. Therefore, I write some simple codes to get the consuming of time of each Pytorch layer.
- With the use of customized code:
We get top 20 time-consuming Pytorch layer:
2024-04-09 09:47:53,138 INFO <class 'torch.nn.modules.conv.Conv2d'>: 0.07607173919677734 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SubMConv3d'>: 0.058701515197753906 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SubMConv3d'>: 0.038175344467163086 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SubMConv3d'>: 0.030804872512817383 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SparseConv3d'>: 0.025530099868774414 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SparseConv3d'>: 0.02077198028564453 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.ConvTranspose2d'>: 0.020176410675048828 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SparseConv3d'>: 0.016696453094482422 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.ConvTranspose2d'>: 0.014063358306884766 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SubMConv3d'>: 0.014019966125488281 seconds
2024-04-09 09:47:53,139 INFO <class 'pcdet.models.backbones_3d.vfe.mean_vfe.MeanVFE'>: 0.012219667434692383 seconds
2024-04-09 09:47:53,139 INFO <class 'pcdet.models.detectors.centerpoint.CenterPoint'>: 0.01217341423034668 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.Conv2d'>: 0.009574413299560547 seconds
2024-04-09 09:47:53,139 INFO <class 'pcdet.models.backbones_2d.map_to_bev.height_compression.HeightCompression'>: 0.006260395050048828 seconds
2024-04-09 09:47:53,139 INFO <class 'spconv.pytorch.conv.SparseConv3d'>: 0.005869150161743164 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.Conv2d'>: 0.003749370574951172 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.Conv2d'>: 0.003445148468017578 seconds
2024-04-09 09:47:53,139 INFO <class 'torch.nn.modules.conv.Conv2d'>: 0.0028769969940185547 seconds
That is to say, the major time-consuming layer should beConv2d
, ConvTransposed2d
, SubMConv3d
, and SparseConv3d
, which means that we can start from quantization of Conv2d
and ConvTranspose2d
(since SubMConv3d
and SparseConv3d
don’t have pre-defined quantization class right now, but we can always customize layers afterwards).
4. Memory usage of each layer
I tried self-written hooks to figure out the GPU memory usage of each layer, but it always seems that there’s some problem with my code. So I instead use profiler
:
As we can see from the right side, the layer which has high GPU memory usage is the Sparse-Conv3d
from the module spconv
(Name: cumm:conv:....
). But I can’t quite understand that why Conv2d
didn’t use any of the GPU memory while has GPU time on them (on the right side).
5. Storage usage of each layer (fvcore)
| name | #elements or shape |
|:----------------------------|:---------------------|
| model | 8.9M |
| backbone_3d | 2.7M |
| backbone_3d.conv_input | 2.2K |
| backbone_3d.conv_input.0 | 2.2K |
| backbone_3d.conv_input.1 | 32 |
| backbone_3d.conv1 | 27.8K |
| backbone_3d.conv1.0 | 13.9K |
| backbone_3d.conv1.1 | 13.9K |
| backbone_3d.conv2 | 0.1M |
| backbone_3d.conv2.0 | 13.9K |
| backbone_3d.conv2.1 | 55.5K |
| backbone_3d.conv2.2 | 55.5K |
| backbone_3d.conv3 | 0.5M |
| backbone_3d.conv3.0 | 55.4K |
| backbone_3d.conv3.1 | 0.2M |
| backbone_3d.conv3.2 | 0.2M |
| backbone_3d.conv4 | 2.0M |
| backbone_3d.conv4.0 | 0.2M |
| backbone_3d.conv4.1 | 0.9M |
| backbone_3d.conv4.2 | 0.9M |
| backbone_3d.conv_out | 49.4K |
| backbone_3d.conv_out.0 | 49.2K |
| backbone_3d.conv_out.1 | 0.3K |
| backbone_2d | 4.6M |
| backbone_2d.blocks | 4.3M |
| backbone_2d.blocks.0 | 1.0M |
| backbone_2d.blocks.1 | 3.2M |
| backbone_2d.deblocks | 0.3M |
| backbone_2d.deblocks.0 | 33.3K |
| backbone_2d.deblocks.1 | 0.3M |
| dense_head | 1.7M |
| dense_head.shared_conv | 0.3M |
| dense_head.shared_conv.0 | 0.3M |
| dense_head.shared_conv.1 | 0.1K |
| dense_head.heads_list | 1.4M |
| dense_head.heads_list.0 | 0.2M |
| dense_head.heads_list.1 | 0.2M |
| dense_head.heads_list.2 | 0.2M |
| dense_head.heads_list.3 | 0.2M |
| dense_head.heads_list.4 | 0.2M |
| dense_head.heads_list.5 | 0.2M |
And below is a storage diagram of a ResNet-50
:
Skipped operation aten::batch_norm 53 time(s)
Skipped operation aten::max_pool2d 1 time(s)
Skipped operation aten::add_ 16 time(s)
Skipped operation aten::adaptive_avg_pool2d 1 time(s)
FLOPs: 4089184256
| name | #elements or shape |
|:-----------------------|:---------------------|
| model | 25.6M |
| conv1 | 9.4K |
| conv1.weight | (64, 3, 7, 7) |
| bn1 | 0.1K |
| bn1.weight | (64,) |
| bn1.bias | (64,) |
| layer1 | 0.2M |
| layer1.0 | 75.0K |
| layer1.0.conv1 | 4.1K |
| layer1.0.bn1 | 0.1K |
| layer1.0.conv2 | 36.9K |
| layer1.0.bn2 | 0.1K |
| layer1.0.conv3 | 16.4K |
| layer1.0.bn3 | 0.5K |
| layer1.0.downsample | 16.9K |
| layer1.1 | 70.4K |
| layer1.1.conv1 | 16.4K |
| layer1.1.bn1 | 0.1K |
| layer1.1.conv2 | 36.9K |
| layer1.1.bn2 | 0.1K |
| layer1.1.conv3 | 16.4K |
| layer1.1.bn3 | 0.5K |
| layer1.2 | 70.4K |
| layer1.2.conv1 | 16.4K |
| layer1.2.bn1 | 0.1K |
| layer1.2.conv2 | 36.9K |
| layer1.2.bn2 | 0.1K |
| layer1.2.conv3 | 16.4K |
| layer1.2.bn3 | 0.5K |
| layer2 | 1.2M |
| layer2.0 | 0.4M |
| layer2.0.conv1 | 32.8K |
| layer2.0.bn1 | 0.3K |
| layer2.0.conv2 | 0.1M |
| layer2.0.bn2 | 0.3K |
| layer2.0.conv3 | 65.5K |
| layer2.0.bn3 | 1.0K |
| layer2.0.downsample | 0.1M |
| layer2.1 | 0.3M |
| layer2.1.conv1 | 65.5K |
| layer2.1.bn1 | 0.3K |
| layer2.1.conv2 | 0.1M |
| layer2.1.bn2 | 0.3K |
| layer2.1.conv3 | 65.5K |
| layer2.1.bn3 | 1.0K |
| layer2.2 | 0.3M |
| layer2.2.conv1 | 65.5K |
| layer2.2.bn1 | 0.3K |
| layer2.2.conv2 | 0.1M |
| layer2.2.bn2 | 0.3K |
| layer2.2.conv3 | 65.5K |
| layer2.2.bn3 | 1.0K |
| layer2.3 | 0.3M |
| layer2.3.conv1 | 65.5K |
| layer2.3.bn1 | 0.3K |
| layer2.3.conv2 | 0.1M |
| layer2.3.bn2 | 0.3K |
| layer2.3.conv3 | 65.5K |
| layer2.3.bn3 | 1.0K |
| layer3 | 7.1M |
| layer3.0 | 1.5M |
| layer3.0.conv1 | 0.1M |
| layer3.0.bn1 | 0.5K |
| layer3.0.conv2 | 0.6M |
| layer3.0.bn2 | 0.5K |
| layer3.0.conv3 | 0.3M |
| layer3.0.bn3 | 2.0K |
| layer3.0.downsample | 0.5M |
| layer3.1 | 1.1M |
| layer3.1.conv1 | 0.3M |
| layer3.1.bn1 | 0.5K |
| layer3.1.conv2 | 0.6M |
| layer3.1.bn2 | 0.5K |
| layer3.1.conv3 | 0.3M |
| layer3.1.bn3 | 2.0K |
| layer3.2 | 1.1M |
| layer3.2.conv1 | 0.3M |
| layer3.2.bn1 | 0.5K |
| layer3.2.conv2 | 0.6M |
| layer3.2.bn2 | 0.5K |
| layer3.2.conv3 | 0.3M |
| layer3.2.bn3 | 2.0K |
| layer3.3 | 1.1M |
| layer3.3.conv1 | 0.3M |
| layer3.3.bn1 | 0.5K |
| layer3.3.conv2 | 0.6M |
| layer3.3.bn2 | 0.5K |
| layer3.3.conv3 | 0.3M |
| layer3.3.bn3 | 2.0K |
| layer3.4 | 1.1M |
| layer3.4.conv1 | 0.3M |
| layer3.4.bn1 | 0.5K |
| layer3.4.conv2 | 0.6M |
| layer3.4.bn2 | 0.5K |
| layer3.4.conv3 | 0.3M |
| layer3.4.bn3 | 2.0K |
| layer3.5 | 1.1M |
| layer3.5.conv1 | 0.3M |
| layer3.5.bn1 | 0.5K |
| layer3.5.conv2 | 0.6M |
| layer3.5.bn2 | 0.5K |
| layer3.5.conv3 | 0.3M |
| layer3.5.bn3 | 2.0K |
| layer4 | 15.0M |
| layer4.0 | 6.0M |
| layer4.0.conv1 | 0.5M |
| layer4.0.bn1 | 1.0K |
| layer4.0.conv2 | 2.4M |
| layer4.0.bn2 | 1.0K |
| layer4.0.conv3 | 1.0M |
| layer4.0.bn3 | 4.1K |
| layer4.0.downsample | 2.1M |
| layer4.1 | 4.5M |
| layer4.1.conv1 | 1.0M |
| layer4.1.bn1 | 1.0K |
| layer4.1.conv2 | 2.4M |
| layer4.1.bn2 | 1.0K |
| layer4.1.conv3 | 1.0M |
| layer4.1.bn3 | 4.1K |
| layer4.2 | 4.5M |
| layer4.2.conv1 | 1.0M |
| layer4.2.bn1 | 1.0K |
| layer4.2.conv2 | 2.4M |
| layer4.2.bn2 | 1.0K |
| layer4.2.conv3 | 1.0M |
| layer4.2.bn3 | 4.1K |
| fc | 2.0M |
| fc.weight | (1000, 2048) |
| fc.bias | (1000,) |
The reason why I also attached a table of ResNet-50
model is that I want to point out the fact that the storage usage of CenterPoint
is less than that of ResNet-50
.
From my perspective, it (may) be common that LiDAR-based 3D object detection model’s storage usage is smaller than traditional 2D model, since LiDAR-based 3D object detection model’s 3D backbone are using spconv
module, a module that take the sparsity nature of Conv3d
into consideration and therefore reduce the model storage usage, GPU memory usage and inference time. So it makes sense that from a perspective of storage usage, 3D < 2D.
Back to our original topic, from the first table, we can clearly see that backbone_2d
is the most storage-consuming layer, which means we can start from quantization of Conv2d
initially.
6. Quantization of the model
Here we try both dynamic and static PTQ only on Conv2d
, i.e., have some calibration data flowed within the model, so that the quantizer can get a good idea of the dynamic range of the activation.
- Original results:
- dynamic calibration-max quantization results:
with model layer’s amax
is dynamic:
- static calibration-max quantization results:
with model layer’s amax
is a specific range or number:
- Weight-only quantization:
with model input quantizer’s num_bits
is 16-bit:
7. Conclusion
We can clearly see that the results are only effected a little bit on the basis that we only quantized 2D conv (But this can be due to the optimization of pytorch-quantization
). So I think our potential next target will be the spconv
layer. Based on the documentation of the pytorch-quantization
, we can customize our own quantized layer just like below: