1. Table of Contents

  • The strcuture of the CenterPoint-Vexel model: ✅
  • Inference time of each layer: ✅
  • Memory usage of each layer: ✅
  • Storage usage of each layer: ✅
  • Quantization of the model: ✅

2. The strcuture of the CenterPoint-Vexel model

CenterPoint(
  (vfe): MeanVFE()
  (backbone_3d): VoxelResBackBone8x(
    (conv_input): SparseSequential(
      (0): SubMConv3d(5, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
      (1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (conv1): SparseSequential(
      (0): SparseBasicBlock(
        (conv1): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
      (1): SparseBasicBlock(
        (conv1): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(16, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
    (conv2): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d(16, 32, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
        (1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseBasicBlock(
        (conv1): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
      (2): SparseBasicBlock(
        (conv1): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(32, 32, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(32, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
    (conv3): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d(32, 64, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
        (1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseBasicBlock(
        (conv1): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
      (2): SparseBasicBlock(
        (conv1): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(64, 64, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
    (conv4): SparseSequential(
      (0): SparseSequential(
        (0): SparseConv3d(64, 128, kernel_size=[3, 3, 3], stride=[2, 2, 2], padding=[0, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
        (1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): SparseBasicBlock(
        (conv1): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
      (2): SparseBasicBlock(
        (conv1): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (relu): ReLU()
        (conv2): SubMConv3d(128, 128, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], algo=ConvAlgo.MaskImplicitGemm)
        (bn2): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
    (conv_out): SparseSequential(
      (0): SparseConv3d(128, 128, kernel_size=[3, 1, 1], stride=[2, 1, 1], padding=[0, 0, 0], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm)
      (1): BatchNorm1d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (map_to_bev_module): HeightCompression()
  (pfe): None
  (backbone_2d): BaseBEVBackbone(
    (blocks): ModuleList(
      (0): Sequential(
        (0): ZeroPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False)
        (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
      (1): Sequential(
        (0): ZeroPad2d((1, 1, 1, 1))
        (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
    )
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
  )
  (dense_head): CenterHead(
    (shared_conv): Sequential(
      (0): Conv2d(512, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (heads_list): ModuleList(
      (0): SeparateHead(
        (center): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (center_z): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (dim): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (rot): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (vel): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (hm): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
      )
      (1-2): 2 x SeparateHead(
        (center): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (center_z): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (dim): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (rot): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (vel): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (hm): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
      )
      (3): SeparateHead(
        (center): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (center_z): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (dim): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (rot): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (vel): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (hm): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
      )
      (4-5): 2 x SeparateHead(
        (center): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (center_z): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (dim): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (rot): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (vel): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (hm): Sequential(
          (0): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU()
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
      )
    )
    (hm_loss_func): FocalLossCenterNet()
    (reg_loss_func): RegLossCenterNet()
  )
  (point_head): None
  (roi_head): None
)

Conclusion: compared with normal 2D object detection algorithm’s used layers, LiDAR-based 3D object detection utilize layers like SparseConv3d and SubMConv3d to implement 3D backbone. Other layers are regular.

3. Inference time of each layer

  • With the use of Profiler:

image-20240408143118738

But what we get right here is the layer name of the inner framework, i.e., the function name of C language. Therefore, I write some simple codes to get the consuming of time of each Pytorch layer.

  • With the use of customized code:

We get top 20 time-consuming Pytorch layer:

2024-04-09 09:47:53,138   INFO  <class 'torch.nn.modules.conv.Conv2d'>: 0.07607173919677734 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SubMConv3d'>: 0.058701515197753906 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SubMConv3d'>: 0.038175344467163086 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SubMConv3d'>: 0.030804872512817383 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SparseConv3d'>: 0.025530099868774414 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SparseConv3d'>: 0.02077198028564453 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.ConvTranspose2d'>: 0.020176410675048828 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SparseConv3d'>: 0.016696453094482422 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.ConvTranspose2d'>: 0.014063358306884766 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SubMConv3d'>: 0.014019966125488281 seconds
2024-04-09 09:47:53,139   INFO  <class 'pcdet.models.backbones_3d.vfe.mean_vfe.MeanVFE'>: 0.012219667434692383 seconds
2024-04-09 09:47:53,139   INFO  <class 'pcdet.models.detectors.centerpoint.CenterPoint'>: 0.01217341423034668 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.Conv2d'>: 0.009574413299560547 seconds
2024-04-09 09:47:53,139   INFO  <class 'pcdet.models.backbones_2d.map_to_bev.height_compression.HeightCompression'>: 0.006260395050048828 seconds
2024-04-09 09:47:53,139   INFO  <class 'spconv.pytorch.conv.SparseConv3d'>: 0.005869150161743164 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.Conv2d'>: 0.003749370574951172 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.Conv2d'>: 0.003445148468017578 seconds
2024-04-09 09:47:53,139   INFO  <class 'torch.nn.modules.conv.Conv2d'>: 0.0028769969940185547 seconds

That is to say, the major time-consuming layer should beConv2d, ConvTransposed2d, SubMConv3d, and SparseConv3d, which means that we can start from quantization of Conv2d and ConvTranspose2d (since SubMConv3d and SparseConv3d don’t have pre-defined quantization class right now, but we can always customize layers afterwards).

4. Memory usage of each layer

I tried self-written hooks to figure out the GPU memory usage of each layer, but it always seems that there’s some problem with my code. So I instead use profiler:

image-20240408143118738

As we can see from the right side, the layer which has high GPU memory usage is the Sparse-Conv3d from the module spconv (Name: cumm:conv:....). But I can’t quite understand that why Conv2d didn’t use any of the GPU memory while has GPU time on them (on the right side).

5. Storage usage of each layer (fvcore)

| name                        | #elements or shape   |
|:----------------------------|:---------------------|
| model                       | 8.9M                 |
|  backbone_3d                |  2.7M                |
|   backbone_3d.conv_input    |   2.2K               |
|    backbone_3d.conv_input.0 |    2.2K              |
|    backbone_3d.conv_input.1 |    32                |
|   backbone_3d.conv1         |   27.8K              |
|    backbone_3d.conv1.0      |    13.9K             |
|    backbone_3d.conv1.1      |    13.9K             |
|   backbone_3d.conv2         |   0.1M               |
|    backbone_3d.conv2.0      |    13.9K             |
|    backbone_3d.conv2.1      |    55.5K             |
|    backbone_3d.conv2.2      |    55.5K             |
|   backbone_3d.conv3         |   0.5M               |
|    backbone_3d.conv3.0      |    55.4K             |
|    backbone_3d.conv3.1      |    0.2M              |
|    backbone_3d.conv3.2      |    0.2M              |
|   backbone_3d.conv4         |   2.0M               |
|    backbone_3d.conv4.0      |    0.2M              |
|    backbone_3d.conv4.1      |    0.9M              |
|    backbone_3d.conv4.2      |    0.9M              |
|   backbone_3d.conv_out      |   49.4K              |
|    backbone_3d.conv_out.0   |    49.2K             |
|    backbone_3d.conv_out.1   |    0.3K              |
|  backbone_2d                |  4.6M                |
|   backbone_2d.blocks        |   4.3M               |
|    backbone_2d.blocks.0     |    1.0M              |
|    backbone_2d.blocks.1     |    3.2M              |
|   backbone_2d.deblocks      |   0.3M               |
|    backbone_2d.deblocks.0   |    33.3K             |
|    backbone_2d.deblocks.1   |    0.3M              |
|  dense_head                 |  1.7M                |
|   dense_head.shared_conv    |   0.3M               |
|    dense_head.shared_conv.0 |    0.3M              |
|    dense_head.shared_conv.1 |    0.1K              |
|   dense_head.heads_list     |   1.4M               |
|    dense_head.heads_list.0  |    0.2M              |
|    dense_head.heads_list.1  |    0.2M              |
|    dense_head.heads_list.2  |    0.2M              |
|    dense_head.heads_list.3  |    0.2M              |
|    dense_head.heads_list.4  |    0.2M              |
|    dense_head.heads_list.5  |    0.2M              |

And below is a storage diagram of a ResNet-50:

Skipped operation aten::batch_norm 53 time(s)
Skipped operation aten::max_pool2d 1 time(s)
Skipped operation aten::add_ 16 time(s)
Skipped operation aten::adaptive_avg_pool2d 1 time(s)
FLOPs:  4089184256
| name                   | #elements or shape   |
|:-----------------------|:---------------------|
| model                  | 25.6M                |
|  conv1                 |  9.4K                |
|   conv1.weight         |   (64, 3, 7, 7)      |
|  bn1                   |  0.1K                |
|   bn1.weight           |   (64,)              |
|   bn1.bias             |   (64,)              |
|  layer1                |  0.2M                |
|   layer1.0             |   75.0K              |
|    layer1.0.conv1      |    4.1K              |
|    layer1.0.bn1        |    0.1K              |
|    layer1.0.conv2      |    36.9K             |
|    layer1.0.bn2        |    0.1K              |
|    layer1.0.conv3      |    16.4K             |
|    layer1.0.bn3        |    0.5K              |
|    layer1.0.downsample |    16.9K             |
|   layer1.1             |   70.4K              |
|    layer1.1.conv1      |    16.4K             |
|    layer1.1.bn1        |    0.1K              |
|    layer1.1.conv2      |    36.9K             |
|    layer1.1.bn2        |    0.1K              |
|    layer1.1.conv3      |    16.4K             |
|    layer1.1.bn3        |    0.5K              |
|   layer1.2             |   70.4K              |
|    layer1.2.conv1      |    16.4K             |
|    layer1.2.bn1        |    0.1K              |
|    layer1.2.conv2      |    36.9K             |
|    layer1.2.bn2        |    0.1K              |
|    layer1.2.conv3      |    16.4K             |
|    layer1.2.bn3        |    0.5K              |
|  layer2                |  1.2M                |
|   layer2.0             |   0.4M               |
|    layer2.0.conv1      |    32.8K             |
|    layer2.0.bn1        |    0.3K              |
|    layer2.0.conv2      |    0.1M              |
|    layer2.0.bn2        |    0.3K              |
|    layer2.0.conv3      |    65.5K             |
|    layer2.0.bn3        |    1.0K              |
|    layer2.0.downsample |    0.1M              |
|   layer2.1             |   0.3M               |
|    layer2.1.conv1      |    65.5K             |
|    layer2.1.bn1        |    0.3K              |
|    layer2.1.conv2      |    0.1M              |
|    layer2.1.bn2        |    0.3K              |
|    layer2.1.conv3      |    65.5K             |
|    layer2.1.bn3        |    1.0K              |
|   layer2.2             |   0.3M               |
|    layer2.2.conv1      |    65.5K             |
|    layer2.2.bn1        |    0.3K              |
|    layer2.2.conv2      |    0.1M              |
|    layer2.2.bn2        |    0.3K              |
|    layer2.2.conv3      |    65.5K             |
|    layer2.2.bn3        |    1.0K              |
|   layer2.3             |   0.3M               |
|    layer2.3.conv1      |    65.5K             |
|    layer2.3.bn1        |    0.3K              |
|    layer2.3.conv2      |    0.1M              |
|    layer2.3.bn2        |    0.3K              |
|    layer2.3.conv3      |    65.5K             |
|    layer2.3.bn3        |    1.0K              |
|  layer3                |  7.1M                |
|   layer3.0             |   1.5M               |
|    layer3.0.conv1      |    0.1M              |
|    layer3.0.bn1        |    0.5K              |
|    layer3.0.conv2      |    0.6M              |
|    layer3.0.bn2        |    0.5K              |
|    layer3.0.conv3      |    0.3M              |
|    layer3.0.bn3        |    2.0K              |
|    layer3.0.downsample |    0.5M              |
|   layer3.1             |   1.1M               |
|    layer3.1.conv1      |    0.3M              |
|    layer3.1.bn1        |    0.5K              |
|    layer3.1.conv2      |    0.6M              |
|    layer3.1.bn2        |    0.5K              |
|    layer3.1.conv3      |    0.3M              |
|    layer3.1.bn3        |    2.0K              |
|   layer3.2             |   1.1M               |
|    layer3.2.conv1      |    0.3M              |
|    layer3.2.bn1        |    0.5K              |
|    layer3.2.conv2      |    0.6M              |
|    layer3.2.bn2        |    0.5K              |
|    layer3.2.conv3      |    0.3M              |
|    layer3.2.bn3        |    2.0K              |
|   layer3.3             |   1.1M               |
|    layer3.3.conv1      |    0.3M              |
|    layer3.3.bn1        |    0.5K              |
|    layer3.3.conv2      |    0.6M              |
|    layer3.3.bn2        |    0.5K              |
|    layer3.3.conv3      |    0.3M              |
|    layer3.3.bn3        |    2.0K              |
|   layer3.4             |   1.1M               |
|    layer3.4.conv1      |    0.3M              |
|    layer3.4.bn1        |    0.5K              |
|    layer3.4.conv2      |    0.6M              |
|    layer3.4.bn2        |    0.5K              |
|    layer3.4.conv3      |    0.3M              |
|    layer3.4.bn3        |    2.0K              |
|   layer3.5             |   1.1M               |
|    layer3.5.conv1      |    0.3M              |
|    layer3.5.bn1        |    0.5K              |
|    layer3.5.conv2      |    0.6M              |
|    layer3.5.bn2        |    0.5K              |
|    layer3.5.conv3      |    0.3M              |
|    layer3.5.bn3        |    2.0K              |
|  layer4                |  15.0M               |
|   layer4.0             |   6.0M               |
|    layer4.0.conv1      |    0.5M              |
|    layer4.0.bn1        |    1.0K              |
|    layer4.0.conv2      |    2.4M              |
|    layer4.0.bn2        |    1.0K              |
|    layer4.0.conv3      |    1.0M              |
|    layer4.0.bn3        |    4.1K              |
|    layer4.0.downsample |    2.1M              |
|   layer4.1             |   4.5M               |
|    layer4.1.conv1      |    1.0M              |
|    layer4.1.bn1        |    1.0K              |
|    layer4.1.conv2      |    2.4M              |
|    layer4.1.bn2        |    1.0K              |
|    layer4.1.conv3      |    1.0M              |
|    layer4.1.bn3        |    4.1K              |
|   layer4.2             |   4.5M               |
|    layer4.2.conv1      |    1.0M              |
|    layer4.2.bn1        |    1.0K              |
|    layer4.2.conv2      |    2.4M              |
|    layer4.2.bn2        |    1.0K              |
|    layer4.2.conv3      |    1.0M              |
|    layer4.2.bn3        |    4.1K              |
|  fc                    |  2.0M                |
|   fc.weight            |   (1000, 2048)       |
|   fc.bias              |   (1000,)            |

The reason why I also attached a table of ResNet-50 model is that I want to point out the fact that the storage usage of CenterPoint is less than that of ResNet-50.

From my perspective, it (may) be common that LiDAR-based 3D object detection model’s storage usage is smaller than traditional 2D model, since LiDAR-based 3D object detection model’s 3D backbone are using spconv module, a module that take the sparsity nature of Conv3d into consideration and therefore reduce the model storage usage, GPU memory usage and inference time. So it makes sense that from a perspective of storage usage, 3D < 2D.

Back to our original topic, from the first table, we can clearly see that backbone_2d is the most storage-consuming layer, which means we can start from quantization of Conv2d initially.

6. Quantization of the model

Here we try both dynamic and static PTQ only on Conv2d, i.e., have some calibration data flowed within the model, so that the quantizer can get a good idea of the dynamic range of the activation.

  • Original results:

image-20240409190524483

  • dynamic calibration-max quantization results:

image-20240409190656395

with model layer’s amax is dynamic:

image-20240409201708976

  • static calibration-max quantization results:

image-20240409201528341

with model layer’s amax is a specific range or number:

image-20240409201614153

  • Weight-only quantization:

image-20240411100939381

with model input quantizer’s num_bits is 16-bit:

image-20240411101044026

7. Conclusion

We can clearly see that the results are only effected a little bit on the basis that we only quantized 2D conv (But this can be due to the optimization of pytorch-quantization). So I think our potential next target will be the spconv layer. Based on the documentation of the pytorch-quantization, we can customize our own quantized layer just like below: image-20240409204640050