1. Table of Contents

  • In-depth Memory Usage Visualization: ✅

  • Ideas about how to implement quantization of sparse conv3d: ✅

  • Ideas about how to implement SmoothQuant operation on conv2d: ✅

2. Large Chunk GPU Memory Usage Overview

  • Data loader
  • Backbone 3d
  • Backbone 3d -> Backbone 2d
  • Backbone 2d
  • Head

img

Below are structure for each major chunk:

  • Data Loader

img

  • Backbone 3d

img

  • 3d feature to 2d feature

img

  • Backbone 2d

img

  • Head

img

3. How to implement quantization of sparse conv3d?

  • Take a look at Nvidia’s implementation of Conv3d quantized layer

image-20240502170058646

The process should be similar if we see:

img

4. How to implement SmoothQuant operation on conv2d?

  • Get activation scale

We will implement the process if getting activation scale similar to the following process (from SmoothQuant):

img

  • Migrate difficulty

img

5. What’s next?

  • Implement SmoothQuant operation first, and then
  • Implement quantization of sparse conv3d.