1. Table of Contents
-
In-depth Memory Usage Visualization: ✅
-
Ideas about how to implement quantization of sparse conv3d: ✅
-
Ideas about how to implement SmoothQuant operation on conv2d: ✅
2. Large Chunk GPU Memory Usage Overview
- Data loader
- Backbone 3d
- Backbone 3d -> Backbone 2d
- Backbone 2d
- Head
Below are structure for each major chunk:
- Data Loader
- Backbone 3d
- 3d feature to 2d feature
- Backbone 2d
- Head
3. How to implement quantization of sparse conv3d?
- Take a look at Nvidia’s implementation of Conv3d quantized layer
The process should be similar if we see:
4. How to implement SmoothQuant operation on conv2d?
- Get activation scale
We will implement the process if getting activation scale similar to the following process (from SmoothQuant):
- Migrate difficulty
5. What’s next?
- Implement SmoothQuant operation first, and then
- Implement quantization of sparse conv3d.