Meeting Discussion (4)

1. Table of Contents The strcuture of the CenterPoint-Vexel model: ✅ Inference time of each layer: ✅ Memory usage of each layer: ✅ Storage usage of each layer: ✅ Quantization of the model: ✅ 2. The strcuture of the CenterPoint-Vexel model CenterPoint( (vfe): MeanVFE() (backbone_3d): VoxelResBackBone8x( (conv_input): SparseSequential( (0): SubMConv3d(5, 16, kernel_size=[3, 3, 3], stride=[1, 1, 1], padding=[1, 1, 1], dilation=[1, 1, 1], output_padding=[0, 0, 0], bias=False, algo=ConvAlgo.MaskImplicitGemm) (1): BatchNorm1d(16, eps=0....

April 26, 2024 · 16 min · Banghao Chi

Meeting Discussion (3)

What has been done? This week’s work: Setup nuScenes training and validation dataset for MMDetection3D framework: ✅ Setup waymo training and validation dataset for MMDetection3D framework: ✅ Wrote API for PTQ under pytorch-quantization framework. (Now just need model and dataloader definition): ✅ Complete walkthrough of CAT-Seg, a SOTA Open Vocabulary Segmentation (OVS) model: ✅ What to discuss? Is this quantization way appropriate? Any advice on the changing of model structure (CAT-Seg)?...

April 2, 2024 · 2 min · Banghao Chi

Quantization on CenterPoint

Take mmdetection as an example First find the Runner class: This is the place where the build of the model is completed: class Runner: def __init__(...): ... ... self.model = self.build_model(model) # wrap model self.model = self.wrap_model( self.cfg.get('model_wrapper_cfg'), self.model) # get model name from the model class if hasattr(self.model, 'module'): self._model_name = self.model.module.__class__.__name__ else: self._model_name = self.model.__class__.__name__ ... ... Learn about how pytorch-quantization works by diving into its source code: Code about the quantization function respect to a specific Pytorch model as input: quant_utils....

April 1, 2024 · 7 min · Banghao Chi

Daily Log

3.12 Managed to understand the whole code base of the CLIP repo from OpenAI. Planned to take a look at CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation, to understand how to implement Open-Vocabulary Segmentation (OVS) using CLIP. 3.13 1. DETR Got a basic understanding of DETR, which is an awesome end-to-end 2D object detection architecture, with its downside lies in: Long training period Difficulty of detecting small objects but has advantages in:...

March 20, 2024 · 21 min · Banghao Chi

CATseg: A complete walk through of the model architecture

1. Model Architecture setup and evaluation data flow(for ade150k) CATSeg setup: backbone: D2SwinTransformer -> Swintransformer -> BasicLayer(2) -> SwinTransformerBlock -> WindowAttention sem_seg_head: CATSegHead.from_config -> CATSegPredictor -> Load CLIP model -> Load text templates -> class_embeddings(self.class_texts, prompt_templates, clip_model) -> for each class: bpe encode classname in different templates and save results in variable texts (80(number of templates), 77(number of sentence length)). CLIP encode texts : texts go through token_embedding(nn.Embedding) (80,77,768(hidden_dim)) texts go through a 12 layers of ResidualAttentionBlock (80,77,768) take features of texts from the eot_token (80,768) do the above for all classes (150(number of test classes),80,768)...

March 15, 2024 · 22 min · Banghao Chi

Argparse: a user-friendly tool to write CLI interface

1. Introduction Hello fellows! Today I’m excited to share insights about the argparse module, a robust and intuitive tool for creating command-line interfaces in Python. What makes argparse particularly fascinating to me is its ability to enable users to quickly leverage Python scripts with custom configurations and functionalities, without the need to dive into the underlying source code. This feature of argparse has captured my interest and again, showcasing its value in making Python files reusable and accessible for diverse applications....

March 8, 2024 · 10 min · Banghao Chi

Real-time Object Recognition in Chess: Personalized Tuning and Hardware Acceleration

1. Selected and customized the YOLOv5 model for Chinese chess annotation data. 2. Conducted testing and analysis of the model. The results indicated exceptional accuracy in recognition capabilities. However, a significant shortfall was identified in terms of efficiency, with the model taking approximately 6 seconds to process a single image. 3. Implemented model optimization. We substitute the YOLOv5 model with a more lightweight variant, YOLOv5-lite and convert the model into the ONNX format to leverage hardware acceleration, thereby enhancing computational efficiency....

August 5, 2023 · 2 min · Banghao Chi