Meeting Discussion (3)

What has been done? This week’s work: Setup nuScenes training and validation dataset for MMDetection3D framework: ✅ Setup waymo training and validation dataset for MMDetection3D framework: ✅ Wrote API for PTQ under pytorch-quantization framework. (Now just need model and dataloader definition): ✅ Complete walkthrough of CAT-Seg, a SOTA Open Vocabulary Segmentation (OVS) model: ✅ What to discuss? Is this quantization way appropriate? Any advice on the changing of model structure (CAT-Seg)?...

April 2, 2024 · 2 min · Banghao Chi

CATseg: A complete walk through of the model architecture

1. Model Architecture setup and evaluation data flow(for ade150k) CATSeg setup: backbone: D2SwinTransformer -> Swintransformer -> BasicLayer(2) -> SwinTransformerBlock -> WindowAttention sem_seg_head: CATSegHead.from_config -> CATSegPredictor -> Load CLIP model -> Load text templates -> class_embeddings(self.class_texts, prompt_templates, clip_model) -> for each class: bpe encode classname in different templates and save results in variable texts (80(number of templates), 77(number of sentence length)). CLIP encode texts : texts go through token_embedding(nn.Embedding) (80,77,768(hidden_dim)) texts go through a 12 layers of ResidualAttentionBlock (80,77,768) take features of texts from the eot_token (80,768) do the above for all classes (150(number of test classes),80,768)...

March 15, 2024 · 22 min · Banghao Chi