CATseg: A complete walk through of the model architecture

1. Model Architecture setup and evaluation data flow(for ade150k) CATSeg setup: backbone: D2SwinTransformer -> Swintransformer -> BasicLayer(2) -> SwinTransformerBlock -> WindowAttention sem_seg_head: CATSegHead.from_config -> CATSegPredictor -> Load CLIP model -> Load text templates -> class_embeddings(self.class_texts, prompt_templates, clip_model) -> for each class: bpe encode classname in different templates and save results in variable texts (80(number of templates), 77(number of sentence length)). CLIP encode texts : texts go through token_embedding(nn.Embedding) (80,77,768(hidden_dim)) texts go through a 12 layers of ResidualAttentionBlock (80,77,768) take features of texts from the eot_token (80,768) do the above for all classes (150(number of test classes),80,768)...

March 15, 2024 · 22 min · Banghao Chi

Real-time Object Recognition in Chess: Personalized Tuning and Hardware Acceleration

1. Selected and customized the YOLOv5 model for Chinese chess annotation data. 2. Conducted testing and analysis of the model. The results indicated exceptional accuracy in recognition capabilities. However, a significant shortfall was identified in terms of efficiency, with the model taking approximately 6 seconds to process a single image. 3. Implemented model optimization. We substitute the YOLOv5 model with a more lightweight variant, YOLOv5-lite and convert the model into the ONNX format to leverage hardware acceleration, thereby enhancing computational efficiency....

August 5, 2023 · 2 min · Banghao Chi