👋 Welcome to Banghao’s Blog

Hi! This is Banghao Chi, an undergraduate student from University of Illinois at Urbana-Champaign and also a Research Assistant advised by Prof. Minjia Zhang.

  • · I’m documenting my learning notes in this blog 😄
  • · This is a space where I will mostly be sharing about Computer Vision & NLP 🙂
  • · I also work on Fullstack development with React and SpringBoot (^â–½^)

Meeting Discussion (11)

1. Table of Contents The feasibility of quantization and SQ on SparseConv3d: ✅ The discussion of sparse conv3d: Find where the computation for sparse conv3d is done: ✅ Inside ops.implicit_gemm: ✅ How can we implement it: Quantization of the activation and weight: ✅ Application of SQ during quantization process: ✅ Evaluation of current method: Whether SQ is effective or not depends on the value of the inputs and weight: ✅ Actual implementation: In progress… 2....

June 16, 2024 Â· 4 min Â· Banghao Chi

LLMarking

This is the official repo for Automatic Short Answer Grading (ASAG) project, named LLMarking, from Xi’an Jiaotong Liverpool University (XJTLU). Using vLLM as the Large Language Model (LLM) inference framework and FastAPI as the HTTP service framework, this project can achieve high throughput of both LLM tokens delivered and request handling. Feature This project aims to achieve high concurrency automatic short answer grading (ASAG) system and implement the construction of service....

June 12, 2024 Â· 3 min Â· Banghao Chi

Meeting Discussion (10)

1. Table of Contents: Final results and comparison: ✅ Do L1loss tests within the model again to see which part (SmoothQuant or transformation) has greater benefits: ✅ Do multiple tests on 50X input-scale with the scaling factor of SmoothQuant changing to see if other factors can provide better results (lower L1loss): ✅ Lots of experiments to analyze the accuracy loss: ✅ Modify the model based on the above experiments: ✅ Validate the accuracy of the final model: through mAP and NDS: ✅ 2....

May 21, 2024 Â· 7 min Â· Banghao Chi

Meeting Discussion (9)

1. Table of Contents Implementation of im2col+gemm operation: ✅ Add INT8-quantizer to the first operation: ✅ Add SmoothQuant to the second operation: ✅ Verify operation through different example inputs: ✅ Integrate im2col+gemm SmoothQuant INT8-quantized layer into model: ✅ Validate the accuracy of the layer by: through different example inputs: ✅ through actual data flow of the model: ✅ through accuracy: In progress… (due to Delta only comes back online really late) 2....

May 17, 2024 Â· 4 min Â· Banghao Chi

Meeting Discussion (8)

1. Table of Contents Original implementation of SmoothQuant and why it’s not correct: ✅ The correct way of implementation IMO: ✅ 2. Original way of getting absMax values def register_collect_smoothquant_hook(model, data_loader, num_batch=200): model.eval() act_scales = {} weight_scales = {} def forward_hook(module, input, name): hidden_dim_act = input[0].shape[1] tensor_act = input[0].view(-1, hidden_dim_act).abs().detach() comming_max_act = torch.max(tensor_act, dim=0)[0].float().cpu() if name not in act_scales: act_scales[name] = comming_max_act else: act_scales[name] = torch.max(act_scales[name], comming_max_act) Input shape: [4, 256, 182, 182] hidden_dim_act = 256 tensor_act: [4*182*182, 256] torch....

May 14, 2024 Â· 1 min Â· Banghao Chi

Meeting Discussion (7)

1. Table of Contents Implementation of SmoothQuant on Conv2d: ✅ Validation of the above implementation: ✅ (for $ \alpha = 0.5 $) 2. Implementation of SmoothQuant operation on Conv2d Get activation scale Get weight scale Compute smoothing factor $ s $ based on above two scales Apply scaling: $\text{input} \mathrel{{/}{=}} s$ $\text{weight} \mathrel{{*}{=}} s$ 2.1 Get activation & weight scale Take a look at the shape of activation, output, and weight in Conv2d: Take one layer as an example: Input shape: torch....

May 10, 2024 Â· 3 min Â· Banghao Chi

Let's build GPT from scratch with BPE!

1. Workshop Description Quick question: Have you ever thought about a string being transformed into a word vector so that it can be further fed into a machine learning algorithm? In this workshop, we are going to dive into the fascinating world of Natural Language Processing (NLP) with our focus on Byte Pair Encoding (BPE) algorithm. We will discover how this powerful technique segments text into subword units, enabling efficient representation of words as vectors....

May 8, 2024 Â· 32 min Â· Banghao Chi

IoT-Enabled Home Security Camera

Video GitHub 1. Motivation Facial recognition technology has become prevalent in all areas of life. Whether you work in security, law enforcement, or manufacture personal devices, the presence of facial recognition for various purposes is evident. Our project seeks to dive into this increasingly common technology and apply it to a place that needs upgrades, such as banks. Many banks are on old applications or using outdated technology, limiting the effectiveness of their work....

May 5, 2024 Â· 4 min Â· Banghao Chi

Meeting Discussion (6)

1. Table of Contents In-depth Memory Usage Visualization: ✅ Ideas about how to implement quantization of sparse conv3d: ✅ Ideas about how to implement SmoothQuant operation on conv2d: ✅ 2. Large Chunk GPU Memory Usage Overview Data loader Backbone 3d Backbone 3d -> Backbone 2d Backbone 2d Head Below are structure for each major chunk: Data Loader Backbone 3d 3d feature to 2d feature Backbone 2d Head 3. How to implement quantization of sparse conv3d?...

May 2, 2024 Â· 1 min Â· Banghao Chi

Meeting Discussion (5)

1. Table of Contents Accuracy graph under diffrerent quantization metrics: ✅ Max value within the layers: ✅ 2. Accuracy graph under diffrerent quantization metrics: As we can observe from both graphs, activation is clearly influced more by quantization. 3. Max value within the layers In the first graph, we can see that the max value within the weigh ranges from 0.1 to 2.94, while in the second graph, we can find an interesting max value pattern, with its value ranging from 8 to 53....

May 1, 2024 Â· 1 min Â· Banghao Chi