I like machine learning and systems. Above all, I like being useful.
I'm a masters student in ECE at UIUC, learning how to make ML Accelerators go brrr - starting from the hardware level.
Most recently, I was on the Inference team at Annapurna Labs, writing NKI kernels and hacking compilers to achieve a 2M-token context length on AWS NeuronX.
[Full Resume]
Projects
-
minimal-flash-attention is Flash Attention in ~100 lines of CUDA.
- cuda-1brc is my CUDA solution to the One Billion Row Challenge. Related to this is my blog post.
- mixed-precision-from-scratch shows all the details of mixed precision training applied to a simple 2-layer MLP. Read more here.
- paged-attention-minimal a simple cache manager for PagedAttention, on top of Llama 3.
Posts
-
I came back to school to study hardware after 5 years of doing ML
-
Mixed Precision Training from Scratch
-
How to set up Nsight Compute Locally to profile Remote GPUs
-
The One Billion Row Challenge in CUDA: from 17m to 17s
-
Growing up in six different countries
-
A Quick Summary of "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
-
The Introspectiveness of Neural Networks
Patents
- Taeksang Kim. Method and system for personalized content recommendation, Korean Pat. No. 10-2727573, 2024.
- Taeksang Kim. Tagging method and system for content, Korean Pat. No. 10-2705765, 2024.
- Jina Hwang and Taeksang Kim. Content provision service method, device, and recording medium, Korean Pat. No. 10-2640214, 2024.
- Taeksang Kim. Machine learning–based recommendation method and system, Korean Pat. No. 10-2619044, 2023.
- Taeksang Kim. Content recommendation method and system, Korean Pat. No. 10-2679131, 2024.
Miscellaneous
- I designed, and was interviewed for, the first introductory ML course for employees at Buzzvil.
- I wrote a tech blog on how to optimize Item-to-item collaborative filtering with sparse vectors and Ray.
- I made an Instagram automation tool that went viral.