Taeksang Peter Kim

I like machine learning and systems. Above all, I like being useful.

I'm a masters student in ECE at UIUC, learning how to make ML Accelerators go brrr - starting from the hardware level.

Most recently, I was on the Inference team at Annapurna Labs, writing NKI kernels and hacking compilers to achieve a 2M-token context length on AWS NeuronX.

[Full Resume]

Projects

minimal-flash-attention is Flash Attention in ~100 lines of CUDA.
cuda-1brc is my CUDA solution to the One Billion Row Challenge. Related to this is my blog post.
mixed-precision-from-scratch shows all the details of mixed precision training applied to a simple 2-layer MLP. Read more here.
paged-attention-minimal a simple cache manager for PagedAttention, on top of Llama 3.

Posts

Dec 15, 2024
I came back to school to study hardware after 5 years of doing ML
May 11, 2024
Mixed Precision Training from Scratch
Apr 22, 2024
How to set up Nsight Compute Locally to profile Remote GPUs
Apr 10, 2024
The One Billion Row Challenge in CUDA: from 17m to 17s
Feb 17, 2019
Growing up in six different countries
Nov 13, 2017
A Quick Summary of "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
Aug 29, 2017
The Introspectiveness of Neural Networks

Patents

Taeksang Kim. Personalized video DJ with text-to-video, Korean Pat. No. 10-0123761, 2023.
Taeksang Kim. Video keyword tagging with large language models, Korean Pat. No. 10-0098584, 2023.
Jina Hwang and Taeksang Kim. Sports awareness, mobile feed video player, Korean Pat. No. 10-0071581, 2023.
Taeksang Kim. Next watch prediction using GPT, Korean Pat. No. 10-0036443, 2023.
Taeksang Kim. First hero optimization through reinforcement learning, Korean Pat. No. 10-0184572, 2022.

Miscellaneous

I designed, and was interviewed for, the first introductory ML course for employees at Buzzvil.
I wrote a tech blog on how to optimize Item-to-item collaborative filtering with sparse vectors and Ray.
I made an Instagram automation tool that went viral.

Projects

Posts

I came back to school to study hardware after 5 years of doing ML

Mixed Precision Training from Scratch

How to set up Nsight Compute Locally to profile Remote GPUs

The One Billion Row Challenge in CUDA: from 17m to 17s

Growing up in six different countries

A Quick Summary of "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"

The Introspectiveness of Neural Networks

Patents

Miscellaneous