Taeksang Peter Kim

I like machine learning and systems. Above all, I like being useful.

I'm a Systems Research Engineer at Together AI, working on kernels and inference engines.

Before, I was on the Inference team at Annapurna Labs, writing NKI kernels and hacking compilers to achieve a 2M-token context length on AWS NeuronX.

[Full Resume]

Projects

minimal-flash-attention is Flash Attention in ~100 lines of CUDA.
cuda-1brc is my CUDA solution to the One Billion Row Challenge. Related to this is my blog post.
mixed-precision-from-scratch shows all the details of mixed precision training applied to a simple 2-layer MLP. Read more here.
paged-attention-minimal a simple cache manager for PagedAttention, on top of Llama 3.

Posts

Dec 15, 2024
I came back to school to study hardware after 5 years of doing ML
May 11, 2024
Mixed Precision Training from Scratch
Apr 22, 2024
How to set up Nsight Compute Locally to profile Remote GPUs
Apr 10, 2024
The One Billion Row Challenge in CUDA: from 17m to 17s
Feb 17, 2019
Growing up in six different countries
Nov 13, 2017
A Quick Summary of "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
Aug 29, 2017
The Introspectiveness of Neural Networks

Patents

Taeksang Kim. Method and system for personalized content recommendation, Korean Pat. No. 10-2727573, 2024.
Taeksang Kim. Tagging method and system for content, Korean Pat. No. 10-2705765, 2024.
Jina Hwang and Taeksang Kim. Content provision service method, device, and recording medium, Korean Pat. No. 10-2640214, 2024.
Taeksang Kim. Machine learning–based recommendation method and system, Korean Pat. No. 10-2619044, 2023.
Taeksang Kim. Content recommendation method and system, Korean Pat. No. 10-2679131, 2024.

Miscellaneous

I designed, and was interviewed for, the first introductory ML course for employees at Buzzvil.
I wrote a tech blog on how to optimize Item-to-item collaborative filtering with sparse vectors and Ray.
I made an Instagram automation tool that went viral.

Projects

Posts

I came back to school to study hardware after 5 years of doing ML

Mixed Precision Training from Scratch

How to set up Nsight Compute Locally to profile Remote GPUs

The One Billion Row Challenge in CUDA: from 17m to 17s

Growing up in six different countries

A Quick Summary of "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"

The Introspectiveness of Neural Networks

Patents

Miscellaneous