Heungsub Lee

Contact
[email protected]
Web Sites
subl.ee, github.com/sublee, linkedin.com/in/sublee

Interest

Problem solving, AI services and platforms, real-time communication in distributed systems, cost and performance optimization, and developer experience.

Skills

Programming Languages
Go, Python, TypeScript
Service Development
Linux, AWS, Kubernetes, Pulumi, gRPC, React, Redis, PostgreSQL, ZeroMQ, OAuth, OpenTelemetry, Concurrency, API design & documentation
AI Engineering
MCP, PyTorch, Gradio, NCCL, NVIDIA Nsight Systems

Work Experience

Lead Software Engineer
Global AI Platform, Sep 2023 – Present

Led the development of Aster, a personal AI agent service that helps users solve problems by planning task sequences and suggesting alternatives, integrating data and functions from multiple MCP servers. Prototyped telephony integration to enable voice-based assistant experience.

Directed LangDiff, an open-source library bridging structured LLM outputs with progressive UI rendering, which hid model latency and improved responsiveness, enabling a smoother Aster experience.

Software Engineering Manager
NAVER, Aug 2020 – Jul 2023

Supervised 25 engineers on MLOps platforms to boost inference performance by 2-3x and improve developer productivity for HyperCLOVA, a Korean-focused LLM.

Developed NSMLv2, a large-scale ML research platform at CLOVA. Designed a multi-tenant, economics-driven architecture that enabled diverse organizations to share GPU clusters efficiently, reducing idle time and maximizing utilization. This platform institutionalized distributed training to address growing demand for scalable training workflows.

Software Engineer
Kakao Brain, Dec 2018 – Aug 2020

Developed torchgpipe, an open-source pipeline parallelism library for PyTorch that scaled large AI models across multiple GPUs with minimal code changes and low overhead.

Developed a serverless training framework and distributed hyperparameter search pipelines on an on-premise GPU cluster, improving resource utilization and automation for model training.

Game Server Engineer
NEXON, Mar 2011 – Dec 2018

Developed cloud-native distributed MMORPG servers for Durango using pub/sub over a spatial grid system, supporting up to 70k concurrent users per game world.

Developed online racing game servers and matchmaking for KartRider Dash and KartRider Coin Rush.

Back-end Web Developer
nPine, Dec 2008 – Feb 2011
Developed e-commerce web services for stock image platforms.

Open Source Experience

torchgpipe, Feb 2019 – Apr 2020
Implemented GPipe, a multi-GPU pipeline parallelism technique for large-scale model training, as a PyTorch library with CUDA, autograd, and long skip connection optimizations; later upstreamed into PyTorch as the official Pipe APIs.
Hangulize, Oct 2010 – Present
Designed a Hangul transcription algorithm and released it as a free web tool widely used by professional Korean translators.
TrueSkill, Jan 2012 – Dec 2015
Implemented TrueSkill™, the rating algorithm behind Xbox Live, as a Python library; presented at PyData Berlin 2019.
Contributions
Contributed upstream patches improving GPU safety (#27371) and API consistency (#21006, #25985) in PyTorch. Fixed subdomain URL bug (#108) in Flask.

Publications

*Contributed equally

Public Speeches


Languages

Education

Computer Software, Kwangwoon University, 2008, Completed the first year only.