Heungsub Lee
- Contact
 - [email protected]
 - Web Sites
 - subl.ee, github.com/sublee, linkedin.com/in/sublee
 
Interest
Problem solving, AI services and platforms, real-time communication in distributed systems, cost and performance optimization, and developer experience.
Skills
- Programming Languages
 - Go, Python, TypeScript
 - Service Development
 - Linux, AWS, Kubernetes, Pulumi, gRPC, React, Redis, PostgreSQL, ZeroMQ, OAuth, OpenTelemetry, Concurrency, API design & documentation
 - AI Engineering
 - MCP, PyTorch, Gradio, NCCL, NVIDIA Nsight Systems
 
Work Experience
- Lead Software Engineer
 - Global AI Platform, Sep 2023 – Present
 - 
Led the development of Aster, a personal AI agent service that helps users solve problems by planning task sequences and suggesting alternatives, integrating data and functions from multiple MCP servers. Prototyped telephony integration to enable voice-based assistant experience.
Directed LangDiff, an open-source library bridging structured LLM outputs with progressive UI rendering, which hid model latency and improved responsiveness, enabling a smoother Aster experience.
 - Software Engineering Manager
 - NAVER, Aug 2020 – Jul 2023
 - 
Supervised 25 engineers on MLOps platforms to boost inference performance by 2-3x and improve developer productivity for HyperCLOVA, a Korean-focused LLM.
Developed NSMLv2, a large-scale ML research platform at CLOVA. Designed a multi-tenant, economics-driven architecture that enabled diverse organizations to share GPU clusters efficiently, reducing idle time and maximizing utilization. This platform institutionalized distributed training to address growing demand for scalable training workflows.
 - Software Engineer
 - Kakao Brain, Dec 2018 – Aug 2020
 - 
Developed torchgpipe, an open-source pipeline parallelism library for PyTorch that scaled large AI models across multiple GPUs with minimal code changes and low overhead.
Developed a serverless training framework and distributed hyperparameter search pipelines on an on-premise GPU cluster, improving resource utilization and automation for model training.
 - Game Server Engineer
 - NEXON, Mar 2011 – Dec 2018
 - 
Developed cloud-native distributed MMORPG servers for Durango using pub/sub over a spatial grid system, supporting up to 70k concurrent users per game world.
Developed online racing game servers and matchmaking for KartRider Dash and KartRider Coin Rush.
 - Back-end Web Developer
 - nPine, Dec 2008 – Feb 2011
 - Developed e-commerce web services for stock image platforms.
 
Open Source Experience
- torchgpipe, Feb 2019 – Apr 2020
 - Implemented GPipe, a multi-GPU pipeline parallelism technique for large-scale model training, as a PyTorch library with CUDA, autograd, and long skip connection optimizations; later upstreamed into PyTorch as the official Pipe APIs.
 - Hangulize, Oct 2010 – Present
 - Designed a Hangul transcription algorithm and released it as a free web tool widely used by professional Korean translators.
 - TrueSkill, Jan 2012 – Dec 2015
 - Implemented TrueSkill™, the rating algorithm behind Xbox Live, as a Python library; presented at PyData Berlin 2019.
 - Contributions
 - Contributed upstream patches improving GPU safety (#27371) and API consistency (#21006, #25985) in PyTorch. Fixed subdomain URL bug (#108) in Flask.
 
Publications
- H. Park et al., “HPCClusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models,” arXiv:2310.02120, Oct 2024.
 - B. Kim et al., “What changes can large-scale language models bring? Intensive study on HyperCLOVA: Billions-scale Korean generative pretrained Transformers,” arXiv:2109.04650, Sep 2021.
 - C. Kim*, Heungsub Lee* et al., “torchgpipe: On-the-fly pipeline parallelism for training giant models,” arXiv:2004.09910, Apr 2020.
 
*Contributed equally
Public Speeches
- “NSML, the hyper-scale ML training platform,” KRnet, Jun 2022.
 - “Remake of Hangulize,” Golang Korea Meetup, Aug 2018.
 - “Profiling,” PyCon Korea, Aug 2015.
 - “The server architecture of Durango,” NDC, 2014, 2016, and 2018.
 
Languages
- Korean — Native
 - English — Conversant in reading and writing
 
Education
Computer Software, Kwangwoon University, 2008, Completed the first year only.