I'm interested in computer vision, multimodal representation learning, foundation models. Most of my research is about analyzing existing mulimodal model and improving its application capabilities.
We propose a simple but powerful data augmentation method which augments a training image into a mosaic with three other negative images carefully curated by a pretrained multimodal alignment model, e.g., CLIP, to make the sample more challenging.
We conduct an extensive benchmark study to measure the performance of representative methods on widely used 7 datasets, while posing additional research questions and empirically verify them.
Work Experience
Oct. 2024 – Present
Seoul, Korea
Samsung Electronics · Software Engineer
Samsung Research → AX Team (Management Diagnosis Office)
DoXA — Document Extraction and Analysis(Internal Service, 2025.01–2025.12)
Developed a document QA benchmark dataset targeting image-heavy document understanding, supporting evaluation of multimodal parsing capabilities
Owned end-to-end production deployment of internal AI service, including K8S setup, PostgreSQL HA configuration, and BE/FE development
Building an agentic translation service for product manuals across Mobile eXperience and Digital Appliances business, handling both BE and FE development
Sep. 2021 – Dec. 2021
Seoul, Korea
SETsystem · AI Research Intern
Surveyed SOTA models for object detection and semantic segmentation to detect ship & wave on radar maps
Replaced Mask R-CNN with a simpler U-Net architecture, improving mIoU by over 1%
This page is a fork of Jon Barron's. Thank you for sharing :)