Projects

ORCHID-Notes: Organ Procurement Note De-identification

Developing ORCHID-Notes, the first large-scale dataset of millions of de-identified clinical notes from the U.S. organ procurement system- enabling research into transplant equity and advancing LLM-based de-identification at scale.

MIMIC-IID: Probing and Mitigating Bias in the MIMIC-CXR Dataset at Scale

Creating MIMIC-IID, a scalable analysis pipeline that tests IID assumptions in MIMIC-CXR at scale, improving dataset transparency and fostering equitable AI development in critical care imaging.

PrivateNLG: Synthetic Data Generation for Mixed-Type Datasets

Engineering a high-throughput deep-generative framework that synthesizes mixed-type (numeric, categorical, and text) tables with preserved joint distributions and feature dependencies—validated in collaboration with Liberty Mutual to accelerate reliable, scalable machine learning development..

Home