top of page

Resume

Work Experience

October 2021 - Present

Graduate Research Assistant

BCHSI - UCSF

Leveraged real-world evidence from longitudinal EHR data, claims, and clinical notes to model disease progression, identify early diagnostic signals, and quantify risk factors in neurodegenerative diseases. Applied statistical modeling (Cox regression, causal mediation), machine learning (Naïve Bayes, logistic regression, random forests, SVMs), and NLP (BERT, cTAKES) to build interpretable, end-to-end pipelines for patient phenotyping, trajectory modeling, and risk stratification. Designed workflows encompassing study design, temporal modeling, feature engineering, quality assurance, and model validation, ensuring clinical relevance and reproducibility. Collaborated with clinicians, and data engineers to integrate multi-modal data sources and deliver scalable, actionable insights for healthcare research.

April 2021 -Sept 2021

Staff Research Associate

Memory and Aging Center - UCSF

Developed, trained, and deployed a convolutional neural network (CNN) for automated segmentation of post-mortem brain tissue images, reducing manual annotation time by up to 80% in neuropathology studies. Implemented the pipeline on AWS (EC2, S3, JupyterHub) for scalable, cloud-based image analysis, enabling large-scale processing across male and female aging cohorts. Designed image preprocessing, model training, and evaluation workflows to ensure accuracy, reproducibility, and adaptability for downstream biomarker discovery and drug target validation in Alzheimer’s disease research.

Aug 2020 - Jan 2021

Computer Systems Engineer

Lawrence Berkeley National Lab

Developed an end-to-end deep learning imaging pipeline using U-Net segmentation to detect neurons in immunofluorescent brain tissue, reducing manual analysis time and improving reproducibility. Designed and implemented image preprocessing, augmentation, model training, and evaluation workflows, enabling scalable application across multiple research groups. Led the deployment and user adoption of the tool in two laboratories, supporting high-throughput, reproducible image analysis for neuroscience research.

June 2019 - Aug 2020

Graduate Research Assistant

Lawrence Berkeley National Lab

Designed and optimized deep learning-based image analysis frameworks for materials science applications, focusing on 3D imaging of fibrous materials. Leveraged convolutional neural networks (CNNs) and high-performance computing (multi-core CPUs, GPUs) to accelerate large-scale imaging pipelines and improve scalability for high-throughput experiments. Developed data preprocessing, volumetric segmentation, and classification workflows that reduced processing time by over 50%, enabling faster experimental cycles and more reproducible results.

Jan 2019 - May 2019

Undergraduate Research Intern

Lawrence Berkeley National Lab

Developed image analyses models for automated cell segmentation in pap smear microscopy images, prototyping a tool to assist pathologists by reducing the inspection area by ~70%. Designed image preprocessing, segmentation, and evaluation pipelines to accelerate pathology review, improve workflow efficiency, and demonstrate the potential for scalable, automated screening in cytology.

Education

Sept 2021 - Aug 2025

Doctorate in Biological and Medical Informatics

University of California, San Francisco

Advisors

Marina Sirota, PhD and M.Maria Glymour, ScD

May 2019 - Aug 2020

Master of Information and Data Science

University of California, Berkeley 

Aug 2016 - Dec 2018

Bachelor of Arts in Applied Mathematics: Data Science

University of California, Berkeley

Skills & Expertise

  • Programming & Data Analysis: Python (pandas, NumPy, scikit-learn, SciPy), R, SQL, PySpark, Bash

  • Machine Learning & AI: Random Forests, Support Vector Machines (SVMs), Logistic Regression, Naive Bayes, Transformers, Clustering, CNNs, U-Net, BERT (fine-tuned), TensorFlow, PyTorch, Keras

  • Natural Language Processing (NLP): Named Entity Recognition (NER), Text Classification, Sentiment Analysis, Summarization, cTAKES, spaCy, Hugging Face Transformers, OCR (AWS Rekognition, Tesseract), Tokenization

  • Statistical Modeling & Epidemiology: Cox proportional hazards models, Causal mediation analysis, Linear and logistic regression, Longitudinal mixed models, Splines, Multiple imputation

  • Data Modalities: Structured & unstructured EHR, Clinical notes, Radiology reports, Biomarker time series, Neuroimaging, Microscopy and 3D-imaging.

  • Cloud & Infrastructure: AWS (EC2, S3, Lambda, SageMaker, DynamoDB), Google Cloud Platform (multi-core CPUs), Docker, JupyterHub

  • Tools & Visualization: Git, GitHub, matplotlib, seaborn.

  • Research Strengths: Longitudinal data analysis, Disease trajectory modeling, Diagnostic prediction, Real-world evidence (RWE), Interpretable ML, Reproducible pipelines

  • Collaborative Experience: Cross-functional work with clinicians, neuropsychologists, imaging scientists, and epidemiologists across academic and clinical research settings.

bottom of page