top of page

Resume

Work Experience

2021 - Present

Graduate Research Assistant

Leveraging real-world evidence, longitudinal EHR data, and clinical notes to model disease progression, identify early diagnostic signals, and support evidence-based healthcare research. My work spans statistical modeling (Cox regression, causal mediation), machine learning (Random Forests, SVMs, logistic regression), and NLP (BERT, cTAKES), with a strong focus on interpretable, clinically relevant solutions developed in close collaboration with clinicians and domain experts.

2021

Staff Research Associate

As a staff research associate, I developed and deployed a CNN-based tool for automated cell segmentation in post-mortem brain tissue, implementing the pipeline on AWS (EC2, S3, JupyterHub) to enable scalable, cloud-based image analysis.

2020 - 2021

Computer Systems Engineer

Developed an end-to-end imaging pipeline to detect neurons in immunofluorescent brain tissue using U-Net segmentation, accelerating manual workflows. Led initial deployment of the tool across two labs to support scalable, reproducible image analysis.

2019 - 2020

Graduate Research Assistant

Designed and optimized deep learning-based image analysis frameworks for material science, leveraging CNNs and high-performance computing (multi-core CPUs and GPUs) to improve efficiency and scalability of large imaging pipelines.

2019

Undergraduate Research Intern

Developed CNN models for automated material phase classification, streamlining large-scale imaging workflows and reducing manual inspection time in experimental microscopy datasets.

Education

2021 - 2025

Doctorate in Biological and Medical Informatics

University of California, San Francisco

Advisors

Marina Sirota, PhD and M.Maria Glymour, ScD

2019 - 2020

Master of Information and Data Science

University of California, Berkeley 

2016 - 2018

Bachelor of Arts in Applied Mathematics: Data Science

University of California, Berkeley

Skills & Expertise

  • Programming & Data Analysis: Python (pandas, NumPy, scikit-learn, SciPy), R, SQL, PySpark, Bash

  • Machine Learning & AI: Random Forests, Support Vector Machines (SVMs), Logistic Regression, Naive Bayes, CNNs, U-Net, BERT (fine-tuned), TensorFlow, PyTorch, Keras

  • Natural Language Processing (NLP): Named Entity Recognition (NER), Text Classification, Sentiment Analysis, Summarization, cTAKES, OCR (AWS Rekognition, Tesseract), Tokenization

  • Statistical Modeling & Epidemiology: Cox proportional hazards models, Causal mediation analysis, Linear and logistic regression, Longitudinal mixed models, Splines, Multiple imputation

  • Data Modalities: Structured & unstructured EHR, Clinical notes, Radiology reports, Biomarker time series, Neuroimaging, Microscopy and 3D-imaging.

  • Cloud & Infrastructure: AWS (EC2, S3, Lambda, SageMaker, DynamoDB), Google Cloud Platform (multi-core CPUs), Docker, JupyterHub

  • Tools & Visualization: Git, GitHub, matplotlib, seaborn.

  • Research Strengths: Longitudinal data analysis, Disease trajectory modeling, Diagnostic prediction, Real-world evidence (RWE), Interpretable ML, Reproducible pipelines

  • Collaborative Experience: Cross-functional work with clinicians, neuropsychologists, imaging scientists, and epidemiologists across academic and clinical research settings.

bottom of page