Resume
Work Experience
2021 - Present
Graduate Research Assistant
Leveraging real-world evidence, longitudinal EHR data, and clinical notes to model disease progression, identify early diagnostic signals, and support evidence-based healthcare research. My work spans statistical modeling (Cox regression, causal mediation), machine learning (Random Forests, SVMs, logistic regression), and NLP (BERT, cTAKES), with a strong focus on interpretable, clinically relevant solutions developed in close collaboration with clinicians and domain experts.
2021
Staff Research Associate
As a staff research associate, I developed and deployed a CNN-based tool for automated cell segmentation in post-mortem brain tissue, implementing the pipeline on AWS (EC2, S3, JupyterHub) to enable scalable, cloud-based image analysis.
2020 - 2021
Computer Systems Engineer
Developed an end-to-end imaging pipeline to detect neurons in immunofluorescent brain tissue using U-Net segmentation, accelerating manual workflows. Led initial deployment of the tool across two labs to support scalable, reproducible image analysis.
2019 - 2020
Graduate Research Assistant
Designed and optimized deep learning-based image analysis frameworks for material science, leveraging CNNs and high-performance computing (multi-core CPUs and GPUs) to improve efficiency and scalability of large imaging pipelines.
2019
Undergraduate Research Intern
Developed CNN models for automated material phase classification, streamlining large-scale imaging workflows and reducing manual inspection time in experimental microscopy datasets.
Education
2021 - 2025
Doctorate in Biological and Medical Informatics
University of California, San Francisco
Advisors
Marina Sirota, PhD and M.Maria Glymour, ScD
2019 - 2020
Master of Information and Data Science
University of California, Berkeley
2016 - 2018
Bachelor of Arts in Applied Mathematics: Data Science
University of California, Berkeley
Skills & Expertise
-
Programming & Data Analysis: Python (pandas, NumPy, scikit-learn, SciPy), R, SQL, PySpark, Bash
-
Machine Learning & AI: Random Forests, Support Vector Machines (SVMs), Logistic Regression, Naive Bayes, CNNs, U-Net, BERT (fine-tuned), TensorFlow, PyTorch, Keras
-
Natural Language Processing (NLP): Named Entity Recognition (NER), Text Classification, Sentiment Analysis, Summarization, cTAKES, OCR (AWS Rekognition, Tesseract), Tokenization
-
Statistical Modeling & Epidemiology: Cox proportional hazards models, Causal mediation analysis, Linear and logistic regression, Longitudinal mixed models, Splines, Multiple imputation
-
Data Modalities: Structured & unstructured EHR, Clinical notes, Radiology reports, Biomarker time series, Neuroimaging, Microscopy and 3D-imaging.
-
Cloud & Infrastructure: AWS (EC2, S3, Lambda, SageMaker, DynamoDB), Google Cloud Platform (multi-core CPUs), Docker, JupyterHub
-
Tools & Visualization: Git, GitHub, matplotlib, seaborn.
-
Research Strengths: Longitudinal data analysis, Disease trajectory modeling, Diagnostic prediction, Real-world evidence (RWE), Interpretable ML, Reproducible pipelines
-
Collaborative Experience: Cross-functional work with clinicians, neuropsychologists, imaging scientists, and epidemiologists across academic and clinical research settings.