Current Research Areas

Measurement Error

Compositional Data Analysis

Selective Inference

Past Projects


Meta-analyses have become the gold standard for synthesizing evidence from multiple clinical trials. They are useful when clinical trial sample sizes are small and the outcome is rare or adverse since individual trials often lack sufficient power to detect a treatment effect. However, when zero events are observed in one or both treatment arms in a trial, commonly used meta-analysis methods fail. My group investigated the impact these zero-events have in random-effects meta-analysis for data-pooling methods, heterogeneity variance, and continuity corrections.  We consider how these methods impact meta-analysis results via simulation using normal-binomial hierarchical models.

Mediation Analysis (Link to preprint)

Multiple expression quantitative trait loci (eQTLs) in protein coding genes (pc-genes) have been demonstrated to have a significant effect in the development of breast cancer – among other cancers. However, the overwhelming majority of these eQTLs locally regulate breast cancer-related genes. Distal effects of genetic variants on pc-genes have not been well-characterized. We investigated the distal effects that eQTLs have in non-coding RNA (ncRNA) and miRNA within solid-state breast carcinoma tumors. Using publicly available data from European-ancestry patients in The Cancer Genome Atlas (TCGA), we optimized a statistical model for detecting cis/trans-eQTLs for both ncRNA and pc-genes. After mediation analysis using MOSTWAS, we detected 3,017 distal-eQTLs of pcGenes significantly mediated through local-eQTLs of ncRNAs at FDR-adjusted P < 0.05. Further analysis such as colocalization and GWAS can further verify the causal relationship between these ncRNAs and pcGenes in breast cancer development and progression.

Electronic Health Records (Multiple Myeloma)

Multiple myeloma is a form of cancer with no cure and few empirically proven treatment options. With the non-profit HealthTree, I leveraged electronic health records to perform non-parametric survival analysis, logistic regression, and data engineering on observational data provided by multiple myeloma patients. My work directly contributed to two poster presentations that were accepted and presented at the 2021 American Society of Clinical Oncology.


Identifying the epigenetic mechanisms at play within cell-types is one of the early steps in understanding crucial differences between healthy and diseased cells. Peripheral blood monocytes play a crucial role in both the innate and adaptive immune system. We explore differences that exists within the epigenetic profiles of human monocytes based on sex. To determine if chromatin accessibility is sexually dimorphic, we collected blood samples from 6 volunteer adults (3 men and 3 women). Monocytes were then isolated from each volunteer. The ATAC-seq protocol was followed, and data analysis is being performed. The results of this analysis will determine to what extent chromatin accessibility of human monocytes is impacted by sex. These differences could have implications for therapeutics targeting the immune system.


As a visiting Undergraduate Research Fellow in the Department of Biomedical Informatics at the Harvard Medical School with Dr. Peter Kharchenko,  I designed and programmed a method to simulate slide-seqV2 spatial-transcriptomics data.  Modeling tissue heterogeneity and cell-cell interactions in vivo has become possible via spatial transcriptomic methods, affording computational researchers the power to investigate many fascinating questions. The computational problem is reconstructing in vivo cell-positioning and predicting their cell-type given the spatial location of the molecular barcodes and expression profile.  To simulate this, I used Poisson sampling processes to model cellular expression before spatially positioning these simulated "cells'' via a random graph. Using exponential decay as a distance-dependent metric, I leveraged multinomial sampling to reproduce dispersion of genetic material present in the slide-seqV2 pipeline. Various regression models and nonnegative matrix factorization then used my simulation framework to reconstruct the position of cells and cell-types in human tissue.