Publications and Working Papers
Manuscripts under review
1. Nonlinear Time Series Modeling Using Bernstein Polynomials and Bayesian Inference
Journal/Status: Submitted to Transactions on Machine Learning Research (TMLR).
Authors: Li Yuan and Raanju R. Sundararajan.
Current advisor: Raanju R. Sundararajan.
Summary:
- This paper introduces Bernstein Polynomial Autoregressive (BPAR) models and an AR-BPAR extension to capture smooth nonlinear temporal dependence while preserving lag-level interpretability.
- The framework uses Bayesian inference, including exact and variational approaches, to provide uncertainty quantification for model parameters and forecasts; shrinkage priors and credible intervals support model pruning.
- Empirical studies across simulated nonlinear systems, chaotic benchmarks, and real time series show strong forecasting performance, especially for chaotic and highly nonlinear data, compared with classical statistical models and neural network baselines.
Manuscripts in progress
1. Novel Point-of-Care Diagnostic Risk Model and Clinical Calculator for Biliary Atresia
Journal/Status: Manuscript in preparation.
Authors: Sandra Rios-Melendez, Li Yuan, Song Zhang, Reena Mourya, Pranavkumar Shivakumar, Estella Alonso, Stephen Guthery, Sanjiv Harpavat, Simon Horslen, Binita Kamath, Saul Karpen, Rohit Kohli, Kathleen Loomes, John Magee, Alexander Miethke, Benjamin Shneider, Ron Sokol, Pamela L. Valentino, Jorge Bezerra, Sindhu Pandurangi, and Childhood Liver Disease Research Network (ChiLDReN).
Summary:
- This study develops a multivariable logistic risk prediction model for biliary atresia using clinical and biomarker data from cholestatic infants in the ChiLDReN cohort and an external validation cohort from Children’s Medical Center Dallas.
- The final model combines age, stool color, MMP-7, and GGT, achieving excellent discrimination for biliary atresia and improving risk classification compared with MMP-7 alone.
- The model is translated into a web-based clinical calculator that provides individualized, real-time risk estimates to support earlier referral and diagnostic evaluation for infants with neonatal cholestasis.
2. AMIC Uncertainty: Bayesian Variational Inference for Sentiment Analysis
Journal/Status: Manuscript in preparation.
Authors: Li Yuan.
Advisor: Jing Cao, Ph.D..
Summary:
- This ongoing research extends the Attention-Based Multiple Instance Classification (AMIC) model from point-estimate sentiment prediction toward uncertainty-aware Bayesian prediction.
- The proposed Bayesian AMIC framework adds a Bayesian variational layer to pooled document-level embeddings produced by AMIC’s attention and sentiment blocks.
- The method uses variational posterior distributions over neural-network weights, KL regularization against Gaussian priors, binary cross-entropy, and AMIC sparsity and peakness penalties in a unified training loss.
- Posterior predictive sampling is used to estimate predictive means, credible intervals, variance, entropy, and class-probability uncertainty for document-level sentiment predictions.
- The framework also proposes a way to derive word-level sentiment contributions from the Bayesian head, supporting interpretable positive and negative word scoring.
- Experiments are being developed across benchmark sentiment datasets including wine reviews, Twitter sentiment, Amazon reviews, and IMDB movie reviews.
Publications
1. Critical Illness Outcomes of Hospitalized Pregnant Women Following a Texas Abortion Ban
Journal: American Journal of Respiratory and Critical Care Medicine.
Authors: Catherine Chen, Deepshikha C. Ashana, Kevin Callison, Courtney Baker, R. Nicholas Burns, Jing Cao, Li Yuan, and Hayley B. Gershengorn.
View paper
Summary:
- This study used an interrupted time series design to examine 2,344,135 hospitalized pregnant women in Texas from January 2018 through March 2024.
- In the full cohort, Texas SB8 was not associated with adjusted rates of critical illness, sepsis, mechanical ventilation, or infection; however, the proportion of hospitalized pregnant patients with high-risk features increased after SB8.
- Among high-risk patients, SB8 was associated with sustained increases in critical illness, sepsis, and infection, highlighting potential implications for infection prevention and critical care planning in abortion-restrictive states.
2. Rate-Perturbing Single Amino Acid Mutation for Hydrolases: A Statistical Profiling
Journal: The Journal of Physical Chemistry B.
Authors: Bailu Yan, Xinchun Ran, Yaoyukun Jiang, Sarah K. Torrence, Li Yuan, Qianzhen Shao, and Zhongyue J. Yang.
View paper
Summary:
- This paper introduced IntEnzyDB, an integrated structure-kinetics database for hydrolases containing thousands of curated kinetic measurements and protein structures for statistical analysis and machine learning.
- The analysis found that only about 10% of single amino acid substitutions were rate-enhancing, and mutations to bulky nonpolar residues such as Val, Ile, Phe, and Leu were significantly more likely to improve hydrolase turnover or catalytic efficiency.
- Structure-kinetics profiling showed that mutation effects depend on distance from the active site: nearby and midrange mutations can strongly change catalytic performance, while distal mutations are more likely to be efficiency-neutral and less likely to be deleterious.