PhD Defense by Li-Hsiang Lin

Title: Computer and Biological Experiments: Modeling, Estimation, and Uncertainty Quantification

 

Advisors: Dr. C. F. Jeff Wu and Dr. V. Roshan Joseph

 

Committee Members:

Dr. Cheng Zhu (Dept. of Biomedical Engineering, GT)

Dr. Xiaoming Huo (School of Industrial and Systems Engineering, GT)

Dr. Ying Hung (Dept. of Statistics, Rutgers University)

 

Date and Time: Tuesday, April 14th, 12:00

 

Meeting URL (for BlueJeans):

https://bluejeans.com/960540777

 

Meeting ID (for BlueJeans):

960 540 777

 

Location: Groseclose 202

 

Abstract:

Statistical experimental analysis is an indispensable tool in engineering, science, biomedicine, and technology innovation. There are generally two types of experiments: computer and physical experiments.  Computer experiments are simulations using complex mathematical models and numerical tools, while physical experiments are actual experiments performed in a laboratory or observed in the field.  Analyzing these experiments helps us understand real-world phenomena and motivates interesting statistical questions and challenges. This thesis presents new methodologies for applications in computer experiments and biomedical studies.

 

In Chapter 1, we propose a new method based on Gaussian processes (GPs) for analyzing computer experiments. GP is a popular choice for approximating a deterministic function in computer experiments. However, the role of transformation in GP modeling is not well understood. Here, we proposed using transformation in GP modeling to improve additivity. This involves finding a transformation of the response such that the deterministic function becomes an approximately additive function, which can then be easily estimated using an additive GP. We call this GP a Transformed Additive Gaussian (TAG) process. Furthermore, to capture possible interactions that are unaccounted for in the additive model, we proposed an extension of the TAG process called Transformed Approximately Additive Gaussian (TAAG) process.  We develop efficient techniques for fitting a TAAG process. In fact, we show here that TAAG can be fitted to high-dimensional data much more efficiently than standard GP. Additionally, we show that compared with a standard GP, TAG produces better estimation, interpretation, visualization, and prediction. The proposed methods are implemented in the R package TAG.

 

In Chapter 2, we further show that the concept of using transformation for improving the additivity of a target function is beneficial in big data modeling.  After improving the additivity, the target function is easier to approximate and is expected to be well-approximated using fewer data points. This implies that we can use a subset of the big data to reduce the computational burden to approximate the target function well. Thus, using the technique of including a subset of big data, we proposed a new method to solve the problem of estimating a target function in large-scaled experiment. Several numerical comparisons show that our methods outperform proposed methods in recent literatures for large-scaled computer experiments in terms of prediction accuracy and computational time.

 

In Chapter 3, motivated by a biological experiment, we propose a new method for quantifying uncertainty in biology studies. Uncertainty quantification attempts to appraise and quantify uncertainty in physical systems. However, in some physical systems, there lacks a method that can be further developed to quantify uncertainty. We were motivated by single-molecule experiments in the study of T cell signaling, where no models can be used for quantifying the features of the single-molecule experiments. To fix this problem, we developed a novel model, the varying coefficient frailty model, to quantify the uncertainty in the single-molecule experiments. The fitted varying coefficient model provides a rigorous quantification of an early and rapid impact on T cell signaling from the accumulation of bond lifetime, which can shed new light on the fundamental understanding of how T cells initiate immune responses. Theoretical properties of the estimators, including their unbiased properties near the boundary, are derived along with discussions on the asymptotic bias-variance trade-off. We can apply the model not only for single-molecule experiments, but also for survival analysis and reliability to explore time-varying effects from covariates with random effects.

 

In Chapter 4, we address the problem of identifying an optimal computer simulator for the observed physical experiments. In many applications, experimenters have several computer models with different scientific implications for a physical phenomenon. However, they may not know which computer model is the most optimal to describe the observed physics. An example from cell biology is that biologists have several biological models used for understanding cell adhesion between T lymphocytes and other cells, but they do not know which biological model is most desirable for real lab data. To find the optimal model for such lab data, we propose a selection criterion based on leave-one-out cross-validation. We show that this criterion can be decomposed into a goodness-of-fit measure and a generalized degrees of freedom, capturing the complexity of the computer simulator. Asymptotic properties of the selected optimal simulator are discussed. Additionally, we show that the proposed procedure includes a conventional calibration method as special case. In the application of cell biology, an optimal simulator is selected, which give new insight on the T cell recognition mechanism in the human immune system.

 

Event Details

Date/Time:

  • Tuesday, April 14, 2020
    1:00 pm - 3:00 pm

Accessibility Information

Per accessibility compliance standards, this page may have links to files that would require the downloading of additional software: