Design and Analysis of Experiments 2021

Table of Contents

1 <2021-10-05 Tue> Opening Remark

10:45-11:00 AM Eastern Time

  • Ryan Lekivetz, JMP
  • Adam Lane, Cincinnati Children's Hospital

2 Scientific Sessions

2.1 <2021-10-05 Tue> Tradeoffs Between Computational Costs and Statistical Efficiency

11-12:30 PM Eastern Time

Organizer: Min Yang, University of Illinois-Chicago

  • William Li, Shanghai Advanced Institute of Finance
    • Title: On optimal designs in information-based optimal subdata - A systematic view of a data reduction strategy with application to second-order model
    • Abstract: With the urgent need of analyzing extraordinary amount of data, information-based optimal subdata selection (IBOSS) approach has gained considerable attention in the recent literature due to its ability of maintaining rich information within the full datset with limited subdata size. On the other hand, there is still lack of systematically exploring the framework, especially on the characterization of the optimal subset, the key step of developing the associated algorithm. Motivated by a real finance case study concerning the impact of corporate attributes on firm value, we systematically explore the framework consisting of the exact steps one can follow when employing the idea of IBOSS for data reduction. Considering the second-order effect model that contains main effects, quadratic effects, and interaction effects, we develop a novel algorithm of selecting an informative subdata. Empirical studies including a real example demonstrate that the new algorithm adequately addresses the trade-off between the computation complexity and statistical efficiency, one of six core research directions for theoretical data science research proposed by the US National Science Foundation.
  • Yanxi Liu, University of Illinois at Chicago
    • Title: Information-based Optimal Subdata Selection for Clusterwise Linear Regression
    • Abstract: Technological advancements have accelerated in recent years. The amount of data being collected and the size of the data are increasing exponentially. Over time, it becomes more challenging to deal with not just massive amounts of data but also their complexity. The relationship between input and output variables may not be homogeneous anymore. Conventional statistical models such as generalized linear models (GLMs) may not be well-suited to heterogeneous relationships. Using a Mixture of Expert models is a good solution. The mixture of Expert models can combine different models. It can detect heterogeneous patterns while maintaining the benefits of conventional statistical modeling techniques. It does, however, need a considerable amount of computer resources, particularly when working with huge quantities of data. The subdata approach is a technique for resolving this issue. Inspired by Wang, Yang, and Stufken (2019), the purpose of this project is to develop an algorithm for clusterwise linear regression, a type of Mixture of Experts, to select optimal subdata from the full data set, which preserves the maximum amount of information while requiring minimal computing resources. In this project, the proposed subdata selection is proved to be asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.
  • Roshan Joseph, Georgia Institute of Technology
    • Title: Supervised compression of big data
    • Abstract: The phenomenon of big data has become ubiquitous in nearly all disciplines, from science to engineering. A key challenge is the use of such data for fitting statistical and machine learning models, which can incur high computational and storage costs. One solution is to perform model fitting on a carefully-selected subset of the data. Various data reduction methods have been proposed in the literature, ranging from random subsampling to optimal experimental design-based methods. However, when the goal is to learn the underlying input-output relationship, such reduction methods may not be ideal, since it does not make use of information contained in the output. To this end, we propose a supervised data compression method called supercompress, which integrates output information by sampling data from regions most important for modeling the desired input-output relationship. An advantage of supercompress is that it is nonparametric – the compression method does not rely on parametric modeling assumptions between inputs and output. As a result, the proposed method is robust to a wide range of modeling choices. We demonstrate the usefulness of supercompress over existing data reduction methods, in both simulations and a taxicab predictive modeling application.

2.2 <2021-10-06 Wed> Optimal designs

12-1:30 PM Eastern Time

Organizer: John Stufken, University of North Carolina Greensboro

  • Jesús López Fidalgo, University of Navarra
    • Title: Active Learning considering the marginal distribution of the covariates
    • Abstract: The Big Data sample size introduces statistical and computational challenges to extract useful information from data sets. The subsampling procedure is widely used to downsize the data volume and allows computing estimators in regression models. Usually, subsampling is performed defining a weight for each point and selecting a subset according to these weights. The subsample can be chosen at random (Passive Learning), but in order to obtain better estimators, the optimal experimental design theory can be used searching for an “influential” sub-sample (Active Learning). This has been developed in the literature for linear and logistic regression, obtaining algorithms based on D-optimality and A-optimality. To the authors knowledge the distribution of the explanatory variables has never been considered for obtaining a subsample. We study the effect of the explanatory variables distribution on the estimation as well as the optimal design. We first assume normality of the covariates and latter we measure the impact of skewness and kurtosis on the estimation and optimal designs. Then, we propose a novel method to obtain optimal subsampling through D-optimality, taking into account the marginal distribution of the covariates. The D-optimal design is computed by an exchange algorithm to obtain the subsample.
  • Kalliopi Mylona, King's College London
    • Title: Optimal split-plot designs for precise pure-error estimation of the variance components
    • Abstract: In this work, we present a novel approach to design split-plot experiments which ensures that the two variance components can be estimated from pure error and guarantees a precise estimation of the response surface model. Our novel approach involves a new Bayesian compound D-optimal design criterion which pays attention to both the variance components and the fixed treatment effects. One part of the compound criterion (the part concerned with the treatment effects) is based on the response surface model of interest, while the other part (which is concerned with pure-error estimates of the variance components) is based on the full treatment model. We demonstrate that our new criterion yields split-plot designs that outperform existing designs from the literature both in terms of the precision of the pure-error estimates and the precision of the estimates of the factor effects.

This is a joint work with Steven G. Gilmour (King’s College London) and Peter Goos (KU Leuven)

  • Rakhi Singh, UNC Greensboro
    • Title: Design selection for 2-level supersaturated designs
    • Abstract: The commonly used design optimality criteria are inadequate for selecting supersaturated designs. As a result, there is extensive literature on alternative optimality criteria within this context. Most of these criteria are rather ad hoc and are not directly related to the primary goal of experiments that use supersaturated designs, which is factor screening. Especially, unlike almost any other optimal design problem, the criteria are not directly related to the method of analysis. An assumption needed for the analysis of supersaturated designs is the assumption of effect sparsity. Under this assumption, a popular method of analysis for 2-level supersaturated designs is the Gauss-Dantzig Selector (GDS), which shrinks many estimates to 0. We develop new design selection criteria inspired by the GDS and establish that designs that are better under these criteria tend to perform better as screening designs than designs obtained using existing criteria. This presentation is based on joint work with John Stufken, University of North Carolina at Greensboro.

2.3 <2021-10-12 Tue> New Developments in Factorial Designs and Orthogonal Arrays

11-12:30 PM Eastern Time

Organizer: Hongquan Xu, University of California, Los Angeles

  • Jessica Jaynes, California State University
    • Title: Orthogonal Array Composite Designs for Drug Combination Experiments with Applications for Tuberculosis
    • Abstract: The aim of this research is to provide an overview of the orthogonal array composite design (OACD) methodology, show that they can be robust to missing data under practical scenarios and provide an application for tuberculosis. We compare the 𝐷-efficiencies of OACDs to the commonly used central composite designs (CCD) when there are a few missing observations and demonstrate that OACDs are more robust than the popular CCDs to missing observations for two scenarios. The first scenario assumes one observation is missing either from one factorial point or one additional point. The second scenario assumes two observations are missing either from two factorial points or from two additional points, or from one factorial point and one additional point. Two real-world applications of OACDs pertaining to tuberculosis are provided: a 155-run OACD with nine drugs and a 50-run OACD with six drugs.
  • Robert Mee, University of Tennessee
    • Title: Two-level parallel flats designs
    • Abstract: Regular \(2^{n-p}\) designs are also known as single flat designs. Parallel flats designs (PFDs) consisting of three parallel flats (3-PFDs) are the most frequently utilized PFDs, due to their simple structure. Generalizing to \(f\) -PFD with \(f>3\) is more challenging. This talk summarizes recent work on a general theory for the \(f\) -PFD for any \(f\geq 3\). We propose a method for obtaining the confounding frequency vectors for all nonequivalent \(f\) -PFDs, and to find the least \(G\) -aberration (or highest D-efficiency) \(f\) -PFD constructed from any single flat. We also characterize the quaternary code design series as PFDs. Finally, we show how designs constructed by concatenating regular fractions from different families may also have a parallel flats structure. Examples are given throughout to illustrate the results.
  • Lin Wang, George Washington University
    • Title: Orthogonal subsampling for big data linear regression
    • Abstract: The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal subsampling (OSS) approach for big data with a focus on linear regression models. The approach is inspired by the fact that an orthogonal array of two levels provides the best experimental design for linear regression models in the sense that it minimizes the average variance of the estimated parameters and provides the best predictions. The merits of OSS are three-fold: (i) it is easy to implement and fast; (ii) it is suitable for distributed parallel computing and ensures the subsamples selected in different batches have no common data points; and (iii) it outperforms existing methods in minimizing the mean squared errors of the estimated parameters and maximizing the efficiencies of the selected subsamples. Theoretical results and extensive numerical results show that the OSS approach is superior to existing subsampling approaches. It is also more robust to the presence of interactions among covariates and, when they do exist, OSS provides more precise estimates of the interaction effects than existing methods. The advantages of OSS are also illustrated through analysis of real data.

2.4 <2021-10-13 Wed> Online experiments

12-1:30 PM Eastern Time

Organizer: David Steinberg, Tel Aviv University

  • Susan Murphy, Harvard University
    • Title: Micro-Randomized Trials & Online Decision-Making Algorithms
    • Abstract: A formidable challenge in designing sequential treatments in health is to determine when and in which context it is best to deliver treatments to individuals. Operationally designing the sequential treatments involves the construction of decision rules that input current context of an individual and output a recommended treatment. Micro-randomized experiments, in which each individual is randomized many times can be use to provide data for constructing these decision rules. Further there is much interest in personalization during the experiment, that is, in real time as the individual experiences sequences of treatment. Here we discuss our work in designing online "bandit" learning algorithms for use in personalizing mobile health interventions Reinforcement Learning provides an attractive suite of online learning methods for personalizing interventions in a Digital Health.
  • Julie Beckley, Etsy
    • Title: Improving Internet Experiment Validity
    • Abstract: Large scale internet experiments are critical for making the best decisions for customers. While randomized controlled experiments are the cleanest way to measure treatment effects, ensuring experiments are behaving as expected is much harder. Based on my experience at Etsy and Netflix, I will go over several case studies of unexpected challenges and the methodological solutions we implemented to resolve them.
  • Weitao Duan, LinkedIn
    • Title: Experiment Velocity and Trustworthiness at LinkedIn
    • Abstract: Controlled experiments, or A/B tests, have been the gold standard for testing a product feature and making launch decisions. Many technology companies, such as Google, Facebook, LinkedIn, and Microsoft, have built large-scale in-house experimentation platforms and fully adopted A/B testing in their decision-making process. Despite the increasing engagement on the site, many of our experiments at LinkedIn suffer from low sample size and low experiment power. To be able to make inference, we adopted a different randomization unit to greatly increase experiment power. In this talk, we will share our lessons and learnings on this. In the second half of the talk, we will discuss the topic of experimentation within an advertising marketplace. We will show that the typical A/B test could lead to severe bias and introduce the budget split design to remove the cannibalization bias.

2.5 <2021-10-19 Tue> Bayesian Optimization and Active Learning

11-12:30 PM Eastern Time

Organizer: Robert Gramacy, Virginia Tech

  • Nathan Wycoff, Georgetown University
    • Title: Learning And Deploying Active Subspaces On Black Box Simulators
    • Abstract: Surrogate modeling of computer experiments via local models, which induce sparsity by only considering short range interactions, can tackle huge analyses of complicated input-output relationships. However, narrowing focus to local scale means that global trends must be relearned over and over again. We first demonstrate how to use Gaussian processes to efficiently perform a global sensitivity analysis on an expensive black box simulator. We next propose a framework for incorporating information from this global sensitivity analysis into the surrogate model as an input rotation and rescaling preprocessing step. We further discuss applications to derivative free optimization via locally defined subspaces. Numerical experiments on observational data and benchmark test functions provide empirical validation.
  • Max Balandat, Facebook
    • Title: Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces
    • Abstract: The ability to optimize multiple competing objective functions with high sample efficiency is imperative in many applied problems across science and industry. Multi-objective Bayesian optimization (BO) achieves strong empirical performance on such problems, but even with recent methodological advances, it has been restricted to simple, low-dimensional domains. Most existing BO methods exhibit poor performance on search spaces with more than a few dozen parameters. In this work we propose MORBO, a method for multi-objective Bayesian optimization over high-dimensional search spaces. MORBO performs local Bayesian optimization within multiple trust regions simultaneously, allowing it to explore and identify diverse solutions even when the objective functions are difficult to model globally. We show that MORBO significantly advances the state-of-the-art in sample-efficiency for several high-dimensional synthetic and real-world multi-objective problems, including a vehicle design problem with 222 parameters, demonstrating that MORBO is a practical approach for challenging and important problems that were previously out of reach for BO methods.
  • Matthias Poloczek, Amazon
    • Title: Scalable High-dimensional Bayesian Optimization
    • Abstract: Bayesian optimization has become a powerful method for the sample-efficient optimization of expensive black-box functions. These functions do not have a closed-form and are evaluated for example by running a complex simulation of a marketplace, by a physical experiment in the lab or in a market, or by a CFD simulation. Use cases arise in machine learning, e.g., when tuning the configuration of an ML model or when optimizing a reinforcement learning policy. Many of these applications are high-dimensional, i.e., the number of tunable parameters exceeds 20, and thus difficult for current approaches due to the curses of dimensionality and the heterogeneity of the underlying functions. Of particular interest are constrained settings, where we are looking for a solution that satisfies inequality constraints of the form c(x) <= 0 and is globally optimal for the objective function among all feasible solutions. These constrained problems are particularly challenging because the sets of feasible points are often small and non-convex. Due to the lack of sample efficient methods, practitioners usually fall back to evolutionary strategies or heuristics.

      In this talk I will start with a brief introduction to Bayesian optimization and then present the trust region Bayesian optimization algorithm (TuRBO) that addresses the above challenges via a local surrogate and a suitable sampling strategy. Then we will turn our attention to optimization under expensive black-box constraints and introduce the scalable constrained Bayesian optimization algorithm (SCBO). I will show comprehensive experimental results that demonstrate that TuRBO and SCBO achieve excellent results and outperform the state-of-the-art methods.

      References:

      • A Tutorial on Bayesian Optimization
      • A Framework for Bayesian Optimization in Embedded Subspaces
      • Scalable Global Optimization via Local Bayesian Optimization
      • Scalable Constrained Bayesian Optimization

2.6 <2021-10-20 Wed> Bayesian adaptive clinical trial designs: using uncertainty and information

12-1:30 PM Eastern Time

Organizer: Peter Muller, the University of Texas at Austin

  • Tianjian Zhou, Colorado State University
    • Title: Probability-of-Decision Designs to Accelerate Dose-Finding Trials
    • Abstract: Cohort-based enrollment can slow down phase I dose-finding trials since the outcomes of the previous cohort must be fully evaluated before the next cohort can be enrolled. This results in frequent suspension of patient enrollment. We propose a class of probability-of-decision (POD) designs to accelerate dose-finding trials, which enable dose assignments in real-time in the presence of pending toxicity outcomes. With uncertain outcomes, the dose assignment decisions are treated as random variables, and we calculate the posterior distribution of the decisions. The posterior distribution reflects the variability in the pending outcomes and allows a direct and intuitive evaluation of the confidence of all possible decisions. Optimal decisions are calculated based on the 0-1 loss, and extra safety rules are constructed to enforce sufficient protection from exposing patients to risky doses. A new and useful feature of POD designs is that they allow investigators and regulators to balance the trade-off between enrollment speed and making risky decisions by tuning a pair of intuitive design parameters. The performances of POD designs are evaluated through numerical studies.
  • Daniel Schwartz, University of Chicago
    • Title: Bayesian Uncertainty-Directed Designs with Model Averaging for Faster and More Informative Dose-Ranging Trials
    • Abstract: In this paper we make three contributions to the design and analysis of Phase 2b non-oncology dose-ranging trials, which are critical for drug developers to find the optimal dose to carry forward to Phase 3. First, we use a Bayesian "uncertainty-directed" design (Ventz et al. 2018) that adaptively randomizes patients to doses in a way that explicitly maximizes information about which dose is optimal. This typically means assigning new patients to doses that have been previously understudied relative to how strongly the data suggest they could be the optimal dose. Second, we efficiently and robustly incorporate pharmacological knowledge through Bayesian model averaging of parametric dose-response curves. And third, we provide very fast posterior computation for this Bayesian adaptive design using a Sequential Monte Carlo algorithm that makes it easier for trialists to conduct extensive simulation studies to reliably check Frequentist error. These practical designs show promise to accelerate Phase 2b trials and produce higher quality evidence before Phase 3.
  • Meizi Liu, University of Chicago
    • Title: PoD-BIN: A Probability of Decision Bayesian Interval De- sign for Time-to-Event Dose-Finding Trials with Multiple Toxicity Grades
    • Abstract: We consider a Bayesian framework based on “probability of decision” for dose- finding trial designs. The proposed PoD-BIN design evaluates the posterior predictive probabilities of up-and-down decisions. In PoD-BIN, multiple grades of toxicity, catego- rized as mild toxicity (MT) and dose-limiting toxicity (DLT), are modeled simultaneously, and the primary outcome of interests is time-to-toxicity for both MT and DLT. This allows the possibility of enrolling new patients when previously enrolled patients are still being fol- lowed for toxicity, thus potentially shortening the trial length. The Bayesian decision rules in PoD-BIN utilize the probability of decisions to balance the trade-off between the need to speed up the trial and the risk of exposing patients to overly toxic doses. We demonstrate via numerical examples the resulting trade-off of speed and safety of PoD-BIN and com- pare it to existing designs. PoD-BIN appears to be able to control the frequency of making risky decisions and, at the same time, shorten the trial duration in the simulation.

2.7 <2021-10-26 Tue> Analyzing Clinical Trials Disrupted by COVID

11-12:30 PM Eastern Time

Organizer: Nancy Flournoy, University of Missouri

  • Richard Emsley, King's College London.
    • Title: Frequentist and Bayesian approaches to rescuing disrupted trials
    • Abstract: There is a severe threat to the validity of clinical trials that were underway before the COVID-19 pandemic and potentially huge research waste. Many studies were paused and recruitment restarted without due consideration of whether all studies should restart, or to required changes in sample sizes. There is also a need for solutions to practical and statistical issues (e.g. increased missing data) that have arisen from the virus and its sequelae. This talk will present some of these challenges, and discuss frequentist and Bayesian approaches to rescuing disrupted trials. The contents are drawn from a report on this topic from the NISS Ingram Olkin Forum Series on Unplanned Clinical Trial Disruptions.
  • Kelly Van Lancker, Ghent University.
    • Title: Potential estimands and estimators for clinical trials impacted by COVID-19
    • Abstract: The COVID-19 pandemic continues to affect the conduct of clinical trials of medical products globally. Complications may arise from pandemic-related operational challenges such as site closure, travel limitations and interruptions to the supply chain for the investigational product, or from health-related challenges such as COVID-19 infected trial participants. Some of these complications lead to unforeseen intercurrent events in the sense that they affect either the interpretation or the existence of the measurements associated with the clinical question of interest. The ICH E9(R1) Addendum on estimands provides a rigorous basis to discuss potential pandemic-related trial disruptions and embed them in the context of study objectives and design elements. In this talk, we focus on the use of the hypothetical strategy and to a lesser extent the treatment-policy strategy to frame clinical questions in the presence of the unforeseen intercurrent events due to the COVID-19 pandemic. It should be noted that different hypothetical strategies could be considered and care has to be taken that the envisaged scenario, in which the intercurrent event would not occur, is precisely described. For their estimation, we will consider different causal inference and missing data methods such as multiple imputation and (augmented) inverse probability weighting. To clarify, we describe the features of a stylized trial in neuroscience, and how it may have been impacted by the pandemic. This stylized trial will then be re-visited by discussing the changes to the estimand and the estimator to account for pandemic disruptions.
  • Diane Uschner, George Washington University.
    • Title: Randomization tests toaddress disruptions inclinical trials
    • Abstract: In early 2020, the World Health Organization declared the novel corona virus disease (COVID-19) a pandemic. On top of prompting various trials to study treatments and vaccines for COVID-19, COVID-19 also had numerous consequences for ongoing clinical trials. People around the globe restricted their daily activities to minimize contagion, which led to missed visits and cancelling or postponing of elective medical treatments. For some clinical indications, COVID-19 may lead to a change in the patient population or treatment effect heterogeneity. We will measure the effect of the disruption on randomization tests and derive a methodological framework for randomization tests that allows for the assessment of clinical trial disruptions. We show that randomization tests are robust against clinical trial disruptions in certain scenarios, namely if the disruption can be considered an ancillary statistic to the treatment effect. As a consequence, randomization tests maintain type I error probability and power at their nominal levels.

2.8 <2021-10-27 Wed> Bayesian and model-robust design

12-1:30 PM Eastern Time

Organizer: Dave Woods, University of Southampton

  • Tim Waite, University of Manchester, UK
    • Title: Minimax efficient random experimental design strategies with application to model-robust design for prediction
    • Abstract: Fisher stressed the importance of randomizing an experiment via random permutation of the allocation of treatments to experimental units; in an industrial context this usually amounts to randomizing the run order of the design. In this talk we take the idea of experimental randomization much further by introducing flexible new random design strategies in which the design to be applied is chosen at random from a distribution of possible designs. We discuss the philosophical justification for doing so from a game-theoretic perspective and it is shown that the new strategies give stronger bounds on both the expectation and survivor function of the loss distribution. The consequences of this approach are explored in several problems, including global prediction from a linear model contaminated by a discrepancy function from an L2-class. In this problem the performance improvement is dramatic: the new approach gives bounded expected loss, in contrast to previous designs for which the expected loss was unbounded.
  • Lida Mavrogonatou, University of Cambridge
    • Title: Optimal Bayesian experimental design for model selection through minimisation of f-divergences
    • Abstract: A systematic understanding of studied phenomena has been embraced in a range of scientific disciplines where a collection of components are studied as parts of a system rather than as isolated processes. As direct observation of the studied system is often not possible, observable information is collected through experiments and subsequently used for inference of unobservable components. Given a predefined budget, Bayesian optimal experimental design methods are often employed to identify the most useful (in terms of a targeted objective) experimental conditions while accounting for potential sources of uncertainty. Unfortunately, currently adopted methods fail to address challenges arising within a modern scientific framework, due to the increased computational complexity of models that can realistically capture the studied structures. In this talk, I will present an efficient estimation framework that is shown to overcome ongoing challenges through the use of variational approximation methods. The proposed approach is applicable to optimal experimental design problems for model selection. A suitable class of metrics that are used to quantify the benefit from each experimental condition (commonly known as utility functions) is established in which the benefit is expressed as an f-divergence between predictive distributions of the competing models.
  • Lulu Kang, Illinois Institute of Technology
    • Title: A Maximin Φp-Efficient Design for Multivariate Generalized Linear Models
    • Abstract: Experimental designs for a generalized linear model (GLM) often depend on the specification of the model, including the link function, the predictors, and unknown parameters, such as the regression coefficients. To deal with the uncertainties of these model specifications, it is important to construct optimal designs with high efficiency under such uncertainties. Existing methods such as Bayesian experimental designs often use prior distributions of model specifications to incorporate model uncertainties into the design criterion. Alternatively, one can obtain the design by optimizing the worst-case design efficiency with respect to the uncertainties of model specifications. In this work, we propose a new Maximin \(\Phi_p\) -Efficient (or Mm-\(\Phi_p\) for short) design which aims at maximizing the minimum \(\Phi_p\) -efficiency under model uncertainties. Based on the theoretical properties of the proposed criterion, we develop an efficient algorithm with sound convergence properties to construct the Mm-\(\Phi_p\) design. The performance of the proposed Mm-\(\Phi_p\) design is assessed through several numerical examples.

3 <2021-10-07 Thu> Roundtable Discussion

11AM - 12PM Eastern

Career in Academia

Career in Industry

Publishing Papers

4 <2021-10-14 Thu> JMP Poster Session

11AM - 12:30PM Eastern

Judges:

Student Best Poster Competition winners:

  1. Nicholas Alfredo Larsen, North Carolina State University, Department of Statistics, for the poster: HODOR: A two-stage Hold-Out Design for Online Randomized experiments
  2. Torsten Reuter, Otto von Guericke University Magdeburg, for the poster: Optimal Subsampling Design for Big Data Regression
  3. Mohammed Saif Ismail Hameed, KU Leuven, for the poster: A tailored analysis of data from OMARS designs
  4. (Honorable mention) Gautham Sunder, Carlson School of Management, for the poster: Hyperparameter Optimization of Deep Neural Networks with Application to Medical Device Manufacturing

Posters

Mario Becerra, KU Leuven
  • Abstract: Discrete choice experiments are frequently used to quantify consumer preferences by having respondents choose between different alternatives. Choice experiments involving mixtures of ingredients have been largely overlooked in the literature, even though many products and services can be described as mixtures of ingredients. As a consequence, little research has been done on the optimal design of choice experiments involving mixtures. The only existing research has focused on D-optimal designs, which means that an estimation-based approach was adopted. However, in experiments with mixtures, it is crucial to obtain models that yield precise predictions for any combination of ingredient proportions. This is because the goal of mixture experiments generally is to find the mixture that optimizes the respondents' utility. As a result, the I-optimality criterion is more suitable for designing choice experiments with mixtures than the D-optimality criterion because the I-optimality criterion focuses on getting precise predictions with the estimated statistical model. In this talk, I will review Bayesian I-optimal designs, compare them with their Bayesian D-optimal counterparts, and show that the former designs perform substantially better than the latter in terms of the variance of the predicted utility.
Alexandre Bohyn, KU Leuven
  • Abstract: A protocol for a bio-assay involves a substantial number of steps that may affect the end result. To identify the influential steps, screening experiments can be employed with each step corresponding to a factor and different versions of the step corresponding to factor levels. The designs for such experiments usually include factors with two levels only. Adding a few four-level factors would allow inclusion of multi-level categorical factors or quantitative factors that may show quadratic or even higher-order effects. However, while a reliable investigation of the vast number of different factors requires designs with larger run sizes, catalogs of designs with both two-level factors and four- level factors are only available for up to 32 runs. In this presentation, we discuss the generation of such designs. We use the principles of extension (adding columns to an existing design to form candidate designs) and reduction (removing equivalent designs from the set of candidates). More especially, we select three algorithms from the current literature for the generation of complete sets of two-level designs, adapt them to enumerate designs with both two-level and four-level factors, and compare the efficiency of the adapted algorithms for generating complete sets of non-equivalent designs. Finally, we use the most efficient method to generate a complete catalog of designs with both two-level and four-level factors for run sizes 32, 64, 128 and 256.
Carlos de la Calle-Arroyo, University of Castilla-La Mancha
  • Abstract: Vapor pressure is a temperature-dependent characteristic of pure liquids, and also of their mixtures. This thermodynamic property can be characterized through a wide range of models. Antoine's equation stands out among them for its simplicity and precision. Its parameters are estimated via maximum likelihood with experimental data. Once the parameters of the equation have been estimated, vapor pressures between known values of the curve can be interpolated. Other physical properties such as heat of vaporization can be predicted as well. The probability distribution of a physical phenomenon is often hard to know in advance, as it depends on the phenomenon itself as well as the procedures to carry on the experiments and the measurements. Hence, assuming a probability distribution for such events has to be done with caution, as it affects the Fisher Information Matrix and consequently the optimal designs. This work presents D-, Ds-, A- and I-optimal designs to estimate the unknown parameters of the Antoine's equation as accurately as possible for homoscedastic and heteroscedastic normal distribution of the response, with the characteristic objectives of the different criteria. An online tool to calculate Antoine's optimal designs for the criteria included in this work has been developed.
DUHAMEL, Université Grenoble Alpes, INRIA, IFPEN
  • Abstract: Nowadays, many model (e.g. a calculation code) inversion issues are present in the industry. These problems are defined by finding all sets of parameters such that a certain quantity of interest remains in a certain area, for example below a threshold. In the field of floating wind for instance, a pre-calibration step consists in estimating model parameters that fit with a given accuracy the measured data (e.g. accelerations).

    An effective way to solve this problem is to use Gaussian process meta-modeling (Kriging) with a sequential experiment design and an inversion-adapted enrichment criterion, such as the famous Bichon (also known as Expected Feasibility Function) and deviation number (denoted U) criteria. It is also possible to use a more elaborate class of criteria: the SUR (Stepwise Uncertainty Reduction) criteria, which in addition to taking into account the evaluation points and the available model evaluations, quantify the uncertainty reduction which can be achieved by the addition of the new point.

    We propose here a SUR version of the Bichon criterion, with both theoretical aspects (explicit formulation of the criterion) and numerical aspects (implementation issues and comparisons with other criteria on classical test functions).

    The part on theoretical aspects therefore presents the proposed SUR strategy, defined from a measure of uncertainty related to the Bichon criterion (integral of the Bichon criterion on the design space), as well as an explicit formulation of the SUR Bichon criterion allowing an efficient implementation. The part on numerical aspects presents the first results concerning the performance associated with this new criterion, compared to other classic criteria, and on common test functions.

    The future prospects for this work are adapting this criterion to more complex data like functional uncertain input variables. In this particular framework, the design of experiment will have to be adapted.

Subhadra Dasgupta, IITB-Monash Research Academy
  • Abstract: This work is focused on finding the best possible retrospective designs for kriging models with two-dimensional inputs. Models with separable exponential covariance structures are studied. The retrospective designs are constructed by adding or deleting points from an already existing design. The best possible designs are found by minimizing the supremum of mean squared prediction error. Deterministic algorithms are developed to find the best possible retrospective designs. We develop the notion of evenness of two-dimensional grid designs to compare them with each other, using the concept of majorization. For the case of the addition of points, we develop two methods for finding the best possible design, one is adding one point at a time and the other is adding all the points simultaneously. For the case of deletion of the points, we develop the method for deleting all points simultaneously. The results show, that a more evenly spread design is the best possible design and is close to regularly spaced grid designs in terms of their efficiencies. To address the scenarios where covariance parameters are unknown, a pseudo-Bayesian technique is used to determine the best possible designs.

    Keywords: Kriging, G-optimality, grid designs, retrospective designs, regularly spaced grids, separable covariance, Ornstein-Uhlenbeck process

Nick Doudchenko, Google
  • Abstract: We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-in-means estimator and a variety of synthetic-control techniques. We propose several different novel estimators and motivate the choice between them depending on the underlying assumptions the researcher is willing to make. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed estimators lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of the mean squared error of the estimates and statistical power when compared to simple and commonly used alternatives such as randomized trials.
Mohammed Saif Ismail Hameed, KU Leuven
  • Abstract: Experimental data are often highly structured due to the use of experimental designs. This does not only simplify the analysis, but it allows for tailored methods of analysis that extract more information from the data than generic methods. One group of experimental designs that are suitable for such methods are the orthogonal minimally aliased response surface (OMARS) designs (Núñez Ares and Goos, 2020), where all main effects are orthogonal to each other and to all second order effects. The design based analysis method of Jones and Nachtsheim (2017) has shown significant improvement over existing methods in powers to detect active effects. However, the application of their method is limited to only a small subgroup of OMARS designs that are commonly known as definitive screening designs (DSDs). In our work, we not only improve upon the Jones and Nachtsheim method for DSDs, but we also generalize their analysis framework to the entire family of OMARS designs. Using extensive simulations, we show that our customized method for analyzing data from OMARS designs is highly effective in selecting the true effects when compared to other modern (non-design based) analysis methods, especially in cases where the true model is complex and involves many second order effects.

    References:

    Jones, Bradley, and Christopher J. Nachtsheim. 2017. “Effective Design-Based Model Selection for Definitive Screening Designs.” Technometrics 59(3):319–29.

    Núñez Ares, José, and Peter Goos. 2020. “Enumeration and Multicriteria Selection of Orthogonal Minimally Aliased Response Surface Designs.” Technometrics 62(1):21–36.

Chaofan Huang, Georgia Institute of Technology
  • Abstract: Space-filling designs are important in computer experiments, which are critical for building a cheap surrogate model that adequately approximates an expensive computer code. Many design construction techniques in the existing literature are only applicable for rectangular bounded space, but in real world applications, the input space can often be non-rectangular because of constraints on the input variables. One solution to generate designs in a constrained space is to first generate uniformly distributed samples in the feasible region, and then use them as the candidate set to construct the designs. Sequentially Constrained Monte Carlo (SCMC) is the state-of-the-art technique for candidate generation, but it still requires large number of constraint evaluations, which is problematic especially when the constraints are expensive to evaluate. Thus, to reduce constraint evaluations and improve efficiency, we propose the Constrained Minimum Energy Design (CoMinED) that utilizes recent advances in deterministic sampling methods. Extensive simulation results on 15 benchmark problems with dimensions ranging from 2 to 13 are provided for demonstrating the improved performance of CoMinED over the existing methods.
Jeevan Jankar, University of Georgia
  • Abstract: With the help of Generalized Estimating Equations, we identify locally \(D\) -optimal crossover designs for generalized linear models. We adopt the variance of parameters of interest as the objective function, which is minimized using constrained optimization to obtain optimal crossover designs. In this case, the traditional general equivalence theorem could not be used directly to check the optimality of obtained designs. In this manuscript we derive a corresponding general equivalence theorem for crossover designs under generalized linear models.
Nicholas Alfredo Larsen, North Carolina State University, Department of Statistics
  • Abstract: A/B tests are standard tools for estimating the average treatment effect (ATE) in online controlled experiments (OCEs), and are key to how online businesses use data to improve products and services. The majority of OCE theory makes the Stable Unit Treatment Value Assumption, which presumes the response of individual users depends only on the assigned treatment, not the treatments of others. Violations of this assumption occur when users are subjected to network interference. Standard methods for estimating the ATE typically ignore this, producing heavily biased results that limit statistical analysts’ ability to improve product quality. Additionally, user covariates that are not observed, but influence both user response and network structure, also bias current ATE estimators. This fact has so far been almost completely overlooked in the network A/B testing literature. In this paper, we demonstrate that the network-influential lurking variables can heavily bias popular network clustering-based methods, thereby making them unreliable. To address this problem, we propose a two-stage design and estimation technique called HODOR: Hold-Out Design for Online Randomized experiments. The proposed method not only outperforms existing techniques, it provides reliable estimation even when the underlying network is unknown or uncertain.
JooChul Lee, University of Pennsylvania
  • Abstract: This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investi- gated, and its asymptotic normality is established. We assign optimal subsampling probabilities to data points that minimize the asymptotic mean squared errors of the general estimator and linearly transformed estimators. Since the proposed probabilities depend on unknown parameters, an implementable algorithm is developed. We first approximate the optimal subsampling probabil- ities using a pilot sample. After that, we select a subsample using the approximated subsampling probabilities and compute estimates using the subsample. We evaluate the proposed method in a simulation study and present a real data example using appliance energy data.
Abhyuday Mandal, University of Georgia
  • Abstract: A new type of experiment which targets on finding optimal quantities of a sequence of factors is drawing much attention in medical science, bio-engineering and many other disciplines. Such studies require simultaneous optimization for both quantities and sequence-orders of several components, which is defined as a new type of factors: quantitative-sequence (QS) factors. Due to the large and semi-discrete solution spaces in such experiments, it is non-trivial to efficiently identify the optimal (or near optimal) solutions using only a few experimental trials. To address this challenge, we propose a novel active learning approach, named as QS-learning, to enable effective modeling and efficient optimization for experiments with QS factors. The QS-learning consists of three parts: a novel mapping-based additive Gaussian process (MaGP) model, an efficient global optimization scheme (QS-EGO), and a new class of optimal designs (QS-design) for collecting initial data. Theoretical properties of the proposed method are investigated and techniques on optimization using analytical gradients are developed. The performance of the proposed method is demonstrated via a real drug experiment on lymphoma treatment and several simulation studies.
Parisa Parsamaram, Otto-von-Guericke University
  • Abstract: In the present work we want to determine optimum designs in the situation of ordinal outcomes with individual subject effects. To describe this situation we use a mixed ordinal regression model where, on the individual level, cumulative ordinal response is assumed based on a logit or probit link. To measure the quality of the design, usually the Fisher information matrix is used. However, in the case of mixed ordinal regression models, there is no closed form of the marginal likelihood and, hence, no closed form of the Fisher information. To avoid this problem, we consider the quasi Fisher information related to the concept of quasi-likelihood estimation. For the quasi Fisher information matrix only the first and second order moments of the model equations are needed which is much simpler than the full likelihood. But even these moments are not readily accessible because of the missing closed form of the corresponding integrals. To solve this, we propose two new concurring approximations for the quasi Fisher information which both show a quite similar performance. Based on these approximations, D-optimum designs are calculated for the specific case of a mixed binary regression model. These results can readily be extended to more complicated model situations.
Sergio Pozuelo-Campos, University of Castilla-La Mancha
  • Abstract: Toxicological tests are widely used to study toxicity in aquatic environments. Reproduction is a possible endpoint of this type of experiments and in this case the response variable are counts. There exits literature about the suitable probability distribution should be considered for analysing these data. In the theory of optimal experimental design, the assumption of this probability distribution is essential and when this assumption is not adequate, there may be a loss of efficiency in the design obtained. The main objective of this work is to propose robust designs when there is uncertainty about the probability distribution of the response variable. The results have been applied to toxicological tests based on Ceriodaphnia Dubia and Lemna Minor, in addition to testing the properties of the designs obtained a simulation study is performed.
Torsten Reuter, Otto von Guericke University Magdeburg
  • Abstract: Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider a general linear model for an extraordinarily large amount of observations, but only a few covariates. Subsampling aims at the selection of a given percentage of the existing original data. Under distributional assumptions on the covariates, we derive subsampling designs for various settings of the linear model, which are based on the design criterion of D-optimality and study their theoretical properties. We make use of fundamental concepts of optimal design theory and an equivalence theorem from convex optimization. The thus obtained subsampling designs provide simple rules on whether to accept or to reject a data point and therefore allow for an easy algorithmic implementation.
Mitchell Aaron Schepps, UCLA
  • Abstract: When there are a few candidate designs for implementation in pharmacometrics, a common method to select the design is to adopt a model-based approach and determine the design with the best value of a pre-selected design criterion among the candidate designs. The design criterion is formulated as a scalar function the Fisher information matrix, which can be challenging to evaluate for non-linear mixed effects models. We propose using nature-inspired metaheuristic algorithms to search for efficient model-based designs with user selected number of time points to optimize the design criterion. We discuss use of metaheuristics as a general purpose optimization tool and apply it to design efficient longitudinal studies for bipolar patients with and without a genetic covariate and treated with lithium.
Yao Shi, Arizona State University
  • Abstract: While generalized linear mixed models are useful, optimal design questions for such models are challenging due to complexity of the information matrices. For longitudinal data, after considering three approximations of the information matrices, we propose an approximation based on the penalized quasi-likelihood method. As an illustration, optimal designs are derived for a study on self-reported disability in older women. We also study the robustness of these optimal designs to misspecification of the covariance matrix for rondom effects.
Gautham Sunder, Carlson School of Management
  • Abstract: The prediction performance of Deep Neural Networks (DNNs) is highly sensitive to the choice of hyperparameters. Hyperparameter optimization (HPO), the process of identifying the optimal hyperparameter values that maximize the model performance, is a critical step in training DNNs. Typically, Bayesian Optimization (BO), a class of Response Surface Optimization (RSO) methods for optimizing nonlinear functions, is a commonly adopted strategy for HPO. In this study, we empirically illustrate that the validation loss in HPO problems, in some cases, can be well-approximated by a second-order polynomial function. When this is the case, Classical RSO (C-RSO) methods are demonstrably more efficient in estimating the optimal response when compared with BO, especially under constraints on run size. In this study we propose Compound-RSO, a three-staged batch sequential RSO strategy for optimizing continuous experimental factors. The proposed Compound-RSO strategy estimates the complexity of the response function and appropriately chooses between C-RSO and BO. For estimating the complexity of the unknown response surface, we propose a robust design which is supersaturated for the full polynomial model. Additionally, when the second-order approximation is adequate, we propose Adaptive-RSO, an adaptive experimentation strategy for optimizing the second-order response surface. In our simulation studies on test functions of varying complexity and noise levels, we illustrate that the Compound-RSO strategy is more efficient than BO when the true response function is second-order and performs comparably to BO when the true response function is complex. A case study on HPO of DNNs for quality inspection at a medical device manufacturer is used to illustrate the usefulness of the proposed Compound-RSO strategy in a business application.
Hongzhi Wang, University of Georgia
  • Abstract: Design of experiments plays important roles in all fields of modern science and engineering. Efficient designs are to be used in order to extract maximum information from the data. However, identifying optimal designs are not necessarily easy for complex real life applications, which are becoming increasingly common in practice. Theoretical results are not widely available for such applications and those that are available only exist for special cases. Several optimization algorithms are used to identify optimal designs, while each algorithm usually targets on one design type only. Here we propose a new nature-inspired evolutionary optimization algorithm which works efficiently on several different types of design problems. Simulation studies establish its superiority over different competing algorithms, in terms of both precision and CPU times.
Jing Wang, University of Connecticut
  • Abstract: Subsampling is a practical approach to extract information from massive data. However, when responses are expensive to measure, developing subsampling schemes is challenging. The estimating efficiency of the existing method under this scenario can be improved for using reweighted estimator. We proposed an unweighted estimator to obtain a more efficient estimator. Asymptotical results obtained via martingale techniques and numerical experiments verified the better performance of our method.
Yaqiong Yao, Department of Biostatistics, Columbia University
  • Abstract: A prevailing method to alleviate the computational cost is to perform analysis on a subsample of the full data. Optimal subsampling algorithm utilizes non-uniform subsampling probabilities, derived through minimizing the asymptotic mean squared error of the subsample estimator, to acquire a higher estimation efficiency for a given subsample size. The optimal subsampling probabilities for softmax regression have been studied under the baseline constraint which treats one dimension of the multivariate response differently from other dimensions. Here, we construct optimal subsampling probabilities for summation constraint where all dimensions are handled equally. For parameter estimation, these two model constraints give the same mean responses and only lead to different interpretations of the parameter, so they always produce the same conclusions. For selecting subsamples, however, we show that they lead to different optimal subsampling probabilities and thus produce different results. The summation constraint corresponds to a better subsampling strategy. Furthermore, we derive the asymptotic distribution of the mean squared prediction error, and minimize its asymptotic mean to define the optimal subsampling probabilities that are invariant to model constraints. Simulations and a real data example are provided to show the effectiveness of the proposed optimal subsampling probabilities.

5 <2021-10-21 Thu> Panel Discussion

11AM - 12PM Eastern

Moderator:

Panelists:

6 Individual Sponsors

  • Angela Dean, The Ohio State University
  • Xinwei Deng, Department of Statistics, Virginia Tech
  • Weitao Duan, LinkedIn Corporation
  • Nancy Flournoy, University of Missouri
  • Fritjof Freise, University of Veterinary Medicine Hannover
  • Robert Gramacy, Virginia Tech
  • Jessica Jaynes, California State University Fullerton
  • Roshan Joseph, Georgia Institute of Technology
  • Lulu Kang, Illinois Institute of Technology
  • Adam Lane, Cincinnati Children's Hospital Medical Center
  • Ryan Lekivetz, SAS / JMP
  • William Li, Shanghai Advanced Institute of Finance, Shanghai Jiao Tong University
  • Jesús Lopez-Fidalgo, University of Navarra
  • Dibyen Majumdar, University of Illinois at Chicago
  • Abhyuday Mandal, University of Georgia
  • Caterina May, Università del Piemonte Orientale
  • JP Morgan, Virginia Tech
  • Max Morris, Iowa State University
  • Werner Mueller, JKU Linz
  • Kalliopi Mylona, King's College London
  • Haojun Ouyang, AVROBIO
  • Frederick Kin Hing Phoa, Academia Sinica
  • Rainer Schwabe, Otto-von-Guericke University Magdeburg
  • Jonathan Stallrich, North Carolina State University
  • David Steinberg, Tel Aviv University
  • John Stufken, University of North Carolina Greensboro
  • HaiYing Wang, University of Connecticut
  • Min Yang, University of Illinois at Chicago

7 Participants

  • ADETOLA ADEDAMOLA ADEDIRAN, UNIVERSITY OF SOUTHAMPTON
  • SHROUG ALZAHRANI, University of Southampton
  • Gabriel Olusola Adebayo, University of Ilorin, Ilorin, Nigeria
  • Sasanka Adikari, Old Dominion University
  • Rachael Caelie Aikens, Stanford University
  • Yasmeen S. Akhtar, Birla Institute of Technology and Science, Pilani – Goa Campus, India.
  • Jose Nunez Ares, KU Leuven
  • Oluchukwu C Asogwa, Alex Ekwueme Federal University Ndufu Alike Ikwo
  • Alex Atayev, Student at Georgia Tech
  • Kupolusi Joseph Ayodele , federal University of Technology Akure Nigeria
  • Max Balandat, Facebook
  • Mario Becerra, KU Leuven
  • Julie (Novak) Beckley, Etsy
  • Derek Bingham, Simon Fraser University
  • Alexandre Bohyn, KU Leuven
  • Carlos de la Calle-Arroyo, University of Castilla-La Mancha
  • Henry Chacon, PhD student
  • Ming-Chung Chang, Academia Sinica
  • Yu-Wei Chen, National Tsing Hua University
  • Alvaro Cia, University of Navarre
  • DUHAMEL, Université Grenoble Alpes, INRIA, IFPEN
  • Subhadra Dasgupta, IITB-Monash Research Academy
  • Angela Dean, The Ohio State University
  • Xinwei Deng, Department of Statistics, Virginia Tech
  • Chris Dong, UCLA
  • Nick Doudchenko, Google
  • Weitao Duan, LinkedIn Corporation
  • Xinyuan Duan, Uconn
  • Olga Egorova, King's College London
  • hamel Elhadj, hassiba ben bouali university of chlef algeria
  • Richard Emsley, King's College London, UK
  • Nancy Flournoy, University of Missouri
  • Fritjof Freise, University of Veterinary Medicine Hannover
  • Rosamarie Frieri, University of Bologna
  • Robert Gramacy, Virginia Tech
  • Suman Guha, Assistant Professor, Presidency University
  • Irene García-Camacha Gutiérrez, University of Castilla-La Mancha
  • Mohammed Saif Ismail Hameed, KU Leuven
  • Chao-hui Huang, National Tsing Hua University, Institute of Statistics
  • Chaofan Huang, Georgia Institute of Technology
  • Jiangeng Huang, Genentech, Inc.
  • Jing-Wen Huang, National Tsing Hua University, Taiwan
  • Ying Hung, Rutges
  • Samuel Jackson, Durham University
  • Omri Jan, TAU
  • Jeevan Jankar, University of Georgia
  • Jessica Jaynes, California State University Fullerton
  • Bradley Jones, SAS Institute
  • Roshan Joseph, Georgia Institute of Technology
  • Lulu Kang, Illinois Institute of Technology
  • Allon Korem, Tel Aviv University
  • Vasiliki Koutra, King's College London
  • Nilesh Kumar, Department of Statistics, University of Delhi, Delhi
  • Kelly Van Lancker, Johns Hopkins University, Bloomberg School of Public Health, US
  • Adam Lane, Cincinnati Children's Hospital Medical Center
  • Nicholas Alfredo Larsen, North Carolina State University, Department of Statistics
  • JooChul Lee, University of Pennsylvania
  • Ryan Lekivetz, SAS / JMP
  • William Li, Shanghai Advanced Institute of Finance
  • Dennis K.J. Lin, Purdue University
  • Meizi Liu, University of Chicago
  • Yanxi Liu, University of Illinois at Chicago
  • Jesús Lopez-Fidalgo, University of Navarra
  • Jose Toledo Luna, UCLA
  • Dibyen Majumdar, University of Illinois at Chicago
  • Abhyuday Mandal, University of Georgia
  • Mart Andrew Maravillas, Georgia Institute of Technology
  • Lida Mavrogonatou, University of Cambridge
  • Caterina May, Università del Piemonte Orientale
  • Robert Mee, University of Tennessee
  • Hendriico Merila, University of Southampton
  • Luca Merlo, Sapienza University of Rome
  • Damianos Michaelides, University of Southampton
  • Zefang Min, University of Connecticut
  • JP Morgan, Virginia Tech
  • Max Morris, Iowa State University
  • Werner Mueller, JKU Linz
  • Susan Murphy, Harvard University
  • Kalliopi Mylona, King's College London
  • THEODORA NEARCHOU, University of Southampton
  • Jordania Furtado de Oliveira , Universidade Federal de Pernambuco
  • Winnie Onsongo, University of Ghana
  • Haojun Ouyang, AVROBIO
  • Soyun Park, University at Buffalo
  • Parmod, MDU, Rohtak India
  • Parisa Parsamaram, Otto-von-Guericke University
  • Dipika Patra, West Bengal State University
  • Frederick Kin Hing Phoa, Academia Sinica
  • Matthias Poloczek, Amazon
  • Jean Pouget-Abadie, Google
  • Sergio Pozuelo-Campos, University of Castilla-La Mancha
  • Peter Rankel, University of Maryland
  • David Refaeli, Tel Aviv University - Master in Statistics
  • Joseph Resch, University of California - Los Angeles
  • Torsten Reuter, Otto von Guericke University Magdeburg
  • Emma Rowlinson, The University of Manchester
  • Gokul Satish, Student
  • Mitchell Aaron Schepps, UCLA
  • Rainer Schwabe, Otto-von-Guericke University Magdeburg
  • Daniel Schwartz, University of Chicago
  • Rashmi Sharma, University of Delhi, Delhi
  • Chenlu Shi, University of California, Los Angeles
  • Yao Shi, Arizona State University
  • Rakhi Singh, UNC Greensboro
  • Difan Song, Georgia Institute of Technology
  • Jonathan Stallrich, North Carolina State University
  • David Steinberg, Tel Aviv University
  • Zack Stokes, UCLA/Amazon
  • John Stufken, University of North Carolina Greensboro
  • Cheng-Yu Sun, National Tsing Hua Univerisity
  • Gautham Sunder, Carlson School of Management
  • Mia Tackney, London School of Hygiene and Tropical Medicine
  • Yike Tang, University of Illinois at Chicago
  • Ye Tian, University of California, Los Angeles
  • Natee Ting, Boehringer-Ingelheim Pharmaceuticals Inc.
  • Carlos Alejandro Diaz Tufinio, Tecnologico de Monterrey
  • Diane Uschner, George Washington University
  • Alan Vazquez, UCLA
  • Nha Vo-Thanh, University of Hohenheim
  • Tim Waite, University of Manchester
  • HaiYing Wang, University of Connecticut
  • Hongzhi Wang, University of Georgia
  • Jing Wang, University of Connecticut
  • Lin Wang, George Washington University
  • Ziyang Wang, University of Connecticut
  • Yanran Wei, Virginia Tech
  • Katherine Wellington, UMass Amherst
  • Lauren Rose Wilkes, University Of Georgia
  • Weng Kee Wong, University of California, Los Angeles
  • Nathan Wycoff, Georgetown University
  • Qian Xiao, University of Georgia
  • Hongquan Xu, University of California, Los Angeles
  • LIAO, YI-HUA, Institute of Statistics, National Tsing Hua University, Hsinchu City, Taiwan
  • Yuhao YIN, UCLA
  • Ching-Chi Yang, University of Memphis
  • Min Yang, University of Illinois at Chicago
  • Xin Yang, University of Connecticut
  • Yaqiong Yao, Department of Biostatistics, Columbia University
  • Kade Young, North Carolina State University
  • YI ZHANG, George Washington University
  • Boya Zhang, Lawrence Livermore National Laboratory
  • Xueru Zhang, University of Tennessee
  • Tianjian Zhou, Colorado State University
  • Xiner Zhou, UC Davis Statistics
  • Yachen Zhu, University of California, Irvine
  • zhaihui li, georgia institute of technology
  • chunyan wang, Purdue University
  • Kelly yuan, University of missouri
  • wenlin yuan, uconn
  • fan zhang, Arizona State University
  • muzi zhang, penn state university