Wang, HaiYing   王海鹰

Department of Statistics
University of Connecticut
W319 Philip E. Austin Building
215 Glenbrook Rd. U-4120
Storrs, CT 06269-4120
Phone: (860) 486-6142
Email: \(\textrm{W}\)@\(\textrm{H}\).\(\textrm{Y}\),
  where \(\begin{cases} \textrm{W} &= \textrm{haiying.wang}\\ \textrm{H} &= \textrm{uconn}\\ \textrm{Y} &= \textrm{edu} \end{cases}\)

About Me

Research Interests

  • Incomplete data analysis
  • Model selection and model averaging
  • Nonparametric and semi-parametric regression
  • Optimum experimental design
  • Sub-sample methods for big data


  1. Zhang, H., Zuo, L., Wang, H., & Sun, L. (2023). Approximating partial likelihood estimators via optimal subsampling. Journal of Computational and Graphical Statistics, , accepted. pdf
  2. Wu, C. O., Chen, M.-H., Xie, M.-g., Wang, H., & Wu, J. (2023). Inaugural editorial. can we achieve our mission: Fast, accessible, cutting-edge, and top-quality? The New England Journal of Statistics in Data Science, 1(1), 1–3. https://doi.org/10.51387/23-NEJSDS11EDI pdf
  3. Yu, J., Liu, J., & Wang, H. (2023). Information-based optimal subdata selection for non-linear models. Statistical Papers, , accepted. https://doi.org/10.1007/s00362-023-01430-3 pdf
  4. Wang, Z., Wang, H., & Ravishanker, N. (2023). Subsampling in longitudinal models. Methodology and Computing in Applied Probability, 25(1), 35. https://doi.org/10.1007/s11009-023-10015-4 pdf
  5. Kim, J. K., & Wang, H. (2023). A note on weight smoothing in survey sampling. Survey Methodology, Accepted. pdf
  6. Yao, Y., Zou, J., & Wang, H. (2023). Model constraints independent optimal subsampling probabilities for softmax regression. Journal of Statistical Planning and Inference, 225, 188–201. https://doi.org/10.1016/j.jspi.2022.12.004 pdf
  7. Wang, H. (2022). A note on centering in subsample selection for linear regression. Stat, 11(1), e525. https://doi.org/10.1002/sta4.525 pdf
  8. Wang, H., & Kim, J. K. (2022). Maximum sampled conditional likelihood for informative subsampling. Journal of Machine Learning Research, 23(332), 1–50. http://jmlr.org/papers/v23/21-0506.html pdf
  9. Yang, Z., Wang, H., & Yan, J. (2022). Optimal subsampling for parametric accelerated failure time models with massive survival data. Statistics in Medicine, 41(27), 5421–5431. https://doi.org/10.1002/sim.9576 pdf
  10. Zhu, R., Wang, H., Zhang, X., & Liang, H. (2022). A scalable frequentist model averaging method. Journal of Business & Economic Statistics, 0(ja), 1–25. https://doi.org/10.1080/07350015.2022.2116442 pdf
  11. Wang, J., Wang, H., & Xiong, S. (2022). Unweighted estimation based on optimal sample under measurement constraints. Canadian Journal of Statistics, n/a(n/a). https://doi.org/10.1002/cjs.11753 pdf
  12. Lee, J., Schifano, E., & Wang, H. (2023). Sampling-based gaussian mixture regression for big data. J. Data Sci., 21(1), 158–172. https://doi.org/10.6339/22-JDS1057 pdf
  13. Jakositz, S., Ghasemi, R., McGreavy, B., Wang, H., Greenwood, S., & Mo, W. (2022). Tap water lead monitoring through citizen science: The influence of socioeconomics and participation on environmental literacy, behavior, and communication. Journal of Environmental Engineering, 148(10), 04022060. https://doi.org/10.1061/(ASCE)EE.1943-7870.0002055
  14. Wang, F., Wang, H., & Yan, J. (2023). Diagnostic tests for the necessity of weight in regression with survey data. International Statistical Review, 91(1), 55–71. https://doi.org/10.1111/insr.12509 pdf
  15. Wang, J., Zou, J., & Wang, H. (2022). Sampling with replacement vs poisson sampling: A comparative study in optimal subsampling. Ieee Transactions on Information Theory, 68(10), 6605–6630. https://doi.org/10.1109/TIT.2022.3176955 pdf code
  16. Yu, J., & Wang, H. (2022). Subdata selection algorithm for linear model discrimination. Statistical Papers, 63(6), 1883–1906. https://doi.org/10.1007/s00362-022-01299-8 pdf
  17. Wang, H., Zhang, A., & Wang, C. (2021). Nonuniform negative sampling and log odds correction with rare events data. Proceedings of the 35 Conference on Neural Information Processing Systems (NeurIPS 2021). pdf code
  18. Yao, Y., Zou, J., & Wang, H. (2021). Optimal Poisson subsampling for softmax regression. Journal of Systems Science and Complexity, accepted. pdf
  19. Wang, H., Zhang, D., Liang, H., & Ruppert, D. (2021). Iterative likelihood: A unified inference tool. Journal of Computational and Graphical Statistics, 30(4), 920–933. https://doi.org/10.1080/10618600.2021.1904961 pdf
  20. Lee, J., Schifano, E., & Wang, H. (2021). Fast optimal subsampling probability approximation for generalized linear models. Econometrics and Statistics. https://doi.org/10.1016/j.ecosta.2021.02.007 pdf
  21. Wang, H., & Zou, J. (2021). A comparative study on sampling with replacement vs poisson sampling in optimal subsampling. In A. Banerjee & K. Fukumizu (Eds.), Proceedings of the 24th international conference on artificial intelligence and statistics (Vol. 130, pp. 289–297). PMLR. http://proceedings.mlr.press/v130/wang21a.html pdf
  22. Zuo, L., Zhang, H., Wang, H., & Sun, L. (2021). Optimal subsample selection for massive logistic regression with distributed data. Computational Statistics, 36(4), 2535–2562. https://doi.org/10.1007/s00180-021-01089-0 pdf
  23. Bar, H., & Wang, H. (2021). Reproducible science with LaTeX. Journal of Data Science., 19(1), 111–125. pdf; code
  24. Yao, Y., & Wang, H. (2021). A review on optimal subsampling methods for massive datasets. Journal of Data Science, 19(1), 151–172. pdf
  25. Zuo, L., Zhang, H., Wang, H., & Liu, L. (2021). Sampling-based estimation for massive survival data with additive hazards model. Statistics in Medicine, 40(2), 441–450. pdf
  26. Zhang, H., & Wang, H. (2021). Distributed subdata selection for big data via sampling-based approach. Computational Statistics & Data Analysis, 153, 107072. https://doi.org/10.1016/j.csda.2020.107072 pdf
  27. Pronzato, L., & Wang, H. (2021). Sequential online subsampling for thinning experimental designs. Journal of Statistical Planning and Inference, 212, 169–193. https://doi.org/10.1016/j.jspi.2020.08.001 pdf
  28. Yao, Y., & Wang, H. (2021). A selective review on statistical techniques for big data. In Y. Zhao & (. D.-G. Chen (Eds.), Modern statistical methods for health research (pp. 223–245). Springer International Publishing. https://doi.org/10.1007/978-3-030-72437-5_11 pdf
  29. Wang, H. (2020). Logistic regression for massive data with rare events. In H. D. III & A. Singh (Eds.), Proceedings of the 37th international conference on machine learning (Vol. 119, pp. 9829–9836). PMLR. http://proceedings.mlr.press/v119/wang20a.html pdf; code
  30. Yu, J., Wang, H., Ai, M., & Zhang, H. (2022). Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. Journal of the American Statistical Association, 117(537), 265–276. https://doi.org/10.1080/01621459.2020.1773832 pdf
  31. Cheng, Q., Wang, H., & Yang, M. (2020). Information-based optimal subdata selection for big data logistic regression. Journal of Statistical Planning and Inference, 209, 112–122. https://doi.org/10.1016/j.jspi.2020.03.004 pdf
  32. Lee, J., Wang, H., & Schifano, E. D. (2020). Online updating method to correct for measurement error in big data streams. Computational Statistics & Data Analysis, 149, 106976. pdf
  33. Hu, G., & Wang, H. (2021). Most likely optimal subsampled markov chain monte carlo. Journal of Systems Science and Complexity, 34(3), 1121–1134. pdf
  34. Wang, H., & Ma, Y. (2021). Optimal subsampling for quantile regression in big data. Biometrika, 108(1), 99–112. https://doi.org/10.1093/biomet/asaa043 pdf; code
  35. Zhou, Y., Qiu, L., Wang, H., & Chen, X. (2020). Induction of activity synchronization among primed hippocampal neurons out of random dynamics is key for trace memory formation and retrieval. The Faseb Journal, 34(3), 3658–3676. https://doi.org/10.1096/fj.201902274R
  36. Wang, H. (2019). More efficient estimation for logistic regression with optimal subsamples. Journal of Machine Learning Research, 20(132), 1–59. pdf; code
  37. Xue, Y., Wang, H., Yan, J., & Schifano, E. D. (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. https://doi.org/10.1111/biom.13137 pdf
  38. Wang, H. (2019). Divide-and-conquer information-based optimal subdata selection algorithm. Journal of Statistical Theory and Practice, 13(3), 46. https://doi.org/10.1007/s42519-019-0048-5 pdf; code
  39. Ai, M., Yu, J., Zhang, H., & Wang, H. (2021). Optimal subsampling algorithms for big data regressions. Statistica Sinica, 31(2), 749–772. https://doi.org/10.5705/ss.202018.0439 pdf
  40. Zhou, Y., Qiu, L., Sterpka, A., Wang, H., Chu, F., & Chen, X. (2019). Comparative phosphoproteomic profiling of type III adenylyl cyclase knockout and control, male, and female mice. Frontiers in Cellular Neuroscience, 13, 34.
  41. Yao, Y., & Wang, H. (2019). Optimal subsampling for softmax regression. Statistical Papers, 60(2), 235–249. pdf
  42. Wang, H., Yang, M., & Stufken, J. (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. pdf; R Package; R code
  43. Wang, H., Zhu, R., & Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829–844. pdf; R Package; R code
  44. Stang, S., Wang, H., Gardner, K. H., & Mo, W. (2018). Influences of water quality and climate on the water-energy nexus: A spatial comparison of two water systems. Journal of Environmental Management, 218, 613–621.
  45. Zhang, X., Wang, H., Ma, Y., & Carroll, R. J. (2017). Linear model selection when covariates contain errors. Journal of the American Statistical Association, 112(520), 1553–1561. pdf; Supplementary
  46. Li, Y., He, X., Wang, H., & Sun, J. (2016). Joint analysis of longitudinal data and informative observation times with time-dependent random effects. In New developments in statistical modeling, inference and application (pp. 37–51). Springer. pdf
  47. Lane, A., Wang, H., & Flournoy, N. (2016). Conditional inference in two-stage adaptive experiments via the bootstrap. In mODa 11-advances in model-oriented design and analysis (pp. 173–181). Springer. pdf
  48. Mo, W., Wang, H., & Jacobs, J. M. (2016). Understanding the influence of climate change on the embodied energy of water supply. Water Research, 95, 220–229.
  49. Li, Y., He, X., Wang, H., & Sun, J. (2016). Regression analysis of longitudinal data with correlated censoring and observation times. Lifetime Data Analysis, 22(3), 343–362. pdf
  50. Wang, H., Chen, X., & Flournoy, N. (2016). The focused information criterion for varying-coefficient partially linear measurement error models. Statistical Papers, 57(1), 99–113. pdf
  51. Li, Y., He, X., Wang, H., Zhang, B., & Sun, J. (2015). Semiparametric regression of multivariate panel count data with informative observation times. Journal of Multivariate Analysis, 140, 209–219. pdf
  52. Wang, H., Schaeben, H., & Keidel, F. (2015). Optimized subsampling for logistic regression with imbalanced large datasets. Proceeding of the 17th Annual Conference of the International Association for Mathematical Geosciences, 1113–1119.
  53. Wang, H., & Flournoy, N. (2015). On the consistency of the maximum likelihood estimator for the three parameter lognormal distribution. Statistics & Probability Letters, 105, 57–64. pdf
  54. Wang, H., Li, Y., & Sun, J. (2015). Focused and model average estimation for regression analysis of panel count data. Scandinavian Journal of Statistics, 42(3), 732–745. pdf
  55. Wang, H., Flournoy, N., & Kpamegan, E. (2014). A new bounded log-linear regression model. Metrika, 77(5), 695–720. pdf
  56. Wang, H., & Zhou, S. Z. (2013). Interval estimation by frequentist model averaging. Communications in Statistics-Theory and Methods, 42(23), 4342–4356. pdf
  57. Wang, H., Zou, G., & Wan, A. T. (2013). Adaptive lasso for varying-coefficient partially linear measurement error models. Journal of Statistical Planning and Inference, 143(1), 40–54. pdf
  58. Wang, H., Pepelyshev, A., & Flournoy, N. (2013). Optimal design for the bounded log-linear regression model. In mODa 10–advances in model-oriented design and analysis (pp. 237–245). Springer. pdf
  59. Wang, H., Zou, G., Wan, A. T., & others. (2012). Model averaging for varying-coefficient partially linear measurement error models. Electronic Journal of Statistics, 6, 1017–1039. pdf
  60. Wang, H., & Sun, D. (2012). Objective bayesian analysis for a truncated model. Statistics & Probability Letters, 82(12), 2125–2135. pdf
  61. Wang, H., & Zou, G. (2012). Frequentist model averaging estimation for linear errors-in-variables models (in chinese). Journal of Systems Science and Mathematical Science, 32(2), 1–14. pdf
  62. Kozak, M., & Wang, H. (2010). On stochastic optimization in sample allocation among strata. Metron, 68(1), 95–103. pdf
  63. Wang, H., Zhang, X., & Zou, G. (2009). Frequentist model averaging estimation: a review. Journal of Systems Science and Complexity, 22(4), 732–748. pdf
  64. Feng, S., Ding, W., Wang, H., Yu, Z., Chen, Y., Zhang, Y., & Xiao, H. (2008). Sampling procedures for inspection by attributes-part 3: Skip-lot sampling procedures (in chinese). Chinese National Standard, Gb/T2828.3-2008.


  • At the University of Missouri
    • Statistics 1200 - Introductory Statistical Reasoning (3cr.), Fall 2010, Spring 2011, Fall 2011
    • Statistics 2500 - Introductory to probability and statistics I (3cr.), Spring 2012
    • Statistics 3500 - Introductory to probability and statistics II (3cr.), Fall 2012, Spring 2013
  • At the University of New Hampshire
    • Math 539 - Introduction to Statistical Analysis (4cr.), Fall 2014
    • Math 644 - Statistics for Engineers and Scientists (4cr.), Fall 2013, Spring 2014, Fall 2014
    • Math 736/836 - Advanced Statistical Methods for Research (4cr.), Spring 2014, Spring 2015, Spring 2016
    • Math 739/839 - Applied Regression Analysis (4cr.), Fall 2016
    • Math 755/855 - Probability with Applications (4cr.), Fall 2015, Fall 2016
    • Math 756/856 - Principles of Statistical Inference (4cr.), Spring 2016, Spring 2017
    • Math 969 - Topics in Probability and Statistics (3cr.), Spring 2017
  • At the University of Connecticut
    • STAT 3115Q - Analysis of Experiments (3cr.), Spring, 2018
    • STAT 5125 - Computing for Statistical Data Science (3cr. in julia), Fall 2021, Spring 2022
    • BIST/STAT 5535: Nonparametric Methods (3cr. using julia) Fall 2018, 2020, 2021
    • BIST/STAT 5505 - Applied Statistics I (3cr.), Fall 2017, 2018, 2019
    • BIST/STAT 5605 - Applied Statistics II (3cr.), Spring 2019, 2020
    • BIST/STAT 6494: Statistical Inference for Big Data (3cr.) Spring 2018

Professional Service