?


Wang, HaiYing   王海鹰

Department of Statistics
University of Connecticut
W319 Philip E. Austin Building
215 Glenbrook Rd. U-4120
Storrs, CT 06269-4120
Phone: (860) 486-6142
Email: \(\textrm{W}\)@\(\textrm{H}\).\(\textrm{Y}\),
  where \(\begin{cases} \textrm{W} &= \textrm{haiying.wang}\\ \textrm{H} &= \textrm{uconn}\\ \textrm{Y} &= \textrm{edu} \end{cases}\)

About Me

Research Interests

  • Incomplete data analysis
  • Model selection and model averaging
  • Nonparametric and semi-parametric regression
  • Optimum experimental design
  • Sub-sample methods for big data

Work in progress

  1. Lee, J., Schifano, E., & Wang, H. (2021). Sampling-based gaussian mixture regression for big data.
  2. Wang, F., Wang, H., & Yan, J. (2021). Can we lose weight? a revisit on diagnostic tests for informative weight in regression with survey data.
  3. Wang, H., & Kim, J. K. (2020). Maximum sampled conditional likelihood for informative subsampling. https://arxiv.org/abs/2011.05988. pdf
  4. Wang, J., Wang, H., & Xiong, S. (2020). Unweighted estimation based on optimal sample under measurement constraints.
  5. Zhu, R., Zhang, X., Wang, H., & Liang, H. (2020). A scalable frequentist model averaging method.

Publications

  1. Yu, J., & Wang, H. (2022). Subdata selection algorithm for linear model discrimination. Statistical Papers. https://doi.org/10.1007/s00362-022-01299-8 pdf
  2. Wang, H., Zhang, A., & Wang, C. (2021). Nonuniform negative sampling and log odds correction with rare events data. Proceedings of the 35 Conference on Neural Information Processing Systems (NeurIPS 2021). pdf
  3. Yao, Y., Zou, J., & Wang, H. (2021). Optimal Poisson subsampling for softmax regression. Journal of Systems Science and Complexity, accepted. pdf
  4. Wang, H., Zhang, D., Liang, H., & Ruppert, D. (2021). Iterative likelihood: A unified inference tool. Journal of Computational and Graphical Statistics, 30(4), 920–933. https://doi.org/10.1080/10618600.2021.1904961 pdf
  5. Lee, J., Schifano, E., & Wang, H. (2021). Fast optimal subsampling probability approximation for generalized linear models. Econometrics and Statistics. https://doi.org/10.1016/j.ecosta.2021.02.007 pdf
  6. Wang, H., & Zou, J. (2021). A comparative study on sampling with replacement vs poisson sampling in optimal subsampling. In A. Banerjee & K. Fukumizu (Eds.), Proceedings of the 24th international conference on artificial intelligence and statistics (Vol. 130, pp. 289–297). PMLR. http://proceedings.mlr.press/v130/wang21a.html pdf
  7. Zuo, L., Zhang, H., Wang, H., & Sun, L. (2021). Optimal subsample selection for massive logistic regression with distributed data. Computational Statistics, 36(4), 2535–2562. https://doi.org/10.1007/s00180-021-01089-0
  8. Bar, H., & Wang, H. (2021). Reproducible science with LaTeX. Journal of Data Science., 19(1), 111–125. pdf; code
  9. Yao, Y., & Wang, H. (2021). A review on optimal subsampling methods for massive datasets. Journal of Data Science, 19(1), 151–172. pdf
  10. Zuo, L., Zhang, H., Wang, H., & Liu, L. (2021). Sampling-based estimation for massive survival data with additive hazards model. Statistics in Medicine, 40(2), 441–450. pdf
  11. Zhang, H., & Wang, H. (2021). Distributed subdata selection for big data via sampling-based approach. Computational Statistics & Data Analysis, 153, 107072. https://doi.org/10.1016/j.csda.2020.107072 pdf
  12. Pronzato, L., & Wang, H. (2021). Sequential online subsampling for thinning experimental designs. Journal of Statistical Planning and Inference, 212, 169–193. https://doi.org/10.1016/j.jspi.2020.08.001 pdf
  13. Yao, Y., & Wang, H. (2021). A selective review on statistical techniques for big data. In Y. Zhao & (. D.-G. Chen (Eds.), Modern statistical methods for health research (pp. 223–245). Springer International Publishing. https://doi.org/10.1007/978-3-030-72437-5_11 pdf
  14. Wang, H. (2020). Logistic regression for massive data with rare events. In H. D. III & A. Singh (Eds.), Proceedings of the 37th international conference on machine learning (Vol. 119, pp. 9829–9836). PMLR. http://proceedings.mlr.press/v119/wang20a.html pdf; code
  15. Yu, J., Wang, H., Ai, M., & Zhang, H. (2022). Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. Journal of the American Statistical Association, 117(537), 265–276. https://doi.org/10.1080/01621459.2020.1773832 pdf
  16. Cheng, Q., Wang, H., & Yang, M. (2020). Information-based optimal subdata selection for big data logistic regression. Journal of Statistical Planning and Inference, 209, 112–122. https://doi.org/10.1016/j.jspi.2020.03.004 pdf
  17. Lee, J., Wang, H., & Schifano, E. D. (2020). Online updating method to correct for measurement error in big data streams. Computational Statistics & Data Analysis, 149, 106976. pdf
  18. Hu, G., & Wang, H. (2021). Most likely optimal subsampled markov chain monte carlo. Journal of Systems Science and Complexity, 34(3), 1121–1134. pdf
  19. Wang, H., & Ma, Y. (2021). Optimal subsampling for quantile regression in big data. Biometrika, 108(1), 99–112. https://doi.org/10.1093/biomet/asaa043 pdf; code
  20. Zhou, Y., Qiu, L., Wang, H., & Chen, X. (2020). Induction of activity synchronization among primed hippocampal neurons out of random dynamics is key for trace memory formation and retrieval. The Faseb Journal, 34(3), 3658–3676. https://doi.org/10.1096/fj.201902274R
  21. Wang, H. (2019). More efficient estimation for logistic regression with optimal subsamples. Journal of Machine Learning Research, 20(132), 1–59. pdf; code
  22. Xue, Y., Wang, H., Yan, J., & Schifano, E. D. (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. https://doi.org/10.1111/biom.13137 pdf
  23. Wang, H. (2019). Divide-and-conquer information-based optimal subdata selection algorithm. Journal of Statistical Theory and Practice, 13(3), 46. https://doi.org/10.1007/s42519-019-0048-5 pdf; code
  24. Ai, M., Yu, J., Zhang, H., & Wang, H. (2021). Optimal subsampling algorithms for big data regressions. Statistica Sinica, 31(2), 749–772. https://doi.org/10.5705/ss.202018.0439 pdf
  25. Zhou, Y., Qiu, L., Sterpka, A., Wang, H., Chu, F., & Chen, X. (2019). Comparative phosphoproteomic profiling of type III adenylyl cyclase knockout and control, male, and female mice. Frontiers in Cellular Neuroscience, 13, 34.
  26. Yao, Y., & Wang, H. (2019). Optimal subsampling for softmax regression. Statistical Papers, 60(2), 235–249. pdf
  27. Wang, H., Yang, M., & Stufken, J. (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. pdf; R Package; R code
  28. Wang, H., Zhu, R., & Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829–844. pdf; R Package; R code
  29. Stang, S., Wang, H., Gardner, K. H., & Mo, W. (2018). Influences of water quality and climate on the water-energy nexus: A spatial comparison of two water systems. Journal of Environmental Management, 218, 613–621.
  30. Zhang, X., Wang, H., Ma, Y., & Carroll, R. J. (2017). Linear model selection when covariates contain errors. Journal of the American Statistical Association, 112(520), 1553–1561. pdf; Supplementary
  31. Li, Y., He, X., Wang, H., & Sun, J. (2016). Joint analysis of longitudinal data and informative observation times with time-dependent random effects. In New developments in statistical modeling, inference and application (pp. 37–51). Springer. pdf
  32. Lane, A., Wang, H., & Flournoy, N. (2016). Conditional inference in two-stage adaptive experiments via the bootstrap. In mODa 11-advances in model-oriented design and analysis (pp. 173–181). Springer. pdf
  33. Mo, W., Wang, H., & Jacobs, J. M. (2016). Understanding the influence of climate change on the embodied energy of water supply. Water Research, 95, 220–229.
  34. Li, Y., He, X., Wang, H., & Sun, J. (2016). Regression analysis of longitudinal data with correlated censoring and observation times. Lifetime Data Analysis, 22(3), 343–362. pdf
  35. Wang, H., Chen, X., & Flournoy, N. (2016). The focused information criterion for varying-coefficient partially linear measurement error models. Statistical Papers, 57(1), 99–113. pdf
  36. Li, Y., He, X., Wang, H., Zhang, B., & Sun, J. (2015). Semiparametric regression of multivariate panel count data with informative observation times. Journal of Multivariate Analysis, 140, 209–219. pdf
  37. Wang, H., Schaeben, H., & Keidel, F. (2015). Optimized subsampling for logistic regression with imbalanced large datasets. Proceeding of the 17th Annual Conference of the International Association for Mathematical Geosciences, 1113–1119.
  38. Wang, H., & Flournoy, N. (2015). On the consistency of the maximum likelihood estimator for the three parameter lognormal distribution. Statistics & Probability Letters, 105, 57–64. pdf
  39. Wang, H., Li, Y., & Sun, J. (2015). Focused and model average estimation for regression analysis of panel count data. Scandinavian Journal of Statistics, 42(3), 732–745. pdf
  40. Wang, H., Flournoy, N., & Kpamegan, E. (2014). A new bounded log-linear regression model. Metrika, 77(5), 695–720. pdf
  41. Wang, H., & Zhou, S. Z. (2013). Interval estimation by frequentist model averaging. Communications in Statistics-Theory and Methods, 42(23), 4342–4356. pdf
  42. Wang, H., Zou, G., & Wan, A. T. (2013). Adaptive lasso for varying-coefficient partially linear measurement error models. Journal of Statistical Planning and Inference, 143(1), 40–54. pdf
  43. Wang, H., Pepelyshev, A., & Flournoy, N. (2013). Optimal design for the bounded log-linear regression model. In mODa 10–advances in model-oriented design and analysis (pp. 237–245). Springer. pdf
  44. Wang, H., Zou, G., Wan, A. T., & others. (2012). Model averaging for varying-coefficient partially linear measurement error models. Electronic Journal of Statistics, 6, 1017–1039. pdf
  45. Wang, H., & Sun, D. (2012). Objective bayesian analysis for a truncated model. Statistics & Probability Letters, 82(12), 2125–2135. pdf
  46. Wang, H., & Zou, G. (2012). Frequentist model averaging estimation for linear errors-in-variables models (in chinese). Journal of Systems Science and Mathematical Science, 32(2), 1–14. pdf
  47. Kozak, M., & Wang, H. (2010). On stochastic optimization in sample allocation among strata. Metron, 68(1), 95–103. pdf
  48. Wang, H., Zhang, X., & Zou, G. (2009). Frequentist model averaging estimation: a review. Journal of Systems Science and Complexity, 22(4), 732–748. pdf
  49. Feng, S., Ding, W., Wang, H., Yu, Z., Chen, Y., Zhang, Y., & Xiao, H. (2008). Sampling procedures for inspection by attributes-part 3: Skip-lot sampling procedures (in chinese). Chinese National Standard, Gb/T2828.3-2008.

Teaching

  • At the University of Missouri
    • Statistics 1200 - Introductory Statistical Reasoning (3cr.), Fall 2010, Spring 2011, Fall 2011
    • Statistics 2500 - Introductory to probability and statistics I (3cr.), Spring 2012
    • Statistics 3500 - Introductory to probability and statistics II (3cr.), Fall 2012, Spring 2013
  • At the University of New Hampshire
    • Math 539 - Introduction to Statistical Analysis (4cr.), Fall 2014
    • Math 644 - Statistics for Engineers and Scientists (4cr.), Fall 2013, Spring 2014, Fall 2014
    • Math 736/836 - Advanced Statistical Methods for Research (4cr.), Spring 2014, Spring 2015, Spring 2016
    • Math 739/839 - Applied Regression Analysis (4cr.), Fall 2016
    • Math 755/855 - Probability with Applications (4cr.), Fall 2015, Fall 2016
    • Math 756/856 - Principles of Statistical Inference (4cr.), Spring 2016, Spring 2017
    • Math 969 - Topics in Probability and Statistics (3cr.), Spring 2017
  • At the University of Connecticut
    • STAT 3115Q - Analysis of Experiments (3cr.), Spring, 2018
    • STAT 5125 - Computing for Statistical Data Science (3cr. in julia), Fall 2021, Spring 2022
    • BIST/STAT 5535: Nonparametric Methods (3cr. using julia) Fall 2018, 2020, 2021
    • BIST/STAT 5505 - Applied Statistics I (3cr.), Fall 2017, 2018, 2019
    • BIST/STAT 5605 - Applied Statistics II (3cr.), Spring 2019, 2020
    • BIST/STAT 6494: Statistical Inference for Big Data (3cr.) Spring 2018

Membership