About

Research

Talks

Teaching

 Siyu Zhou      

 

A person wearing glasses and a plaid shirt

Description automatically generated

Postdoctoral Fellow

Department of Biostatistics and Bioinformatics

Rollins School of Public Health

Emory University

 

356 Grace Crum Rollins Building

1518 Clifton Rd NE

Atlanta, GA 30033

siyu.zhou@emory.edu

[Google Scholar]

 

A blue and gold logo

Description automatically generated

A blue and yellow logo

Description automatically generated

A blue shield with crossed torches

Description automatically generated

 

 

 

 

 

About

I am a Postdoctoral Fellow at the Department of Biostatistics and Bioinformatics, Emory University under supervision of Dr. Limin Peng. Previously, I obtained my Ph.D. in Statistics from the University of Pittsburgh under the supervision of Dr. Lucas Mentch. I earned my M.Phil. in Mathematics from The Hong Kong University of Science and Technology, advised by Dr. Man-Yu Wong and a B.S. in Mathematics from the same institution.

At its core, my work involves investigating and discovering the statistical and mathematical properties of machine learning algorithms and leveraging such insights to further improve model performance and to develop highly accurate, robust methodologies to handle diverse types of data emerging from new technologies and scientific needs. Moreover, I seek to develop feature importance measures to aid in interpretability.

 

Research

o   Siyu Zhou and Limin Peng (2024+), Approximate Global Censored Quantile Random Forests

In preparation

o   Siyu Zhou and Limin Peng (2024+), Global Quantile Learning with Censored Data Based on Random Forests

Submitted

o   Meredith Wallace, Lucas Mentch, Bradley Wheeler, Amanda Tapia, Marc Richards, Siyu Zhou, Lixia Yi, Susan Redline, Daniel Buysse (2023), Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction

BMC medical research methodology 23 (1), 144

o   Siyu Zhou and Lucas Mentch (2023), Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Statistical Analysis and Data Mining: The ASA Data Science Journal 16 (1), 45-64

o   Lucas Mentch and Siyu Zhou (2022), Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

Journal of Machine Learning Research 23 (224), 1-32, 2022

o   Giles Hooker, Lucas Mentch and Siyu Zhou (2021), Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance

Statistics and Computing 31, 82 (2021)

o   Lucas Mentch and Siyu Zhou (2020), Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Journal of Machine Learning Research, 21(171), 1-36, 2020

 

Talks

o   Why Random Forests Work and Why That's a Problem

Conference on Statistical Learning and Data Science

Newport Beach, CA, Nov 5 - 8, 2024

o   Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Joint Statistical Meeting

Portland, OR, Aug 3 - 8, 2024

o   Regularization on Ensembles of Tree and Variable Importance

ICSA Applied Statistics Symposium

University of Florida, Gainesville, FL, Jun 19 - 22, 2022

o   Random Forests: Why They Work and Why That's a Problem

Joint Statistical Meeting

Virtual, Aug 3 - 8, 2021

o   Augmented Bagging as an Alternative to Random Forests and Implications on Variable Importance

Symposium on Data Science and Statistics

Virtual, Jun 2 - 4, 2021

o   Augmented Bagging as an Alternative to Random Forests

Joint Statistical Meeting

Virtual, Aug 2 - 6, 2020

o   Explaining the Practical Success of Random Forests

Symposium on Data Science and Statistics

Virtual, Jun 3 - 5

 

Teaching

o   University of Pittsburgh

o   Stat 800   -   Statistics in the Modern World

Instructor (2021 SUM)

o   Stat 1000 -   Applied StatisticalMethods

Instructor (2020 SUM), Teaching Assistant (2018 SP, 2019 SUM, 2019 FA, 2021 FA)

o   Stat 1100 -   Statistics and Probability for Business Management

Teaching Assistant (2018 FA, 2020 SP)

o   Stat 1221 -   Applied Regression

Teaching Assistant (2019 SP)

o   Stat 1361 -   Statistical Learning and Data Science

Teaching Assistant (2022 SP)

o   Stat 2270 -   Data Mining

Teaching Assistant (2019 FA, 2021 FA)

o   Stat 2360 -   Statistical Learning and Data Science

Teaching Assistant (2022 SP)

o   The Hong Kong University of Science and Technology

o   Math 3423 - Statistical Inference

Teaching Assistant (2015 FA, 2016 FA)

o   Math 3424 - Regression Analysis

Teaching Assistant (2015 SP, 2016 SP)