The annual American Statistical Association DataFest held at Duke each spring gives hundreds of students a glimpse of statistics and data science in practice, bringing real challenges to life in a collegial, competitive setting.
Due to COVID, this year’s virtual version took place over several weeks – rather than a couple days – and focused on emerging data sets related to the pandemic and its effects. Two of those projects caught the attention of another organization whose members were impressed by what Statistical Science students created.
A Duke undergraduate who examined the effects of pandemic-related YouTube videos and a team of students who examined the effect of stay-at-home orders on motor vehicle accidents received honorable mentions from the global Undergraduate Class Project Competition in its Intermediate Statistics category.
“(Competitions) provide an opportunity for students to apply what they’re learning in the classroom to analyze interesting and relevant questions using real-world data that is often complex and messy,” said Maria Tackett, an assistant professor of the practice in Statistical Science who leads the DataFest competition. “Most importantly, students can enjoy the fun and creative aspects of data science without the pressure of grades.”
Ruixin (Edna) Zhang – mentored by new Assistant Professor of the Practice Yue Jiang – drew an honorable mention for her project, COVID-19 Pandemic-Related YouTube Videos Effects on Mental Health of Viewers .
“I love to contemplate statistical principles and apply them in real-life settings,” said Zhang. “When I typed the last word on my document, the sense of accomplishment went all over me. I was so delighted to find the answer that YouTube videos indeed put more mental toll about the pandemic on viewers, and also proud of myself for completing a decent analysis independently.”
A team consisting of Duke undergraduates Joe Choo, Glen Morgenstern, Carrie Wang and Zhixue (Mary) Wang also won an honorable mention for exploring the impact of government stay-at-home orders on motor vehicle accidents in New York City: Socioeconomic Disparities in Response to COVID-19 Lockdown Orders: Analyzing New York City Motor Vehicle Collisions. Their team was mentored by Tackett.
“As they gained each new insight from the data, they kept exploring to better understand the results and nuances in the data,” she said. “The depth of their exploration and their understanding of the nuances and complexities in the data made their project one of the top in the competition.”
Team member Morgenstern said he first became interested in statistics by reading box scores of St. Louis Cardinals games in the newspaper every morning before school.
“Eventually I learned that I could apply statistics to so many more problems than just figuring out my favorite statistical baseball player,” he said, “and I got hooked on data science once I started my coursework at Duke. I haven’t stopped reading those box scores, though.”
For other students, the appeal is in problem solving, and what insights might be discovered just under the surface of a data set.
“I've always loved the lightbulb moment when you figure something out,” said Mary Wang, “whether that's for a puzzle, a mystery novel, or a math problem. Data science is fun for me in that way because it's that same problem-solving but at scale.”
“I think my most important takeaway (from this project) is to always dig a little deeper,” Mary Wang continued. “Data that looks normal at the surface could be hiding interesting trends underneath. For example, it was straightforward to see that mandatory stay-at-home guidelines meant fewer cars which meant fewer collisions in general. However, by separating that data out by the change in boroughs over time, we found a very different picture. This new data revealed socioeconomic factors, leading us to dig a little deeper on who still traveled, why, how, and their impact.”
Choo said he became invested in data science through early co-curricular experiences at Duke – specifically through involvement in the Data+ program.
“There is so much sifting and playing around with the data before any real findings can take place,” he noted. “I really enjoyed using Tableau for quick visualizations that can provide a lens into some interesting niches or problems our team could pursue! It was also incredibly rewarding to be working with a team and produce a cohesive project together.”
“This project taught me that data science is for everyone,” Morgenstern said. “Our team members' backgrounds were in economics, public policy, computer science and English. These different perspectives helped make our analysis more well-rounded. I’m incredibly proud of our team for achieving this recognition, especially since all our work was done virtually.”