Statistics in the Workplace: Lessons from Harvard Alumni

Harvard GUSH
6 min readAug 13, 2020

By: Asher Noel and Alicia Wu

Graphic by Alicia Wu and Angie Shin

On May 6, the Group for Undergraduates in Statistics at Harvard College (GUSH) spoke with 6 alumni about their experiences since having graduated from Harvard Statistics. Below are their thoughts.

We would like to thank our amazing panelists for their participation: Kathy Evans AB ’08 PhD ’17 now at the Toronto Raptors, David Robinson AB ’10 Princeton PhD ’15 now at Heap, Jason Rosenfeld AB ’12 now at NYU, Yannik Pitcan AB ’11 Berkeley PhD ’20 now at Y9 Solutions, Raj Bhuptani AB/AM ’13 now at Two Sigma, and Michele Zemplenyi AB ’13 PhD ’20 now at Bloomberg Harvard City Leadership Initiative. And a special thanks to Joe Blitzstein, GUSH’s faculty advisor and Professor of the Practice in Statistics, for connecting GUSH with the panelists.

How does Statistics impact your work?

David: As a Principal Data Scientist, I work a lot with web analytics. I ask questions like, “How do people get into websites? What makes them likely to convert to being a user? How does usage change over time?” I then develop features for automated insights, testing many hypotheses to identify features for every client. Multiple hypothesis testing means that I cannot just compute a p-value for every test. As always, I need to look out for confounding factors.

Kathy: When I worked in public health, I found that people misuse data often. I asked myself, “How do I communicate ideas to people who don’t understand statistics?” Talking about multiple hypothesis testing with physicians is one thing. Talking to basketball coaches is far different. I focused on how I can visualize data, and how I can simplify ideas to focus on the core problems. A lot of audiences are not going to care about a p-value. They want a range of interesting values. Tools like ggplot did not exist when I was in undergrad, but libraries like those are great for expressing data. Now, I do a lot of exploratory data analysis (EDA), where I look at distributions, skews, and other metrics, et cetera.

Jason: All the problems I work on fall into one of two buckets: 1) “Moneyball-type analytics”: How do we use statistics to evaluate players? How much should we pay players? Every year, the NBA draft is in June, and all 30 NBA teams have analytics groups that build models to try to forecast the future careers of young 18-year old basketball players. Most teams still rely on scouts to go watch and see who “looks” like a good player, but teams are increasingly placing more weight on different types of models in draft evaluation to decide on players. Instead of talking in terms of regression, I have to put ideas in ways other people can understand. For example, “high floor” means that someone has a decent chance of spending time in the NBA, but they likely will not be a star. The other bucket of problems are 2) “Statistics for the betterment of the NBA.” These are harder. We need to be creative. It’s difficult to run experiments, as we can’t just create an NBA game for a month. Instead, we use focus groups. Eventually, we tune questions like, “How long is the game? What are the rules? Where are the lines?”

Yannik: Statistics help tell stories. With a statistics background, it is easier to interpret and then communicate data to stakeholders and clients, translating rigor into something others can understand. Explaining concepts like time series or concept drift detection, requires more statistical sophistication than you might expect. To explain concepts intuitively, you need to be able to understand the concepts very intuitively as well.

Raj: At Two Sigma, pretty much everyone is a statistician in some way. I have not directly used anything I’ve learned in Stat 210 and above, but I would not be able to do anything I currently do without having taken them. Learning statistics at a very deep level, visualizing randomness, helps you answer questions in a productive way. In industry, what matters is answering the question, not using the coolest model. Statistics can make methods complicated, but it is all about thinking: How do I capture what I want to capture, measure what I want to measure? What expresses what I need to express? It’s important to develop an ‘orthogonal practitioner’s knowledge’ to answer questions in practical, industrial settings. Oftentimes, by the time you get to part q on problem sets, it’s 3am and you hope your teaching fellows won’t read your response closely. But that part q is the question that matters, nothing else matters, everything else is just implementation: “When deploying in real life, how will this thing behave?” These are the most important questions. Don’t ignore part q of your problem sets.

Michele: The role of a statistician with people who do not have strong stats backgrounds is to be there for common sense. People without statistics backgrounds take things like a p-value threshold very seriously, but we know that results can be sensitive to outliers and assumptions. You should be a sounding board to help them guide decisions. Experience in undergrad analyzing data will help build up that sense and intuition.

How do you differentiate Statistics and Applied Math?

David: The most important skill that separates people with statistics from people with applied math, physics, or economics backgrounds is programming with data. I have met many brilliant physicists, but they often still use Excel when they want to graph something. Stat undergrad programs have built more R and Python into their curriculum. I learned R in STAT 111 and STAT 135.

Kathy: In statistics, people think of things in terms of dataframes and flat data; at Google, software engineers using statistics into production frameworks thought about data very differently. The way you picture data in your head is very different.

Yannik: The “edge” as a statistician compared to applied math or computer science is stark: it’s how we understand uncertainty. With statistics, you will stand out among data scientists. Most come straight from a computer science background. More than just p-values, can you intuitively understand ordinary least square (OLS) regression? Leverage? Deriving OLS normal equations? Yet it is invaluable to have an understanding of algorithms and data structures in industry.

Raj: I am generalizing a bit, but there are way more computer science people who think they can do some stat than there are stat people who know how to code. You should be a stat person who knows how to code: that’s much more powerful. I interview people who have a much stronger computer science than statistics background. They tend not to do as well.

Michele: If you want to pursue graduate school, it’s still good to have a strong theoretical background by taking proof-based math courses.

How do you see and deal with Gender Bias?

Kathy: In my experience in academia, biostatistics has a lot more women than statistics, so there are women in strong leadership positions as teaching assistants. But there are small micro-aggressions: One time, Natalie Dean, my former teaching assistant who is now a professor at the University of Florida, was not referred to as “doctor” or “professor” on CNN, while her male counterpart was. Similar experiences have happened to me a couple of times at conferences.

Michele: The places where I have worked have been pretty equal and fair, but people tend to look to women to “organize dinners and organize department picnics.” Be aware that you do not naturally fall into those roles just because people think you do.

What classes do you recommend?

Jason: STAT 110 “forced me to think in a way other classes did not,” STAT 149.

Raj: STAT 139 was the “most applicable class.” People need to understand material at the STAT 110 level, not STAT 104 level. Understanding linear modeling from a geometric perspective, by taking classes such as STAT 139, 149, and 244, is important.

Kathy: STAT 244 is the “best class I’ve taken in anything ever.”

Yannik: STAT 110, STAT 210, and stochastic processes if interested in quantitative finance.

David: STAT 135 on “Statistical Computing Processes,” CS 50 and CS 61.

Michele: STAT 139 and 149.

The Group for Undergraduates in Statistics at Harvard College (GUSH) is committed to creating a unique and open space on campus for students interested in statistics. Join our mailing list here! Thanks to our panelist sourcers Rachel Li and Ginnie Ma, and Ben Chiu for his questions.

--

--

Harvard GUSH

The Group for Undergraduates in Statistics at Harvard College