February 2, 2021

How Can We Prepare Students for Data Science Careers?

What skills do students need to succeed in data science? Four data scientists share their insights.

All students must be data literate to navigate a world that is driven by big data. But how can teachers foster data literacy, and what skills will students need to thrive in data science careers?

Since 2013, EDC has led education R&D efforts—including Oceans of Data Institute, Strengthening Data Across the Curriculum, and Innovation Pathways to Data Science Careers—to answer these questions. Now a new online panel series, designed and led by EDC, MassBioEd, and Massachusetts Life Sciences Center (MLSC), is engaging educators, industry leaders, and policymakers in surfacing and sharing strategies to improve access to data science careers.

“Data science is a rapidly evolving field,” says Joyce Malyn-Smith, an expert in STEM workforce development and a distinguished scholar at EDC. “Education and workforce development systems are struggling to keep pace and prepare learners to pursue jobs that require data literacy and experience working with data.”

These struggles are impacting employers. In Massachusetts and nationwide, job openings abound in data science, but go unfilled due to a lack of qualified candidates. The first panel in the series, “The Data Science Workforce Challenge,” tackled this issue, with a focus on the life sciences.

During the panel, which was moderated by Karla Talanian of MassBioEd, four data scientists—Drs. Sudeshna Das, Shuba Gopal, Iain McFadyen, and Sally Trabucco—shared their insights into the skills needed to succeed in their field.

Q: What skills do people need to get into the field?

Trabucco: A core set of critical thinking skills and communications skills are key. Those are not the first things people think about when you mention data science. If you can do those two things, many times an employer can teach you the rest. Some experience coding is usually necessary, but it doesn’t have to be in a specific language. You need some experience in a language, you need to understand the logic, and you need to be able to troubleshoot and apply those critical skills. . . . Real world experience working with data is also valuable because data is messy in the real world.

McFadyen: Data visualization is also a critical skill and means of communicating with collaborators. It speaks to a broader requirement for being a data scientist, which is data competency: “Are you able to find, collect, organize, clean, annotate, inspect, and visualize key data sets?” You need to be able to live and breathe your data set. You need to dig into it like a gardener digging into the soil . . . it’s the soil from which you’re going to grow your results. That mindset can come from many different fields and backgrounds, but the key to data science is wanting to live inside a data set. Visualization is part of that.

Gopal: There’s a phrase that I personally hate but will mention here: “data storytelling.” To me as a scientist, the idea of storytelling feels like fictionalizing something, but I think a key idea is, “Can you create a narrative that is compelling enough that people who are not as deeply into the soil of your data as you will be able to understand and appreciate the consequences?” It’s a combination of effective visualization, understanding your audience, and then being able to craft a narrative that helps explain why your recommendation is a good recommendation. That set of skills is really important. So visualization is a piece, digging deep into your data is a piece, and then there’s bringing it all together into a narrative that is compelling to others.

Q: Given the huge demand for data workers, would it help to have a high school or community college program to give students real-world experience?

McFadyen: I think anything that increases the pipeline of talent will be helpful. It would be great if we could build the mindset in kids that a data scientist needs to have that I mentioned earlier. If there’s a way to effectively teach data fluency and data communication to kids so they get interested in data and data science—and follow that interest—it would be very helpful.

Das: A pipeline program sounds wonderful. It could start at the high school level, definitely, but the focus should not be on learning a particular technology or language but developing fundamental skills and then using those skills in an internship to get exposure to solving a real problem. Massachusetts General Hospital has a diversity internship program where they hire students from Boston area high schools. I’ve had some interns from the program, and for the interns, exposure to the work and real-life role models are more important than the training in a particular technology or program.

The next panel, Enhanced Strategies to Increase Diversity, will take place on February 10 and focus on the role of diversity initiatives in growing the data science workforce. Moderated by Kenneth Turner, MLSC director, the panel will feature Brenda Darden Wilkerson, president and CEO of AnitaB.org; Pam Eddinger, president of Bunker Hill Community College; Ginette Saimprevil, interim executive director of Bottom Line; and Ron Walker, executive director of the Coalition of Schools Educating Boys of Color.

“As colleges and universities create new data science programs and pathways to address the workforce shortage, we need to keep equity foremost and make sure no students are left out,” said EDC vice president Sarita Pillai. “We are excited about this panel’s potential to deepen understanding of how education and industry leaders can work together to make that happen.”