JavaScript Loaders

Wednesday, March 18, 2020

Survey Results: What Degree is Best for Data Science?

The Survey

Results from the survey What Degree is Best for Data Science? (the survey is still open) collected from  February 9 through March 12, 2020 asking participants 4 questions:

  • Answers about self:
    • Q1: What is the highest level of school degree you have completed?
    • Q2: Which of the following best describes the field in which you received your highest degree?
  •  Answers about best education:
    • Q3: What level of school degree you consider optimal for successful career in data science?
    • Q4: Which field of study you consider optimal for successful career in data science?

During that period 289 respondents participated and 285 successfully completed all 4 questions, so 4 participants with partial answers were removed from analysis below.

Though simple and short (average time it took to complete was 55 seconds (after removing 6 outliers who took over 500 seconds)) the survey's questions possess certain internal structure in time and subject. Questions form 2 groups in time: one about education already acquired by a participant and the other about participant recommendations for best education. Subjects of questions yield 2 alternative groups: pair of 1st and 3d about degree and pair of 2d and 4th about field of study.

Answers to Each Question


Bird's-Eye View


Sankey Diagrams: How Data Flows

Sankey diagrams help visualize how answers flow through the questions. We start with pairs of related questions and finish with all 4 questions together. 

Completed Degree and Field of Study (Q1, Q2)

Best Degree and Field of Study (Q3, Q4)

Completed Degree vs. Best Degree (Q1, Q3)

Completed Field vs. Best Field (Q2, Q4)

Complete Flow of Answers For All 4 Questions

Concluding comments

The results are self-evident. The survey is still open so anyone who didn't participate can still do so and let others know about it. 

If you haven't noticed yet there is certain bias towards statistics in answers. This might originate from the fact that significant part of respondents reached the survey via R-bloggers distribution popular among R users (who often have background in statistics). 

Finally, there is another implicit bias: people with degree in Math are likely to suggest Math as best field, and so on for other fields and degrees. This sort of bias is evident from Sankey diagrams above: see (Q1, Q3) and (Q2, Q4) diagrams. Removing such bias from the results could be useful and I attempted this exercise but found it to be either too naive in my DIY approach or too extensive to process in short period of time from resources discovered. If you have pointers or even better a method of removing such bias from answers I'd love to hear from you.

No comments: