Oh, the Places You'll Go!

Posted Aug 14, 2017 by Samhita Karnati

Exploring Student Career Trajectories

When I was a freshman I thought I knew what I wanted to do post-graduation and the steps that I needed to take to get there. Growing up in the Seattle area, surrounded by family and friends working at Amazon and Microsoft, I became interested in tech, decided to be a computer science major, and believed that success in this field was working at a big company. I focused in on this trajectory and didn’t really think about any other careers.

As a rising senior with a much larger and more varied group of friends and mentors, I now know that computer science is not the only route to tech, not every computer science major interned at a big company, and not everyone gets an internship every summer. My experience is not unique – many freshmen commit themselves to one path from consulting to finance to engineering without exploring all their options, basing their decisions on often incomplete information and rumors. Similarly, students stress out about choosing a major in their freshman or sophomore years, believing that this one decision will define their post-graduation prospects. The root of the problem is a severe disparity in access to information regarding all possible career options. At Handshake, we are leveling the college student playing field where talent is distributed evenly, but opportunity is not; information access is one facet of that opportunity equity.

Handshake stands at a unique position to leverage rich student data to enhance the intersection of the student, career, and employer experiences on the platform. In this exploration of the data, we will look at trends around students’ internships, jobs, majors, and graduate studies with the hypothesis that non-linear internship trajectories are in fact the norm.

Lay of the Land

Let’s first look at the top industries that students worked in across their summers and post-graduation. All data used for this project are aggregated and anonymized, coming from the class of 2017 of two mid-sized east coast schools that typically have a strong lean towards finance, consulting, and accounting. This group of students interned and chose jobs in 52 industries, working in 38 job functions. The following table shows the most popular industries for students across different summer internships and their first post-graduation job.

Freshman Summer Sophomore Summer Junior Summer Post-Graduation
1 Research Research Internet & Software Consulting
2 Internet & Software Internet & Software Consulting Internet & Software
3 Investment/Portfolio Management Investment Banking Healthcare Investment Banking
4 Investment Banking Healthcare Investment Banking Non-Profit

As expected, we see Consulting and Investment Banking have a strong presence in all summers and post-graduation. Also somewhat expected, we see that (on-campus) Research is very popular for freshmen and sophomores, as this is a way for many underclassmen to build up experience when companies are not yet actively recruiting them. More surprising is that Healthcare was a very popular industry for sophomore and junior internships and that non-profit was the fourth most popular industry for students after they graduate. I was personally completely unaware of the popularity of these industries from the stories I’ve heard and conversations that I’ve had with my friends, but this could be attributed to the fact that newer grads tend to increasingly prioritize mission and impact in their careers.

Paths on Paths on Paths

Now let’s look at the paths that link these industries and positions. In the interactive visualization below, choose a post-graduation outcome (either industry or position) to see the different ways students got there. The nodes represent population sizes based on a log scale, and the solid nodes are clickable to expand a particular path backwards to the class of 2017’s freshman summer. Connected nodes on a path represent the cohort of students who followed that particular trajectory to the final post-graduation node. Below the tree diagram, there’s a bubble chart with all the majors that fed into the selected post-graduation outcome.

Career paths visualization View the full visualization here.

It is important to note that neither industry nor position entirely capture what type of work a student is doing in their internship. While an industry like consulting indicates that the student is a consultant, other industries like Food & Retail could mean that the student was in sales, marketing, technology, supply chain management, or a number of other functions. Similarly for positions, an analyst could work in a variety of industries.

The sheer quantity of paths is quite telling that students are exploring in their internships. In the consulting diagram, we see one group of students go from Environmental Services to Legal & Law Enforcement to Healthcare, and then to Consulting after graduation. In Non-Profit, there is a group of students who interned in Biotech and Life Sciences, then in International Affairs, and finally worked in Non-Profit. Across all outcomes explored and shown here, the only two that had a group of students work in the same area/position across all three summers and post-graduation were Non-Profit and Data & Analytics. This again supports that there is no single way to achieve a desired outcome, that one internship does not define your post-graduation job, and that internship exploration is actually the norm.

In comparing the industry outcomes (Consulting, Non-Profit, Communications, and Education) with the position outcomes (Data & Analytics and Product Management), the nodes in the paths for position outcomes are more homogenous than those for industry outcomes. This suggests a pattern where students are exploring how to use particular skillsets in a variety of different fields that might interest them.

Changing Directions

Another outcome we can explore is continuing education. The data shows that in this group of students, 9.87% reported that they were going to graduate school. But do these students typically stay in the same area of study? Or are there changes from undergrad to grad school? This sankey diagram shows once again how many paths students take and that there are many students who change concentrations in graduate school. The left and right-hand side labels are undergraduate and graduate school majors respectively.

Continuing education visualization View the full visualization here.

Immediately, we see that law school and medical school attracted the most students of any single graduate school concentration. Some graduate school majors, like physics, drew exclusively from one undergraduate major (in this case physics), while others, like public health and education, had students from a wide variety of undergraduate majors. The large number of outgoing and incoming paths from and to the nodes is really exciting because it shows that there are so many ways students can use their education after they graduate.
For example, sociology majors went to graduate school for everything from medicine to business to global affairs. It is important to note that this diagram does not show what students minored in (and the data for minors is rather sparse), so concentration changes can be attributed to students pursuing their undergraduate minor in their graduate studies. However, this supports the initial hypothesis as minors are another way for students to explore more of their interests.

How We Got Here

Let’s dive into the process of exploring the data and generating these visualizations. We chose to pull data from the class and schools used as they had very complete data for both their summer work experiences and what they were going to do post-graduation, allowing for meaningful exploration.

Knowing that we would explore majors, internships, and post-graduation outcomes, we pulled anonymized and aggregated data from three sources:

  1. User information: School year and major
  2. Work experiences: Job title, job description, and the year of the experience
  3. First Destination Survey (FDS): Employer and job position, or graduate degree

While there is a lot of data, it does not give us complete information on all students in the class of 2017. As mentioned, these two schools had incredibly complete data about their students’ post-graduation plans. With any survey, there’s a design tradeoff in getting specificity vs. adding too much friction, and so students going through FDS can skip various fields. Thus, there are expected holes in post-graduation information. Additionally, since Handshake is a fairly young product, many students have not had the chance to completely fill out their Handshake profiles yet, and different cohorts exhibit significantly different engagement levels. For example, the class of 2017 listed nearly twice as many work experiences for their sophomore year summer than for their freshman summer, and about 10% fewer experiences for their junior year summer. This does not necessarily imply that there were twice as many students with jobs or internships in their sophomore year summers than their freshman year summers, or that more students found internships when they were sophomores than when they were juniors. This is just the information that students reported on their profiles.

Expectedly, this data needed lots of cleaning up.

First, we wanted to map each experience to an industry and position type. When students enter their employer information, they have the option of choosing from a suggested list or ignoring the suggestions and typing their own in, which will be included in the suggestions for future entries. This means that there are lots of duplicate employers and employers we do not have in our system. To get an experience-industry mapping, we first tried to explore Latent Dirichlet Allocation (LDA) for topic extraction, but realized that it wasn’t the right fit to get a mapping. We then decided try out some text-based classification algorithms (implemented with scikit-learn), which ended up working quite well.

About 40% of the experiences from the class of 2017 referenced an employer that was in our database with an associated industry category, which could be used for the training set. The actual features were based on concatenated strings of the description + position title + employer name. We used scikit-learn’s tokenizer, stopwords, and stemmer to get term occurrences, and then tf-idf (term frequency times inverse document frequency) to get frequency vectors. At first, experiences were getting classified as Investment Banking and Internet & Software more than any other category because the training data was skewed to these industries and only had a few entries for industries like Forestry. To make the training data more useful, we manually classified some of the less-represented industries. Of the three classifiers we tried (Naive Bayes, Support Vector Machine, and Random Forest), SVM proved to be the most accurate, at 92.4% accuracy. We used the same approach for classifying positions, working with 52% labeled data and achieving a classification accuracy of 93.1% with an SVM.

Aside from experiences, the other data we had to sanitize were students’ undergraduate and graduate majors. Different schools have completely different sets of undergraduate majors and so in combining the data from two schools, we needed to merge the majors. This is a problem that Handshake has been tackling for various parts of the product, but for this project, we did the merging almost entirely manually. The other task was in deduping the graduate school majors. This data is collected in the FDS survey, allowing students to describe their graduate school degree in whatever ways they choose – there was even a response in German. Thus, apart from doing simple corrections like removing stopwords and ignoring case, much of this task also had to be done manually.

One of the major goals of this project was to create interactive visualizations explaining the data, laying the foundations for potentially integrating such visualizations into various parts of the product. We explored a variety of tools – including tableau, vega, and dygraphs – and ended up using d3 (short for data driven documents). We chose d3 for its control and configurability, and for the selfish reason that we had been wanting to experiment with d3 and this project gave us the opportunity to do so.

“You can steer yourself // Any direction you choose”

This project was just a first stab at using the incredible data that Handshake has to help students make more informed decisions. The data supports the hypothesis that non-linear career paths are the norm and that students are in fact exploring in their undergraduate internships. The number of paths to get to any one outcome, the varied majors associated with each outcome, and all the different graduate school majors shows that neither your major nor your first few internships define your post-graduation options.

From a technical standpoint, there are lots of ways that we can make the code more robust and usable in-product. As a first step, we could work on making the data and classification better. By removing duplicate employers created by students through their profiles and making sure that all employers have associated industries, we could be more confident in the trends we are analyzing. It would also be interesting to add more schools and/or generate these visualizations for every school on Handshake. This is dependent on schools having more students with complete profiles and higher FDS completion rates so that there is sufficient data to work with. With more and better data, we could also build out visualizations based on specific majors and locations.

At Handshake, we are constantly trying to use data to close the opportunity gap for students. This fall, we are rolling out the new Campus Profiles feature, where students who share their profiles to other students in their university will also be able to search and view their peers’ profiles. In doing so, students will be able to learn about what students outside of their networks are up to, hopefully inspiring them to explore options they had not previously considered. We’re also introducing collections on the new student dashboard with recommended jobs based on a student’s reported interests, major, and school to further aid in exploration. We’re just getting started with using data to help students navigate their careers and post-graduation options. There are lots of challenges ahead, but we strongly believe that these problems need to be solved, and are excited to move forward towards our mission.

If you’re interested in helping us leverage data to bridge the opportunity gap, check out our open engineering jobs! We’re especially excited about hiring a Machine Learning Engineer so we can better use analytics and ML to build out our products.