Using Neural Networks to Enable Novel Higher Education Analytics

In the Fall of 2015, President Obama unveiled the College Scorecard, an online tool aimed at bringing much desired transparency to higher education. The College Scorecard consists of thousands of variables for thousands of schools going back almost two decades. This provides ample opportunity to discover various insights about higher education, yet it can be difficult to imagine how to make sense of so much data. Traditional analyses tend to focus on a few hand-selected platinum variables and apply routine statistical methods, leaving so much of the data’s potential unrealized.

In this new work, featured in e-Literate, we apply neural networks to the College Scorecard in novel ways, allowing us to grapple with the size of the dataset, the extent of missing data (more than half of the possible data is missing), the various classes of data (continuous, categorical, etc.), and complex nonlinear relationships. Leveraging specific neural network architectures, we perform various novel analyses such as

using unsupervised machine learning through the use of deep neural networks to develop an objective, holistic, quantitative college ranking,
discovering colleges that are most similar to each other across the available data,
discovering “hidden” Ivy League schools–schools that possess a sufficiently accurate signature of an Ivy League school even though they are not labeled as an Ivy, and
developing a new metric called Value Over Replacement School (VORS), inspired by the Moneyball culture of sports analytics.

Below is an interactive table where you can explore the results of these new analytics. The values in the table are derived from the most recent sufficiently complete data in the College Scorecard.

Error
You are trying to load a table of an unknown type. Probably you did not activate the addon which is required to use this table type.

When using this table:

A “Hidden Ivy” is a school sufficiently similar to the set of Ivy League schools. A “Hidden Ivy Prospect” is a school forecast to soon become a “Hidden Ivy.” A “Nearly Hidden Ivy” is a school not forecast to become a “Hidden Ivy,” yet is still reasonably close to being considered one.

VORS stands for Value Over Replacement School, which is defined as the amount that observed 10-year mean earnings exceed the expected amount for matriculating students in a given year. The expected earnings are based on a neural network model that considers many pieces of data related to the characteristics of the students, tuition, etc.

Note only colleges that had sufficient 2014 data are included in the table.

Some things you may notice about the results of this analysis are that

while the overall college rankings mostly align with other sources of college rankings when looking across the entire list of schools, when looking at the rankings among top tier schools, it is clear that what is being measured is something different than merely prestige,
colleges labeled as “hidden” Ivies are surprisingly consistent with published lists, such as those found in The Hidden Ivies, which are curated based on human expertise,
the schools with the highest VORS scores include many well-known top tier schools, but also include many schools that have specialties, such as business, engineering (specifically marine engineering), and pharmaceutical schools, however,
many of the schools with the lowest VORS scores are also prestigious, but are either liberal arts (or outright arts) schools that have talented students that are less concerned with choosing careers that will maximize their monetary earnings.

To see a more detailed explanation of the methodology, please read our Research Brief A New School of Thought for Our Thoughts on Schools: Using Neural Networks to Enable Novel Higher Education Analytics.

Meet the Author

Steve Lattanzio

Research Engineer

slattanzio@lexile.com