In the Fall of 2015, President Obama unveiled the College Scorecard, an online tool aimed at bringing much desired transparency to higher education. The College Scorecard consists of thousands of variables for thousands of schools going back almost two decades. This provides ample opportunity to discover various insights about higher education, yet it can be difficult to imagine how to make sense of so much data. Traditional analyses tend to focus on a few hand-selected platinum variables and apply routine statistical methods, leaving so much of the data’s potential unrealized. In this new work, we apply neural networks to the College Scorecard in novel ways, allowing us to grapple with the size of the dataset, the extent of missing data (more than half of the possible data is missing), the various classes of data (continuous, categorical, etc.), and complex nonlinear relationships. Leveraging specific neural network architectures, we perform various novel analyses such as:
- Using unsupervised machine learning through the use of deep neural networks to develop an objective, holistic, quantitative college ranking;
- Discovering colleges that are most similar to each other across the available data;
- Discovering “hidden” Ivy League schools–schools that possess a sufficiently accurate signature of an Ivy League school even though they are not labeled as an Ivy; and
- Developing a new metric called Value Over Replacement School (VORS), inspired by the Moneyball culture of sports analytics.