to sift efficiently through a broad range of potential variables, identifying the relationships, thresholds, and interactions that are most reliably and robustly informative. The Essence of Machine Learning: Overfitting vs. Underfitting But the use of complex, flexible models often comes at a cost—they can work too well. Fitting is easy, prediction is hard. And a key danger of using a complex model is that it will almost always fit the existing sample well. Indeed, a sufficiently complex model should be able to fit the data perfectly. But that is no guarantee
. As a general guide, the field has its origins in computational statistics, and is chiefly concerned with the use of algorithms to identify patterns within a dataset (Kuhn and Johnson, 2016). The actual algorithms can range from the simplest OLS regression to the most-complex “deep learning” neural network; but ML is distinguished by its often single-minded focus on predictive performance—indeed, the essence of machine learning is the design of experiments to assess how well a model trained on one dataset will predict new data ( Box 2 ). In this regard, the