*In this post I cover a few tricks to diagnose problems with a statistical learning algorithm. We discuss tricks for uncovering high bias and high variance. Then we discuss a method that can tell if there are problems with your objective function or the optimization algorithm employed. These are lecture notes from Andrew Ng’s on line machine learning lecture series.
*

I’ve been listening to Andrew Ng’s machine learning lecture series on my commute to work and came across a gem of a video in lecture 11. The lecture deals with practical advice on how to get your machine learning algorithms to “just work.” Some of the insights that Andrew provides are quite useful, I think, and I thought I’d share my lecture notes.

The story is typical, you spend hours coding up your favorite ML algorithm, sure that it will work, then when you apply it you find that it fails miserably. What to do now?

At this point you have a wide number of options available, so many, in fact, that it’s difficult to tell what to do next. You can collect more training data. Maybe some parameters are not set correctly. Maybe you need to tweak your input features or collect new features altogether. Or maybe your algorithm is just plain wrong.

The following tips can guide you to your next step.

## Diagnosis Tip #1 Bias vs. Variance

When you have problems with a machine learning algorithm (barring any bugs in your code), there are two general issues that come up, high bias or high variance. High bias arises when your model is not rich enough to capture trends in your training data. Think, for example, of trying to fit, with a linear model, training data that is inherently quadratic.

High variance comes up when your model is “too precise for its own good” and ends up modeling intricacies in the noise. Think of modeling n points with an n-1 polynomial — You’ll always fit your training data perfectly at the expense of artificial oscillations in your model.

When you have an improperly tuned statistical model, you should diagnose which of these two errors you have. One trick is to look at your training and test errors,

- If your training error and your test error are far apart you are dealing with high variance.
- If your training error and your test errors are about the same you are dealing with high bias.

(You can view single point estimates of your training and test errors, or, to get a bigger picture, view the training and test errors as a function of sample size.)

In case 1) the divergence in training and test errors is caused by over-fitting. You’re able to fit your training data artificially well, causing your training error to drop. In case 2) your training and test errors are similar because your model is actually working as designed, the problem is that the model just isn’t good enough yield any more performance points from either training or test data.

Once you’ve recognized the type of error you can take appropriate action. If you are addressing high variance you can

- Add more training data.
- Add or strengthen regularization parameters.
- Use less features.

If you are addressing high bias,

- Use more features.
- Relax your regularization parameters.
- Try a different algorithm or tweak your algorithm.

## Diagnosis Tip #2 Problems with your objective function

A successful machine learning solution has two steps at its core (and really this applies to just about all engineering problems),

- Create an optimization function $latex J(\boldsymbol\phi)$ that “describes” your real-world problem.
- Create an algorithm that optimizes $latex J(\boldsymbol\phi)$.

(Let’s say that we’re maximizing $latex J(\boldsymbol\phi)$ for this discussion.)

When dealing with a failed statistical algorithm realize that an error may lurk in step 1) or step 2). Figuring this out early can save you lots of time down the line. Here is a trick to discover these errors.

Generate an external estimate for the optimal solution, call it $latex \boldsymbol\phi_{EXT}$, which comes from some alternate method and, importantly, that this method gives you a better *test error. *If possible, this estimate should come from a human source or perhaps another more advanced machine learning method. Now plug in your external estimate into your optimization problem, $latex J(\boldsymbol\phi_{EXT})$.

Now, you have one of two possible outcomes. First, assume that $latex \hat{\boldsymbol\phi}$ is the optimal parameter estimate that you obtained in step 2 above (i.e. from the algorithm you are troubleshooting). Then you could have the following result,

$latex J(\boldsymbol\phi_{EXT}) > J(\hat{\boldsymbol\phi})$

and

$latex TEST\_ERROR( \boldsymbol\phi_{EXT} ) < TEST\_ERROR( \hat{\boldsymbol\phi} )$, (Remember, this second part is by design).

What this says is that you’ve gone to a superior algorithm (superior due to lower test error) and that this algorithm generates a parameter $latex \boldsymbol\phi_{EXT}$ that *beats *your best optimal parameter estimate $latex \hat{\boldsymbol\phi}$ ( Beats it because it achieves a higher value for $latex J(\boldsymbol\phi)$). This implies that your optimization function $latex J(\boldsymbol\phi)$ is, in fact, correct (step 1) and, importantly, that there’s something wrong with your optimization algorithm (step 2) since it fails to find the best solution for $latex J(\boldsymbol\phi)$.

The other possibility is that you have

$latex J(\boldsymbol\phi_{EXT}) < J( \hat{\boldsymbol\phi} )$

and

$latex TEST\_ERROR( \boldsymbol\phi_{EXT} ) < TEST\_ERROR( \hat{\boldsymbol\phi} )$. (Again, this is by design.)

What this says is that you have gone to a superior algorithm and that algorithm fails to generate a parameter estimate $latex \boldsymbol\phi_{EXT}$ that beats your current estimate $latex \hat{\boldsymbol\phi}$. However the superior algorithm still gives you a better test error. This can only imply that there is something wrong with the optimization function (step 1).

This test tells you where to focus your energy, if there is an error with your optimization problem (step 1) then it must be redesigned. If your optimization algorithm is at fault (step 2), then you could have a problem in your numerics, or have convergence or stability issues, so go spend more time troubleshooting that.

Granted, actually finding a $latex \boldsymbol\phi_{EXT}$ via a”superior algorithm” that improves the test error can be tricky. But in many cases it’s possible. For example, in Andrew’s lecture, he recounted a time when he was debugging an on-line machine learning algorithm applied to autonomously fly a helicopter. To employ this diagnosis trick, Andrew took measurements from human-guided flight and extrapolated parameter estimates for his optimization function. From that point he was able to run this test and diagnose a problem.