I was trying to figure out how hard it would be to tap into the Facebook APIs to do some data mining on social data. Doing this couldn’t be easier, you don’t even have to register an application with Facebook. All you need is your web browser and an environment where you can consume JSON […]

## What I’ve been reading this week: Low level links, Legos for adults and a free AI course by Peter Norvig.

Lot’s of low level this week For some reason this was the week of low level programming. I came across a great article titled The Demise of the Low Level Programmer on #AltDevBlogADay, a graphics and gaming heavy blog. At the end of the article there is a veritable goldmine of links to articles on […]

## Image Processing with Open Source Tools

My goal is to assemble an open source environment where I can prototype image processing algorithms in a fashion similar to Matlab. There’s two reasons for this. First, while Matlab is a great tool, it’s simply much too costly to obtain a commercial license for it. Especially in light of the fact that open source […]

## “Is your optimization function correct?” Revisited

I think I have a cleaner way to explain Diagnosis Tip #2 in my post on diagnosing problems with your machine learning algorithm. Many machine learning solutions boil down to defining a model that is specified by some parameter $latex \theta$ and then creating an optimization function or error function $latex J( \theta )$ that […]

## Cheat Sheet: Properties of Probability Distributions

Here is a probability distribution cheat sheet that I like to keep around for reference. This focuses on the “big picture” properties of some well known PDFs. The goal is to collect some properties that can help me decide when it’s appropriate to use a particular distribution. Beta Distribution Used in task duration modeling (E.g.. […]

## Adobe Air + AWS Cloud

In this mini-tutorial we make use of Adobe AIR and Amazon Web Services to create a bare-bones desktop visualization application which is connected to the cloud. This combines two technologies that I’m very excited about, Adobe Flex and AWS. Adobe Flex allows you to create professional dynamic visualizations and GUIs while AWS lends serious computational […]

## Yahoo! Key Scientific Challenges Program

The old grad school mailing lists are abuzz with Yahoo! Labs Key Scientific Challenges Program. The site, besides being a call to arms for researchers, describes some major areas of research that Yahoo! is focusing on. Good brain-food.

## How to diagnose problems with your statistical learning algorithm.

In this post I cover a few tricks to diagnose problems with a statistical learning algorithm. We discuss tricks for uncovering high bias and high variance. Then we discuss a method that can tell if there are problems with your objective function or the optimization algorithm employed. These are lecture notes from Andrew Ng’s on […]

## Evidence approximation in linear regression: A method that produces “automatically regularized” solutions.

Summary In this post, I look at a Bayesian treatment of the linear regression problem. Making use of basis functions allows you to model non-linear patterns in data, however taking this route usually requires that you regularize your solution. To find the best regularization parameter often requires cross validation, but by looking at a framework […]

## Some time series analysis resources

I have been looking into time series analysis topics recently. Here’s a list of resources that I have found useful so far. I’ll update this if I find more stuff: * Free PDF Text: A First Course on Time Series Analysis – From the University of Wurzburg. * http://www.statsoft.com/textbook/time-series-analysis/ – This is actually a website […]