Home > Classification, Domain Adaptation, Estimation, Logistic Regression > Regularized Minimax on Synthetic Data

Regularized Minimax on Synthetic Data

First I would like to mention that, since my last post, I came across the paper from 2005 on Robust Supervised Learning by J. Andrew Bagnell that proposed almost exactly the same regularized minimax algorithm as the one I derived. He motivates the problem slightly differently and weights each example separately and not based on types, but the details are essentially identical.

Experiments on Synthetic Data

I tried the algorithm on some synthetic data and a linear logistic regression model. The results are shown in the figures below.

In both examples, there are examples from two classes (red and blue). Each class is a drawn from a  mixture of two normal distributions (i.e., there are two types per class).

The types are shown as red squares and red circles, and blue diamonds and blue triangles. Class-conditionally the types have a skewed distribution. There are 9 times as many red squares as red circles, and 9 times as many blue diamonds as triangles.

We would expect a plain logistic regression classifier will minimize the overall “error” on the training data.

However since an adversary may assign a different set of costs to the various types (than those given by the type frequencies) a minimax classifier will hopefully try to avoid incurring a large number of errors on the most confusable types.

Example 1

Example1. Original training data set. Both the red and blue classes have two types in 9:1 ratio.

Example 1. Plain logistic regression. No minimax. Almost all of the red circles are misclassified.

Example1. Minimax with gamma = 0.1

Recall that as gamma decreases to zero, the adversary has more cost vectors at his disposal, meaning that the algorithm optimizes for a worse assignment of costs.

Example 2

Example2. Original training data set.

Example1. Logistic regression. No minimax.

Example2. Minimax with gamma = 0.5


1. Notice that the minimax classifier trades off more errors on more frequent types for lower error on the less frequent ones. As we said before, this may be desirable if the type distribution in the training data is not representative of what is expected in the test data.

2. Unfortunately we didn’t quite get it to help on the named-entity recognition problem that motivated the work.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: