Saturday, June 23, 2012

Intro to Octave for Coursera Students

Octave as a programming language has a lot to offer. To give you a taste this post attempts to showcase some of the cooler features of the language. But it also serves a purpose of introduction to Octave (or Matlab) for those who are taking or considering taking Coursera Machine Learning class by Professor Andrew Ng (great great idea). Not incidentally most of the examples were inspired by the homework assignments for the course.


Disclaimer: this post contains no concrete references, examples, excerpts or solutions for any of the Coursera courses, exercises, or homework assignments.


Matrix Basics

Suppose we have a 3 by 5 matrix A like this:
octave:> A = [1 2 3 4 5; 2 3 4 5 6; 3 4 5 6 7]

A =

   1 2 3 4 5
   2 3 4 5 6
   3 4 5 6 7

Then to extract a single element from A is just like in most other languages:
octave:> A(2,3)

ans = 4 
Just remember that following mathematical conventions Octave indexes start with 1.

Almost everything in Octave is array (vector or matrix or similar) and index is no exception. Let's take a range for example. Range is a row vector with evenly spaced elements, e.g.:

octave:> 1:5
ans =

   1   2   3   4   5

octave:> A(1:3,1:5)
ans =

   1   2   3   4   5
   2   3   4   5   6
   3   4   5   6   7
Operation A(1:3, 1:5) returns whole A again because it selects all of its rows and columns. Ranges can exist by themselves as vectors but there is special type which is available only in the context of matrix index:

octave:> A(2:end,3:end)
ans =

   4   5   6
   5   6   7
Ranges 2:end and 3:end are defined only within concrete matrix context as keyword end indicates last row or column position within matrix. You can even select elements for last 2 rows and columns like this:
octave:> A(end-1:end,end-1:end)
ans =

   5   6
   6   7

Logical Operations on Matrices

Logical arrays in Octave contain all logical elements and are usually results of relational operators with vectors and matrices like this:

octave:> A != 3
ans =

   1   1   0   1   1
   1   0   1   1   1
   0   1   1   1   1
One cool application of this is inverting identity matrix:
octave:> I = eye(3)
I =

Diagonal Matrix

   1   0   0
   0   1   0
   0   0   1

octave:> I == 0
ans =

   0   1   1
   1   0   1
   1   1   0

Euclidean Distance

Given two vectors (same size) find Euclidean distance between them.

a = [0 0 0];
b = [1 2 2];
distance = sqrt(sumsq(a-b));
If you find yourself writing in Octave more complex solutions for similar problems  with vectors  then stop and review your vectorization approach.

Vectorizing indexes

Suppose you have a collection of m vectors in n-dimensional space stored as m x n matrix X.  Suppose that the value of last (n-th) coordinate of these vectors is always 0 or 1. Then we want to produce 2 subsets of X - one subset of vectors with last coordinate equal to 0 and the other subset where vectors have last coordinate equal to 1:
n = size(X, 2);
X0 = X( find( X(:, n) == 0 ), :);
X1 = X( find( X(:, n) == 1 ), :);

Randomness

Random numbers appear in many problems. Basic approach is a matrix filled with random numbers from uniform distribution on the interval (0, 1):
octave:> rand(3,4)
ans =

   0.347937   0.317482   0.630678   0.245148
   0.917634   0.649125   0.634592   0.837635
   0.994745   0.092818   0.154936   0.966380
But Octave offers a few shortcuts. For one, such common distributions as normal, exponential, Poisson, and gamma each receive their own function randn, rande, randp, and randg.

Function randperm produces a row vector of randomly permuted integers from 1 to n:
octave:> randperm(5)
ans =

   3   1   4   5   2
If you are looking for an arbitrary vector with values between 0 and N of size n (n<=N) then randperm gets id done (in this case N = 100 and n = 10):
octave:> x = randperm(100)(1:10)
x =

   88    1   89   25   76   19   78   38   99   34

This combined with vector indexing accomplishes rather elaborate task in short one-liner: suppose we have m n-dimensional vectors stored as m x n matrix X and we need to pick k vectors (k < m) randomly. The following gets this done:

X(randperm(size(X, 1))(1:k), :);

Newer releases of Octave (I use 3.2.4) added functions randi and randperm(n, m) that offer even nice features.

Binary Singleton Expansion Function (bsxfun)

This function reminds me of Python map function, but having it in Octave is necessity (unlike in Python). When vectorizing in Octave we have a few options: 1) both parameters are of the same dimensions for element-wise application; 2) parameters are of compatible sizes for matrix operations like multiplication; 3) one parameter is a matrix and the other is a scalar.

And then we use bsxfun for vectorizing everything else, for example applying a vector to each row:

X = rand(5, 3);
mu = mean(X);
sigma = std(X);
X_norm = bsxfun(@minus, X, mu);
X_norm = bsxfun(@divide, X_norm, sigma);

The operations above resulted in normalizing all 5 vectors (rows) from XX_norm contains vectors with 0 means and standard deviations 1 for all 3 dimensions. In just 2 lines bsxfun applied mu (means) and sigma (standard deviations) to each row of X and X_norm.

Timing operations

Use functions tic and toc to measure execution time in Octave to tune performance when necessary:
octave:> tic; A * A'; toc
Elapsed time is 2.58e-008 seconds.

Monday, June 4, 2012

Gentle Intro to Octave or Matlab

I began using Octave for homework assignments from the online Machine Learning class. Having worked with languages like Python, Groovy and JavaScript I never expected a system designed for numerical computations to include such a complete and unique programming language. But it does and I can't resist sharing some examples.

There are two important things one should know about Octave (or Matlab as Octave is usually portable to Matlab):
  • Octave is a high level language just like Python or Groovy
  • Using Octave without matrices or vectors is like using Java without objects
Just these by themselves are worth a whole book on Octave but instead I go on with few cool  examples (leaving the book for later :-).

Matrices and Vectors

Creating vector or matrix in Octave is simple:
octave:> A = [1 2 3; 4 5 6; 7 8 9]
A =

   1   2   3
   4   5   6
   7   8   9
defines 3x3 matrix of integers.

Use special functions to define special matrices, e.g. identity:
octave:> I = eye(3)
I =

Diagonal Matrix

   1   0   0
   0   1   0
   0   0   1

all zeros:

octave:> allZeros = zeros(2,4)
allZeros =

   0   0   0   0
   0   0   0   0
vector (number of columns is 1) of all ones:

octave:> allOnes = ones(3,1)
allOnes =

   1
   1
   1
or matrix with random values:

octave:> X = rand(3, 5)
X =

   0.400801   0.091597   0.951333   0.063074   0.018309
   0.690633   0.194094   0.417911   0.658953   0.624323
   0.848887   0.696741   0.213559   0.363656   0.632738
And finally getting a vector of values from 1 to N (row vector):
octave:> 1:N
ans =

    1    2    3    4    5    6    7    8    9   10
 and column (vector above transposed):
octave:> (1:N)'
ans =

    1
    2
    3
    4
    5
    6
    7
    8
    9
   10

To stir things up a bit I make the following claim:
In Octave for any given problem there is higher than 50% chance that using matrices alone solves the problem with less code and more efficiently than when using loop and condition statements. 
Being a high level language Octave has control statements if, switch, loops for and while but using them in Octave is often your second choice. The reason are many matrix operators and functions Octave offers may accomplish a task without ever invoking a single control statement in a fraction of time.

Suppose you have a matrix X and you need to insert a column of 1s in front. Then I just concatenate a vector of 1s of proper size and X:

 X = [ ones(size(X, 1), 1),  X ];

What just happened: function size returned 1st dimension of array X (number of rows); then function ones generated a vector (2d dimension is 1) of 1s, and finally we concatenated column and X. But this is only beginning.

Matrix magic

This example illustrates why and how things may work out better without control statements in Octave. Suppose I have a row vector (we may call it also an array but ultimately it is a single row matrix) of numbers from 1 to 10:
octave:> y = randperm(10)
y =

    4    3    1    8    6   10    9    2    7    5

octave:> y = repmat(y, 1, 10);

Function randperm returned a row vector containing random permutation of numbers 1 through 10. Then I used function repmat to make vector y 10 times its original length by repeating it 10 times (which results in 1 by 100 matrix y).

Now, to the problem we will solve. Let's say each number from 1 to 10 corresponds to 10-dimensional vector of 0s and 1 with all of its elements set to 0 except the position of the number. Thus, 1 corresponds to vector (1 0 0 0 0 0 0 0 0 0), 2 corresponds to vector (0 1 0 0 0 0 0 0 0 0) and so on until 10 that corresponds to (0 0 0 0 0 0 0 0 0 1). Then given some vector y like above (where each element is a number from 1 to 10) we want to produce series of vectors that correspond to elements in y.

If you never worked with Octave before then solution may amaze you, if you did then this might be your normal routine:

A = eye(10)

A =

Diagonal Matrix

   1   0   0   0   0   0   0   0   0   0
   0   1   0   0   0   0   0   0   0   0
   0   0   1   0   0   0   0   0   0   0
   0   0   0   1   0   0   0   0   0   0
   0   0   0   0   1   0   0   0   0   0
   0   0   0   0   0   1   0   0   0   0
   0   0   0   0   0   0   1   0   0   0
   0   0   0   0   0   0   0   1   0   0
   0   0   0   0   0   0   0   0   1   0
   0   0   0   0   0   0   0   0   0   1

result = A(:, y);

What just happened: first we created an identity matrix A of size 10 - note that it consists of exactly 10 vectors we are mapping to and each is in right column position. Now, we just plug our original vector y into column index of matrix A. This will extract elements from A: all for rows and precisely right columns. Thus A(:, 2) gives us matrix which is 2nd column of A, A(:, 2:4) gives us matrix with columns of A from 2 to 4, same is accomplished with A(:, [2:4]), and A(:, [1 4 9]) selects 1st, 4th and 9th columns of A. Finally, we can plug an arbitrary vector in column index - in our case vector y just what we need and result becomes 10 by 100 matrix where each column corresponds to element of y.

As unconventional as it may sound I keep thinking of this solution in terms of ranges when indexing arrays. Indexing array could be via an integer, range, or array of numbers. The latter is just a vector and that is all to it.