Thursday, June 28, 2012

A Realistic Test

There was an interesting post on the mailing list ( The problem is, given some logical array, A = [1 0 0 1 0 0 1 1 1 ...], and minimum length of consecutive ones, K, all sequences of ones less than K should be filtered out.

The exciting part for me is that the JIT compiler is currently far enough along to compile the loopy solution.

Input generation

I used a simple script to generate some random input data. A double matrix is used, because JIT does not yet work for logical matrices.
function result = gen_test (n)
  result = double (rand (1, n) > .01);


Vectorized code (based off of code from the mailing list)
function z = vectorized (A, K)
  temp = ones (1, K);
  z = conv (A, temp);
  z = z > K-1;
  z = conv (z, temp);
  z = z(K:end-K+1);
  z = z >= 1;


I didn't do anything fancy here.
function z = loopy (A, K)
  z = A;
  n = numel (A);
  counter = 0;
  for ii=1:n
    if z(ii)
      counter = counter + 1;
      if counter > 0 && counter < K
        z(ii-counter:ii-1) = 0;
      counter = 0;

  if counter > 0 && counter < K
    z(end-counter+1:end) = 0;


These numbers were taken from a AMD FX(tm)-4100 Quad-Core Processor with 8 GB of RAM. I just ran each test once. For each test, the number of elements in A was 1,000,000.

Vectorized Loopy JIT Loopy JIT (no overhead) Loopy (No JIT)
K=3 0.078s 0.059s 0.028s 5.27s
K=100 0.43s 0.063s 0.028s 5.66s
K=500 1.58s 0.082s 0.033s 5.73s

These results are expected. The efficiency of the vectorized approach depends on K, while the loopy version does not. While JIT support is not complete or stable yet1, I think this shows that the current JIT implementation is able to handle a few practical examples, not just interpreter benchmarks.

hg id for regular octave branch: 52cb71787cd1
hg id for jit branch: f649b66ef1af

1 Occasionally I break functionality like assignment, addition, and function calls.

Monday, June 18, 2012

Matrix Support

The Octave Language

An interesting feature of the Octave language is that matrices are treated as values, not references. For example, the following code
A = [1 2 3];
B = A;
A(1) = 100;
disp (B(1));
will display the value 1, not 100. Matrices are also passed by value in function calls.

Interpreter Implementation

Octave currently uses a really cool system to minimize the number of times matrices. It combines Copy on Write (COW) with reference counting to reduce the number of copies.

The idea is to delay copies until a matrix is mutated. Then the copy is only made if the reference count is greater than one. Take the previous example,
A = [1 2 3];
B = A; # No copy
A(1) = 100; # Copy, as both A and B refer to [1 2 3]
This algorithm has two nice properties. First, it is simple. Second, copies are only made when required.


I plan on using the same algorithm in JIT. This decision means that every assignment of the form
A(idx) = x; # Where idx and x are scalar
requires a check to see if a copy is required. This actually is not such a big issue, because we already require a check to ensure the index is legal. In order to speed up the normal case, we can directly inline the condition check.

Which leads us to the pseudo code for the inlined function.
if (isint (idx) && idx >= 1 && i < nelem (A) && count (A) == 1)
  A(idx) = x;
  A = handle_odd_assign (A, idx, x);
Hopefully, this will allow the normal case to be quick, but still provide a correct implementation for more complicated cases.

Monday, June 11, 2012

Errors are annoying

I realized that I need to qualify that this issue isn't a problem with Octave's implementation of error handling. Instead, the issue lies in the language, and the fact that almost every operation can cause an error.

In Octave correctly handling errors is annoying. For an example, let's look at a simplified implementation of binary operators in the interpreter.
octave_value retval;
octave_value lhs = // compute lhs
if (! error_state && lhs.is_defined ())
  octave_value rhs = // compute rhs
  if (! error_state && rhs.is_defined ())
    retval = ::do_binary_op (op_type, lhs, rhs);
return retval;
Notice that after every operand is computed, the error state must be checked. Take the follow statement
a = (a + b) / c;
When converted into a linear SSA form we get
  temp0 = a0 + b0
  check_error block1, done
  temp2 = temp0 / c0
  check_error block2, done
  c1 = temp2
  goto done
  c2 = phi (block0: c0, block1: c0, block2: c1)
  # yield a0, b0, and c2 as results
Notice that the phi merge function requires an entry for every operation. Furthermore, each variable that is defined must be represented in the done block. At the worse case, each statement could define a new variable leading to O(n^2) space complexity. (where n is the statement count in Octave)

However, in practice the number of variable should be low compared to the number of lines. As a test, I automatically generated a 500 line program consisting of scalar addition statements like
i1 = i0 + i0;
It looks like the compile time stays within reasonable bounds.

Monday, June 4, 2012


Progress last week was a little slower than I had hoped. Originally, I had planned on adding support for break and continue quickly, then moving onto other issues. However, I realized the hard way that my simple inline SSA construction algorithm was not very extensible. I have now implemented a textbook SSA construction algorithm.

I also took another look at my modifications to (with help from John W. Eaton and Jordi). The configure check was fixed and LLVM_CONFIG can be set to the location of llvm-config. This allows building with LLVM in a nonstandard path.

Next, I think I will focus ensuring error and warning conditions are handled correctly (not talking about try/catch, just error and warning). In theory error checking is simple, for each operation that may error I need to check to see if it errored, then if it has exit the JIT function. There are a few annoying details. For example, I need to keep track of and return the values of each variable when an error occurs. Additionally, the ability to change warnings into errors adds further complexity.