Oct file
The oct file is mostly equivalent to the loopy version in my previous post.
#include <octave/oct.h> #include <octave/parse.h> DEFUN_DLD (oct_loopy, args, , "TODO") { feval ("tic"); octave_value ret; int nargin = args.length (); if (nargin != 2) print_usage (); else { NDArray data = args(0).array_value (); octave_idx_type nconsec; nconsec = static_cast<octave_idx_type> (args(1).double_value ()); if (!error_state) { double *vec = data.fortran_vec (); octave_idx_type counter = 0; octave_idx_type n = data.nelem (); for (octave_idx_type i = 0; i < n; ++i) { if (vec[i]) ++counter; else { if (counter > 0 && counter < nconsec) std::fill (vec + i - counter, vec + i, 0); counter = 0; } } if (counter > 0 && counter < nconsec) std::fill (vec + n - counter, vec + n, 0); ret = octave_value (data); } } feval ("toc"); return ret; }
Results
I ran each test five times, taking the lowest time. I have also separated out the compile/link time from the run time. For JIT, compile time was determined by running the function twice, and subtracting the first run time from the second run time. The compile time for the oct file was determined by timing mkoctfile. The initial parameters were a random vector, A, of size 1,000,000 and a K = 3.
Compile time | Run time | |
---|---|---|
JIT | 14ms | 21ms |
OCT | 2400ms | 3.3ms |
When using JIT, the compile time is part of the run time for the first execution of the loop. This means that for this example, JIT is currently about 10 times slower than the oct file. However, if we were to execute the function 50 times on 1,000,000 element vectors, then JIT would be 6 times slower.
After looking at the assembly, it looks like JIT runs into issues with checks for matrix index validity and that loop variables are doubles (in loops like `for ii=1:5' ii is a double). It should be possible to fix these issues in JIT, but it will result in a larger compile time.
"It should be possible to fix these issues in JIT, but it will result in a larger compile time."
ReplyDeleteCan you share any thoughts on the trade-off between JIT compile time and the performance of the generated code?
Will there be user-accessible configuration options for the JIT compiler? If so, maybe include an option for the user to pick a compiler optimization level?
Thanks!
For now, I'm not worrying too much about the trade-off. I would rather spend my time compiling more of the language.
DeleteI think giving the user control over the optimization level is a good idea, but we need multiple optimization levels first :)