tag:blogger.com,1999:blog-79784759844872386722014-10-03T00:41:18.211-07:00JIT OctaveThe development of a JIT compiler for GNU OctaveMax Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-7978475984487238672.post-48179686740933078222012-11-05T00:10:00.001-08:002012-11-05T00:10:59.273-08:00JIT, Debugging, and InterruptsI finally found some time to work on Octave last weekend. There has been some talk on the mailing list recently about releasing a new version of Octave, so I figured I should clean up a few loose ends in JIT.<br /><br /><h3>Breakpoints</h3><div>Up until now JIT has been skipping breakpoints. While there are several issues with supporting breakpoints in JIT, the biggest one is that the Octave debugger allows for the execution of arbitrary statements. Arbitrary statement execution is a very powerful and useful feature of the Octave debugger, but it allows users to change the type of a variable in the middle of code execution.<br /><br />JIT gets its performance improvement by making assumptions about variable types, so entering debug mode means we need to exit JIT. I took the simple way out here and do not start JIT if there are any breakpoints in the code (see <a href="http://hg.savannah.gnu.org/hgweb/octave/file/44272909d926/libinterp/interp-core/pt-jit.cc#l1941">hg</a>).<br /><br /></div><h3>Interrupts</h3><div>In Octave if you hit control-c Octave will stop execution and return to the Octave prompt (unless debug on interrupt is set). The interpreter does this by calling <span style="font-family: Courier New, Courier, monospace;">octave_quit</span>, which checks the interrupt state and throws an <span style="font-family: Courier New, Courier, monospace;">octave_interrupt_exception</span> if an interrupt has occured.</div><div><br /></div><div>Ideally, to support interrupts in JIT a call to <span style="font-family: Courier New, Courier, monospace;">octave_quit</span><span style="font-family: inherit;"> should be inserted once per loop iteration</span>. Unfortunately, it is not that simple. After an interrupt occurs the current variable values need to be reported to the interpreter. For example,<br /><pre>i = 0;<br />while 1<br /> i += 1;<br />endwhile</pre>If the user interrupts the loop the interpreter needs to save the current value of <span style="font-family: Courier New, Courier, monospace;">i</span>. This means JIT needs a way to catch and rethrow the <span style="font-family: Courier New, Courier, monospace;">octave_interrupt_exception</span>. While LLVM does have a way of handling exceptions, the infrastructure in Octave does not yet exist to support the LLVM feature.<br /><br />Instead, I inserted a check to <span style="font-family: Courier New, Courier, monospace;">octave_interrupt_state</span>. If <span style="font-family: Courier New, Courier, monospace;">octave_interrupt_state</span> is greater than 0, we need to exit to the Octave prompt. I reused the code for checking <span style="font-family: Courier New, Courier, monospace;">error_state</span><span style="font-family: inherit;"> to achieve this.</span><br /><br />Now that JIT handles interrupts and breakpoints in a manner consistent with the interpreter, I can't think of ways in which JIT and the interpreter differ (besides speed). The amount of code which JIT can compile is still fairly limited. Hopefully, I will get some time over winter break to make it easier to extend JIT and improve what JIT can compile. In its current state JIT should be ready to include as an experimental feature in the next Octave release.</div>Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-85499029415371057212012-08-18T20:45:00.001-07:002012-08-18T20:45:38.047-07:00GSoC ReportI have just finished writing my Google Summer of Code <a href="https://sites.google.com/site/2bass2/report.pdf?attredirects=0&d=1">final report</a>.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com2tag:blogger.com,1999:blog-7978475984487238672.post-80956783242750359722012-08-07T10:46:00.003-07:002012-08-07T10:46:38.681-07:00Multidimensional indexing and endThis week I added support for multidimensional matrix indexing and using end in jit. Support for the end keyword is interesting. Take the following for example,<br /><pre>y = A(1, sin (end));</pre>From that line alone, it is not clear what end refers to. If sin is a matrix, then end will be the end of sin. If sin is a function, then end will be the end of A.<br /><br />I have solved this problem by keeping track of the full context information for each end. Lets take a look at a slightly more complicated example<br /><pre>y = A(1, 2, B(sin (end), 5));</pre>then during type inference our context might look something like this<br /><pre>type identifier index count<br />function sin 0 1<br />matrix B 0 2<br />matrix A 2 4</pre>In this context end then refers to the end of matrix B at index 0.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-69322607959397244222012-07-27T10:49:00.000-07:002012-07-27T10:49:30.472-07:00OctConf Reflection and Project PlanI had a great time at OctConf 2012. There were a lot of interesting people there, and it is nice to be able to put faces to names. I definitively hope I will be able to make it next year.<br /><br />I recently realized that there are only two weeks left before the GSoC suggested pencils down date (8/13/2012). In the remaining time I plan on focusing my effort on better matrix support and supporting more builtin functions. I should be able to make significant progress on this in the remaining two weeks.<br /><br />After GSoC is done, I plan on working on compiling user functions and function handles. I think adding support for user functions in JIT is important, but I'm not sure if I will be able to complete it in two weeks.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-84873599656120380722012-07-03T15:40:00.001-07:002012-07-03T15:40:32.464-07:00Comparison of JIT with Oct filesIn my last <a href="http://jit-octave.blogspot.com/2012/06/realistic-test.html">post</a> I tested the Octave JIT compiler on a problem presented on the <a href="https://mailman.cae.wisc.edu/pipermail/help-octave/2012-June/052642.html">mailing list</a>. I got a request for a comparison with oct files. I think this is an interesting comparison, because ideally the JIT compiler should reduce the need to rewrite Octave scripts as oct files.<br /><br /><h4> Oct file</h4><div>The oct file is mostly equivalent to the loopy version in my previous post.</div><pre>#include <octave/oct.h><br />#include <octave/parse.h><br /><br />DEFUN_DLD (oct_loopy, args, , "TODO")<br />{<br /> feval ("tic");<br /><br /> octave_value ret;<br /> int nargin = args.length ();<br /> if (nargin != 2)<br /> print_usage ();<br /> else<br /> {<br /> NDArray data = args(0).array_value ();<br /> octave_idx_type nconsec;<br /> nconsec = static_cast<octave_idx_type> (args(1).double_value ());<br /><br /> if (!error_state)<br /> {<br /> double *vec = data.fortran_vec ();<br /> octave_idx_type counter = 0;<br /> octave_idx_type n = data.nelem ();<br /> for (octave_idx_type i = 0; i < n; ++i)<br /> {<br /> if (vec[i])<br /> ++counter;<br /> else<br /> {<br /> if (counter > 0 && counter < nconsec)<br /> std::fill (vec + i - counter, vec + i, 0);<br /><br /> counter = 0;<br /> }<br /> }<br /><br /> if (counter > 0 && counter < nconsec)<br /> std::fill (vec + n - counter, vec + n, 0);<br /><br /> ret = octave_value (data);<br /> }<br /> }<br /><br /> feval ("toc");<br /> return ret;<br />}</pre><pre><br /></pre><h4> Results</h4><div>I ran each test five times, taking the lowest time. I have also separated out the compile/link time from the run time. For JIT, compile time was determined by running the function twice, and subtracting the first run time from the second run time. The compile time for the oct file was determined by timing mkoctfile. The initial parameters were a random vector, A, of size 1,000,000 and a K = 3.<br /><br /><table border="2" cellpadding="3" cellspacing="3"><tbody><tr><th></th><th>Compile time</th><th>Run time</th></tr><tr><th>JIT</th><td>14ms</td><td>21ms</td></tr><tr><th>OCT</th><td>2400ms</td><td>3.3ms</td></tr></tbody></table></div><div><br />When using JIT, the compile time is part of the run time for the first execution of the loop. This means that for this example, JIT is currently about 10 times slower than the oct file. However, if we were to execute the function 50 times on 1,000,000 element vectors, then JIT would be 6 times slower.<br /><br />After looking at the assembly, it looks like JIT runs into issues with checks for matrix index validity and that loop variables are doubles (in loops like `for ii=1:5' ii is a double). It should be possible to fix these issues in JIT, but it will result in a larger compile time.</div>Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com2tag:blogger.com,1999:blog-7978475984487238672.post-78123595142623399292012-06-28T22:30:00.002-07:002012-06-29T15:52:02.738-07:00A Realistic TestThere was an interesting post on the mailing list (<a href="https://mailman.cae.wisc.edu/pipermail/help-octave/2012-June/052642.html">https://mailman.cae.wisc.edu/pipermail/help-octave/2012-June/052642.html</a>). The problem is, given some logical array, A = [1 0 0 1 0 0 1 1 1 ...], and minimum length of consecutive ones, K, all sequences of ones less than K should be filtered out.<br /><br />The exciting part for me is that the JIT compiler is currently far enough along to compile the loopy solution.<br /><br /><h4> Input generation</h4>I used a simple script to generate some random input data. A double matrix is used, because JIT does not yet work for logical matrices.<br /><pre>function result = gen_test (n)<br /> result = double (rand (1, n) > .01);<br />endfunction</pre><h4> Vectorized</h4>Vectorized code (based off of code from the mailing list)<br /><pre>function z = vectorized (A, K)<br /> tic;<br /> temp = ones (1, K);<br /> z = conv (A, temp);<br /> z = z > K-1;<br /> z = conv (z, temp);<br /> z = z(K:end-K+1);<br /> z = z >= 1;<br /> toc;<br />endfunction</pre><h4> Loopy</h4><div>I didn't do anything fancy here.</div><pre>function z = loopy (A, K)<br /> tic;</pre><pre> z = A;<br /> n = numel (A);<br /> counter = 0;<br /> for ii=1:n<br /> if z(ii)<br /> counter = counter + 1;<br /> else<br /> if counter > 0 && counter < K<br /> z(ii-counter:ii-1) = 0;<br /> endif<br /> counter = 0;<br /> endif<br /> endfor<br /><br /> if counter > 0 && counter < K<br /> z(end-counter+1:end) = 0;<br /> endif<br /> toc;<br />endfunction<br /></pre><h4> Results</h4>These numbers were taken from a AMD FX(tm)-4100 Quad-Core Processor with 8 GB of RAM. I just ran each test once. For each test, the number of elements in A was 1,000,000.<br /><br /><table border="2" cellpadding="3" cellspacing="3"><tbody><tr><th></th><th>Vectorized</th><th>Loopy JIT</th><th>Loopy JIT (no overhead)</th><th>Loopy (No JIT)</th></tr><tr><th>K=3</th><td>0.078s</td><td>0.059s</td><td>0.028s</td><td>5.27s</td></tr><tr><th>K=100</th><td>0.43s</td><td>0.063s</td><td>0.028s</td><td>5.66s</td></tr><tr><th>K=500</th><td>1.58s</td><td>0.082s</td><td>0.033s</td><td>5.73s</td></tr></tbody></table><br />These results are expected. The efficiency of the vectorized approach depends on K, while the loopy version does not. While JIT support is not complete or stable yet<sup>1</sup>, I think this shows that the current JIT implementation is able to handle a few practical examples, not just interpreter benchmarks.<br /><br />hg id for regular octave branch: 52cb71787cd1<br />hg id for jit branch: f649b66ef1af<br /><br /><span style="font-size: x-small;"><sup>1</sup> Occasionally I break functionality like assignment, addition, and function calls.</span>Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com6tag:blogger.com,1999:blog-7978475984487238672.post-20925777727959855312012-06-18T21:14:00.000-07:002012-06-18T21:14:48.684-07:00Matrix Support<h3> The Octave Language</h3>An interesting feature of the Octave language is that matrices are treated as values, not references. For example, the following code<br /><pre>A = [1 2 3];<br />B = A;<br />A(1) = 100;<br />disp (B(1));</pre>will display the value 1, not 100. Matrices are also passed by value in function calls.<br /><br /><h3> Interpreter Implementation</h3>Octave currently uses a really cool system to minimize the number of times matrices. It combines Copy on Write (COW) with reference counting to reduce the number of copies.<br /><br />The idea is to delay copies until a matrix is mutated. Then the copy is only made if the reference count is greater than one. Take the previous example,<br /><pre>A = [1 2 3];<br />B = A; # No copy<br />A(1) = 100; # Copy, as both A and B refer to [1 2 3]</pre>This algorithm has two nice properties. First, it is simple. Second, copies are only made when required.<br /><br /><h3> JIT</h3>I plan on using the same algorithm in JIT. This decision means that every assignment of the form<br /><pre>A(idx) = x; # Where idx and x are scalar</pre>requires a check to see if a copy is required. This actually is not such a big issue, because we already require a check to ensure the index is legal. In order to speed up the normal case, we can directly inline the condition check.<br /><br />Which leads us to the pseudo code for the inlined function.<br /><pre>if (isint (idx) && idx >= 1 && i < nelem (A) && count (A) == 1)<br /> A(idx) = x;<br />else<br /> A = handle_odd_assign (A, idx, x);</pre>Hopefully, this will allow the normal case to be quick, but still provide a correct implementation for more complicated cases.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-16070683954671089792012-06-11T13:53:00.000-07:002012-06-12T20:54:45.215-07:00Errors are annoying<i>Edit</i><br /><i>I realized that I need to qualify that this issue isn't a problem with Octave's implementation of error handling. Instead, the issue lies in the language, and the fact that almost every operation can cause an error.</i><br /><br />In Octave correctly handling errors is annoying. For an example, let's look at a simplified implementation of binary operators in the interpreter.<br /><pre>octave_value retval;<br />octave_value lhs = // compute lhs<br />if (! error_state && lhs.is_defined ())<br />{<br /> octave_value rhs = // compute rhs<br /> if (! error_state && rhs.is_defined ())<br /> retval = ::do_binary_op (op_type, lhs, rhs);<br />}<br />return retval;<br /></pre>Notice that after every operand is computed, the error state must be checked. Take the follow statement<br /><pre>a = (a + b) / c;</pre>When converted into a linear SSA form we get<br /><pre>block0:<br /> temp0 = a0 + b0<br /> check_error block1, done<br />block1:<br /> temp2 = temp0 / c0<br /> check_error block2, done<br />block2:<br /> c1 = temp2<br /> goto done<br />done:<br /> c2 = phi (block0: c0, block1: c0, block2: c1)<br /> # yield a0, b0, and c2 as results</pre>Notice that the phi merge function requires an entry for every operation. Furthermore, each variable that is defined must be represented in the done block. At the worse case, each statement could define a new variable leading to O(n^2) space complexity. (where n is the statement count in Octave)<br /><br />However, in practice the number of variable should be low compared to the number of lines. As a test, I automatically generated a 500 line program consisting of scalar addition statements like<br /><pre>i1 = i0 + i0;</pre>It looks like the compile time stays within reasonable bounds.<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-yp9SZuhdGiM/T9ZK8qSLNgI/AAAAAAAAADw/OIjngJgTbBk/s1600/foo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-yp9SZuhdGiM/T9ZK8qSLNgI/AAAAAAAAADw/OIjngJgTbBk/s400/foo.png" width="400" /></a></div>Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-21918179311510875312012-06-04T13:35:00.001-07:002012-06-04T13:35:08.436-07:00Progress?Progress last week was a little slower than I had hoped. Originally, I had planned on adding support for break and continue quickly, then moving onto other issues. However, I realized the hard way that my simple inline SSA construction algorithm was not very extensible. I have now implemented a textbook SSA construction algorithm.<br /><br />I also took another look at my modifications to configure.ac (with help from John W. Eaton and Jordi). The configure check was fixed and LLVM_CONFIG can be set to the location of llvm-config. This allows building with LLVM in a nonstandard path.<br /><br />Next, I think I will focus ensuring error and warning conditions are handled correctly (not talking about try/catch, just error and warning). In theory error checking is simple, for each operation that may error I need to check to see if it errored, then if it has exit the JIT function. There are a few annoying details. For example, I need to keep track of and return the values of each variable when an error occurs. Additionally, the ability to change warnings into errors adds further complexity.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-9246969108593055002012-05-28T09:23:00.000-07:002012-05-28T09:23:01.721-07:00Type InferenceI just finished implementing a type inference algorithm based off of FALCON [1]. I like this algorithm for several reasons<br /><ol><li>The efficiency is <i>O(n)</i>: This is important because we should minimize JIT overhead.</li><li>Decouples type inference from code generation: This will make it easier if we decide to switch from LLVM to a GNU equivalent.</li><li>Type bounds are good: The algorithm is able to assign different types to the same variable when the variable is used in different contexts.</li></ol>The algorithm works by introducing a type lattice (also similar to the MaJIC and McVM MATLAB JIT compilers[2][3]). For example, the simple lattice I am currently using<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-k9JAiN9ojAM/T8OCabaexHI/AAAAAAAAADA/oRDF_eWFCJs/s1600/lattice.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="http://1.bp.blogspot.com/-k9JAiN9ojAM/T8OCabaexHI/AAAAAAAAADA/oRDF_eWFCJs/s320/lattice.png" width="320" /></a></div><div>Each variable in the <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA</a> is then assigned a type from our lattice. For example, in the code<br /><pre>b = 1;<br />for i=0:5<br /> temp = 5;<br /> b = b + temp;<br /><br /> temp = [5 5 5];<br /> b = b + temp;<br />endfor<br /></pre>The algorithm is able to differentiate between the two occurrences of <i>temp</i>. The algorithm will also realizes that <i>b</i> is a matrix inside of the loop (once matrices are added to type type lattice). Then before the start of the loop, <i>b</i>, will be converted to a single element matrix. This means we can generate matrix addition operations inside of the loop.<br /><br />[1] L. D. Rose and D. Padua. Techniques for the Translation of MATLAB Programs into Fortran 90. ACM Transactions on Programming Languages and Systems, 21(2):286–323, Mar. 1999. <br />[2]George Almási and David Padua. 2002. MaJIC: compiling MATLAB for speed and responsiveness. SIGPLAN Not. 37, 5 (May 2002), 294-303.<br />[3] <a href="http://www.sable.mcgill.ca/mclab/mcvm/mcvmthesis.pdf" style="color: #830000; text-decoration: none;">M.Sc. Thesis - <i>McVM: an Optimizing Virtual Machine for the MATLAB Programming Language</i></a></div>Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com0tag:blogger.com,1999:blog-7978475984487238672.post-74322830805473763842012-05-22T19:15:00.000-07:002012-05-23T10:22:54.093-07:00Initial WorkHi, I am Max Brister, a student working on JIT compilation in GNU Octave for google summer of code. My project proposal can be found <a href="https://google-melange.appspot.com/gsoc/project/google/gsoc2012/max_brister/23001">on melange</a>. In this blog I plan to explore implementation details, mark my progress, and present intermediary results.<br /><br />I have already implemented a simple proof of concept which shows some promise. The simple script<br /><pre>a = 1;<br />b = 1;</pre><pre>tic;<br />for i=1:10000<br /> for j=1:10000<br /> a = a + b;<br /> endfor<br />endfor<br />toc;</pre>executes on my computer in 0.178s using my jit branch, and in 253.124s using Octave 3.6.1. A speedup of 1422 is not bad.<br /><br />There is still quite a ways to go before the code is ready for users though. I need to take another look at type inference, ensure error cases are handled correctly, and the current implementation requires the inner loop bounds to be constant for type inference.<br /><br />You can view my current progress by checking out my code from<br /><pre>hg clone http://inversethought.com/hg/octave-max</pre>The specific revision this post refers to is cba58541954c.<br /><br />*edit*<br />I realized that I should mention that a speedup of 1422x is really a best cast speedup. Code which is already vectorized will see a much smaller speedup (if any). This is because Octave already efficiently implements vectorization.Max Bristerhttp://www.blogger.com/profile/01065466254954513693noreply@blogger.com8