Tracing GMP/MPFR applications using DTrace

I recently dug out some old half finished work about using DTrace to study numerical algorithms complexity and thought that it would make a nice post. Although the technique of tracing function calls to identify an algorithm complexity class empirically is not new, I've never seen it done using DTrace. In this entry I show how, by using DTrace, to analyze algorithms written using MPFR (or GMP). I also included an example application for a simple well-known algorithm acting as a proof of concept. After all, a blog post about numerical computing without some graphs would be a shame.

When looking at an algorithm performance two measures are of prime interests. The operations counts and execution time. DTrace enables us to measure both of those with a rather high degree of granularity. It is for example possible to measure the number of numerical operations required to perform a task as well as execution time statistics broken down by the type of operations. However, like with everything else the probes effect should be taken into account as well as a some pitfalls discussed later in this post.

DTrace

DTrace is a dynamic tracing framework initially developed for Solaris enabling users to observe most aspects of a running system using tracing probes. These probes are dynamically enabled at run time providing no runtime impact when de-activated making DTrace a very attractive tool to provide deep insight of a production environment. For more information about DTrace see the end of this entry for a selection of external links.

The PID provider

The simplest and least intrusive way of tracing calls in a user application is by using the PID probes. The pid provider allows us to trace an arbitrary function's entry and return points as well as individual instructions within a user process or it's shared libraries. The prime advantage of the pid provider is that it does not require modification or recompilation of the target binaries. As such, an untraced program will behave normally with no performance impact, moreover, untraced sections of the application won't be affected by DTrace.

For example let's look at a simple PID probe tracing the entry point of the mpfr_add() function :

pid123:a.out:mpfr_add:entry

Here it is assumed that the function mpfr_add is located the main binary of a process with PID 123. Let's now take a closer look at the four parts of a PID probe.

Provider : "pid123:a.out:mpfr_add:entry

The first part of the probe description is the provider, in this example pid123. In the general case the provider's name is pid followed directly without any spaces by the PID of the process of interest. In practise, the $target variable is used instead of the numerical PID. At runtime, DTrace is going to replace $target with the PID of the process specified to DTrace with the -p or -c option. Using this trick, the previous example would become :

pid$target:a.out:mpfr_add:entry

Failure to provide a target to DTrace when using the $target method or specifying an inexistent PID will result in an “invalid probe specifier” error message meaning that no matching probe was found.

Module : "pid123:a.out:mpfr_add:entry"

For the pid provider, the module field specifies in which binary object the function to be traced reside. These objects include the process main binary and any of it's loaded shared libraries. In the case of the main binary it can be referenced either by the process name, e.g. “bash” or by the generic name a.out. The later is preferred as it will work even if the name of the target process change. Shared libraries can be specified by their name only and optionally any of the suffixes normally found on a shared library. For example in case of the MPFR shared library all of the following are equivalent.

pid123:libmpfr.so.1.2.0:mpfr_add:entry
pid123:libmpfr.so.1:mpfr_add:entry
pid123:libmpfr.so:mpfr_add:entry
pid123:libmpfr:mpfr_add:entry

Function : "pid123:a.out:mpfr_add:entry"

Intuitively, the function part of the probe definition is the name of the specific function to be traced. It must correspond exactly to the name inside the target binary. Be aware that some functions of MPFR also have a macro version available. These functions, while they will appear in the shared library, will most of the time never be called.

Name : "pid123:a.out:mpfr_add:entry"

The fourth and final part of the probe description is the probe name. For the pid provider three choices exist. The two more common are the entry and return probes. These two probes trace respectively the entry or return point of a function. The probe name can also be replaced by a number, in which case it represents an offset in byte from the start of the function. The last case is useful, for example, if a function is known to return early in some situations and we do not want to include those occurrences in our results.

Example

To demonstrate probe enumeration and basic tracing using the pid provider we will use the following example program ex1.c. This program simply initialize three mpfr_t variables and perform three mpfr_add() operations with a getchar() call to pause the program before the operations of interest. Note that generally mpfr_t variables should be cleared using the mpfr_clear() function when they are no longer needed. In this case, the process terminates soon after, thus clearing the allocated memory is not required.

#include  
#include "mpfr.h" 

int main() 
{ 
  printf( "Program start!\n" ); 

  mpfr_t op1, op2, op3; 

  mpfr_init_set_d( op1, 1.12, GMP_RNDN ); 
  mpfr_init_set_d( op2, 2.83, GMP_RNDN ); 
  mpfr_init_set_d( op3, 4.37, GMP_RNDN ); 

  printf( "Press any key to continue\n" ); 
  getchar(); 

  mpfr_add( op3, op1, op2, GMP_RNDN ); 
  mpfr_add( op1, op2, op3, GMP_RNDN ); 
  mpfr_add( op2, op3, op1, GMP_RNDN ); 

  return 0; 
}

Assuming we have a file named ex1.c containing the previous code it can be compiled and linked manually by invoking the following commands :

$ gcc -c -m32 -I/opt/csw/include ex1.c
$ gcc -o ex1 -m32 -L/opt/csw/lib -R/opt/csw/lib -lmpfr -lgmp ex1.o

The first command will compile the file ex1.c while linking is done in the second step. We must take care to change the /opt/csw/include and /opt/csw/lib directories when not using the Blastwave distribution of MPFR and GMP.

Tracing the calls to mpfr_add() is achieved by the command :

# dtrace -n 'pid$target:libmpfr.so:mpfr_add:entry' -c ./ex1

Running DTrace requires additional privileges so it should be run by an account with the necessary privileges or with pfexec (Similar to sudo for Linux users). The output should look similar to this :

# dtrace -n 'pid$target:libmpfr.so:mpfr_add:entry' -c ./ex1 
dtrace: description 'pid$target:libmpfr.so:mpfr_add:entry' matched 1 probe 
Program start! 
Press any key to continue 

dtrace: pid 1154 has exited 
CPU     ID                    FUNCTION:NAME 
  0  64780                   mpfr_add:entry 
  0  64780                   mpfr_add:entry 
  0  64780                   mpfr_add:entry 

Optionally, it is possible to invoke DTrace on an already running process. For example, if the same program ex1 is currently paused, waiting for user input with PID 1234 the previous DTrace command line can be modified to :

dtrace -n 'pid$target:libmpfr.so:mpfr_add:entry' -p 1234

This is but one method of tracing function calls to MPFR in a user process. Another method not discussed here would to add user made probes to the MPFR code, those probes are under a provider named User-level Statically Defined Tracing (USDT). However, they require non-trivial modifications to the code and are not discussed in this post. It is also possible to do some really funky stuff like defining probes in an interposition library if modification to the library's code is impractical.

D scripts for tracing MPFR applications

The simple command line presented earlier, being a DTrace one-liner is a bit limited in functionality. The full power of DTrace really happens when using custom scripts named D-scripts. A complete discussion of the features of D-scripts is beyond this post and could fill quite a few blog posts. Excellent tutorials exists on the subject, see some recommended ones at the end of this entry. I will show here how to use DTrace to measure operation count and execution time from D-scripts as well as an important issue to note when tracing MPFR applications.

Bellow is a simple script designed to trace calls to the mpfr_mul() function. If run on an MPFR application it will count the number of time mpfr_mul was called and also display a message when this occur.

#!/usr/sbin/dtrace -s 

#pragma D option quiet 

BEGIN 
{ 
  printf( "Ready! Press CTRL-C to abort.\n" ); 
} 

pid$target:libmpfr.so:mpfr_mul:entry 
{ 
  @add[probefunc] = count(); 
  printf( "%s called!\n", probefunc ); 
}

Before we try this script a few things should be noted about it. First as described in the various guides DTrace can be used as a shell script interpreter. This is the first line starting with the well known #!, it is optional if you run DTrace explicitly but does no harm otherwise. Next, I specified the D option quiet pragma that simply tells DTrace not to output anything unless we explicitly use an output action like trace or printf. While not entirely necessary the script also use the BEGIN probe in order to print a message once tracing started. This practise is very useful, otherwise there is no simple way to know tracing has started. Finally, one pid probe is activated in the script, tracing the mpfr_mul function. Upon firing it will count the number of time it fired and emit a message when this happens.

To test this script let's use a slightly modified version of ex1.

#include  
#include "mpfr.h" 

int main() 
{ 
  printf( "Program start!\n" ); 

  mpfr_t op1, op2, op3; 

  mpfr_set_default_prec( 2000 ); 
  mpfr_init_set_d( op1, 10.124, GMP_RNDN ); 
  mpfr_init_set_d( op2, 2.836, GMP_RNDN ); 
  mpfr_init_set_d( op3, 4.373, GMP_RNDN ); 

  printf( "Press any key to continue\n" ); 
  getchar(); 

  mpfr_mul( op3, op1, op2, GMP_RNDN ); 
  mpfr_mul( op1, op2, op3, GMP_RNDN ); 
  mpfr_mul( op2, op3, op1, GMP_RNDN ); 

  return 0; 
}

In ex2.c, MPFR additions has been replaced by multiplications and the precision got increased to 2000 bits. This new example can be compiled in exactly the same way as the previous one, also assuming we have our script in a file named ex2.d we ca try it out. First the ex2 program should be run preferably in another shell waiting for input, then DTrace can be invoked as follow :

# dtrace -s ex3.d -p `pgrep -n ex3`

The results will look like :

# ./ex3.d -p `pgrep -n ex3` 
Ready! Press CTRL-C to abort. 
mpfr_mul called! 
mpfr_mul called! 
mpfr_mul called! 
mpfr_mul called!
mpfr_mul called! 
mpfr_mul called! 

  mpfr_mul                                                          6

While the probe fired correctly, one quickly see that it fired six times instead of three. Also confirmed by the aggregation result of 6 at the last line. Checking the MPFR source code for mpfr_mul reveals that under certain conditions mpfr_mul can be called recursively. This situation is common when tracing function calls and can happen also when another function we are not interested in tracing calls some of the functions that are instrumented.

There is many different way to isolate the calls we are interested into. One of the solution is to implement a lock variable that is set at the entry point of mpfr_mul, preventing any further tracing until the function return. For this to guard against other function, every MPFR functions used in the application should be instrumented, or at least those who are known to call other MPFR functions. This is shown in the script ex2a.d also including mpfr_sub() to the traced function as example :

#!/usr/sbin/dtrace -s 

#pragma D option quiet 

BEGIN 
{ 
  printf( "Ready! Press CTRL-C to abort.\n" ); 
  self->lock = 0; 
} 

pid$target:libmpfr.so:mpfr_mul:entry, 
pid$target:libmpfr.so:mpfr_sub:entry 
/ self->lock == 0 / 
{ 
  @add[probefunc] = count(); 
  printf( "%s called!\n", probefunc ); 
} 

pid$target:libmpfr.so:mpfr_mul:entry, 
pid$target:libmpfr.so:mpfr_sub:entry 
{ 
  self->lock++; 
} 

pid$target:libmpfr.so:mpfr_mul:return, 
pid$target:libmpfr.so:mpfr_sub:return 
{ 
  self->lock--; 
} 

The predicate / self->lock == 0 / instruct DTrace to skip tracing when lock is more than 0. The use of self-> in front of the variable enables the script to work in a multi-threaded program and will use a unique instance of lock for each thread. It is also possible to instrument every functions of MPFR at once by using the following probe :

pid$target:libmpfr.so::entry

Which will match every functions in the libmpfr.so library and would probably take a bit longer to initiate tracing as well as slow down the target application. Either way, running the previous script would count the number of calls correctly.

Test case : Matrix multiplication

Now let's try it out on a very simple algorithm like matrix multiplication. I selected this algorithm for it's simplicity and constant operation count for a given input size. That way we know what the result should be. It is well known that the “classic” multiplication of two square matrices of size n requires n3 additions and n3 multiplications giving this algorithm an asymptotic complexity of O(n3). Obviously we can expect a cubic growth in operation count and total runtime. To verify the use of DTrace to analyze such a simple algorithm I used this tracing script.

#!/usr/sbin/dtrace -s 

#pragma D option quiet 

BEGIN 
{ 
  printf( "Ready! Press CTRL-C to abort.\n" ); 
  self->lock = 0; 
} 

pid$target:libmpfr.so:mpfr_mul:entry, 
pid$target:libmpfr.so:mpfr_add:entry 
/ self->lock == 0 / 
{ 
  @add[probefunc] = count(); 
  self->in = timestamp; 
} 

pid$target:libmpfr.so:mpfr_mul:entry 
{ 
  self->lock++; 
} 

pid$target:libmpfr.so:mpfr_mul:return 
{ 
  self->lock--; 
} 

pid$target:libmpfr.so:mpfr_mul:return, 
pid$target:libmpfr.so:mpfr_add:return 
/ self->lock == 0 / 
{ 
  @sum[probefunc] = sum(timestamp - self->in); 
} 

END 
{ 
  printf( "\n\nCount by operation type :" ); 
  printa(@add); 
  printf( "\nTotal time by operation type :" ); 
  printa(@sum); 
}

It is very simple on purpose to minimize the tracing impact on the running time. MPFR and GMP operations, while taking much longer than equivalent native operations are still impressively fast. As such, DTrace probes make a relatively big impact on total run time when using low precision. This must be taken into account when looking at results from DTrace analysis.

For this test I used code based on a toolbox currently in development for Scilab started as a GSOC 2009 project.

Now to the results since this entry is getting way too long. Plotting operations count for additions or multiplications gives the following result.


Operations count versus matrix size.
Operations count versus matrix size. (Click for full resolution.)

Unsurprisingly showing a cubic growth in the number of operations. Running time when using 100000 bits of precision for the elements shows more interesting results.


Runtime versus matrix size.
Runtime count versus matrix size. (Click for full resolution.)

Time taken doing additions and multiplications is shown as measured by DTrace. The yellow section is the dry run time without probe effect while the green portion is effectively the DTrace overhead being roughly a 25% increase in this case. This sadly makes DTrace unsuitable for operations profiling in many cases unless using large precision arithmetic (>100000). It can however be used for other measurements like operations count or to discover scaling and/or parallelism issues in advanced algorithms.

Well that's it, for now.

External links

GNU GMP Library : www.gmp.org
GNU MPFR Library : www.mpfr.org
Sun BigAdmin DTrace : GNU GMP Library : http://www.sun.com/bigadmin/content/dtrace/
Solaris Internals DTrace Topics : www.solarisinternals.com/wiki/index.php/DTrace_Topics
OpenSolaris DTrace Community : opensolaris.org/os/community/dtrace/

References and recommended readings

GNU MPFR User Manual : www.mpfr.org/mpfr-current/mpfr.html
Solaris Dynamic Tracing Guide : docs.sun.com/app/docs/doc/817-6223/6mlkidlf1?a=view

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.