Using decomp and solve from C++

The goal of this section is basically to repeat the rules from the section Using Fortran Subroutines from C++ with the example of the Fortran subroutines decomp and solve from the book Computer Methods for Mathematical Computations.

I will use a simple matrix class described in matrix.h (see also Class Matrix). The class is based on vector<double> and it stores a matrix column-wise as it is accepted in Fortran. The header also contains a simple timing class (define USECLOCK with Microsoft Visual C++). Let us look at the code in main.cc where the code determines the matrix dimension from the command line argument, then it initializes the matrix with random number, preserves the copy of matrices, as they will be destroyed by decomp and solve, then calls decomp and solve and print information about timing and the residual.

  • Fortran usually uses extra underscore, so the names of Fortran subroutines are decomp_ and solve_. Well, Intel Fortran on Windows does not follow this rule, so use nm (Using nm to troubleshoot linking problems) to make sure what names to use in your C++ code.
  • The Fortran function must be declared and we need to use extern C to prevent name mangling.
  • Fortran uses pointers in function calls, so the declaration must reflect this.
  • When one uses the vector<double> class from the standard library, function begin() not necessarily returns a pointer to double. The trick is to employ then &* to get the pointer to double.

To compile the code with gcc

$ g++ -c -s -O3 main.cc
$ gfortran -c -s -O3 decomp.f
$ gfortran -c -s -O3 solve.f
$ g++ main.o decomp.o solve.o -o main.exe

or use the supplied make file

$ make -f fmm_c++.make

After that

$ ./main 500
time to factor A(500,500) is 0.096 s
cond is 22349.3
time for back substitution is 1.16299e-07 s
residual is 5.7989e-15

Please note that timing is much slower as compared with functions lu_factor/lu_solve in SciPy. The main reason is that SciPy calls LAPACK functions which in turn use the optimized BLAS and this is the main point. If you need to achieve a good performance, you cannot afford not using the optimized BLAS. The table below compares results of main.cc and 02lu.py from Linear Solve in Python (NumPy and SciPy) on my notebook (time is in second).

Matrix dimension decomp solve lu_factor lu_solve
500 0.096 0.001 0.019 0.0005
1000 0.69 0.002 0.12 0.002
2000 9.3 0.007 0.82 0.01

Previous:

Overview
Linear Solve in Python (NumPy and SciPy)
Using decomp and solve from Fortran

Next:

Using LAPACK from C++


Comments are closed.