The goal of this section is basically to repeat the rules from the section Using Fortran Subroutines from C++ with the example of the Fortran subroutines decomp and solve from the book Computer Methods for Mathematical Computations.

I will use a simple matrix class described in matrix.h (see also Class Matrix). The class is based on `vector<double>`

and it stores a matrix column-wise as it is accepted in Fortran. The header also contains a simple timing class (define USECLOCK with Microsoft Visual C++). Let us look at the code in main.cc where the code determines the matrix dimension from the command line argument, then it initializes the matrix with random number, preserves the copy of matrices, as they will be destroyed by decomp and solve, then calls decomp and solve and print information about timing and the residual.

- Fortran usually uses extra underscore, so the names of Fortran subroutines are
`decomp_`

and`solve_`

. Well, Intel Fortran on Windows does not follow this rule, so use`nm`

(Using nm to troubleshoot linking problems) to make sure what names to use in your C++ code. - The Fortran function must be declared and we need to use
`extern C`

to prevent name mangling. - Fortran uses pointers in function calls, so the declaration must reflect this.
- When one uses the
`vector<double>`

class from the standard library, function`begin()`

not necessarily returns a pointer to double. The trick is to employ then`&*`

to get the pointer to double.

To compile the code with gcc

`$ g++ -c -s -O3 main.cc`

$ gfortran -c -s -O3 decomp.f

$ gfortran -c -s -O3 solve.f

$ g++ main.o decomp.o solve.o -o main.exe

or use the supplied make file

`$ make -f fmm_c++.make`

After that

`$ ./main 500`

time to factor A(500,500) is 0.096 s

cond is 22349.3

time for back substitution is 1.16299e-07 s

residual is 5.7989e-15

Please note that timing is much slower as compared with functions `lu_factor/lu_solve`

in SciPy. The main reason is that SciPy calls LAPACK functions which in turn use the optimized BLAS and this is the main point. If you need to achieve a good performance, you cannot afford not using the optimized BLAS. The table below compares results of main.cc and 02lu.py from Linear Solve in Python (NumPy and SciPy) on my notebook (time is in second).

Matrix dimension | decomp | solve | lu_factor | lu_solve |

500 | 0.096 | 0.001 | 0.019 | 0.0005 |

1000 | 0.69 | 0.002 | 0.12 | 0.002 |

2000 | 9.3 | 0.007 | 0.82 | 0.01 |

## Previous:

Overview

Linear Solve in Python (NumPy and SciPy)

Using decomp and solve from Fortran

RSS