Linear algebra algorithms can be written in terms of standard matrix-vector operations. This operations could be optimized for a particular hardware and thus one can increase performance by using the optimized BLAS libraries. In practice this means that it is **must** to use an optimized BLAS in your work. A nice introduction to BLAS is in Wikipedia

http://en.wikipedia.org/wiki/BLAS

A quick overview of available functions is available in the LAPACK User Guide

http://www.netlib.org/lapack/lug/node145.html

On Netlib there is the reference BLAS implementation

but it is slow. Its goal just to demonstrate BLAS functions. This could be a quick solution for the start when you do not have an optimized BLAS library yet or in the case when you have problems with it. The BLAS interface was originally developed in Fortran but the C interface is also available

http://www.netlib.org/blas/blast-forum/cblas.tgz

Usually you do not need this code as well, as it is already included in the optimized BLAS library.

## Optimized BLAS libraries

I have been working for quite awhile with ATLAS 3.6

http://math-atlas.sourceforge.net/

It is free but you have to compile it. This could be a good exercise to test your skills in software engineering.

Currently I am using Intel MKL

http://software.intel.com/en-us/intel-mkl/

It is good but it is a commercial product and costs some money.

I have also once tried AMD AMCL

http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx

With difference to Intel MKL, it is free. Well, you need to sign up to download it and if you want to distribute it with your code, you have to fill the license agreement.

Another popular optimized BLAS library is Goto BLAS

http://www.tacc.utexas.edu/tacc-projects/gotoblas2/

## See also:

Matrix Multiplication

Solving System of Linear Equations

RSS