To answer the second part of your question, a Gaussian blur is simply the a 3-d gaussian surface applied as a convolution kernel over the image. Wikipedia has a great reference on the algorithm itself, but basically, you take the values of a Gaussian curve and convert that into a square matrix, and multiply it by every single pixel in your image, e.g.:
Kernel:
[0 1 2 0 0
1 4 6 4 1 X Iterate over every single pixel in the image
2 6 10 6 2
1 4 6 4 1
0 1 2 1 0]
(Note that this is just a sample kernel, there are very specific eqns which, depending on your Gaussian variables, you'll get different results)
To answer the performance part of your question, the overall speed of this algorithm would depend on a few things, assuming a constant sized image. Lets say the image is NxM pixels, and the convolution kernel is PxP pixels. You're going to have to do PPN*M operations. The greater P, the more operations you're going to have to do for a given image. You can get crafty with the algorithm you use here, doing very specific row or columnar based math.
Implementation is also very important. If you want to be extremely efficient, you'll probably want to use the most advanced instructions that your architecture offers. If you're using an Intel x86 chip, you'll probably want to look at getting a license for Intel performance primitives (IPP) and calling those instructions directly. IIRC, OpenCV does make use of IPP when its available...
You could also do something very smart and work with all scaled integers if the floating point performance on your given architecture is poor. This would probably speed things up a bit, but I would look at other options first before going down this road.