ProgrammingAI: February 2012

2012/02/28

How to get the type of an object in Python

Sometimes we need to know the type of an object in Python. For instance, when you get an object from a method and you just don't know anymore if its a list, a dict, or other, how can you know the type of the object? To solve this, you can use either the type(object) or object.__class__ commands, like it is demonstrated bellow.

d1={1:'airplane', 2:'car'}
d2= ['airplane', 'car']
print type(d1)
print d1.__class__
print type(d2)
print d2.__class__

Results in:

<type 'dict'>
<type 'dict'>
<type 'list'>
<type 'list'>

2012/02/23

Linear Algebra in Python with numpy - getting ready to Machine Learning implementation

In this post I will present how it can be handled Linear Algebra in Python to implement Machine Learning algorithms. This will be important not only to represent the 'X' matrix and 'y' vector as the input and output, respectively, of the algorithms, but also to perform the vectorized implementation of the algorithms which turn out to present much more performance than the iterative alternatives.

For this we shall use the numpy Python library, particularly, its version 1.6.1.

Both for the representation of matrices and vectors we shall use the numpy array object which allow us to perform all needed operations. It follows a set of examples that cover all (I hope) the operations we may need to do perform in the implementation of Machine Learning algorithms in Python.

So firstly we must import the library:

>>> from numpy import *

By importing the library this way there is no need to use the prefix np when referring to a numpy object like array.

So then we create some matrices and a vector and print them to the screen:

>>> # Matrix M1 3x3 (3 rows, 3 columns)
>>> M1= array([[1,0,5],[2,5,3],[6,2,8]])
>>> print M1
[[1 0 5]
 [2 5 3]
 [6 2 8]]

>>> # Matrix M2 3x3 (3 rows, 3 columns)
>>> M2= array([[4,0,2],[2,5,8],[2,4,1]])
>>> print M2
[[4 0 2]
 [2 5 8]
 [2 4 1]]

>>> # Matrix M3 2x3 (2 rows, 3 colums)
>>> M3= array([[1,3,2],[4,0,1]])
>>> print M3
[[1 3 2]
 [4 0 1]]

>>> # Matrix M4 3x2 (3 rows, 2 colums)
>>> M4= array([[1,3],[0,1],[5,2]])
>>> print M4
[[1 3]
 [0 1]
 [5 2]]

>>> # Vector 1x3 (3-dimensional vector)
>>> v= array([[2,5,4]])
>>> print v
[[2 5 4]]

>>> # Transpose of the vector to convert it from a horizontal to a vertical vector
>>> v= v.transpose()
>>> print v
[[2]
 [5]
 [4]]

Notice that in line 29 we define the vector as a matrix of 1 row (getting an horizontal vector). This is necessary because the transpose only works well in an matrix. If we defined v as we define u bellow (notice that bellow we have only one'[' instead of two '[[' we would not be able to apply the transpose and get a vertical vector):

>>> u= array([2,5,4])
>>> print u
[2 5 4]

>>> print u.transpose()
[2 5 4]

And how can we create a matrix or a vector initializing all elements to 0?

>>> # Matrix (3 rows, 4 column) of float 0's
>>> print zeros((3,4))
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
>>> # Vector (6 rows, 1 column) of float 0's
>>> print zeros((6,1))
[[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]
 [ 0.]
 [ 0.]]

And to 1?

>>> # Matrix (4 rows, 3 column) of float 1's
>>> print ones((4,3))
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
>>> # Vector (1 row, 6 columns) of float 1's
>>> print ones((1,6))
[[ 1.  1.  1.  1.  1.  1.]]

Now that we have some matrices and a vector let's perform some operations on them:

>>> # Size of the matrix
>>> print "size:", M1.shape, "nr. of rows:", M1.shape[0], "nr. of columns:", M1.shape[1]
size: (3, 3) nr. of rows: 3 nr. of columns: 3

>>> # Access cells of a matrix 
>>> print "element in 1st row and 1st column of M1:", M1[0,0]
element in 1st row and 1st column of M1: 1

Now let's add two matrices which results in a new matrix where each element is the result of adding the correspondent elements in the original matrices. Both added matrices must have the same size (rows and columns):

>>> # Matrix addition
>>> print M1+M2
[[ 5  0  7]
 [ 4 10 11]
 [ 8  6  9]]

And now let's multiply a scalar and a matrix. The result is a matrix with the same dimension (rows and columns) where each cell is multiplied by the scalar:

>>> # Scalar multiplication
>>> print 3*M1
[[ 3  0 15]
 [ 6 15  9]
 [18  6 24]]

It follows the multiplication of a vector by a matrix. If we multiply A (nrOfRows-A, nrOfCols-A) and B (nrOfRows-B,nrOfCols-B), and without worring which is a vector and which is a matrix, the multiplication is only possible when, nrOfCols-A == nrOfRows-B. The result of the multiplication is a matrix with size (nrOfRows-A,nrOfCols-B)

>>> # Multiplication of a matrix with a vector
>>> print dot(M1,v)
[[22]
 [41]
 [54]]

To explain how the multiplication above works, see bellow:
$$\begin{bmatrix}
1&0&5\\
2&5&3\\
6&2&8
\end{bmatrix}*\begin{bmatrix}
2\\
5\\
4
\end{bmatrix}=\begin{bmatrix}
1*2+0*5+5*4\\
2*2+5*5+3*4\\
6*2+2*5+8*4
\end{bmatrix}=\begin{bmatrix}
22\\
41\\
54
\end{bmatrix}$$
It is also possible to multiply two vectors. What we get as a result depends on the dimensions of the multiplied vectors. If we multiply a 1 by 3 vector by a 3 by 1 vector we get a 1 by 1 matrix (which we can easily convert to a scalar) obtained by multiplying corresponding entries and then summing those products. If we multiply a 3 by 1 vector by a 1 by 3 vector we get a 3 by 3 matrix obtained by multiplying each cell in the first vector by each cell in the correspondent row of the second vector. Bellow we show two examples by multiplying a vector by its transpose.

>>> # Vector multiplication
>>> print dot(v.transpose(), v)
[[45]]
>>> print dot(v.transpose(), v)[0][0]
45
>>> print dot(v, v.transpose())
[[ 4 10  8]
 [10 25 20]
 [ 8 20 16]]

And now we will multiply two matrices. This is the general case of the above cases, and so, to multiply two matrices we need that the number of columns of the first matrix be equal to the number of rows of the second matrix. And as above the result shall be a matrix with size (nrOfRows-A,nrOfCols-B)

>>> # Matrix multiplication
>>> print dot(M3, M4)
[[11 10]
 [ 9 14]]

To create a Identity matrix is also pretty straightforward:

>>> # Identity matrix of 3*3
>>> print eye(3)
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

As it can be seen bellow, any matrix multiplied by the identity results in that same matrix. M1 must be a square matrix (same nr of rows and columns). If you remember from above M1 is a 3x3 square matrix:

>>>print "Checking identity: M1xI"
Checking identity: M1xI
>>>print dot(M1, eye(3))
[[ 1.  0.  5.]
 [ 2.  5.  3.]
 [ 6.  2.  8.]]
>>>print "Checking identity: IxM1"
Checking identity: IxM1
>>>print dot(eye(3), M1)
[[ 1.  0.  5.]
 [ 2.  5.  3.]
 [ 6.  2.  8.]]

Now let's see how to get the inverse of a matrix. By definition matrix*(inverse(matrix)) == (inverse(matrix))*matrix == identity. Also be aware that not all matrices have inverse... the rule of thumb is that matrices that are near to 0 may not have inverse.

>>> # Inverse of M1
>>> print linalg.inv(M1)
[[-0.35416667 -0.10416667  0.26041667]
 [-0.02083333  0.22916667 -0.07291667]
 [ 0.27083333  0.02083333 -0.05208333]]
>>> # Inverse of M1 multiplied by M1 - which results in the identity matrix
>>> print dot(linalg.inv(M1), M1)
[[  1.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   1.00000000e+00  -1.11022302e-16]
 [  5.55111512e-17   1.38777878e-17   1.00000000e+00]]
>>> # M1 multiplied by inverse of M1 - which results in the identity matrix
>>> print dot(M1, linalg.inv(M1))
[[  1.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   1.00000000e+00  -8.32667268e-17]
 [  0.00000000e+00   2.77555756e-17   1.00000000e+00]]

We have seen above for the case of a vector, but in any case, see in the following how you can get the transpose of a matrix.

>>> # Matrix transpose
>>> print M1.transpose()
[[1 2 6]
 [0 5 2]
 [5 3 8]]
>>> print M3.transpose()
[[1 4]
 [3 0]
 [2 1]]

And how can we get the maximum value present in a given matrix or vector and how can we do this by row or column of a matrix

>>> # Maximum value of all cells of a matrix
>>> print M1.max()
8
>>> print M2.max()
8
>>> print M3.max()
4
>>> print M4.max()
5
>>> # Maximum value of all cells of a vector
>>> print v.max()
5
>>> # Maximum value of each column of a matrix
>>> print M1.max(0)
[6 5 8]
>>> # Maximum value of each row of a matrix
>>> print M1.max(1)
[5 5 8]

Other task important to Machine Learning is to be able to concatenate. Bellow you can find some examples of it.

>>> # CONCATENATE rows of M1 with rows of M2 
>>> print concatenate((M1,M2))
[[1 0 5]
 [2 5 3]
 [6 2 8]
 [4 0 2]
 [2 5 8]
 [2 4 1]]
>>> # CONCATENATE columns of M1 with columns of M2
>>> print concatenate((M1,M2),1)
[[1 0 5 4 0 2]
 [2 5 3 2 5 8]
 [6 2 8 2 4 1]]
>>> # CONCATENATE a vector with a matrix
>>> print concatenate((v,M1),1)
[[2 1 0 5]
 [5 2 5 3]
 [4 6 2 8]]

2012/02/21

Machine learning - linear regression - part I - theory overview

Imagine that you have a data set composed by x and y values, where y is conditioned by x. A simple, and not so interesting example, could be if you have a data set where y= 2 * x:

(x,y)

(1,2)

(2,4)

(3,6)

(4,8)

Now the challenge is: if you just provide to the computer the data set of (x,y) pair values, but not the formula that these values follow, how can you teach the computer to automatically learn the relation between 'x' and 'y' so that when a 'x' value is provided to the computer he is able to automatically predict the correspondent 'y' value. This can be achieved by supervised Machine Learning algorithms like linear regression, logistic regression, neural networks and others. In a supervised learning problem we provide to the algorithm a training data set containing several examples and for each example we say what is the value of 'y' for a given 'x'.

In this post I'll present the linear regression algorithm which is a machine supervised learning algorithm. Here the regression term is used because the “answer” for each example is not discrete, but more like real-valued outputs.

As stated before the supervised learning algorithms that we will refer to will be: linear regression, logistic regression and neural networks. But for each there are some common terms and concepts that will briefly presented bellow:

Data set

Set of (x,y) tuples, where each tuple is called an example. E.g.:

(1,2)

(2,4)

(3,6)

(4,8)

Training data set:
Data set which is used for training purposes by a supervised machine learning algorithm.

Input "variables" / features

These are the input values of each example on the data set. It can be a single value 'x' or several 'x' values. In the latter case they are named x1, x2, …, xn . The input "variables" are also called features.

Output "variable" / Target "variable"

This is the output value of each example on the data set. It can also be called target "variable"

m= number of training examples (in the example above m = 4)
n= number of features (in the example above n = 1 because we have only 1 x variable per example)

(x,y)= one specific example

(x(i),y(i))= training example in the row i - ith row of the training set

If follows a block diagram explaining how the supervised learning algorithms work:

We provide a training set to the learning algorithm that will then compute an hypothesis function ‘h’ that given a new (x1, x2, …, xn) feature will estimate the 'y' value. So the hypothesis ‘h’ is a function that maps between x’s and y’s.

Hypothesis for the linear regression with multiple variables able also called linear regression with multiple features also called multivariate linear regression.

$$h_{\theta}(x)= \theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+...+\theta_{n}x_{n}$$

$$\theta_{i's}= parameters$$

For convenience of notation we define $$x_{0}=1$$ for all training examples $$(x_{0}^{i}= 1)$$

We can then have, for each training example, the variables/features and the parameter's represent by vectors. One vector containing the variables/features and other vector containing the parameter's:

$$\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2}\\ ...\\ x_{n}\\ \end{bmatrix} \epsilon \mathbb{R}^{n+1} \begin{bmatrix} \theta_{0}\\ \theta_{1}\\ \theta_{2}\\ ...\\ \theta_{n}\\ \end{bmatrix} \epsilon \mathbb{R}^{n+1}$$

We can re-write the hypothesis function as:

$$(1)h_{\theta}(x)=\theta_{0}x_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+...+\theta_{n}x_{n}=\sum_{j=0}^{n}\theta_{j}x_{j}$$ (as x0 = 1 this equation is similar to the above). Another way to write this equation is through linear algebra with vector multiplication: $$h_{\theta}(x)= \theta^{T}x$$ which is equivalent to:

$$(2) h_{\theta}(x)=\begin{bmatrix} \theta_{0}&\theta_{1}&\theta_{2}&...\theta_{n} \end{bmatrix} \begin{bmatrix} x_{0}\\ x_{1}\\ x_{2}\\ ...\\ x_{n}\\ \end{bmatrix}$$

If you don't know why the equation (1) and (2) represent exactly the same, you probably need a refreshment on Linear Algebra, to which I recommend you to visit the Khan Academy and search the Linear Algebra video tutorials.

So the learning algorithms goal is to learn the values of the parameters θ that make the hypothesis function ‘h’ best fit the training data, that is, the learning algorithms will choose the values for θ₀, θ₁, θ₂, ..., θ_n so that h_θ(x) is close to 'y' for our training examples (x,y).

So what we want is to minimise the difference between the estimated output h_θ(x) and the real output ‘y’, for all examples in the training set, by varying the parameters θ₀, θ₁, θ₂, ..., θ_n . So we want to minimise the error, or as usually done for regression problems, we want to minimise the squared error:

$$Cost function= J(\theta)= \frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^{2}$$$$Objective= \underset{\theta}{minimise} J(\theta)$$

So what the expression above means is: find me the values of θ₀, θ₁, θ₂, ..., θ_nso that the squared error cost function is minimised. The multiplication term 1/2m is used only for mathematical convenience.

In order to minimise the cost function we shall use an algorithm called gradient descent that is able to minimise arbitrary functions.

Gradient descent

Start with some θ₀, θ₁, θ₂, ..., θ_n(vector θ) - a common choice is starting with θ₀=0, θ₁=0, θ₂=0, ..., θ_n= 0
Change θ₀, θ₁, θ₂, ..., θ_nuntil we minimise the function J

Gradient descent algorithm

repeat until convergence {

$$\theta_{j}=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta)(for j=0, j=1, j=2, ..., j=n)$$

}

The α is a positive number called the learning rate and controls how big are the steps of changing θ₀, θ₁, θ₂, ..., θ_n. If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
In order to implement this algorithm it is necessary to update θ₀, θ₁, θ₂, ..., θ_n simultaneously. First the updated values for all j (for j=0, j=1, … j=n) are calculated:
$$temp0= \theta_{0}-\alpha\frac{\partial}{\partial\theta_{0}}J(\theta)$$
$$temp1= \theta_{1}-\alpha\frac{\partial}{\partial\theta_{1}}J(\theta)$$
$$temp2= \theta_{2}-\alpha\frac{\partial}{\partial\theta_{2}}J(\theta)$$
$$...$$
$$tempn= \theta_{n}-\alpha\frac{\partial}{\partial\theta_{n}}J(\theta)$$
and only after all values are calculated you can update the values of θj:

θ₀= temp0
θ₁= temp1
θ₂= temp2
θ_n= tempn

If after calculating temp0 we immediately updated θ₀, then when calculating temp1 we would be using the updated value of θ₀ for the derivative of the cost function (remember that the cost function includes all values of (θ)). That would be incorrect.

The gradient descent finds local optimum (minimum) and stays there.

Gradient descent algorithm with the derivative terms solved

repeat until convergence {

$$\theta_{j}=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta)=\theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x_{j}^{(i)}(for j=0, j=1, j=2, ..., j=n)$$

}

The cost function for linear regression will always be a “bowl” shaped function, or better put, a convex function. And this type of function does not have local optimum, only global optimum, which implies that the gradient descent applied to linear regression cost functions guarantees that the global minimum is always reached.

Feature scaling The idea here is to make all features (x1,x2,x3, ...) to be on a similar scale, that is, all features being on a similar range of values. The consequence is, as it can be proved mathematically, that the gradient descent will converge quicker. The goal is to have: $$-1\leq x_{i}\leq 1$$ Feature scaling with mean normalization

When doing feature scaling sometimes it is also performed mean normalization. Here the strategy is to make the features to have approximately zero mean by replacing x_i by x_i - μ_i(this should not be applied to x₀):
$$x_{i}=\frac{x_{i}-\mu_{i}}{S_{i}}$$, where:

μ_i= average value of x_iin the training set
S_i= range of values of x_i(meaning the maximum value - minimum value); or it can also be used the standard deviation of x_i.

This strategy will result in approximately: $$-0.5\leq x_{i}\leq 0.5$$

By applying feature scaling we can make gradient descent run much faster and converge in lot fewer iterations. When should the gradient descent algorithm be stopped? Manually: a plot should be done by having in the x-axis the number of iterations of the gradient descent algorithm and in the y-axis the cost:

The cost must always decrease after each iteration.

Automatically: it is possible to have an automatic convergence test - declare convergence if the cost J(θ) decreases by less than a small value ε (e.g.: 10-3) in one iteration - as it may be difficult to choose the value of ε, many times is better to plot the cost function has explained above.

Learning rate - α

The cost function J(θ) should decrease after each iteration of gradient descent, but if it doesn’t, two things may be happening:

there is a bug in the code - correct it
the learning rate α is too high - decrease α

For a α which is small enough, gradient descent should always decrease, but if α is too small, gradient descent can be slow to converge.
To choose α we can try consecutively different values of α until we are satisfied with the result of the cost function: 0.001, 0.003 (=0.001 * 3), 0.01, 0.03, 01, 0.3.

Machine learning - python implementations

Arthur Samuel has defined in 1959 Machine Learning as "the field of study that gives computers the ability to learn without being explicitly programmed". This post will be the first of a series of posts regarding different machine learning algorithms. For each algorithm I'll present first a theoretical overview of the algorithm and then I'll present a practical implementation of the algorithm in Python and some examples to demonstrate the usage of the algorithm implementation. The ultimate goal will be to implement a Pong game in Python and then use Machine Learning techniques to teach each paddle of the Pong game how to play. The programming will be done in the Windows environment with Python 2.6, numpy 1.6.1, pygame 1.9.1 and finally matplotlib 1.1.0. This work was inspired by the excellent 2011 Stanford online course on Machine Learning provided by Professor Andrew Ng. You can find his course available online on the OpenClassroom.

2012/02/20

The Python paradox revisited

Seven years after, it is still a very good read the Python Paradox by Paul Graham:

"if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job."

Python isn't an esoteric language no more, but I think people today still use Python not because they think it will help them to get a job, but because the language get the job done in a very elegant, and quick way...and the code gets so good looking and readable...

2012/02/16

The Zen of Python

In the Python shell try the following:

import this

You will be surprised with the result of the command above... the Zen of Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

which is a set of guiding principles for Python's design, by Tim Peters, divided into 20 aphorisms, only 19 of which have been written down. I've posted this not only because it's an interesting curiosity but because I feel that these are almost always good principles to follow when programming/coding/hacking... let's keep them present in our lives...

But there is also a mystery here... what is the 20th, not written, aphorism? I've seen many jokes around this matter in the Internet but no credible answer... can someone en-light me on this?

2012/02/05

MathJax and SyntaxHighlighter

In my first post in the ProgrammingAI blog I would like to refer 2 javascript engines that I embedded in the blog.

MathJax - mathematical formulas in Latex and MathML

MathJax is an open source JavaScript display engine for mathematics that can be integrated in different blogs, like Google blogger, to permit the display of mathematical formulas written in Latex or MathML. I've choosen this engine bacause it not only displays the equations with high quality, but it also allows copy-paste from the blog to external applications like word and latex documents. Additionally it provides accessibility features like zoom.

For the installation of the engine I've followed the following steps:

In blogger I pressed the 'Design' link
Then in the links presented in the left frame of the blog management I clicked the 'layout' link
I chose 'Edit HTML'
And then inserted after the tag <head>:

Then it's just a question of testing the concept. Bellow I show the cost function for Linear Regression:

$$J\left (\theta _{0},\theta _{1} \right )=\frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\theta }\left ( x^{i} \right ) -y^{i}\right )^2$$

This formula was generated by placing in the message the equation's latex equivalent enclosed between double dollar signs:

$"$ J\left (\theta _{0},\theta _{1} \right )=\frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\theta }\left ( x^{i} \right ) -y^{i}\right )^2 $"$ (the " symbols must be removed from the previous expression - I've placed them just to avoid the conversion of the Latex to the formula). If you do not know Latex you can use an equation editor like the one from CODECOGS)

Finally I would like just to ask you to click with the right mouse button in the above formula. We'll see that a context menu is open with several options:

Copy-paste: if you select 'Format' and then 'MathML', and then choose 'Show source' you can copy all MathML text and then just paste it to applications like Mathematica or do a paste-special (keep text only option) to word - this should work with any MathML-enabled application
Accessible Math: where, for instance, you can zoom on the fomula by right-clicking on the formula, select 'Settings', and then 'Zoom trigger' and then selecting 'Click'. Next time you click over the equation you'll see it zoomed

But MathJax isn't perfect. One problem (feature) is that the formulas are placed in the middle of the screen and with 1 (maybe more) newlines before the formula. So this means that if you are writing a sentence and just want to put a simple formula that is kept in the same line of the sentence you just can't do it. For this case an effective alternative is just to write the formula in plain HTML. HTML allows to write greek letters, relations (bigger, bigger or equal, etc), logical operators (there exists, for all, etc), set symbols (contains, element of, etc), infinity, square root, etc. A more complete list of the available possibilities can be found here. And then you can also add subscript or superscipt text which is usually needed for mathematical formulas. And if you want a nice place to test your HTML, find it here.

SyntaxHighlighter - code syntax highlighter

The next challenge was to be able to insert programming code snippets in the blog with language highlights, numbered (to be able to comment the code by referring the line) and allowing easy copy-paste. After some investigation the choice was SyntaxHighlighter.

For the installation of the engine I've followed the following steps:

In blogger I pressed the 'Design' link
Then in the links presented in the left frame of the blog management I clicked the 'layout' link
I chose 'Edit HTML'
And then insert before the tag </head> (in the link above it is stated before <head> which is wrong):

We can change the style of the code snippet by changing the theme in line 2 to one of the following:

shThemeDefault.css
shThemeDjango.css
shThemeEclipse.css
shThemeEmacs.css
shThemeFadeToGrey.css
shThemeMidnight.css
shThemeRDark.css

You can test the themes in run-time here.

In lines 5 to 28 you find all the programming languages that will be supported for code highlighting. Notice that this is only a subset of all available languages - you can check other languages available here. You may delete the lines corresponding to languages that do not interest you.

Finally, how can then a code snippet be inserted into the blog? Well, as described in the author's site there are two ways to insert the code. In both ways you must be editing the HTML of the message. The method that I prefer imply enclosing the code snippet between a HTML 'pre' like the example bellow:
Notice that in line 1 you must state the programming language of the code snippet. In this case it's python.

The other method imply enclosing the code snippet with a special <script> tag with CDATA around it:

Again, in line 1 you must state the programming language of the code snippet. In this case it's python again.