This series of posts were not a Python tutorial, just some tips for those Fortran programmers who are learning python. Once you know the basics of Python, you can focus on extensions useful to scientists.
The most essential one is Numpy, which gives python the ability to work efficiently with arrays. If you install Numpy, it is worth also installing Scipy. Scipy includes a wealth of algorithms that you probably need but you don’t want to code. From Fourier transforms and splines to minimizations and numerical integrations. There are excellent tutorials for both Numpy and Scipy and a good place to start is at http://www.scipy.org/.
Remember that although you can translate a Fortran code almost line by line into Python, the resulting code will not be optimal, neither for clarity nor efficiency. Learn to be Pythonic:
Use dictionaries, use sets, use list comprehension, and even consider using classes! Remember that almost everything is iterable in Python.
This is my last post for the series Python for Fortran programmers, but I will continue writing about Python tools that I find useful for my research. I hope they will also help other computational chemists and biophysicists.
Python arguments can have have a default value, in which case, they become optional. Its similar to Fortran optional arguments, but the implementation and the syntax is pretty different. For immutable objects, everything works as a Fortran programmer would expect. But for mutable objects we tend to be surprised.
Type the following function:
def function(data=): data.append(1) return data
and call the function several times, without arguments (
function()). What is going on?
As usual, you can better visualize this process if you execute the code in: http://www.pythontutor.com
But it’s easy to understand, and if you do, you won’t be confused again. The first thing to remember is that lists are mutable. The second one is that the
def statement is an executable statement, which is executed when the function is defined in that line. As an example, will
Hello from f1! be printed if you execute only these lines of code?
def f1(): print("Hello from f1!") return 1 def f2(x=f1()): pass
See? You are defining functions but you are actually producing output!
So once the function is called and the object data is created, the same object is called all the time, pointing to the same data mutable object.
If you have understood everything, you should be able to predict the output of these lines:
def function(data=): data.append(1) return data print(function()) print(function()) print(function()) f1 = function def function(data=): data.append(1) print("Hello!") return data print(function()) print(function()) print(function()) print("f1 is function:", f1 is function)
You can expand this fascinating subject here: http://effbot.org/zone/default-values.htm from where I stole the example and some ideas, or here:
Before we deal with the “problem” of default values, we need to clearly understand what mutable and immutable objects are, because this is a concept that does not appear in Fortran.
Objects in Python can be mutable or immutable. This may seem pretty irrelevant. After all, what is the difference between:
a = 5 #immutable a = [1, 2, 3 ] #mutable
True, mutable objects have some methods such as
a.pop() that immutable objects don’t have. But again, you can think of workarounds:
b = (1, 2, 3) #an immutable tuple b = b[:-1] #all elements except the last one.
However it is important to realize that
b is now a new object (it has a different memory address) having the same name as the previous one, whereas
a remains the same object (same memory address) after the
pop method is called. You can check that by calling
id(b) before and after the assignments. When we modify a mutable object, the object remains the same, but when we “modify”
b, we create a new object:
a = [1, 2, 3] >>> id(a) 139878335741320 >>> a.pop() 3 >>> a [1, 2] >>> id(a) #the same as before 139878335741320 >>> b = (1, 2, 3) >>> id(b) 139878160519888 >>> b = b[:-1] >>> b (1, 2) >>> id(b) # a NEW object with the same name 139878160530536
Now we can deal with the surprising behaviour of default arguments.
For a Fortran programmer, reading that “everything in python is an object” does not help. It mostly frightens. Here I’ll try to give some oversimplified definitions so that you can read Python literature without being blocked by the language.
At the beginning you can forget about how objects work, just remember that anything like a Fortran variable (integers, arrays, etc) is an object.
Objects are class instances. Somehow the class is like the definition of the object, whereas the objects are the actual variables. In
REAL :: x,y,z,
REAL would be the class and
x,y,z would be the objects.
Objects have methods and attributes. Again, for simplicity, methods are like Fortran functions, but associated with a certain object. And attributes are variables associated with that object. In Fortran, the closest thing to an attribute would be the fields in a
As attributes and methods are associated with an object, its syntax is special:
In fact, some modules such as numpy (we’ll see more about numpy in the future) have methods implemented also as “normal” module functions. For example, if x is a numpy array we can call its method to get the mean:
or call the module function:
It is not easy to see the utility of objects, because many of the tutorial examples are too trivial. But here is one more example. There are many diagonalization subroutines in lapack. One for real matrices, one for complex, another one for symmetric, etc. They all need different names, and you have to check which one to apply to your particular matrix.
In Python, you could have all these different algorithms as methods in different classes, so that they could have the same name without collisions. Then:
would call the right method depending on what the class of
Python only has functions, whereas Fortran has two types of procedures: functions and subroutines. Functions do something with the arguments and return a value. The analogy with mathematical functions is obvious: f(x)=cos2(x) is asking to be coded as a function. Subroutines are supposed to perform some tasks and/or return several arguments than cannot be returned with a function. A typical example could be a diagonalization subroutine, which returns the eigenvalues and the eigenvectors.
Fortran programmers find no problem in understanding Python functions, but how can we make a function that returns eigenvalues and eigenvectors?
Python is not the only language having only functions, in fact its pretty common, the exception being Fortran… But in Fortran we can code everything as a subroutine or everything as a function.
subroutine cos2(x, res) could return the “function” value as an argument. And
function diag(m, eigvec, eigval) could have the same arguments as the analogous subroutine and have a return value that just informs if the diagonalization succeeded.
This diagonalization function could be easily translated into a python function. Done? Not yet…
As Fortran programmers we tend to overuse the arguments to output results, and in Python the return value is much more flexible than in Fortran, because what the function returns is not fixed in size or type. So in Python it would make more sense to return the eigenvalues and the eigenvectors in the return statement and leave the arguments only as inputs to the function.
Let’s see two examples:
The Lapack diagonalization subroutine
DSYEV has these arguments:
SUBROUTINE dsyev( JOBZ, UPLO, N, A, LDA, W, WORK, LWORK, INFO )
A contains the matrix to be diagonalized on input and the eigenvectors on output. The eigenvalues are returned in
INFO returns 0 if successful. How is does the interface look like in Python?
Scipy contains wrappers to many Lapack subroutines. Although with Python we should call the higher-level routine scipy.linalg.eigh we can also call directly the Lapack routine:
>>> import numpy as np >>> import scipy.linalg as la >>> a = np.array(([[1,2,3], [2,4,5], [3,5,6]])) >>> la.lapack.dsyev(a) (array([ -0.51572947, 0.17091519, 11.34481428]), array([[ 0.73697623, 0.59100905, 0.32798528], [ 0.32798528, -0.73697623, 0.59100905], [-0.59100905, 0.32798528, 0.73697623]]), 0)
So it returns the eigenvalues, the eigenvectors and INFO. These arrays are created in the subroutine and can be assigned to variables which need not be defined. For example:
>>> vals, vecs, out = la.lapack.dsyev(a)
If the interface of
dsyev was something like:
w should be defined before the call (with arbitrary values), something very unnatural in Python. I will discuss in the next post why I didn’t include
info as an argument…
The second example is a function that concatenates two lists, only if the last element of the first list is the same as the first element of the second list. Otherwise it returns nothing (
def concat(l1, l2): if l1[-1]==l2: return l1+l2 else: return None
Because Python is dynamically typed, it is ok to return different types of variables in the same function depending on the case. Assigning a result to a variable will not produce an error:
l3 = concat([1,2,3], [4,5])
The code could continue by checking what is l3:
if l3 is None: do something else: do something_else
So do not be afraid of returning complex lists from a function and avoid changing the arguments values as much as possible, as this is a more clear coding style.
Learning Python is easier than learning Fortran. First, because the language has a simpler syntax and its interactive behaviour makes errors easier to catch and more understandable. Second, because there are lots of resources, many of them free: tutorials, books, etc.
These two interactive web-pages are a good example of the type of help you cannot find with Fortran:
That said, at some point you might feel dismayed by the size of the language. Python is much larger than Fortran. Both the core language and the available extensions. You’ll have to assume that, unless you become too hooked on it, there’ll be many areas of the language that you will – and can – ignore. With the risk of blasphemy, I dare say you can write hundreds of useful scripts completely ignoring Object Oriented Programming.
The number of available packages is also overwhelming. Take, for example, the calculation of the square root. In most tutorials you are told to import the
math module which contains a
math.sqrt will give an error for negative arguments.
math.sqrt only returns floats. If you want deal with complex numbers, you need to
import cmath. But
cmath.sqrt always returns a complex number, even if its imaginary part is zero. All this is similar to the Fortran
sqrt function (which implicitly calls two different functions for real and complex arguments). But with python you can both have your cake and eat it. There is a third
sqrt function which returns a complex number only when the argument is negative. It can be found in the
numpy.lib.scimath module. This third function can be useful to you, but you can also live without it. Getting used to ignoring all these potentially useful features is part of the learning process. Eventually you will remember and look for the ones you really need.
One last confusing issue is which Python version to learn. As with Fortran, the newest versions are better, but the problem with Python is that versions 3.x are not backward compatible with the wide-spread versions 2.x. My recommendation is that you learn python 3.x. Mainly because it is the future. Most packages have already been translated into Python 3. I think installing python3, numpy, scipy, sympy, and ipython is enough to start using python in scientific problems. Make sure you install these packages for python3. Each python version can host its own modules completely independently. If you are using an older distribution, you can also install them quite easily with
pip. Of course, things can get more complex, as Konrad Hinsen reports in his interesting blog.
Despite their incompatibility, Python3 and Python2 are not that different. Probably, they are less different than FORTRAN77 and Fortran 2000. The most significant differences that you will find are that
print() is a function in Python3 (it uses parenthesis) and that
input() works differently.
And if you start programming, please do it in a nice way.
Python is a general-purpose language, used in extremely different fields. Take a look at http://wiki.python.org/moin/PythonProjects Many of the projects are available at the Python repository PyPI. That means the language is active and adequate for many applications. But of course, we want it to be also good at number crunching and data visualization.
For that you need some packages. Packages are extensions of the core language, a kind of library in Fortran. They need to be imported before they are used. Some packages are a must for scientists: numpy, matplotlib and possible, scipy. Installing python packages is easy. I will explain that in the future, but these three packages are on most Linux repositories (certainly in Ubuntu) and that is the simplest way to install them.
Because python is an interpreted language (It’s gonna be very slow!! Wait, wait…) you can use different ‘shells’. I recommend iPython. That, together with the previous packages, turns python into a powerful scientific development tool. If you have time (I promise to keep this post short) watch this amazing talk by Fernando Perez, the author of ipython:
If you are still not convinced, take a look at this survey which compares Python to Fortran:
Convinced? Then start by typing
import this and start absorbing the Zen of python. Then impress your colleagues by defying gravity with
import antigravity (only in Python 3). Aha! You look more pythonic now…
If you are new to Python and want to install it you will have to decide whether to use Python 2 or Python 3. In the next post we will see how to make this decision. The short answer is ‘use Python 3’.
Being trained as a scientist (as I am), you have a certain probability of having been taught Fortran. Are there reasons to try to learn another language? And why Python?
I use computers all the time. But most of the computationally intensive tasks that run in my cluster are performed by highly optimized codes that I have not written: Gromacs, Orca, Gaussian, etc. What I usually need to do is to play with the data these codes generate. Here is where I frequently need to code something. But playing with the data rarely involves only number crunching. Okay, sometimes I may need to do an interpolation, or a least square fitting, but much more often I need to graph the results, or parse software outputs to get these data. So, should I use Fortran?
For the number crunching, a naïve answer would be ‘yes’. I disagree. If what you want to do is a common task, such as the ones mentioned (or a diagonalization, or a minimization, or…) you can probably find packages or libraries where that is already implemented. So why reinvent the wheel? You can find that in Fortran, but you can find that in Python too. And the installation, the documentation and the ease of use will be better in Python. Do you want an example? I need the eigenvalues of a matrix stored in a text file ‘a.dat’. And you only have python installed. Nothing else.
$ sudo apt-get install python-numpy $ python >>> import numpy >>> a=numpy.loadtxt('a.dat') >>> numpy.linalg.eigvals(a) array([ 16.13476875, -8.29479695, -0.3399718 ])
Now, beat that with Fortran… Of course, sometimes you’ll have to code your computationally intensive task –we are scientists, we are supposed to be doing new things–, and Fortran is an efficient language for that.
But most of the time I am not solving linear equations. Most of the time I am struggling to get the energies of a set of outputs and correlate that with distances that are in a set of trajectory files. Or to modify tens of input files in a systematic way. Or to plot these results automatically, without manual copy&paste to a spreadsheet. And for that, Fortran is almost useless.
As John D. Cook puts it: “I’d rather do math in a general-purpose language than try to do general-purpose programming in a math language”
I am not saying you should abandon Fortran. I am just advocating for another tool that can ease many of your common tasks.
This is my first entry in my brand new blog. I’m a bit afraid of this new adventure…
The idea of this post is to share some of my research experience and results. I’ll start with several posts on python tips for Fortran programmers. This is because I am now preparing a course on that, and because I haven’t found that much information on this subject on the web. There are plenty of great Python tutorials, but few assume you are an experienced Fortran programmer. This is what I was (or I am) and there are several computational chemists like me, who could profit from learning python. So, let get started…