I have been asked that many time by students and would like to have your opinion. Which is the best programming language to learn if i want to do bioinformatics?
Going back a few years, Perl was the language of choice for bioinformatics. Perl is a great programming language as it’s easy to learn, has powerful regular expression implementation for pattern matching, and low overhead. That is, “quick and dirty” scripts can be generated to read files, parse/filter/etc. , and write out again with relative ease. Furthermore, the BioPerl package is quite comprehensive. However, when it comes to doing things in a more structured way, Perl begins to fall short. The syntax can be very confusing both for the programmer and in terms of readability and the object oriented implementation is more of an afterthought than a central feature.
Then along came Python and has essentially taken over the bioinformatics world. Why you ask? Python has all of the great features of Perl, it’s intuitive, great with regular expression, and the like. But where it differs is key. Python is an extremely elegant language. So much so, that a special term has been coined to describe it… the “pythonic” way of programming. It scores high on readability and ease of use, and has an endless number of packages, a package manager (called pip), not to mention tools like virtualenv and Jupyter notebooks.
So, if you need to recommend a programming language for bioinformatics, I’d say Python is the clear choice.
I agree with Jamie.
Depends if you’re indending to run pre-existing analyses or want to write your own tools. R is probably the better choice, because you’ll have to use Bioconductor packages for a lot of analyses. It’s also significantly easier to learn.
That said, Python is a much better language to program in and write your own toolsets. (I personally like Python more, but think R is more useful if you’re just starting out)
One really problematic thing about Python is that it’s tremendously inefficient. You can write efficient code using Python – the way you do that is to take the hard part out of Python. For instance, Numpy is commonly used to provide Python with efficient handling of matrices – not just so that the operations are performed in an external library written in C and assembly (Intel MKL almost everywhere), but also to avoid Python’s very high-overhead data layout. Python’s data applies also to string-handling – not just matrices. And Python does not easily take advantage of multiple cores (ie, it is relatively hostile to threads.)
In short, Python is excellent, unless you’re writing your innermost loops in Python. Then it’s terrible. But it’s a great language for scripting (higher-level composition of programs or libraries that are written in a more efficient language.)