Today, we can’t talk data science and not mention `Python` nor `R`. Both Python and R have become the ultimate choice for any data scientist given their robustness and range of libraries that can be leveraged to get any job done without a lot of hustle, not forgetting that both are open source languages. Both languages have been inexistence for over 25 years (since early 90s) with Python inheriting the computer science DNA and R inheriting the statistical DNA.
Am new to data science, should I learn Python or R?
I asked myself this question 2 years back, searched the internet for answers, but I could hardly land a definite answer. So I decided to try both interchangeably. This way, I have been able to learn the strengths and weaknesses of each of them. Overall, I must confess that none is superior than the other. So you just need to discern what your individual interest is before you get biased by online comparisons.
Python first appeared in 1991 and was created by Guido van Rossum a Dutch programmer with a design philosophy that priotizes code readability. Python has often been labelled the `easiest language` with a steep learning curve. Python is general purpose and is heavily used in machine learning, Artificial Intelligence and web development. The beauty about Python is that it’s easily deployable and can thus be leveraged in executing real time data mining tasks. With a computer science DNA, data science is mostly preferred by computer scientists and software engineers.
Common libraries used in Python are numpy(dataframes), pandas(data wrangling), matplotlib(visualization) and seaborn(visualization). Libraries can easily be loaded into a project by just an import command e.g `import pandas as pd`. Also, only specific functions in a library can be loaded to improve performance where necessary.
The Anaconda package is very instrumental when it comes to dealing with python dependencies and it bundles a number of IDEs including a web based Jupyter Notebook that leverages the use of Markdown to write python notebooks that can be exported as .py, .html, etc files.
R first appeared in 1993 as a statistical computing language and is widely used by statisticians (like myself) and data miners. R is a highly functional programming language and most of the statistical functions are native (base) to the R language and can be used without even having to load any libraries. R’s ggplot2 library has very powerful data visualization functions that can be used to generate intuitive visualizations even for the common man to interprete. R’s dplyr library that can be installed from the tidyverse package is a very powerful data wrangling tool that is instrumental in data cleaning.
RStudio is R’s native IDE and I must admit that it’s one of the easiest IDEs to use with everything on the dashboard. It also comes with Markdown support that can be used to generate web, pdf and power point documents.
R is popular amongst Researchers and people in academia and is less desirable when it comes to development.
Python is all round and easily deployable. It can be used in developing stock and forex trading applications, robotics, etc and if you are in a highly dynamic industry, Python will be the most convenient.
On the other hand, R is a powerful statistical language that comes handy when you have some analytical research reports to design.
I love both languages but I will bias you to learn Python and perfect it. You’ll not regret.