Many times I hear people ask: “Which programming language should I learn for Data Science?” or “What language is the best for Machine Learning?”
It’s a classic beginner’s question. I recall having the same thoughts myself when I started out so I can understand it — the fear and uncertainty of the unknown.
Here we will take a quick look at languages that can be used for data manipulation and building algorithms of machine learning. Ready, Set? Let’s go.
When you first encounter a problem, the first thing you do of course is to understand the problem. What problem type is it (supervised or unsupervised; classification or regression), this will help you determine the expected solution which in turn guides your problem-solving process. After which you dive head into your data and try to make some sense of it. This is Data exploration.
Are there missing values? No? Your job has just been made easier. Yes? Well, there is no problem there. You can read about how to handle missing data in our previous post.
Machine learning libraries are numerous and so are the languages employed in this field. Although some have a larger piece of the pie than others do.
Programming Languages that support machine learning libraries and algorithms include:
- R programming language
I may have missed your favorite language — I apologize. But as you can see, our options are not limited.
As a beginner here are some factors you need to consider before learning any language.
- The Language Community: If you get stuck on a problem can you get help. As a beginner still trying to grasp the concepts of machine learning, you need all available resources at your service. Thankfully, help is never too far away. Documentations, blogs and communities are always ready to help. You just have to look.
- Is the language open source or paid: While you often have to pay for the good things in life, in Machine Learning, they often come free. Utilize them. Of course there are circumstances when we must use paid services such as in Cloud Computing (and these even have Free Tier options) or with some programming languages e.g. MATLAB.
- Check job postings: What are the most sought after languages and skills by recruiters. Some companies have their language and system already in place. Just as in any other field, a good understanding of the job market will give you an advantage.
The most important thing however is skill. Languages are a means to an end — tools. If you are able to get the work done, nobody cares what language you use — mostly. This may sound cliché but, Build your skill and let your skill speak for you.
I remember my very first coding ‘job’. It was a simple one, at least for anyone with some coding experience.
My task? To write a simple code that could find a particular phrase in a webpage. If the phrase occurred, it would result in a beep sound. If the phrase did not occur however, a simple “phrase does not exist” is printed. I could use any language of choice, it did not matter as long as I got the work done.
I had just started learning python at that time(I made sure to mention that). In fact, that served as my introduction to urrllib and BeautifulSoup.
After one or two breakdowns, I finally did it. Yay me!
It’s still like that sometimes for me. I come across a problem and before you know it I have 10 tabs open, googling this or that and I’m going through machine learning publications and losing my mind going “How am I just coming across this?”
That is one thing I love about python, the 'un-lack' of information. Python has a large community and if you experience a problem, you are probably not the first person to have experienced it. A careful (sometimes random) web search will provide you with answers and even sample code. Sample codes have proved useful to me right from my first ‘job’. It provides insight and reduces the need to start coding from scratch. Over the years, Python has become the major language for machine learning because of its simplicity and support for machine learning libraries and algorithms. As a beginner, you can never go wrong with Python.
Python is also open source. You can get a copy of your free python software from the python official site.
MATLAB is a programming language used by professionals. It is especially useful in linear algebra and solving mathematical problems, developing algorithms, data analysis and data visualization. If you ever get stuck, you can refer to the help menu. Mathworks also has a blog where you can find useful information.
R programming language is a free programming language and software (R Studio) for statistical computing. It is relatively easy to learn and has a data frame function which is useful for data analysis. R also has a wide variety of libraries for data manipulation and visualization.
To iterate again, knowledge of a language does not guarantee you a job. You should work to build your expertise in areas such as data mining, data exploration, feature engineering and feature selection. Good knowledge of machine learning libraries and algorithms are also very useful as well as knowledge of statistics. Knowledge of statistics will help you better understand your models and even build your own algorithms.
Low or No-code machine learning options are also becoming popular. You can build machine learning models without having to code. Such platforms include Microsoft Azure’s Designer or Google Cloud’s AutoML.
Machine Learning solutions are as diverse as the problems they solve.
Find out what suits you and run with it — not literally of course but if you want to, it’s perfectly fine.
Hopefully this has helped you with your dilemma. Go forth and machine learn!