AI+ Community Kaggle Competition
Understanding Students test performance using Machine learning
At Data Science Nigeria (DSN) Port Harcourt, we are committed to the skill growth of our community which is why earlier this week, we held an InClass Kaggle competition. Our choice problem? Using machine learning to understand the influence of ethnicity, gender, parent’s background, test preparation and even lunch on student’s test performance.
MEET OUR TOP WINNERS
- Omotosho Olamilekan is an engineering student passionate about exploring technologies and solving problems.
- Blessing Agadagba is an intelligent, purpose-driven person. She is passionate about providing world-class data driven solutions to better humanity.
- Adeoluwa Adeboye is an AI enthusiast who loves solving problems with python.
- Babatunde Hameed is a beginner in data science and machine learning. We are impressed!
- John Godday is a graduate of Statistics from Ladoke Akintola University of Technology. He has just completed his youth service in Port Harcourt and is a machine learning enthusiast with interest in finance and health.
Our winners cut across beginners, intermediate and skilled data scientists!
What is the first thing you do when faced with a machine learning problem?
Babatunde takes his time to understand the problem statement and explore the data. As John Godday puts it, ‘I understand my data before anything’. This includes checking for data quality issues such as missing values, spelling errors in categorical variables and understanding the data distribution.
Exploring the data helps one determine the sort of machine learning problem — supervised or unsupervised — and for Adeoluwa this is a crucial step. Understanding the problem type will determine the algorithms used to fit the data.
What is your first step to solving a machine learning problem?
Dealing with Challenges
No good [model] fit comes without its own challenges and for most of our winners, it was encoding their categorical data. However, after many trials and trips to the World Wide Web, this problem did not stand a chance against our champs. For Olamilekan, One-Hot Encoding did the trick.
Categorical data is data that can be divided up into distinct groups. They are labels and often times must be transformed into numeric values before they can be ‘fed’ into the algorithms. For some algorithms such as the decision tree, this is not a necessary step as the decision tree can be fit with categorical data.
There are a number of ways to transform categorical data such as Label Encoding and One-Hot Encoding. In label encoding, each category is assigned an integer value. This is useful for ordinal categorical variables with natural ordered relationships. Another and more efficient way is One-Hot Encoding, which assigns binary values to each category.
We will look in-depth into categorical data much later.
How do you handle categorical data?
Finding the algorithm that best fits your data can be quite an ordeal. John had to try three different algorithms before obtaining his optimal fit with XGBoost and Blessing says she tried to fit her data with all the machine learning algorithms she knows (that can be quite daunting). For Adeoluwa it was an easy decision, “From my little experience in Machine Learning, I have learnt that in most cases you cannot go wrong with the XGBoost algorithm,” he says. We can tell that someone has a favorite algorithm here.
Olamilekan’s method is quite intriguing. After fitting his data with so many algorithms, he decided to stack them — find the mean of the algorithms that had a good fit. There is no hard and fast rule to machine learning, often times to find a best fit you have to try different algorithms and even tweak their parameters. It is good to see something different every now and then.
Need I ask how you arrive at your best fit?
Our winners did not leave without dropping some words of advice: Learn the basics; practice more with platforms such as Kaggle and Zindi; and yes, follow Data Science Nigeria tutorials to gain hands on knowledge. Did I leave anything out?
If you have read this far you must be a data science enthusiast. We are kindred spirit. Let’s not lose touch.