Filtering Data with Queries

SQL or Sequel is the standard language for communicating with relational databases. Creating or modifying databases using queries is an important skill for data scientists or analysts as databases are one of the most common sources of data.

Any well meaning company or organization keeps its own data. This data may include consumer information, purchase or sale, inventory and so on. A database is simply a collection of related data. A database is made up of tables which in turn are made up of fields.

We will be working with a sample SALES database of an…

A beginner’s guide to NLP and extracting insight from text data

Before we had computers, papers, ink or Lead to make pencils, human communication was oral. Languages evolved and with it, structure and semantics. Languages went from strictly oral form to oral and written form. NLP is the process of making machines understand human language — text or speech.

Natural Language Processing (NLP) has gained more popularity and advancement over the years. The ability to draw insight from text data has opened a world of opportunities. …

Who are the data professionals and do we need them?

Data science is a broad field that combines domain knowledge, programming skills, mathematics and statistics in order to draw insight from data. As you may have guessed from the name — Data Science — the work done revolves around data, lots of data.

I have to admit, when I first heard about Data Science, I thought all individuals who worked in the field were data scientists. …

How to reduce Bias and Variance error in your model

In the process of building a Predictive Machine learning model, we come across the Bias and Variance errors. The Bias-Variance Tradeoff is one of the most popular tradeoffs in Machine Learning. Here, we will go over what Bias error and Variance error are, sources of these errors and how you can work to reduce these errors in your model.

How does Machine Learning differ from traditional programming?

The high school definition of a program was simple. A program is a set of rules that tells the computer what to do…

Implementing functions in your data analysis process.

Functions are not a new idea in programming. In fact, most programming languages have built-in functions that allow programmers carry out different tasks.

Functions are sequence of codes written together in blocks in order to perform an action or a series of actions.

Some in-built functions in Python include:

  • min()
  • max()
  • cos(x)
  • sin(x)
  • tan(x)
  • str()
  • float()

and many more.

We can also define our own functions. To define a function, we use the keyword def followed by a function name and parenthesis. …

Tools and skills you need in Data Science

There has been a debate on whether Data Science is the ‘sexiest job of the 21st century’ or not. We will not be taking part of that debate — today. Rather, for those looking to get a start in data science and in extension, Machine Learning, or if you are just looking to get ahead in the field, we have curated tools you need as a data scientist.

These tools will be broken down into different categories for convenience — more mine than yours. …

How do you get information across through visualization?

Data visualization is the graphic representation of data and information. Data visualization makes use of charts, graphs, software or other visualization tools to provide a quick overview of data and show trends and relationships that exist.

Data visualization is not just about plotting charts or making colorful images, in data visualization, the goal is to pass information to the end users as well as:

  • Visualize trend in dataset
  • Easily recognize outliers
  • Recognize data patterns
  • Understand relationship between data

Data Visualization for Data Scientists

Data science has found application in various industries leading to the employment of data…

Working in the fast paced world of data science? Here are some habits that are bound to increase your productivity.

Data science is quite the trend right now. The pay is high and great opportunities abound. Its applications are numerous and many industries, tech or not, are beginning to see the importance of making sense of their data.

While it is a hot topic right now, here are 8 habits that will make you a better data scientist.

Know the Job Roles

Before you apply for any job, make sure you know what you are getting yourself into. There are many job roles/titles that…

AI+ Tutorial

Using data to predict the probability of customer churn

In our last two posts, we have spent some time going through evaluation metrics for classification models.

In this post, we will go through a classification problem — Customer Churning. This dataset is part of a Data Science Nigeria (DSN) Pre-Bootcamp hackathon hosted on Zindi.

The goal is to predict customers that are likely to CHURN or stop using the network.


Expresso is an African telecommunications company that provides customers with airtime and mobile data bundles. The objective of this challenge is to develop a machine learning model to predict the likelihood of each Expresso customer “churning,” i.e. …

In our previous post, we discussed the Evaluation Metrics for Classification models with emphasis on the Confusion matrix, accuracy, precision, recall and log loss metrics. You can read the post here to understand the basic evaluation metrics.

By evaluating our models, we assess their performance using key parameters and functions. Model evaluation is an integral step in the machine learning pipeline and we evaluate our models for a number of reasons. Most importantly, we evaluate models to ascertain that they are able to generalize on unseen data and therefore produce accurate predictions. …

