What types of data exist in data science and how do we classify them?
In data science, there are two main types of data: categorical data and numerical data. These are the two most common types of data you will encounter in data science and the most common way of classifying or grouping the various types of data. You’ll encounter them quite frequently in data science, so it’s important that you clearly understand the distinction between the two.
Categorical data represent named qualities of an observed phenomenon. This includes using words to describe the names or properties of objects, like their color, shape, and texture. For example, the color of an apple is red. The word “red” describes the quality of the color of the apple.
In data science, we refer to categorical data as “qualitative data” since they describe the quality of the thing they represent.
However, most beginners more intuitively understand the term “categorical” rather than “qualitative”, so I recommend that you conceptualize this type of data as “categorical” data.
Numerical data represents measured quantities of an observed phenomenon. This includes using numbers to describe the measurement of objects like their size, weight, and velocity. For example, the price of 6 apples is $2.00. “Six” represents the quantity of apples and “$2.00” represents the price of the apples.
In data science, we refer to numerical data as “quantitative data” since they describe the quantity of the thing they represent.
However, because most beginners often confuse the terms “qualitative” and “quantitative” so I recommend you refer to this type of data as “numerical” data when you’re just getting started.
Categorical and numerical data can be further divided into four subtypes. Categorical data can be divided into nominal and ordinal data. Numerical data can be divided into interval and ratio data. We’ll take a look at each of these four subtypes of data, in our next article.
In the meantime, to learn more about the types of data in data science, please see my latest course Intro to Data for Data Science.