In data science, we commonly encounter several scalar data types. By scalar , we mean that the data type contains only a single value. This is unlike composite data types, which contain multiple values in the same data structure.
Here are the most common scalar data types you will likely encounter in data science.
A character represents a single letter, digit, or symbol. For example, “A”, “1”, and “%” can all be represented as characters. We can string together a sequence of characters (called a character string) to represent words, numbers, and bodies of text. Character strings are very flexible for storing data, but they are not very efficient in terms of data processing and storage space.
A Boolean represents either a true or a false value (and only a true or a false value). Booleans allow us to efficiently store and process data composed of either yes or no answers.
An enumeration represents a set of named categories. For example, the list of fruit “Apple”, “Banana”, and “Orange” can be represented as an enumeration. Enumerations allow us to efficiently store and process lists of named categories that contain a high degree of duplication.
An integer represents a whole number. For example, the numbers -1, 0, 1, 2, 10, 100, and 12345 can all be represented as integers. Integer data types work well for storing and processing numbers that do not contain fractional values.
A decimal represents a decimal fraction. For example, the numbers -0.1, 0.01, and 1.23 can all be represented as decimals. Decimals work well when we’re dealing with fractional values (like money) that require perfectly accurate decimal arithmetic.
A float represents numbers using a binary-equivalent of scientific notation. For example, the number 1.2 x 103 can be represented as a float. Floats work well when we’re dealing with very large or very small values but perfectly accurate measurements and arithmetic are not required.
A date represents time as a calendar day. For example, “July 21, 1969” can be represented as a date. A Date data type works well when we just need to specify a year, a month, and a day but nothing more.
A time data type represents a time of day. For example, “2:56:17 AM” can be represented as a time. A time data type is used when we just need to represent an hour, a minute, a second, and milliseconds but not a date.
A date-time data type represents both a date and a time of day. For example, we can represent “July 21, 1969 at 2:56:17 AM” using a date-time data type. This data type is used when we need to represent time across days.
To learn more about scalar data types used in data science, please see my latest course Intro to Data for Data Science.