July 15, 2019
Author: Matthew Renze

How do we represent data in a computer?

All data inside of modern computers are stored as a series of ones and zeros — we call this binary data. The ones and zeros are called binary digits (or “bits” for short).

In modern computers, data are stored in small blocks of eight bits called a “byte”. We can combine two, four, eight, or more bytes together to create larger blocks of binary data.

However, the computer needs to understand what each of these blocks of ones and zeros represents — is it a word, a number, a date and time, or something else? This is where data types come into play.

A data type is an attribute of data that tells the computer what a group of binary data represents. They tell the computer how to interpret the bits of data – either as a character, a number, a date, or something else. They determine what operations can be performed on the data, like addition, subtraction, and multiplication. They specify how the data are stored and the size of the data by the number of bytes they require. And they instruct the computer on how to display the data in a human-readable format.

For example, we represent the letter “A” as a byte of binary digits using the binary sequence “01000001”. We represent the digit “1” in binary as “00110001” and we represent the “%” symbol as 00100101.

Essentially, we can represent anything that can be typed into a computer as a sequence of ones and zeros using data types.

In data science, there are two main divisions of data types: scalar and composite data types.

First, we have scalar data types, also known as primitive data types, basic data types, or built-in data types. Scalar data types represent the most basic building blocks of data by storing letters, numbers, and symbols in a computer as binary data.

Scalar data types store a single unit of data. This can be a letter, a number, a date, a time, or something else. We refer to them as scalar data types because a scalar variable in mathematics can hold one and only one value at a time.

Scalar data types are also the most basic unit of storage for data in a computer. Everything from a small text document to a giant distributed database are composed of these single units of storage.

Scalar data types provide a set of operations that can be performed on the data they contain. All of the processing that occurs in a computer is essentially the result of these operations being executed on scalar data types.

Next, we have composite data types, also known as aggregate data types, compound data types, or more commonly, data structures. Composite data types are composed of a set of scalar data types. They organize the scalar data types and provide them with structure so that they can be worked with as a collection of values.

A composite data type is a logical container used to organize related data. It contains a set of scalar data types organized in a specific way.

Composite data types allow us to store and access information effectively. They provide methods for accessing individual scalar values and performing operations on groups of scalar values.

In addition, composite data types provide context to related data which (as we discussed previously) is used to create information.

You can think of a composite data type as a container that holds a collection of related data in a specific way.

We’re going to discuss scalar data types, followed by composite data types, more in-depth in our next two articles.

To learn more about data types used in data science, please see my latest course Intro to Data for Data Science.