March 15, 2019 Author: Matthew Renze

Data, information, and knowledge are the key building blocks of data science. However, most people don’t really understand what they are or how they are related to one another. So, to help you take your first steps on your data science journey, I’d like to explain to you what they are and how they are related.

## Data

When we think of data we might imagine a bunch of ones-and-zeros sitting inside of a computer, the stats from our favorite sports team, or the medical records at our local hospital. But what exactly is data? Or technically, the more grammatically correct question is, “what are data?”

Data are a collection of symbols that describe observations of the world around us. They record facts about the natural world that we live in. These include descriptions of the qualities of things in our world, for example, colors, shapes, and textures. In addition, they include measurements of quantities of things in our world, for example, size, weight, and velocity.

Data are represented using symbols. This includes representing qualities of things using words, for example, the color of an apple is red. In addition, this includes representing quantities of things using numbers, for example, the apple has a mass of 100 grams.

Imagine we’re feeling sick and we go to see our doctor. Our doctor takes a measurement of our body temperature using a thermometer. The thermometer reads 39°C (which is 102°F, for those of us who’ve yet to switch to the metric system). Based on this temperature, it’s clear that we’re running a fever.

The mass value of 100 grams is what we refer to as a “datum” (i.e. a single piece of data). The word “data” is actually the plural form of the word “datum”. So when we have more than one “datum”, we have “data”. However, most people now use the term “data” in both the singular and plural form, interchangeably. The term “datum” is now, rarely, if ever, used.

Essentially, data is information encoded as symbols. It’s how we record information about the world around us. However, not only is data created from information, data can be used to create new information as well.

## Information

Information is everywhere. We have information on the menus at our restaurants, in the books in our libraries, and on street signs while we’re driving. But what exactly is information in the context of data science?

Information is something that reduces uncertainty about our world. It is the answer to questions like who, what, where, how many, or how much. Essentially, information provides clarity about the world we live in.

Information can be created from data. We create information by organizing, analyzing, and interpreting data. Organizing, analyzing, and interpreting data gives it context and meaning. This additional context and meaning are essentially what distinguishes data from information.

For example, imagine that our doctor has recorded a history of our normal body temperature over the past few years. They analyze the historical data and computes that our average (or normal) body temperature is 37°C. This average temperature of 37°C is what we call information.

Information is more meaningful than the individual data points that were used to create it. This makes information quite useful on its own. However, information can also be used to create something more powerful it can be used to create knowledge.

## Knowledge

Sir Francis Bacon famously said that “knowledge is power”, but what makes knowledge so powerful? And what exactly is knowledge in the first place?

Knowledge is a theoretical or practical understanding of the natural world around us. It explains the observations that we see and why things behave the way they do. In addition, it allows us to predict the behavior of phenomena in our world. Both of which, are quite beneficial to our survival and our ability to thrive in the world.

Knowledge is created from information. It is essentially a collection of information that has been organized to provide a consistent and cohesive understanding of a specific topic. From a more pragmatic standpoint, knowledge is used to solve problems within a specific domain. Knowledge allows us to make decisions so that we can take action that leads us to a goal of some kind.

For example, imagine that our doctor knows that when a person’s body temperature rises to 39°C that they are likely fighting an infection. This relationship between an increase in body temperature and the presence of an infection is what we refer to as knowledge. In the case of our example, our doctor would use this knowledge to decide that we are most likely fighting an infection. As a result of this knowledge, they would likely take the action of recommending that we get plenty of rest, drink lots of fluids, and potentially get further testing if necessary.

Knowledge alone isn’t enough to solve problems. In order to solve real-world problems, we need to use knowledge in combination with new information. It’s this combination of existing knowledge and new information that leads to a solution to a problem. For example, our doctor used a combination of their existing medical knowledge and new information about our current body temperature to decide that we were likely fighting an infection and recommend a course of treatment.

Understanding the relationship between data, information, and knowledge is a critical first step to understanding data science. Now that you’ve taken that first step, you’re ready to begin your data science journey. To learn more, please see my latest course Intro to Data for Data Science.