The Data Life Cycle - Matthewrenze

November 15, 2019 Author: Matthew Renze

How do we get from raw data to action in data science?

To understand the data-driven decision-making process, we need to look at the various stages in the data life cycle. These six stages include collection, storage, processing, analysis, and action.

Collection

Data are created when we observe and record a phenomenon that exists in the natural world. For example, observing and recording the outdoor temperature each hour. We can collect data about our world from a variety of sources including sensor readings, business transactions, human interactions, and by running experiments.

Storage

Data are first recorded temporarily in a type of memory called volatile storage (e.g. computer memory). However, we quickly move our data into a more permanent type of storage called persistent storage (e.g. a hard drive). There are many ways we can store data in a computer system including file-based formats, web-based formats, transactional databases, analytical databases, and Big Data platforms.

Processing

Before we can analyze our data, we need to process them to prepare them for analysis. This stage often involves steps like transforming, cleaning, and querying the data. Depending upon the situation, we can either perform these data-processing steps manually, using a scripting language, or using an automated data-processing pipeline (called a Data ETL).

Analysis

Once we’ve processed our data, we want to analyze them to create new information that we can act upon. We analyze data in order to support decisions, explain observations, and discover new information. This stage typically involves various tools and techniques including reports, dashboards, business-intelligence tools, data mining, machine learning, and artificial intelligence.

Action

After we’ve analyzed our data we need to take action. This stage begins with making a decision, then involves taking appropriate action, and ends with an outcome that is either positive, negative, or neutral. Action in this stage can take many forms including communicating our findings to others to encourage them to take action or automating a decision-making process using artificial intelligence.

Repeat

The final stage in the data life cycle is to repeat the process. Data science is an incremental and highly iterative process based on feedback. So, we want to use feedback from the outcome of our actions to drive the next iteration of the process. This allows us to continuously improve the process and our data-science practices over time.

However, it’s very important to note that the success of this data-driven approach is based on all of the steps that came before it. This is why it’s important to learn the rest of the details of the data life cycle — so that you can always choose the best possible action given the data.

To learn more, please watch my free online course Intro to Data for Data Science.

Share this Article