September 15, 2019 Author: Matthew Renze

In data science, we commonly encounter several composite data types. By composite, we mean that the data type contains multiple value. This is unlike scalar data types, which contain only a single value.

Here are the most common composite data types you will likely encounter in data science.

Homogenous Data Types

A vector, also known as an array, is a one-dimensional sequence of homogenous data. Vectors are used to store a list of elements that are all of the same data type.

A matrix is a two-dimensional grid of homogenous data. Matrices are typically used to store and process groups of related numbers using a set of mathematical operations known as matrix algebra.

A tensor is a three-dimensional cube (or an n-dimensional hypercube) of homogenous data. Tensors are typically used to create deep neural networks in machine learning, which is where the deep learning framework TensorFlow gets its name.

Tabular Data Types

A dictionary is a two-column table that stores a list of key-value pairs. A dictionary, also known as a look-up table, is used to quickly retrieve data by a unique identifier.

A table stores data as a set of rows and columns. Tables are the most common way you will encounter structured data in data science.

Semi-structured Data Types

A tree organizes data as a set of nodes and branches. Trees are used to represent hierarchical data (i.e. data that are organized into parent-child relationships).

A graph organizes data as a set of nodes and edges. Graphs are used to represent a network of data. They represent each item as a node and each relationship as an edge.

Multimedia Data Types

There are also a variety of multimedia data types that we encounter in data science. These include text documents, images, audio, video, and shape data.

To learn more about composite data types used in data science, please see my latest course Intro to Data for Data Science.

Share this Article