Lists are a common part of life. Every morning, I jot down a quick set of tasks I want to accomplish, whether they're related to work, personal interests, or a list of destinations I'm visiting for a road trip. Lists provide clarity and help keep track of important points.

Just as lists are a key piece of our lives they’re also a crucial component in programming and web development. Lists keep track of elements on a webpage, data points in a chart, user-provided information, and many other use cases. Python supports lists and provides different methods to extract value from them.

pandas is a Python framework that simplifies the process of manipulating and updating relational or labeled data. Just as regular Python supports lists, pandas provides a special data structure for working with lists, arrays, and dictionaries: the Series.

Download Now: An Introduction to Python [Free Guide]

In this post, we’ll cover everything you need to know to start using Series in pandas, including:

You can think of a Series as a column in a table. It holds one or more rows of data values all grouped under a common name — in this case, the variable name you store the Series under when first creating it. Below is an example Series printed to the terminal.

Screenshot of Series in pandas showing series name first_series and printout of series with a column of integersIn first_series, you have a list of integers that all increment by 25. You can also see the Series labels on the left-hand side. By default, pandas Series are zero-indexed, so the first row of a Series will have an integer label of 0.

We'll discuss how to update the default index as part of creating a Series in the next section.

How to Create a Series in Pandas

In most scenarios, the data you use in pandas will come from an external source (i.e. outside your Python environment). For simplicity, we will focus on data that originates from within the file.

To create a pandas Series, use the nomenclature below:

 

import pandas as pd

your_series = pd.Series("Hello")

Since pandas is an external library, you first need to import it into your Python file with the import pandas statement. The code then aliases pandas to "pd," which is achieved with the as pd language.

Aliasing is another word for abbreviating. It allows you to shorten the length of your statements, which decreases the file size and makes the code easier to read. It is not a requirement, but it is considered a best practice to alias library names when importing them, so you will see the pandas library called by the "pd" alias in most files.

Now that you have access to the pandas library, you can create your series. The code starts by declaring a new variable your_series and setting it equal to the result of calling the pandas .Series() method. Whatever data you want to build the Series with is placed between the parentheses, which is also referred to as the argument. In this case, your argument is a single string "Hello."

Once you run the file, you have your first series. You can confirm everything worked by printing the result to the terminal with the print function:

 

print(your_series)

The result of running print() is below.

Screenshot of your_series printed to the terminal with one row containing string "Hello"

You now have a Series containing one row at index position 0 with the string "Hello" as your data value.

Obviously, building a Series for one value isn't the primary use case. Why not just store the string in a variable, which would remove the need to import pandas in the first place? Now that you understand the base syntax, we'll look at more practical use cases for Series in the next section.

Convert a list to a Series in pandas.

Let's return to the first_series example. This Series was built from a Python list, which is declared with the square brackets ([ ]).

 

dataset = [25, 50, 75, 100]

first_series = pd.Series(dataset)

After declaring the list and assigning it to the variable dataset, you pass dataset as the argument to the .Series() method of the pandas library. The output is below.

List converted to Series in pandas printed to the terminal

The next two methods for creating Series follow this base workflow with a few additional considerations.

Convert an array to a Series in pandas.

NumPy is another open-source library in Python built to support scientific computing. One of its main offerings is the NumPy array, which improves on the Python list by optimizing storage and speed.

These background differences between NumPy arrays and Python lists mean there aren't notable distinctions in their structure, as you can see in the screenshot below.

Screenshot showing printouts of Python list and NumPy array holding same values

This also means that the workflow for creating a Series from an array is essentially the same as a list, except you need to import the NumPy library as well as the pandas library:

 

import pandas as pd

import numpy as np

num_arr = np.array([25, 50, 75, 100])

num_arr_series = pd.Series(num_arr)

The only additional step from the list workflow is that — on top of importing NumPy — you also call the NumPy .array() method to generate an array from a Python list. Once that operation is completed, you pass the array (num_arr) as the argument to the .Series() method.

The output is similar to what you'd expect from the first_series printout:

Screenshot of Series printed to terminal from NumPy array with values incremented by 25

The only notable difference between the num_arr_series and first_series printouts is the data type (dtype). When creating a Series of the same data values from a list, the data type was int64 (64-bit integers). In this example, the data type is int32 (32-bit integers).

Because the integers in the array are not very large, NumPy has determined it can use the 32-bit data type, which does not require as much storage space in memory. On the other hand, the 64-bit data type can store integers of virtually any size, but this means that each data value requires more memory space regardless of its actual size.

Now that you understand the process for converting lists and arrays into Series, let's review a final Python object that you will often capture in a Series: the dictionary.

Convert a dictionary to a Series in pandas.

Dictionaries in Python are collections of data formatted in key-value pairs. Each key is unique and corresponds to one data value. You can view an example in the Series workflow below.

 

first_dict = {

    "name1": "Stephen",

    "name2": "Isabelle",

    "name3": "Habib",

    "name4": "Athena"

}

dict_series = pd.Series(first_dict)

Similar to when you create a list, you declare a dictionary with the curly brackets ({ }). Here you have assigned the variable first_dict to reference the object. Inside first_dict, you have key-value pairs. For example, name1 is the key for the "Stephen" value. The dictionary is then passed to the .Series() method as the argument.

When you print the new Series to the terminal, you may notice something distinct from our other examples:

Series in pandas created from dictionary object where keys are now a custom index

Did you see it? The keys in your dictionary have replaced the default index labels. This means you can now pull values from your Series by either their integer or string label (e.g. 0 or "name1"). Converting dictionaries into Series is a simple way to define a custom index and preserve the key values. We'll review the process for extracting values by their index in the next section.

Before continuing, it's worth noting that there is an additional way to define a custom index. To do this, add an index argument to the Series method:

 

cust_index_series = pd.Series(dataset, index = ["pos1", "pos2", "pos3", "pos4"])

Here you provide a custom list (["pos1", "pos2", "pos3", "pos4"]) to index dataset by passing it as the argument for the index parameter. You can now see your custom index deployed in the screenshot below.

Series in pandas with custom index of string labels

The default zero-indexed labels (0, 1, 2, 3) have now been overridden by the list you defined when creating the Series. You can also create a custom index with integer or even boolean labels, but strings are the most common label since they open up a whole separate search method. Searching by string labels will be reviewed in the next section.

The three methods for creating pandas Series covered here are not the only way you can generate a Series, but they are the approaches you will most often use.

This video from Codemy reviews how to create a Series from lists, arrays, and dictionaries:

 

Pandas Series Methods

Now that you have your Series, there are many built-in methods for interacting with the data and producing insights. The methods examined below are by no means all the operations available; these have been selected as the methods you will likely use the most.

1. Display Methods

Best for: creating a quick snapshot of the values in your Series

With large data sets, you have different options for viewing the Series rows. The simplest approach is to print the entire Series to the terminal with the Python print function:

 

print(long_num_series)

However, this is not always practical for large data sets. For example, the long_num_series has 24 rows. Since the data inside is an ordered list of integers, you can get a good idea of the data values inside using the .head and .tail methods.

To start, .head() captures the first five rows of the Series:

 

long_num_series.head()

The output is below. Note that you still need to print the result of calling .head() to see it in the terminal.

Screenshot of pandas Series with first five rows displaying

This gives you a quick "snapshot" of the start of your Series. You can also provide an integer as an argument if you would like to see more (or less) than five rows:

 

long_num_series.head(10)

Here you retrieved the first 10 rows. The result is below.

Screenshot of pandas Series with first 10 rows displaying

The .tail method follows the same principles but displays the end of the Series. Like with the .head method, .tail() displays five rows by default:

 

long_num_series.tail()

The results are printed below.

Screenshot of pandas Series with last five rows displaying

You can also specify a different number of rows than the default by providing an integer as an argument. The .head and .tail methods are useful because they are more efficient than printing the entire Series to the terminal each time you want to confirm a change.

Next, let's inspect the methods for indexing the Series.

2. Index Methods

Best for: retrieving subsets of your Series to analyze or store in new Series

Indexing is the process through which you retrieve rows or sets of rows from your Series based on their position labels. Series have two main methods for retrieving data values: .iloc and .loc.

.iloc (short for integer location) takes one or more integers as its argument. Because Series are zero-indexed, these integers refer to the position of each data value. For example, integer 0 corresponds to the first value, integer 1 to the second value, and so on.

In this example, you use .iloc[] to call the value at index 2 (third position):

 

num_series.iloc[2]

The output is below:

Third row of Series printed to the terminal

With .iloc, you can also use a segmenting method known as slicing. Slicing specifies a range of rows to return. For example, this code calls for the first three rows:

 

num_series.iloc[0:3]

The result is below.

First three rows from Series printed to the terminal

Even though the slice includes the index value 3, the row at position 3 is not returned. That is because the range is exclusive, which means that the value after the colon (:) specifies the index to end the slice at. In other words, the range includes everything up to — but not including — the fourth row.

.loc (short for location) functions in a similar way, but it takes string labels instead of integers. This means that .loc only works with custom indexes. Below, the code calls for the row at the index "pos2".

 

num_series.loc["pos2"]

.loc[] retrieves the row matching this string in the output:

Second row of Series matching index "pos2" printed to terminal

You can retrieve more than one row at a time by passing a list of named indexes to .loc[]:

 

num_series.loc[["pos4", "pos2"]]

The result is below.

Two Series rows matching indexes "pos4" and "pos2" printed to the terminal

Note that even though the "pos2" row comes before "pos4" in the actual Series, the output follows the order you specify in the list.

You can use slices in .loc calls and named lists in .iloc calls, but these use cases are not as common.

Next, we'll examine two methods to sort a Series.

3. Sort Methods

Best for: improving the uniformity of your Series and discovering trends within your data set

pandas Series have two methods for sorting rows: .sort_index and .sort_value. As you may guess from the name, .sort_index focuses on your Series' index.

In this scenario, you created a Series with an unordered index and now want to see what your Series looks like if the index is sorted in ascending order.

 

num_series = pd.Series(dataset, ["pos4", "pos2", "pos3", "pos1"])

num_series.sort_index(inplace = True)

Here you are calling .sort_index() with the parameter inplace. The default behavior for this method is to return a new copy of the Series. By setting inplace to True, you are reversing this behavior so that the method modifies the original Series. The output is below.

Series with a now sorted index in alphabetical and numerical order

This example demonstrates that .sort_index can order the index both alphabetically and numerically. Additionally, .sort_values also shares this capability.

In this scenario, you have a Series of names you want to sort in descending order. Remember that the sort methods provide ascending order by default, so you'll need to reverse the default behavior in .sort_values by setting the ascending parameter to False:

 

name_series = pd.Series(first_dict)

name_series.sort_values(ascending = False, inplace = True)

Once again, .sort_values() will provide numerical and/or alphabetical order in the output. This time, the result displays the data value that starts with the letter that’s last in the alphabet and works its way to the start:

Series with sorted rows in reverse alphabetical order

Of the two methods, you'll probably find .sort_values more useful, but you have the option to sort by either the index or the data values in your Series.

Sorting is one approach to discovering trends in your Series. The next method allows you to dive deeper into the analytics of numerical data sets.

4. Aggregation Methods

Best for: building an analytical picture of your numerical data set for deeper insights

You can aggregate numerical data in your Series with the .sum, .product, .mean, .median, .max, and .min methods. These calculations provide a higher-level picture of your data and help you measure its quality by checking for outliers, skewed data, and so on. These methods and their corresponding concepts are broken down below.

  • .sum: Returns the result of adding all values in a Series together.
  • .product: Returns the result of multiplying all values in a Series.
  • .mean: Calculates the average value by adding all values and dividing by the total rows.
  • .median: Returns the midpoint in a numerical data set.
  • .max: Finds the largest number in a Series.
  • .min: Finds the smallest number in a Series.

You can call any of these methods on their own:

 

num_series = pd.Series([25, 50, 75, 100])

num_series.median()

The result of calling .median() is below.

The calculated median of a series printed to the terminal

Instead of performing each calculation individually, you can perform multiple calculations with the .agg (short for aggregate) method:

 

num_series.agg(['sum','product','mean','median','max','min'])

The output even formats each of these results into a Series with a corresponding index value so it's easy to read:

Aggregation method results organized into a Series with custom index

Note that these methods will only work with purely numerical data sets. Even one string value in a Series will cause the aggregation operations to fail.

5. Null Value Methods

Best for: removing null values to improve the data integrity of your Series

Your Series may not always have a complete data set or correct values. In this case, you can use the null value methods to either remove rows with invalid values or fill them with new values.

In this example, you have a numerical dataset that has two null values. NaN stands for "not a number."

Series with two null values printed to the terminal

If you want to improve the data integrity of your Series, you can use the .dropna() method to remove — or drop — any null values, including NaNs:

 

num_series.dropna(inplace = True)

The result is below.

Series with two null values removed

Note that the index has not reordered even though the Series is now only four rows long. This ensures any string labels remain with their corresponding rows if you are using a custom index.

If you would rather replace null values with placeholder data (e.g. a base number), you can use the .fillna() method to overwrite — or fill — null values:

 

num_series.fillna(0, inplace = True)

Now you have replaced any NaNs with zeros:

Series with two rows updated to 0 values

Whether it's better to drop rows with null values or fill them with placeholder values depends on your particular use case and goals.

Unlock new insights with Series in Pandas.

Moving your data into Series in pandas opens up whole new possibilities for data analysis and decision making. Series accept lists, NumPy arrays, and dictionaries, standardizing your data versus having to conform to the requirements of three different objects. Series also provide a wealth of built-in methods to display, index, sort, aggregate, and improve your data sets to ensure you are always finding the right insights to solve your business's biggest challenges.

python

 python-guide

Originally published Feb 24, 2022 7:00:00 AM, updated March 21 2022

Topics:

What Is Python?