Applying Filters on Pandas DataFrame becomes crucial for all the data manipulations and transformations.
In the vast landscape of data analysis, the ability to filter and extract specific information from a dataset is paramount. Pandas, the go-to data manipulation library in Python, provides powerful tools for filtering data within a DataFrame. In this guide, we’ll explore various methods to apply filters on a Pandas DataFrame, empowering you to wield this skill with precision in your data analysis endeavors.
Pandas DataFrame - Understanding the Basics
A Pandas DataFrame is a two-dimensional tabular data structure, akin to a spreadsheet. Filtering, in this context, refers to the process of selecting a subset of rows or columns based on certain conditions. This allows you to focus on the data that is relevant to your analysis.
Applying Filters on Pandas DataFrame - Filtering Rows Based on Conditions
Let’s go through few use cases that will give you better understanding on filtering Pandas DataFrame rows based on conditions.
Use Case I - Pandas DataFrame Filtering Rows Based on Conditions
In this example, we have a DataFrame of student names with their ages and grades.
import pandas as pd data = {'Name': ['Bob', 'Harry', 'Ellie', 'David'], 'Age': [25, 30, 22, 28], 'Grade': [85, 92, 78, 95]} df = pd.DataFrame(data)
Use Case 2 - Pandas DataFrame Filtering Rows Based on Multiple Conditions
Now, if we want the rows where the grade is more than 90; below chunk of code should be able t perform that operation.
high_scorers = df[df['Grade'] > 90]
Now, if we want the rows where grade is more than 90 and the age is more than 25.
filtered_students = df[(df['Grade'] > 90) & (df['Age'] > 25)]
Use Case 2 - Pandas DataFrame Filtering Rows using isin method
If you want to select among multiple vales together, below process can be followed –
selected_students = df[df['Name'].isin(['Bob', 'David']
)]
For a larger list of values you can assign the list first and then use that list within isin() to apply.
names = ['Bob', 'David']
selected_students = df[df['Name'].isin(names)]
Use Case 3 - Pandas DataFrame Filtering Rows using Negating method
If you want to filter the data such a way that you want to include everything except few values, the negate (~) method becomes handy. The negate (~) also works for almost all the conditions if you want to take the opposite of that. Below is just a use case –
selected_students = df[~df['Name'].isin(['Bob', 'David'] )]
For a larger list of values you can assign the list first and then use that list within isin() to apply.
names = ['Bob', 'David'] selected_students = df[~df['Name'].isin(names)]
Pingback: How to drop columns in a Pandas DataFrame - Data Analytics Edu
Pingback: How to rename columns in pandas DataFrame - Data Analytics Edu