3 Effective Methods of Dealing With Missing Values in Python

df.dropna(), df.replace(), df.fillna()

The absence of data value in an observation is called a null value. As common as they occur in data analysis, missing values can significantly impact the data analysis process and skew conclusions drawn from datasets.

There are various reasons why missing values may exist. Incomplete data entry, equipment malfunctions and lost files are some reasons.

There are different types of missing data, and it is necessary to consider the reason for missing values as it informs one's action. This article will consider omission and imputation methodologies.

Omission involves removing samples with invalid data from rows that might have less impact on the analysis. Imputation includes filling in the missing data with a value such as the mean and the median of a variable.

Approach 1 df.dropna()

Dropping missing values:

Using the dropna() function to remove rows containing missing values from a DataFrame.

#import libraries

import pandas as pd
import numpy as np

#create a DataFrame

data = { "Fruits":["Apple","Banana", "Orange", "Strawberry","Pineapple", "Watermelon"], "Number_Of_Fruits":[66," np.nan ",99," np.nan ",23, " np.nan "]}

df =pd.DataFrame(data)

#display the first five rows of the DataFrame

df.head(5)

python missing values example

#Display the sum of null values using isna()

dropping values with isna #drop all the null values using dropna()

dropping values with dropna Approach 2 df.replace()

Replacing missing values:
Using the df.replace() to replace null values with the mean value

# Find the mean of the “ Number_Of_Fruits" variable

dropping values with dropna # Replace the null values with the mean value and display the values.

dropping values with dropna
Approach 3 fillna()

#import two libraries

import pandas as pd

import numpy as np

#create a DataFrame

data = { "Fruits":["Apple","Banana", "Orange", "Strawberry","Pineapple", "Watermelon"], "Number_Of_Fruits":[66," np.nan ",99," np.nan ",23, " np.nan "]}

df =pd.DataFrame(data)

#Find the mean value and fill the null values with fillna() and display the result.

dropping values with dropna
Conclusion

To summarise, missing values refer to the absence of data values for a variable in an observation. They are a common occurrence when analysing data, and when not dealt with, they can significantly impact the outcome of an analysis. There are various reasons why data may be missing from an observation, and it is essential to determine the importance of each missing value to the analysis process in order to take the right action.