US Unemployment Rate Analysis with Python

Oluwawemimo Folayan
5 min readMar 25, 2021

Unemployment is a term referring to individuals who are employable and actively seeking a job but are unable to find a job. Included in this group are those people in the workforce who are working but do not have an appropriate job. Usually measured by the unemployment rate, which is dividing the number of unemployed people by the total number of people in the workforce, unemployment serves as one of the indicators of a country’s economic status. reference

Unemployment in the United States discusses the causes and measures of U.S. unemployment and strategies for reducing it. Job creation and unemployment are affected by factors such as economic conditions, global competition, education, automation, and demographics. These factors can affect the number of workers, the duration of unemployment, and wage levels. continue reading. reference

Now, to the analysis.

Import necessary libraries, import data, set options.

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns%matplotlib inlineplt.style.use(“ggplot”)

Numpy is for scientific calculations, Pandas for data analysis, Matplotlib, and Seaborn for data visualization.

The data is available on Kaggle, here

data = pd.read_csv(‘/content/drive/MyDrive/Week 3 task/output.csv’)data.head()

Output;

Data Exploration

Here, the data will be explored for insights.

So, pandas.describe is a good way to start.

data.describe(include=”all”)

Let’s explore the results we have here.

  • There are 885548 rows, 5 columns.
  • The year column represents the recorded years for the unemployment rate, the minimum year recorded is 1990 and the maximum year is 2016, this indicates the data span across 26years.
  • The month columns contain 12 distinct values as expected there are 12months in a year. and march appears to be the most frequent month.
  • The State column consists of 47 distinct values, There are 50 states in the US, Hence, three States are not present in the data. Also, Texas appears most in the data with 57658 frequency.
  • County in the US means an administrative or political subdivision of each state that consists of a geographic region with specific boundaries and usually some level of governmental authority, just like a local government area, US has 3,144 counties, in the column, we have 1752 distinct values. hence, not all counties are represented by the data.
  • The rate column is the column that indicates the unemployment rate in 47 states of the US. There are min values of 0.00 indicating that there is a period in a particular state that records a Zero unemployment rate and we have the max values at 58.4 which also indicates a period in a particular state that records over 50% unemployment rate.
  • Lastly, the count function indicates there are no missing rows in the data, but let cross-check.

let’s check for the missing states

state_=data.State.unique()state_

Output;

From the data, the missing states are Alaska, Florida, and Georgia, although Florida is the 3rd most populous state in the US, Alaska has the largest area in the US and Georgia is the 7th most populous state in the US.

#Countplot on the statesplt.figure(figsize=(12,6))g = sns.countplot(data[‘State’])g.set_xticklabels(g.get_xticklabels(), rotation=90, ha=”right”)plt.show()

The frequency of Texas is much more than the other states and the lowest frequency is Delaware.

Let’s check for States and Rate

#average unemployment rate per statesstate_rate=data[[‘State’, ‘Rate’]]state_rate_= state_rate.groupby([‘State’],as_index=False).mean()state_rate_=state_rate_.sort_values([‘Rate’], ascending=False)state_rate_

output;

#visualie the average unemployment rate 
fig, ax = plt.subplots(figsize=(14,6))
sns.barplot(x=’State’, y=’Rate’, data=state_rate_, ax=ax)plt.title(‘The Average Rate Per states’)plt.xticks(rotation=’vertical’)plt.show()

Some of the Top states with high unemployment rates here are rank on the top 15 lists of the poorest states in the US. source

max_data=pd.DataFrame(data, columns=[‘Year’, ‘Month’,’State’,’County’, ‘Rate’])max_rate=max_data[max_data[‘Rate’]>50]max_rate
  • The states with high unemployment rates are Texas and Colorado between 1990–1991, 1992 respectively.
  • San Juan County in Colorado experienced the highest unemployment rate(58.4) in January 1992.
min_data=pd.DataFrame(data, columns=[‘Year’, ‘Month’, ’State’, ’County’, ‘Rate’])min_rate=max_data[min_data[‘Rate’]<0.1]min_rate

Output;

The state with zero unemployment rates is Texas between 1990–1993. Interestingly, in Texas, between 1990–1991, Starr County experienced a high unemployment rate, and Loving County and McMullen County experienced a Zero unemployment rate. same state different districts. Note, The US was just recovering from the Early 1990 recession which lasted eight months from July 1990 to March 1991.

Rate Column Analysis

plt.figure(figsize=(12,5))sns.distplot(data[‘Rate’])plt.show()

Output;

The rate column is positively skewed. that is, positive skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Skewness is the degree of distortion from the symmetrical bell curve or the normal distribution. It measures the lack of symmetry in data distribution. Symmetrical distribution will have a skewness of 0.

print(f’MODE:’, data.Rate.mode()),print(‘-’*10)print(f’median:’, data.Rate.median()),print(‘-’*10)print(f’mean:’, data.Rate.mean()),#data.Rate.()

Output;

Just confirmed the Rate is positively skewed, the mean and the median is greater than the mode.

Data Exploration by year

#average unemployment rate per yearyear=data[[‘Year’, ‘Rate’]]year_= year.groupby([‘Year’],as_index=False).mean()year_=year_.sort_values([‘Year’], ascending=False)
#check the trendfig, ax = plt.subplots(figsize=(12,5))sns.lineplot(x='Year', y='Rate', data=year_, ci=None, markers=True, ax=ax)ax.set_xticks(ticks=data['Year'].value_counts(ascending=True).index)plt.xticks(rotation='vertical')plt.show()

The spike that started from 2008 till 2013 was a result of the great recession that happened in the US. This article provides more insight into the recession period.

Note, this analysis is purely for learning, and hope you had a great time, give a clap if you do🤗.

--

--