Google Trends gives us an estimate of search volume. Let's explore if search popularity relates to other kinds of data. Perhaps there are patterns in Google's search volume and the price of Bitcoin or a hot stock like Tesla. Perhaps search volume for the term "Unemployment Benefits" can tell us something about the actual unemployment rate?
Data Sources:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
Download and add the .csv files to the same folder as your notebook.
df_tesla = pd.read_csv('TESLA Search Trend vs Price.csv')
df_btc_search = pd.read_csv('Bitcoin Search Trend.csv')
df_btc_price = pd.read_csv('Daily Bitcoin Price.csv')
df_unemployment = pd.read_csv('UE Benefits Search vs UE Rate 2004-19.csv')
Challenge:
.describe() function to see some useful descriptive statisticsdf_tesla.head()
| MONTH | TSLA_WEB_SEARCH | TSLA_USD_CLOSE | |
|---|---|---|---|
| 0 | 2010-06-01 | 3 | 4.766 |
| 1 | 2010-07-01 | 3 | 3.988 |
| 2 | 2010-08-01 | 2 | 3.896 |
| 3 | 2010-09-01 | 2 | 4.082 |
| 4 | 2010-10-01 | 2 | 4.368 |
df_tesla.tail()
| MONTH | TSLA_WEB_SEARCH | TSLA_USD_CLOSE | |
|---|---|---|---|
| 119 | 2020-05-01 | 16 | 167.000000 |
| 120 | 2020-06-01 | 17 | 215.962006 |
| 121 | 2020-07-01 | 24 | 286.152008 |
| 122 | 2020-08-01 | 23 | 498.320007 |
| 123 | 2020-09-01 | 31 | 407.339996 |
df_tesla.shape
(124, 3)
print(f'Largest value for Tesla in Web Search: {df_tesla["TSLA_WEB_SEARCH"].max()}')
print(f'Smallest value for Tesla in Web Search: {df_tesla["TSLA_WEB_SEARCH"].min()}')
Largest value for Tesla in Web Search: 31 Smallest value for Tesla in Web Search: 2
df_tesla.describe()
| TSLA_WEB_SEARCH | TSLA_USD_CLOSE | |
|---|---|---|
| count | 124.000000 | 124.000000 |
| mean | 8.725806 | 50.962145 |
| std | 5.870332 | 65.908389 |
| min | 2.000000 | 3.896000 |
| 25% | 3.750000 | 7.352500 |
| 50% | 8.000000 | 44.653000 |
| 75% | 12.000000 | 58.991999 |
| max | 31.000000 | 498.320007 |
print(df_unemployment.shape)
df_unemployment.head()
(181, 3)
| MONTH | UE_BENEFITS_WEB_SEARCH | UNRATE | |
|---|---|---|---|
| 0 | 2004-01 | 34 | 5.7 |
| 1 | 2004-02 | 33 | 5.6 |
| 2 | 2004-03 | 25 | 5.8 |
| 3 | 2004-04 | 29 | 5.6 |
| 4 | 2004-05 | 23 | 5.6 |
print('Largest value for "Unemployemnt Benefits" '
f'in Web Search: {df_unemployment.UE_BENEFITS_WEB_SEARCH.max()}')
Largest value for "Unemployemnt Benefits" in Web Search: 100
print(df_btc_price.shape)
df_btc_price.head()
(2204, 3)
| DATE | CLOSE | VOLUME | |
|---|---|---|---|
| 0 | 2014-09-17 | 457.334015 | 21056800.0 |
| 1 | 2014-09-18 | 424.440002 | 34483200.0 |
| 2 | 2014-09-19 | 394.795990 | 37919700.0 |
| 3 | 2014-09-20 | 408.903992 | 36863600.0 |
| 4 | 2014-09-21 | 398.821014 | 26580100.0 |
print(df_btc_search.shape)
df_btc_search.head()
(73, 2)
| MONTH | BTC_NEWS_SEARCH | |
|---|---|---|
| 0 | 2014-09 | 5 |
| 1 | 2014-10 | 4 |
| 2 | 2014-11 | 4 |
| 3 | 2014-12 | 4 |
| 4 | 2015-01 | 5 |
print(f'largest BTC News Search: {df_btc_search["BTC_NEWS_SEARCH"].max()}')
largest BTC News Search: 100
Challenge: Are there any missing values in any of the dataframes? If so, which row/rows have missing values? How many missing values are there?
print(f'Missing values for Tesla?: {df_tesla.isna().values.any()}')
print(f'Missing values for U/E?: {df_unemployment.isna().values.any()}')
print(f'Missing values for BTC Search?: {df_btc_search.isna().values.any()}')
Missing values for Tesla?: False Missing values for U/E?: False Missing values for BTC Search?: False
print(f'Missing values for BTC price?: {df_btc_price.isna().values.any()}')
Missing values for BTC price?: True
print(f'Number of missing values: {df_btc_price.isna().values.sum()}')
df_btc_price[df_btc_price.CLOSE.isna()]
Number of missing values: 2
| DATE | CLOSE | VOLUME | |
|---|---|---|---|
| 2148 | 2020-08-04 | NaN | NaN |
Challenge: Remove any missing values that you found.
df_btc_price.dropna(inplace=True)
Challenge: Check the data type of the entries in the DataFrame MONTH or DATE columns. Convert any strings in to Datetime objects. Do this for all 4 DataFrames. Double check if your type conversion was successful.
df_btc_price.DATE = pd.to_datetime(df_btc_price.DATE, yearfirst=True)
df_btc_search.MONTH = pd.to_datetime(df_btc_search.MONTH, yearfirst=True)
df_btc_price.DATE.head()
0 2014-09-17 1 2014-09-18 2 2014-09-19 3 2014-09-20 4 2014-09-21 Name: DATE, dtype: datetime64[ns]
df_tesla.MONTH = pd.to_datetime(df_tesla.MONTH, yearfirst=True)
df_tesla.head()
| MONTH | TSLA_WEB_SEARCH | TSLA_USD_CLOSE | |
|---|---|---|---|
| 0 | 2010-06-01 | 3 | 4.766 |
| 1 | 2010-07-01 | 3 | 3.988 |
| 2 | 2010-08-01 | 2 | 3.896 |
| 3 | 2010-09-01 | 2 | 4.082 |
| 4 | 2010-10-01 | 2 | 4.368 |
df_unemployment.MONTH = pd.to_datetime(df_unemployment.MONTH)
df_unemployment.head()
| MONTH | UE_BENEFITS_WEB_SEARCH | UNRATE | |
|---|---|---|---|
| 0 | 2004-01-01 | 34 | 5.7 |
| 1 | 2004-02-01 | 33 | 5.6 |
| 2 | 2004-03-01 | 25 | 5.8 |
| 3 | 2004-04-01 | 29 | 5.6 |
| 4 | 2004-05-01 | 23 | 5.6 |
df_btc_monthly = df_btc_price.resample('M', on="DATE").mean()
print(df_btc_monthly.shape)
df_btc_monthly.head()
(73, 2)
| CLOSE | VOLUME | |
|---|---|---|
| DATE | ||
| 2014-09-30 | 407.182428 | 2.934864e+07 |
| 2014-10-31 | 364.148873 | 2.912885e+07 |
| 2014-11-30 | 366.099799 | 2.199111e+07 |
| 2014-12-31 | 341.267871 | 1.784201e+07 |
| 2015-01-31 | 248.782547 | 3.544555e+07 |
df_btc_monthly = df_btc_price.resample('M', on="DATE").last() # Month End Data
df_btc_monthly.head()
| DATE | CLOSE | VOLUME | |
|---|---|---|---|
| DATE | |||
| 2014-09-30 | 2014-09-30 | 386.944000 | 34707300.0 |
| 2014-10-31 | 2014-10-31 | 338.321014 | 12545400.0 |
| 2014-11-30 | 2014-11-30 | 378.046997 | 9194440.0 |
| 2014-12-31 | 2014-12-31 | 320.192993 | 13942900.0 |
| 2015-01-31 | 2015-01-31 | 217.464005 | 23348200.0 |
# Create locators for ticks on the time axis
years = mdates.YearLocator()
months = mdates.MonthLocator()
years_fmt = mdates.DateFormatter('%Y')
# Register date converters to avoid warning messages
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
Challenge: Plot the Tesla stock price against the Tesla search volume using a line chart and two different axes. Label one axis 'TSLA Stock Price' and the other 'Search Trend'.
ax1 = plt.gca()
ax2 = plt.twinx()
ax2.set_ylabel("Web Search")
ax1.set_xlabel("Month")
ax1.set_ylabel("TSLA Stock Price")
ax2.plot(df_tesla.MONTH, df_tesla.TSLA_WEB_SEARCH)
ax1.plot(df_tesla.MONTH, df_tesla.TSLA_USD_CLOSE)
[<matplotlib.lines.Line2D at 0x7fba08511ee0>]
Challenge: Add colours to style the chart. This will help differentiate the two lines and the axis labels. Try using one of the blue colour names for the search volume and a HEX code for a red colour for the stock price.
Hint: you can colour both the axis labels and the lines on the chart using keyword arguments (kwargs).
ax1 = plt.gca()
ax2 = plt.twinx()
ax2.set_ylabel("Web Search", color="blue")
ax1.set_xlabel("Month")
ax1.set_ylabel("TSLA Stock Price", color="#FD0707")
ax2.plot(df_tesla.MONTH, df_tesla.TSLA_WEB_SEARCH, color="blue")
ax1.plot(df_tesla.MONTH, df_tesla.TSLA_USD_CLOSE, color="#FD0707")
# plt.show()
[<matplotlib.lines.Line2D at 0x7fba08534dc0>]
Challenge: Make the chart larger and easier to read.
# Plot Edits
plt.figure(figsize=(14,8), dpi=150)
plt.title("Tesla Web Search vs Price", fontsize=18)
plt.xticks(fontsize=14, rotation=45)
plt.yticks(fontsize=14)
ax1 = plt.gca()
ax2 = plt.twinx()
# Labels
ax1.set_xlabel("Month", fontsize=14)
ax1.set_ylabel("TSLA Stock Price", fontsize=14, color="#FD0707")
ax2.set_ylabel("Web Search", fontsize=14, color="blue")
# Set the minimum and maximum values on the axes
ax1.set_ylim([0, 600])
ax1.set_xlim([df_tesla.MONTH.min(), df_tesla.MONTH.max()])
# Ticks
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)
ax1.plot(df_tesla.MONTH, df_tesla.TSLA_USD_CLOSE, color="#FD0707")
ax2.plot(df_tesla.MONTH, df_tesla.TSLA_WEB_SEARCH, color="blue")
plt.show()
How to add tick formatting for dates on the x-axis.
# Ticks
# ax1.xaxis.set_major_locator(years)
# ax1.xaxis.set_major_formatter(years_fmt)
# ax1.xaxis.set_minor_locator(months)
Challenge: Create the same chart for the Bitcoin Prices vs. Search volumes.
plt.figure(figsize=(14,8), dpi=120)
plt.title("Bitcoin News Search vs Resampled Price", fontsize=18)
plt.xticks(fontsize=14, rotation=45)
plt.yticks(fontsize=14)
ax1 = plt.gca()
ax2 = plt.twinx()
ax1.set_xlabel("Month")
ax1.set_ylabel("BTC Price", color="orange", fontsize=14)
ax2.set_ylabel("Search Trend", color="skyblue", fontsize=14)
ax1.set_xlim([df_btc_monthly.index.min(), df_btc_monthly.index.max()])
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)
ax1.plot(df_btc_monthly.index, df_btc_monthly.CLOSE, color="orange", linestyle="--")
ax2.plot(df_btc_monthly.index, df_btc_search.BTC_NEWS_SEARCH, color="skyblue", linewidth=3, marker="o")
[<matplotlib.lines.Line2D at 0x7fba3a12baf0>]
Challenge Plot the search for "unemployment benefits" against the unemployment rate.
plt.figure(figsize=(14,8), dpi=120)
plt.title("Monthly Search of Unemployment Benefits in the US v/s the U/E Rate", fontsize=18)
plt.xticks(fontsize=14, rotation=45)
plt.yticks(fontsize=14)
ax1 = plt.gca()
ax2 = plt.twinx()
ax1.set_xlabel("Month", fontsize=14)
ax1.set_ylabel("FRED U/E Rate", color="purple", fontsize=14)
ax2.set_ylabel("Search Trend", color="skyblue", fontsize=14)
ax1.set_xlim([df_unemployment.MONTH.min(), df_unemployment.MONTH.max()])
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)
ax1.grid(color="grey", linestyle="--")
ax1.plot(df_unemployment.MONTH, df_unemployment.UE_BENEFITS_WEB_SEARCH, color="purple", linestyle="--", linewidth=2)
ax2.plot(df_unemployment.MONTH, df_unemployment.UNRATE, color="skyblue", linewidth=2)
[<matplotlib.lines.Line2D at 0x7fba29ef6880>]
Challenge: Calculate the 3-month or 6-month rolling average for the web searches. Plot the 6-month rolling average search data against the actual unemployment. What do you see in the chart? Which line moves first?
plt.figure(figsize=(14,8), dpi=120)
plt.title('Rolling Monthly US "Unemployment Benefits" Web Searches vs UNRATE', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(fontsize=14, rotation=45)
ax1 = plt.gca()
ax2 = ax1.twinx()
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)
ax1.set_ylabel('FRED U/E Rate', color='purple', fontsize=16)
ax2.set_ylabel('Search Trend', color='skyblue', fontsize=16)
ax1.set_ylim(bottom=3, top=10.5)
ax1.set_xlim([df_unemployment.MONTH[0], df_unemployment.MONTH.max()])
ax1.grid(color="grey", linestyle="--")
# Calculate the rolling average over a 6 month window
roll_df = df_unemployment[['UE_BENEFITS_WEB_SEARCH', 'UNRATE']].rolling(window=6).mean()
ax1.plot(df_unemployment.MONTH, roll_df.UNRATE, 'purple', linewidth=3, linestyle='-.')
ax2.plot(df_unemployment.MONTH, roll_df.UE_BENEFITS_WEB_SEARCH, 'skyblue', linewidth=3)
plt.show()
Challenge: Read the data in the 'UE Benefits Search vs UE Rate 2004-20.csv' into a DataFrame. Convert the MONTH column to Pandas Datetime objects and then plot the chart. What do you see?
df_unemployment_latest = pd.read_csv("UE Benefits Search vs UE Rate 2004-20.csv")
df_unemployment_latest.head()
| MONTH | UE_BENEFITS_WEB_SEARCH | UNRATE | |
|---|---|---|---|
| 0 | 2004-01 | 9 | 5.7 |
| 1 | 2004-02 | 8 | 5.6 |
| 2 | 2004-03 | 7 | 5.8 |
| 3 | 2004-04 | 8 | 5.6 |
| 4 | 2004-05 | 6 | 5.6 |
df_unemployment_latest.MONTH = pd.to_datetime(df_unemployment_latest.MONTH)
df_unemployment_latest.MONTH
0 2004-01-01
1 2004-02-01
2 2004-03-01
3 2004-04-01
4 2004-05-01
...
195 2020-04-01
196 2020-05-01
197 2020-06-01
198 2020-07-01
199 2020-08-01
Name: MONTH, Length: 200, dtype: datetime64[ns]