Chapter 4 Categorical Data
Contents
Chapter 4 Categorical Data¶
import pandas as pd
Exercise 1¶
Using Jupyter, create a series with the temperature values for the last seven days. Filter out the values below the Mean
The US National Weather Service has some basic charts and data that are readily available. I’ll use the page dedicated to Salt Lake City—the capital of my state—and manually transcribe some data there.
#creating a series with temperature values from the last seven days
slc_temps_hi = pd.Series(
[34, 40, 46, 55, 56, 55, 47],
name = 'slc_temps_hi'
)
#creating a boolean array to use as a filtration mask
mask = slc_temps_hi > slc_temps_hi.mean()
#we'll display the bool array for good measure
mask
0 False
1 False
2 False
3 True
4 True
5 True
6 False
Name: slc_temps_hi, dtype: bool
#now we'll use the mask as a filter to display only values from `slc_temps_hi` that are above the average value of the series
slc_temps_hi[mask]
3 55
4 56
5 55
Name: slc_temps_hi, dtype: int64
Exercise 2¶
Using Jupyter, create a series with your favorite colors. Use the categorical type.
I’ve always been partial to purple, black, red, and pink. I’ll build a series to reflect that. Note that there are (at least) two very simple ways to to do this. I’ll include both methods below.
#method 1
fav_colors = pd.Series(
['purple', 'black', 'red', 'pink'],
name = 'favorite_colors',
dtype='category'
)
fav_colors
0 purple
1 black
2 red
3 pink
Name: favorite_colors, dtype: category
Categories (4, object): ['black', 'pink', 'purple', 'red']
#method 2
fav_colors2 = pd.Series(
['purple', 'black', 'red', 'pink'],
name = 'favorite_colors'
).astype('category')
fav_colors2
0 purple
1 black
2 red
3 pink
Name: favorite_colors, dtype: category
Categories (4, object): ['black', 'pink', 'purple', 'red']