Chapter 8 Conversion Methods
Contents
Chapter 8 Conversion Methods¶
import pandas as pd
import numpy as np
url = 'https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip'
df = pd.read_csv(url, usecols = ['city08', 'fuelType1'])
city_mpg1 = df.city08
fuel_type1 = df.fuelType1
Exercise 1¶
Convert a numeric column to a smaller data type
city_mpg1.dtype
dtype('int64')
city_mpg2 = city_mpg1.astype('int16')
city_mpg2.dtype
dtype('int16')
Exercise 2¶
Calculate the memory savings by converting to a smaller numeric types.
mem_delta_cty_mpg = city_mpg2.memory_usage(deep=True) - city_mpg1.memory_usage(deep=True)
print('By converting the data type of the `city_mpg1` series we have successfully altered the amount of memory required to store this infomration by {} bytes.'.format(mem_delta_cty_mpg))
By converting the data type of the `city_mpg1` series we have successfully altered the amount of memory required to store this infomration by -246864 bytes.
Exercise 3¶
Convert a string column into a categorical type
fuel_type1.dtype
dtype('O')
fuel_type2 = fuel_type1.astype('category')
fuel_type2.dtype
CategoricalDtype(categories=['Diesel', 'Electricity', 'Midgrade Gasoline', 'Natural Gas',
'Premium Gasoline', 'Regular Gasoline'],
, ordered=False)
Exercise 4¶
Calculate the memory savings by converting to a categorical type.
mem_delta_cty_mpg = fuel_type2.memory_usage(deep=True) - fuel_type1.memory_usage(deep=True)
print('By converting the data type of the `fuel_type1` series we have successfully altered the amount of memory required to store this infomration by {} bytes.'.format(mem_delta_cty_mpg))
By converting the data type of the `fuel_type1` series we have successfully altered the amount of memory required to store this infomration by -2948768 bytes.