Chapter 8 Conversion Methods

Chapter 8 Conversion Methods

import pandas as pd
import numpy as np
url = 'https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip'

df = pd.read_csv(url, usecols = ['city08', 'fuelType1'])
city_mpg1 = df.city08
fuel_type1 = df.fuelType1

Exercise 1

Convert a numeric column to a smaller data type

city_mpg1.dtype
dtype('int64')
city_mpg2 = city_mpg1.astype('int16')
city_mpg2.dtype
dtype('int16')

Exercise 2

Calculate the memory savings by converting to a smaller numeric types.

mem_delta_cty_mpg = city_mpg2.memory_usage(deep=True) - city_mpg1.memory_usage(deep=True)

print('By converting the data type of the `city_mpg1` series we have successfully altered the amount of memory required to store this infomration by {} bytes.'.format(mem_delta_cty_mpg))
By converting the data type of the `city_mpg1` series we have successfully altered the amount of memory required to store this infomration by -246864 bytes.

Exercise 3

Convert a string column into a categorical type

fuel_type1.dtype
dtype('O')
fuel_type2 = fuel_type1.astype('category')
fuel_type2.dtype
CategoricalDtype(categories=['Diesel', 'Electricity', 'Midgrade Gasoline', 'Natural Gas',
                  'Premium Gasoline', 'Regular Gasoline'],
, ordered=False)

Exercise 4

Calculate the memory savings by converting to a categorical type.

mem_delta_cty_mpg = fuel_type2.memory_usage(deep=True) - fuel_type1.memory_usage(deep=True)

print('By converting the data type of the `fuel_type1` series we have successfully altered the amount of memory required to store this infomration by {} bytes.'.format(mem_delta_cty_mpg))
By converting the data type of the `fuel_type1` series we have successfully altered the amount of memory required to store this infomration by -2948768 bytes.