Quick and dirty look at the causes of workplace injuries. Data obtained from Kaggle.com and, while not a very comprehensive list (sample size is only 537), I'm using this as more of an experiment with Jupyter notebook and pelican.
import csv
import pprint
with open('../data/injuries.csv') as csvfile:
df = csv.reader(csvfile, delimiter=',')
next(df)
injury_collection = {}
row_count = 0
for i in df:
row_count += 1
injury_type = i[2]
if injury_type in injury_collection:
injury_collection[injury_type] += 1
else:
injury_collection[injury_type] = 1
if row_count == 20:
break
pprint.pprint(injury_collection)
print 'Total injury types in first 20 lines:',len(injury_collection)
I cut this early.. It appears the dataset needs a lot of cleaning.. Pretty much every single entry is listed as a unique type.. Next step is to aggregate similar categories together:
import csv
import pprint
with open('../data/injuries.csv') as csvfile:
df = csv.reader(csvfile, delimiter=',')
next(df)
injury_collection = {}
row_count = 0
for i in df:
row_count += 1
injury_type = i[2]
if 'aircraft' in injury_type.lower() or 'in-flight' in injury_type.lower():
injury_type = 'aircraft'
elif 'animal' in injury_type.lower():
injury_type = 'animal'
elif 'bending' in injury_type.lower() or 'walking' in injury_type.lower() or 'running' in injury_type.lower() or 'kneeling' in injury_type.lower() or 'jump' in injury_type.lower() or 'sitting' in injury_type.lower() or 'standing' in injury_type.lower() or 'boarding' in injury_type.lower() or 'climbing' in injury_type.lower() or 'slipping' in injury_type.lower():
injury_type = 'movement'
elif 'bites' in injury_type.lower() or 'bitten' in injury_type.lower():
injury_type = 'insect'
elif 'aircraft' in injury_type.lower():
injury_type = 'aircraft'
elif 'bomb' in injury_type.lower() or 'arson' in injury_type.lower():
injury_type = 'bomb/arson'
elif 'caught' in injury_type.lower() or 'machinery' in injury_type.lower() or 'compressed' in injury_type.lower() or 'structure' in injury_type.lower() or 'equipment' in injury_type.lower():
injury_type = 'equipment'
elif 'collision' in injury_type.lower():
injury_type = 'collison'
elif 'contact' in injury_type.lower() or 'exposure' in injury_type.lower() or 'ignition' in injury_type.lower() or 'substance' in injury_type.lower():
injury_type = 'dangerous substance'
elif 'electricity' in injury_type.lower():
injury_type = 'electric'
elif 'drug' in injury_type.lower():
injury_type = 'drug'
elif 'fall' in injury_type.lower()or 'falls' in injury_type.lower() or 'collapsing' in injury_type.lower():
injury_type = 'falling'
elif 'explosion' in injury_type.lower():
injury_type = 'explosion'
elif 'fire' in injury_type.lower():
injury_type = 'fire'
elif 'drowning' in injury_type.lower() or 'choking on' in injury_type.lower():
injury_type = 'choking'
elif 'vehicle' in injury_type.lower() or 'overturned' in injury_type.lower() or 'oncoming' in injury_type.lower() or 'vehicular' in injury_type.lower():
injury_type = 'vehicle'
elif 'unknown' in injury_type.lower():
injury_type = 'unknown'
elif 'other person' in injury_type.lower() or 'by other' in injury_type.lower():
injury_type = 'intentional by other'
elif 'self-harm' in injury_type.lower() or 'intentional' in injury_type.lower():
injury_type = 'self-harm'
elif 'rubbed' in injury_type.lower() or 'repetitive' in injury_type.lower():
injury_type = 'friction'
elif 'struck' in injury_type.lower():
injury_type = 'struck'
elif 'transportation' in injury_type.lower() or 'cycle' in injury_type.lower():
injury_type = 'transportation'
elif 'overexertion' in injury_type.lower() or 'exertions' in injury_type.lower():
injury_type = 'overexertion'
else:
injury_type = 'other'
if injury_type in injury_collection:
injury_collection[injury_type] += 1
else:
injury_collection[injury_type] = 1
pprint.pprint(injury_collection)
print len(injury_collection)
So a quick condensing of the injury type list knocks it down from over 500 entries to 22 specific injury types. lets plot it real quick with matplotlib!
import matplotlib.pyplot as plt
plt.figure(figsize=(30,12))
plt.bar(range(len(injury_collection)), injury_collection.values(), align='center')
plt.xticks(range(len(injury_collection)), injury_collection.keys())
aplt.show()
A nice dataset to test jupyter and pelican with, yet not much use from this info as I could not determine what company this data is originated from. Aside from injuries from movements (bending/twisting/running/etc), equipment/machinery incidents and being around dangerous substances appear to be the highest causes of workplace injuries.