Forecasting Chicago Crime Rates with SARIMA

My last project of automating a data import pipeline for Chicago's crime data created the perfect environment for using past crime rates to predict future. I use SARIMA time-series forecasting to predict weekly crime rates 6-months out for the city and create a heatmap of location by time to further identify crime trends for Chicago.

more ...

Automating an ETL Pipeline w/ Python and MySQL

I find myself often working with data that is updated on a regular basis. Rather than manually run through the etl process every time I wish to update my locally stored data, I thought it would be beneficial to work out a system to update the data through an automated script. I use python and MySQL to automate this etl process using the city of Chicago's crime data.

more ...

Character Co-occurrence Network Diagram w/ NetworkX in Python

Being a big fan of fantasy novels, I've always had an interest in how characters within books with massive character lists all interweave and connect together. I've also had interest for awhile now in visualizing some type of complex network with networkX in Python. Oathbringer is the most recent book from Brandon Sanderson's Stormlight Archive series and it was a perfect option for me to combine both of these interests. The following is the code I used to parse through the etext of the novel and create a character co-occurence network diagram. Although certain decisions made throughout the process may not be perfect for representing direct 'co-occurrences', I found the resulting visual to be an interesting look at relationships seen throughout the book.

more ...

Practice at Dimensionality Reduction with SKlearn

I found some free time and thought I'd finally get some more practice at dimensionality reduction. With this goal in mind, I went onto Kaggle and found a competition(Estimate house prices) which looked appropriate to practice these skill with. Throughout this post I walk through the steps I took from cleaning and standardizing the data, to finally performing PCA and fitting a simple linear regression to the top five most influential eigenvectors! Not the most accurate regression ever, but great practice and surprisingly efficient given it drops 81 variables into only 5.

more ...

2018 March Madness Bracket Predictor (part II)

Section two working towards a finding a model in which to predict the 2018 NCAA tourney results. This segment takes the cleaned csv file from part I and further manipulates it into a format appropriate for fitting to models. The primary problem with the current format is the fact each row revolves around a game. This code will split the rows into two- One for the winner and one for the loser.

more ...

2018 March Madness Bracket Predictor (part I)

First step in creating a 2018 March Madness Bracket predictor for both personal practice with scikit-learn and to compete in the Kaggle competition this year. Part 1 revolves around cleaning the data and creating new variables to use in modeling.

more ...