Automating an ETL Pipeline w/ Python and MySQL

I find myself often working with data that is updated on a regular basis. Rather than manually run through the etl process every time I wish to update my locally stored data, I thought it would be beneficial to work out a system to update the data through an automated script. I use python and MySQL to automate this etl process using the city of Chicago's crime data.

more ...

2019 TUN Data Challenge

Brief overview of the TUN Data Challenge I conducted with a team as part of SNHU experiential learning course. We work with client, marketing, and interaction data from non-profit, Hire Heroes USA to answer business problems specified by the organization. Our process involved: Defining goals, cleaning data, exploratory data analysis, statistical analysis, data visualization, and communicating results. Final results from the contest are still pending, expected in late July.

more ...

Practice at Dimensionality Reduction with SKlearn

I found some free time and thought I'd finally get some more practice at dimensionality reduction. With this goal in mind, I went onto Kaggle and found a competition(Estimate house prices) which looked appropriate to practice these skill with. Throughout this post I walk through the steps I took from cleaning and standardizing the data, to finally performing PCA and fitting a simple linear regression to the top five most influential eigenvectors! Not the most accurate regression ever, but great practice and surprisingly efficient given it drops 81 variables into only 5.


Nigerian Airline Frequency Analysis

A summary report of work I did for a Data Analytics II course at SNHU. Fictional company, Austin Air, is looking to enter the international airline industry with a goal of capturing a significant market share of business focused around Nigeria. Analysis will provide a look at the current industry and suggest strategies towards implementing a pilot program for the company to introduce to the market and achieve their goal.

more ...

2018 March Madness Bracket Predictor (part I)

First step in creating a 2018 March Madness Bracket predictor for both personal practice with scikit-learn and to compete in the Kaggle competition this year. Part 1 revolves around cleaning the data and creating new variables to use in modeling.


Injured Worker Causes

Quick and dirty look at the causes of workplace injuries. Data obtained from Kaggle.com and, while not a very comprehensive list (sample size is only 537), I'm using this as more of an experiment with Jupyter notebook and pelican.