2019 TUN Data Challenge

Brief overview of the TUN Data Challenge I conducted with a team as part of SNHU experiential learning course. We work with client, marketing, and interaction data from non-profit, Hire Heroes USA to answer business problems specified by the organization. Our process involved: Defining goals, cleaning data, exploratory data analysis, statistical analysis, data visualization, and communicating results. Final results from the contest are still pending, expected in late July.

more ...

Parallel Coordinates Plot using Plotly

Over the holiday season I heard several discussions on which charities are best to donate to and why some are better than others. With this in mind, I thought it would be interesting to examine the stats which set one charity over another and find a way to visualize these in an effective manner. With some help from Charity Navigator, I was able to source and collect the appropriate information and thought it a great time to finally give Plotly a go.

more ...

Character Co-occurrence Network Diagram w/ NetworkX in Python

Being a big fan of fantasy novels, I've always had an interest in how characters within books with massive character lists all interweave and connect together. I've also had interest for awhile now in visualizing some type of complex network with networkX in Python. Oathbringer is the most recent book from Brandon Sanderson's Stormlight Archive series and it was a perfect option for me to combine both of these interests. The following is the code I used to parse through the etext of the novel and create a character co-occurence network diagram. Although certain decisions made throughout the process may not be perfect for representing direct 'co-occurrences', I found the resulting visual to be an interesting look at relationships seen throughout the book.


Classifying Comment Toxicity w/ NLTK and NB

I've wanted to get more practice with natural language processing, so I grabbed a dataset of Wikipedia comments from a past Kaggle challenge to attempt to classify toxicity of each comment. Train data was a collection of over 150k comments connected to user-defined classifications of 'toxic', 'severe toxic', 'obscene', 'insult', 'threat', 'hate'. Here's the process I took to create a model which identifies these particular classifications of future comments!

First is to import libraries we'll be using


Titanic Survival Classification w/ XGBoost

Summary: It's been alittle over three years since I attempted my first ML project of classifying survival of the titanic.. A pretty famous dataset and task for this field I'd say. I achieved a accuracy of around 75% in my first attempt at this (I believe with a random forest model). I though it would be nice to revisit this and give it another go now that I'm a little more experienced. Additionally, I've wanted to explore ensemble learned some more, particularly XGBoost. Let's dive in!


Lyric Analysis

An assortment of word clouds I made from lyrics I scraped for a school project. Size of each word is relative to frequency of usage by the artist in (up to) 150 of their most recent songs. Mount Eerie references nature a lot while Aesop Rock just references A LOT of things. Harry and the Potters, well... .

more ...