Classifying Comment Toxicity w/ NLTK and NB

I've wanted to get more practice with natural language processing, so I grabbed a dataset of Wikipedia comments from a past Kaggle challenge to attempt to classify toxicity of each comment. Train data was a collection of over 150k comments connected to user-defined classifications of 'toxic', 'severe toxic', 'obscene', 'insult', 'threat', 'hate'. Here's the process I took to create a model which identifies these particular classifications of future comments!

First is to import libraries we'll be using

Nightingale Rose in R

Summary: I recently tracked my daily coffee consumption and thought it would be interesting to find a fun way to visual it. After some exploration of options in R, I decided to give a Nightingale Rose Diagram a try.

Titanic Survival Classification w/ XGBoost

Summary: It's been alittle over three years since I attempted my first ML project of classifying survival of the titanic.. A pretty famous dataset and task for this field I'd say. I achieved a accuracy of around 75% in my first attempt at this (I believe with a random forest model). I though it would be nice to revisit this and give it another go now that I'm a little more experienced. Additionally, I've wanted to explore ensemble learned some more, particularly XGBoost. Let's dive in!

Lyric Analysis

An assortment of word clouds I made from lyrics I scraped for a school project. Size of each word is relative to frequency of usage by the artist in (up to) 150 of their most recent songs. Mount Eerie references nature a lot while Aesop Rock just references A LOT of things. Harry and the Potters, well... .

more ...

Quick Life-Expectancy Visualizations

A few visuals I created in Tableau displaying some trends in life expectancy by country. Data was sources from the World Bank Open Data website and includes life expectancy at birth of around 200 countries from between 1968 and 2014. Originally conducted for SNHU Data Visualization homework assignment in which to express data in five different method classifications: comparison, hierarchical, temporal, correlation, and geo-spatial.

more ...

A Baby on the Way

My wife and I have a child on the way! With this on my mind lately, I thought it would be interesting to grab some data and visualize tendencies seen throughout pregnancy and birth. I dug into CDC: National Center for Health Statistics and Social Security datasets and found a few interesting trends to visualize.

more ...