Identifying Advertisements with ANN's

I pull data from the UCI Machine Learning Repo and use it to train a model which can identify advertisements based upon their image size and URL terminology. I work through cleaning the data, attempting a few different fitting algorithms, and end with some parameter-tuning of an ANN. My final model results in over 97% accuracy in classifying advertisements in testing. Originally conducted for a Machine Learning course as SHHU focused on the R language.

more ...

Multiple Linear Regression to Predict Consumer Spending

As in the last post, here's some more work in excel with economic variables. This time I use value forecasts of 30y mortgage, unemployment, and personal income rates, figured in a similar manner as before (annual growth/change rates - 10y moving averages) to predict future levels of personal consumption expenditures. I run a multilinear regression analysis to forecast PCE based upon the three independent variables and end up with some pretty strong results and an adjusted R-squared of .974.

more ...

Forecasting Economic Time-Series Variables in Excel

I grabbed some economic data from FRED and attempt to forecast future values for each variable using Excel. Econ variables of focus for this project include GDP, CPI, and the unemployment rate for the U.S. This was originally done for my Macro-econometrics class and provided me some nice time with Excel which I admittedly use too little.

more ...

Classifying Comment Toxicity w/ NLTK and NB

I've wanted to get more practice with natural language processing, so I grabbed a dataset of Wikipedia comments from a past Kaggle challenge to attempt to classify toxicity of each comment. Train data was a collection of over 150k comments connected to user-defined classifications of 'toxic', 'severe toxic', 'obscene', 'insult', 'threat', 'hate'. Here's the process I took to create a model which identifies these particular classifications of future comments!

First is to import libraries we'll be using


Titanic Survival Classification w/ XGBoost

Summary: It's been alittle over three years since I attempted my first ML project of classifying survival of the titanic.. A pretty famous dataset and task for this field I'd say. I achieved a accuracy of around 75% in my first attempt at this (I believe with a random forest model). I though it would be nice to revisit this and give it another go now that I'm a little more experienced. Additionally, I've wanted to explore ensemble learned some more, particularly XGBoost. Let's dive in!


Time Series Analysis in R

Lately I've been looking to explore time-series modeling and to get more practice with manipulating data in R. Luckily, a dataset on Kaggle provided me the opportunity to do both of these things. Attempting to predict future sale volume of items in stores (from 1-C russian store sales dataset) gave me the chance to apply the theoretical knowledge I've been studying of time-series analysis (ARIMA in particual). Additionally, I was able to get more comfortable with dplyr and lubridate in the process. more ...