Being a big fan of fantasy novels, I've always had an interest in how characters within books with massive character lists all interweave and connect together. I've also had interest for awhile now in visualizing some type of complex network with networkX in Python. Oathbringer is the most recent book from Brandon Sanderson's Stormlight Archive series and it was a perfect option for me to combine both of these interests. The following is the code I used to parse through the etext of the novel and create a character co-occurence network diagram. Although certain decisions made throughout the process may not be perfect for representing direct 'co-occurrences', I found the resulting visual to be an interesting look at relationships seen throughout the book.
I found some free time and thought I'd finally get some more practice at dimensionality reduction. With this goal in mind, I went onto Kaggle and found a competition(Estimate house prices) which looked appropriate to practice these skill with. Throughout this post I walk through the steps I took from cleaning and standardizing the data, to finally performing PCA and fitting a simple linear regression to the top five most influential eigenvectors! Not the most accurate regression ever, but great practice and surprisingly efficient given it drops 81 variables into only 5.
Section two working towards a finding a model in which to predict the 2018 NCAA tourney results. This segment takes the cleaned csv file from part I and further manipulates it into a format appropriate for fitting to models. The primary problem with the current format is the fact each row revolves around a game. This code will split the rows into two- One for the winner and one for the loser.
First step in creating a 2018 March Madness Bracket predictor for both personal practice with scikit-learn and to compete in the Kaggle competition this year. Part 1 revolves around cleaning the data and creating new variables to use in modeling.