Popularity of Mashable Articles

This project utilizes machine learning techniques to predict the popularity of articles on mashable.com. This is the final project for the University of Minnesota's Data Visualization and Analytics Cohort 10. The group members of this project include Steven Gaetz, Natalia Mendoza-Orr, Nate Witte, and Sam Ziegler. This group used K Nearest Neighbors, Random Forest, SVM, and Neural Networks to build a machine learning model that would predict if an article fits into one of three categories based on the number of shares on social media. The categories are Popular, Neutral, and Unpopular.

Data

These data include a set of features about articles that were published by Mashable.com over a period of two years. It includes 39,797 rows and 61 attributes (58 predictive attributes, 2 non-predictive, 1 target field) including URL of the article, days between the article publication and the dataset acquisition, number of words in the title, etc. For a full list of features, please see the link below. The dataset used in this project can be downloaded here: https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity.

Technologies

This project was created with:

HTML/CSS/Bootstrap
Skikit learn
Pandas
Matplotlib
Tableau

The results of the project are shown below

Algorithm	Accuracy (%)
Decision Tree	42.0
Random Forest	50.9
SVM	32.0
KNN	47.0
Neural Network	47.3

As shown above, the random forest algorithm proved to have the highest accuracy of all the models. However, none of the models were particularly effective, with the highest accuracy percentage being just over 50%. This leads to our primary conclusion that the attributes collected in this dataset did not have much predictive power on the number of shares.