Popularity of Mashable Articles


This project utilizes machine learning techniques to predict the popularity of articles on mashable.com. This is the final project for the University of Minnesota's Data Visualization and Analytics Cohort 10. The group members of this project include Steven Gaetz, Natalia Mendoza-Orr, Nate Witte, and Sam Ziegler. This group used K Nearest Neighbors, Random Forest, SVM, and Neural Networks to build a machine learning model that would predict if an article fits into one of three categories based on the number of shares on social media. The categories are Popular, Neutral, and Unpopular.

Data

These data include a set of features about articles that were published by Mashable.com over a period of two years. It includes 39,797 rows and 61 attributes (58 predictive attributes, 2 non-predictive, 1 target field) including URL of the article, days between the article publication and the dataset acquisition, number of words in the title, etc. For a full list of features, please see the link below. The dataset used in this project can be downloaded here: https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity.

Technologies

This project was created with:

  • HTML/CSS/Bootstrap
  • Skikit learn
  • Pandas
  • Matplotlib
  • Tableau

The results of the project are shown below

Algorithm Accuracy (%)
Decision Tree 42.0
Random Forest 50.9
SVM 32.0
KNN 47.0
Neural Network 47.3

As shown above, the random forest algorithm proved to have the highest accuracy of all the models. However, none of the models were particularly effective, with the highest accuracy percentage being just over 50%. This leads to our primary conclusion that the attributes collected in this dataset did not have much predictive power on the number of shares.