BBC News classification algorithm comparison

Fork BBC News dataset (available for download in Insight Project Resources website) is made up of 2225 newslines classified into 5 categories (Politics, Sport, Entertainment, Tech, Business) and, similarly to Reuters-21578, it can be adopted in order to test both the efficacy and the efficiency of different classification strategies. In the repository: https://github.com/giuseppebonaccorso/bbc_news_classification_comparison,…

Reuters-21578 text classification with Gensim and Keras

Fork Reuters-21578 is a collection of about 20K news-lines (see reference for more information, downloads and copyright notice), structured using SGML and categorized with 672 labels. They are diveded into five main categories: Topics Places People Organizations Exchanges However, most of them are unused and, looking at the distribution, it’s…