ARTICLE
TITLE

Hate Speech Detection on Twitter in Indonesia with Feature Expansion Using GloVe

SUMMARY

Twitter is one of the popular social media to channel opinions in the form of criticism and suggestions. Criticism could be a form of hate speech if the criticism implies attacking something (an individual, race, or group). With the limit of 280 characters in a tweet, there is often a vocabulary mismatch due to abbreviations which can be solved with word embedding. This study utilizes feature expansion to reduce vocabulary mismatches in hate speech on Twitter containing Indonesian language by using Global Vectors (GloVe). Feature selection related to the best model is carried out using the Logistic Regression (LR), Random Forest (RF), and Artificial Neural Network (ANN) algorithms. The results show that the Random Forest model with 5.000 features and a combination of TF-IDF and Tweet corpus built with GloVe produce the best accuracy rate between the other models with an average of 88,59% accuracy score, which is 1,25% higher than the predetermined Baseline. The number of features used is proven to improve the performance of the system.

 Articles related

Aniq Noviciatie Ulfah, M Khairul Anam    

Hate speech is a form of crime in which the violator threatened with punishment by ITE law. But now netizens in Indonesia still use many of the words of Hate Speech in commenting on news in the online media. The impact of this situation is many netizens ... see more


Auliya Rahman Isnain,Agus Sihabuddin,Yohanes Suyanto    

Currently, the discussion about hate speech in Indonesia is warm, primarily through social media. Hate speech is communication that disparages a person or group based on characteristics such as (race, ethnicity, gender, citizenship, religion and organiza... see more


Aini Suri Talita, Aristiawan Wiguna    

Researches involving Artificial Neural Network (ANN) or its derivative have been published all around the world, spesifically to solve data mining problem, classification, clusterinf, or detection problems. Recurrent Neural Network is a class of ANN with... see more


Junanda Patihullah,Edi Winarko    

Social media has changed the people mindset to express thoughts and moods. As the activity of social media users increases, it does not rule out the possibility of crimes of spreading hate speech can spread quickly and widely. So that it is not possible ... see more


Oryza Habibie Rahman , Gunawan Abdillah, Agus Komarudin    

Nowadays social media has become a place for peoples to express their opinions, there are many ways that can be done to express both positive and negative opinions. Hate speech is one of the problems that we find quite a lot in cyberspace, that things ca... see more