Classification of texts on emergency situations in Almaty

M.Y. Andirov; Zh.Zh. Assan; S. Nopembri; A.M. Seilkhan; D.E. Myrzakhmetov

doi:10.31643/2023/6445.36

Authors

M.Y. Andirov Al-Farabi Kazakh National University
Zh.Zh. Assan Al-Farabi Kazakh National University
S. Nopembri Universitas Negeri Yogyakarta
A.M. Seilkhan Aktobe RSU named after K.K. Zhubanov
D.E. Myrzakhmetov Al-Farabi Kazakh National University

DOI:

https://doi.org/10.31643/2023/6445.36

Keywords:

machine learning, text classification, support vector machine, logistic regression, KNN, NLP, preprocessing, emergencies.

Abstract

Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.

Downloads

Download data is not yet available.

Author Biographies

M.Y. Andirov, Al-Farabi Kazakh National University

2nd year Master's student, Computer Science, Faculty of Information Technology, Al-Farabi Kazakh National University, Almaty, Kazakhstan.

Zh.Zh. Assan, Al-Farabi Kazakh National University

2nd year Master's student, Computer Science, Faculty of Information Technology, Al-Farabi Kazakh National University, Almaty, Kazakhstan.

S. Nopembri, Universitas Negeri Yogyakarta

Professor, Universitas Negeri Yogyakarta, Yogyakarta, Indonesia.

A.M. Seilkhan, Aktobe RSU named after K.K. Zhubanov

2nd year Master's student, Computer science and information technology, Faculty of Physics and Mathematics, K. Zhubanov Aktobe Regional University, Aktobe, Kazakhstan.

D.E. Myrzakhmetov, Al-Farabi Kazakh National University

2nd year Master's student, Computer Science, Faculty of Information Technology, Al-Farabi Kazakh National University, Almaty, Kazakhstan.

References

A Review of Machine Learning Algorithms for Text-Documents Classification.Aurangzeb Khan and Baharum Baharudin and Lam Hong Lee and Khairullah khan.JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY. 2010; 1:4-20.

KNN based Machine Learning Approach for Text and Document Mining. Vishwanath Bijalwan and Vinay Kumar and Pinki Kumari and Jordan Pascual. International Journal of Database Theory and Application. 2014; 7:61-70.

KrasnyanskyMN, ObukhovAD, SolomatinaEM, VoyakinaAA. Sravnitel'nyj analiz metodov mashinnogo obucheniya dlya resheniya zadachi klassifikacii dokumentov nauchno-obrazovatel'nogo uchrezhdeniya [Comparative analysis of machine learning methods for solving the problem of classifying documents of the scientific and educationalinstitution ques]. Vestnik VGU. 2018; 3:173-182. (in Russ.).

Applying machine learning algorithms for automatic Persian text classification.Mojgan Farhoodi and Alireza Yari.International Conference on Advanced Information Management and Service. 2010; 6:318-323.

Text Classification with Machine Learning Algorithms.Nasim VasfiSisi and Mohammad Reza and Feizi Derakhshi. Journal of Basic and Applied Scientific Research. 2013; 1:31-35.

A Novel Active Learning Method Using SVM for Text Classification. Mohamed Goudjil and Mouloud Koudil and M. Bedda.International Journal of Automation and Computing. 2018; 15:290-298.

Performance Analysis of Supervised Machine Learning Algorithms for Text Classification.Sadia Zaman Mishu and Rafiuddin SM.International Conference on Computer and Information Technology. 2016; 19:409-413.

Study on SVM Compared with the other Text Classification Methods. Xiaoyu Luo.Alexandria Engineering Journal. 2021; 60:3401-3409.

Text Classification Using Machine Learning Techniques.Ikonomakis, Emmanouil and Kotsiantis and Sotiris and Tampakas, V.WSEAS transactions on computers. 2005; 4:966-974.

A survey of text classification algorithms.Aggarwal Charu C. and ChengXiang Zhai. Mining text data. Springer. 2012; 4:163-222.

Text classification by labeling words.Liu and Bing.AAAI. 2004,4.

A Comparative Study for Email Classification, Advances and Innovations in Systems.Seongwook Youn and Dennis McLeod.Computing Sciences and Software Engineering. 2007, 387-391.

Keikha, Mostafa and Razavian, Narjes and Oroumchian, Farhad and Razi, Hassan, Document Representation and Quality of Text: An Analysis, Survey of Text Mining II: Clustering, Classification, and Retrieval. 2008,219-232.

Ontolo gy-Based Classification Of News In An Electronic Newspaper.Lena Tenenboim and Bracha Shapira and Peretz Shoval. International Conference Intelligent Information and Engineering Systems. 2008.

The news site is tengrinews.kz [Electronic resource]. Access mode: https://tengrinews.kz/kazakhstan_news/devushka-sportkare-sovershila-dtp-vyiezjaya-kluba-almatyi-466091/

Li Qian, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology (TIST)13.2022; 2:1-41.

Study on SVM Compared with the other Text Classification Methods.Zhijie Liu and Xueqiang Lv and Kun Liu and Shuicai Shi.2010 Second International Workshop on Education Technology and Computer Science. 2010; 1:219-222.

An Optimal SVM-Based Text Classification Algorithm.Zi-qiang Wang and Xia Sun and De-xian Zhang and Xin Li. International Conference on Machine Learning and Cybernetics. 2006; 60:1378-1381.

Li Qian, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology (TIST)13.2022; 2:1-41.

Sabri T, El Beggar O, and Kissi M. Comparative study of Arabic text classification using feature vectorization methods. Procedia Computer Science. 2022; 198:269-275.

Wadud MAH, Kabir MM, Mridha MF, Ali MA, Hamid MA, and Monowar MM. How can we manage offensive text in social media-a text classification approach using LSTM-BOOST.International Journal of Information Management Data Insights. 2022; 2:100095.