Exploring the Applicability of Topic Modeling in SARS-CoV-2 Literature and Impact on Agriculture

Lakshmi Sonkusale, K.K. Chaturvedi, S.B. Lal, M. S. Farooqi, Anu Sharma, Pratibha Joshi, Achal Lama and D.C. Mishra


For the last two years, countries around the globe have been suffering and severely affected by the Covid-19 pandemic due to the novel coronavirus. Researchers from various disciplines are conducting research and publishing number of articles related to this virus and its effects. Furthermore, articles related to Covid-19 are being continuously published in the form of research papers, popular articles, blogs, surveys, short stories etc. These possess useful information and this information can be processed to infer important knowledge by applying text mining techniques. The Latent Dirichlet Allocation (LDA) technique provides an efficient way to analyse unclassified text into useful sets of terms, called topics. LDA can group terms with similar semantic meaning into topics called "themes". A theme is a group of terms that frequently appear together. The objective of the present study is to explore the applicability of topic modeling in identifying the hidden themes or topics by using published research articles related to Covid-19 and agriculture through Google scholar. After pre-processing of titles and abstracts, two approaches namely LDA with Bag of Words (LDAB) and LDA with Term Frequency-Inverse Document Frequency (LDAT) were applied to fi nd the hidden themes. There are thirteen and seven topics are identified by applying LDAB and LDAT respectively. These identified topics comprised with different set of words or features will play an important role in developing the information retrieval system for specific search related to agricultural production, supply chain mechanism in agriculture, health and agri-tourism.

Keyword: Topic modeling; Covid-19; Latent dirichlet allocation; Machine learning; Text Analytics.

Full Text