Abstract:
Natural Language Processing (NLP) encompasses a multitude of practical applications, 
including Information Retrieval, Information Extraction, Machine Translation, Text 
Simplification, Sentiment Analysis, Text Summarization, Spam Filtering, Auto-prediction, 
Auto-correction, Speech Recognition, Question Answering, and Natural Language Generation. 
Many of these applications are essentially classification tasks, which can be performed by 
machine learning models. Ensemble techniques within machine learning involve combining 
multiple models to improve predictive performance compared to individual models. This thesis 
explores the application of ensemble learning techniques to improve classification performance 
in NLP tasks.  
Various ensemble learning techniques, including bagging, boosting, random forest, and voting, 
are explored and experimented with. For each ensemble method, common base models, such 
as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and K-Nearest Neighbor 
(KNN), are employed. Various evaluation metrics commonly used in NLP classification tasks 
are used, including accuracy, precision, recall, F1-score, and time complexity of the algorithms. 
The findings of the thesis suggest that ensemble methods, especially boosting, generally 
perform better than traditional machine learning methods for NLP classification tasks. The 
thesis also describes the modification of two ensemble models – firstly, majority voting is 
modified for the situation when a tie occurs, and secondly, bagging is modified with a different 
type of sampling. Both of these methods result in improved performances in the datasets. 
Overall, the research work provides a comprehensive overview of ensemble learning 
algorithms and their applications in improving classification performance in NLP tasks, backed 
by theoretical discussions, case studies, and experimental results.