Abstract:
Multi-Task Learning (MTL) involves learning of multiple tasks, jointly. It seeks to improve
the generalization performance of each task by leveraging the relationships among
the different tasks. It is an advanced concept of Single-Task Learning (STL), most widely
used in classification. In STL, each task is considered to be independent and learnt independently
whereas in MTL, multiple tasks are learnt simultaneously by utilizing task
relatedness. The main intuition is that the training signal present in related tasks can help
each of the tasks learn better models. It also allows for learning of better models with fewer
labeled examples.
In this thesis our focus is on improving the classification performance for a database
categorized as a hierarchy and archiving large number of documents. We focus on improving
the classification performance of this database (source) by developing a MTL based model.
In this model we use an external database to facilitate the classification process for the
source database. We have used the logistic regression model for multiple classification
tasks and k-nearest neighbor approach for finding the similarities between the classes in
two hierarchical databases. The kNN allows us to de fine task relationships. Experiment
on sampled DMOZ dataset has been done to evaluate the performance of MTL with STL, Semi-Supervised Learning (SSL) and Transfer Learning (TL). We have also used random
projections for achieving better runtime performance at a minimal effect on classification
accuracy.