A quick review of Transfer Learning

5 min readApr 26, 2021

What is Transfer Learning?

It’s a ML method, where the model learns from a related source domain through training and applies the acquired knowledge (parameters of the chosen model) on different but related target domains/problems.

Technical Definition:

For a given source domain Ds and a corresponding Task Ts, the transfer learning aims to transfer the related knowledge obtained to the target domain Dt with a corresponding Task Tt to boost the performance of the target predictive function ft(.) where Ds!=Dt or Ts!=Tt.

Ds (Source Domain) = {X, P(X)}, a feature space X , and a marginal probability distribution P(X)
Ts (Source Task) = {Y, f(.)}, the target label Y, and f(.), the objective function that is supposed to learn from sample data to predict the corresponding label for the new instances.
The same holds for Dt (Target Domain) and Tt (Target Task).

The above definition explains single source transfer learning since single source is involved in the training process and there are studies about multi-source transfer learning involving multiple source domains and tasks contributing to improve the target predictive function ft(.)

How it’s helpful?

One might wonder why not train on the same domain. The answer is that not all domains have sufficient annotated(labelled) data for training eg, medical data, most of the medical data are private and sensitive. The problem of shortage of annotated data gets solved by training on related domains and applying the inferred knowledge on the scarce domain.

What sort of differences are in source and target domain? It can be,

Domain adaptation ={X, P(X)}, where either Xs != Xt or P(Xs) != P(Xt ) or both might be the case.
Task={Y, P(Y |X)}, where Ts != Tt , indicates that either label spaces are different.
Methodology
Distribution, where P(Ys) != P(Yt )

What is closer to transfer learning? but actually not!!

Semi-supervised learning- It involves abundant unlabelled examples and a small amount of labelled examples for training. It assumes the training and test dataset follow identical and independent distribution whereas in transfer learning, training and test dataset are allowed to be drawn from distinct domains, tasks and distributions.

Multi-view learning- Multi view data contains additional useful information that enhances learning. Considering the availability of multi view data in recent years, applications like natural language processing, video analysis, cross media retrieval etc are benefited from this type of learning. For instance, audio and video signals, images and text data being available for a single problem statement. It helps in gaining information from distinctive features.

Multitask learning- It enhances the performance of the model by knowledge transfer. It simultaneously trains multiple related tasks to improve generalization. The difference is, multitask learning tries to improve performance in all the related tasks whereas transfer learning aims at improvising only the target learner.

Categorization of Transfer learning:

Based on availability of labelled data in source and target domain,

Inductive — labeled data is available in the target domain

Transductive — source domain contains a large amount of labeled data while there is no labeled data in the target domain.

Unsupervised— no labeled data neither in source nor in target domain.

Based on feature space and label space in source and target domain,

Let’s see a short overview of the approaches followed to solve the homogeneous and heterogeneous transfer learning problems.

Homogeneous transfer learning problem - source and target domains share same feature and label spaces while the difference exists in the marginal ( P(Ｘt) ≠ P(Ｘs)) or/and conditional distribution (P(𝖸t|Ｘt) ≠ P(𝖸s|Ｘs)) between domains.

Heterogeneous transfer learning problem - source and target domains feature or/and label spaces differ.

Both can be solved by shallow and deep transfer learning approaches. Shallow includes instance-based, feature-based, parameter-based, and relational-based methods, whereas Deep involves utilizing the underlying structure of mass data in the past decade for knowledge transferring.

Homogeneous transfer learning approaches:

Instance-based methods try to minimize the marginal or/and conditional distribution difference between domains using importance sampling or re-weighting methods. It aims to find proper weights for the source labeled data to learn the source task with minimum expected risk when applying to the target domain.
Feature-based methods try to transform original examples to a new space to learn the underlying pattern across domains. The feature transformation involves feature augmentation, feature mapping, feature clustering, feature Alignment, encoding etc.

2a) Symmetric feature-based approach helps in learning the common features between source and target domains. These common features are fed to the learner to improvise the performance of the target task Tt. The main aim is to find domain independent common features for minimizing the difference between source and target marginal distributions.

2b) Asymmetric feature-based approach helps in reducing the discrepancy by transforming the source domain features to target domain space. Weights are computed for the source instances in such a way that its distributions are aligned in the target space. This form of transformation is only applicable for balancing the marginal distribution difference and not the conditional distribution difference because it will harm the performance. Conditional distribution difference between source and target domain is known as context feature bias, for instance, source and target domain involves different topics that words in source have a meaning and the same in target has different meaning. For instance, Feature Augmentation Method is used to solve the conditional distribution difference in such a case, by creating two new sets of instances for the source and target domains (three times of its original size).

3) Parameter based approach learns the parameters from the labelled source data and shares with the target prediction function. There are single model knowledge transfer and multi model knowledge transfer where transfer happens from one(single) or from choosing a subset of labelled classes(multi) and controls the amount of information transferred from each class by finding the right parameters for the model.

4) Relational based approach is especially helpful when sample data are not independently and identically distributed. It exploits the relationship between source and target domains and transfers the relational knowledge to the target domain.

Heterogeneous transfer learning approaches:

It involves mainly the feature based approach for knowledge transfer between domains where Xs != Xt and/or Ys != Yt.

1)Symmetric feature based approach transforms both source and target features to a common subspace using Heterogeneous Feature Augmentation for instance.

2)Asymmetric feature based approach transforms from one feature space to another for knowledge transfer.

Transfer learning is an exciting topic and it has been applied in a variety of tasks and applications like cross-domain text classification, detection of muscle fatigues based on surface electromyogram data collected from different sensors, image classification, object detection etc. Wish you all happy learning!!

Reference

A Concise Review of Transfer Learning — Abolfazl Farahani Department of Computer Science University of Georgia Athens, GA, USA a.farahani@uga.edu Behrouz Pourshojae Department of Information and Technology Road and Urban Development Organization Arak, Iran b.pourshojae@gmail.com Khaled Rasheed Department of Computer Science University of Georgia Athens, GA, USA Khaled@uga.edu Hamid R. Arabnia Department of Computer Science University of Georgia Athens, GA, USA hra@uga.edu

A quick review of Transfer Learning

Written by Teepika R M

No responses yet