Anil Batra

I am a CDT Ph.D. scholar at School of Informatics in University of Edinburgh and being supervised by Prof. Frank Keller and Dr. Laura Sevillia. I work at the intersection of machine learning, natural language processing and computer vision. I am interested in developing Multimodal Models of Videos-Text to understand and generate the Videos/Clips or Text. My recent work explores the Procedure Segmentation and summarization of steps to perform the task in the Instructional Videos. Additonaly, I am exploring algorithms to learn multi-modal representations from video & text.

Previously, I have completed my Master in Computer Science (by research) at IIIT - Hyderabad, under the supervision of Prof. C.V. Jawahar and Facebook mentors Dr. Guan Pang, Dr. Saikat Basu. During Masters, I was part of Center of Visual Information Technology Lab (CVIT) and developed models to detect roads under occlusion in Satellite Imagery.

I worked as Research Engineer at Facebook in Spatial Computing Team. I was designing, training, and evaluating extraction of connected road network with limited set of labels and large scale noisy labels.

Earlier, I did my Bachelor's in Electronics and Communications from RIMT affiliated to Punjab Technical University, Jalandhar.

anilbatra2185@gmail.comEmail | CV | Google Scholar | Github | LinkedIn

  • [Sep 2022]: Our new work "Temporal Ordering in the Segmentation of Instructional Videos" is accepted at BMVC 2022.
  • [Dec 2021]: Volunteer for session chair at LXAI workshop, Neurips 2021.
  • [Nov 2021]: Will be serving as CVPR 2022 Reviewer.
  • [Jun 2021]: Served as ICCV 2021 Reviewer.
  • [Sep 2020]: Joined CDT-NLP Ph.D at University of Edinburgh under the supervision of Dr. Laura Sevillia and Prof. Frank Keller.
  • [Jun 2019]: Succesfully defended my Master thesis. Panel - Prof. C.V. Jawahar, Prof. K. Madhava Krishna, Dr. Girish Varma
  • [Jun 2019]: Poster presentation at CVPR 2019, Long Beach (image).
  • [May 2019]: Received travel sponsporship from Facebook - Spatial Computing team to attend CVPR 2019.
  • [Apr 2019]: Presented CVPR - Improving Road Connectivity work at Facebook Spatial Team.
  • [Mar 2019]: Paper accepted at CVPR 2019 on Improved Road Connectivity.
  • [Jan 2019]: Join Facebook - Spatial Team as Research Engineer (Contingent Worker).
  • [Dec 2018]: Submitted my Master thesis Road Topology Extraction from Satellite images by Knowledge Sharing.
  • [Jun 2018]: Paper accepted at BMVC 2018 on Self-Supervised Learning.

[NEW] A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara
British Machine Vision Conference (BMVC), 2022

pdf suppl abstract bibtex

Understanding the steps required to perform a task is an important skill for AI systems. Learning these steps from instructional videos involves two subproblems: (i) identifying the temporal boundary of sequentially occurring segments and (ii) summarizing these steps in natural language. We refer to this task as Procedure Segmentation and Summarization (PSS). In this paper, we take a closer look at PSS and propose three fundamental improvements over current methods. The segmentation task is critical, as generating a correct summary requires each step of the procedure to be correctly identified. However, current segmentation metrics often overestimate the segmentation quality because they do not consider the temporal order of segments. In our first contribution, we propose a new segmentation metric that takes into account the order of segments, giving a more reliable measure of the accuracy of a given predicted segmentation. Current PSS methods are typically trained by proposing segments, matching them with the ground truth and computing a loss. However, much like segmentation metrics, existing matching algorithms do not consider the temporal order of the mapping between candidate segments and the ground truth. In our second contribution, we propose a matching algorithm that constrains the temporal order of segment mapping, and is also differentiable. Lastly, we introduce multi-modal feature training for PSS, which further improves segmentation. We evaluate our approach on two instructional video datasets (YouCook2 and Tasty) and observe an improvement over the state-of-the-art of ∼ 7% and ∼ 2.5% for procedure segmentation and summarization, respectively.

author = {Batra, Anil and Gowda, Shreyank N and Keller, Frank and Sevilla-Lara, Laura},
title = {A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos},
booktitle = {BMVC},
year = {2022}


Improved Road Connectivity by Joint Learning of Orientation and Segmentation
Anil Batra*, Suriya Singh*, Guan Pang, Saikat Basu, C.V. Jawahar and Manohar Paluri (* equal contribution)
Computer Vision and Pattern Recognition (CVPR), 2019

pdf suppl poster abstract bibtex code

Road network extraction from satellite images often produce fragmented road segments leading to road maps unfit for real applications. Pixel-wise classification fails to predict topologically correct and connected road masks due to the absence of connectivity supervision and difficulty in enforcing topological constraints. In this paper, we propose a connectivity task called Orientation Learning, motivated by the human behavior of annotating roads by tracing it at a specific orientation. We also develop a stacked multi-branch convolutional module to effectively utilize the mutual information between orientation learning and segmentation tasks. These contributions ensure that the model predicts topologically correct and connected road masks. We also propose Connectivity Refinement approach to further enhance the estimated road networks. The refinement model is pre-trained to connect and refine the corrupted ground-truth masks and later fine-tuned to enhance the predicted road masks. We demonstrate the advantages of our approach on two diverse road extraction datasets SpaceNet and DeepGlobe. Our approach improves over the state-of-the-art techniques by 9% and 7.5% in road topology metric on SpaceNet and DeepGlobe, respectively.

author = {Batra, Anil and Singh, Suriya and Pang, Guan and Basu, Saikat and Jawahar, C.V. and Paluri, Manohar},
title = {Improved Road Connectivity by Joint Learning of Orientation and Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}


Self-supervised Feature Learning for Semantic Segmentation of Overhead Imagery
Suriya Singh*, Anil Batra*, Guan Pang, Lorenzo Torresani, Saikat Basu, C.V. Jawahar and Manohar Paluri (* equal contribution)
British Machine Vision Confernece (BMVC), 2018

pdf suppl abstract bibtex

Overhead imageries play a crucial role in many applications such as urban planning, crop yield forecasting, mapping, and policy making. Semantic segmentation could enable automatic, efficient, and large-scale understanding of overhead imageries for these applications. However, semantic segmentation of overhead imageries is a challenging task, primarily due to the large domain gap from existing research in ground imageries, unavailability of large-scale dataset with pixel-level annotations, and inherent complexity in the task. Readily available vast amount of unlabeled overhead imageries share more common structures and patterns compared to the ground imageries, therefore, its large-scale analysis could benefit from unsupervised feature learning techniques.
In this work, we study various self-supervised feature learning techniques for semantic segmentation of overhead imageries. We choose image semantic inpainting as a self-supervised task for our experiments due to its proximity to the semantic segmentation task. We (i) show that existing approaches are inefficient for semantic segmentation, (ii) propose architectural changes towards self-supervised learning for semantic segmentation, (iii) propose an adversarial training scheme for self-supervised learning by increasing the pretext task's difficulty gradually and show that it leads to learning better features, and (iv) propose a unified approach for overhead scene parsing, road network extraction, and land cover estimation. Our approach improves over training from scratch by more than 10% and ImageNet pre-trained network by more than 5% mIoU.

Author = {Singh, Suriya; Batra, Anil; Pang, Guan; Torresani, Lorenzo; Basu, Saikat; Paluri, Manohar; Jawahar, C. V.},
Title = {Self-supervised Feature Learning for Semantic Segmentation of Overhead Imagery},
Booktitle = {BMVC}, Year = {2018}


Multimodal Procedural Knowledge Learning using WikiHow articles
CDT-NLP Project (2020)

pdf abstract

Procedural learning with multi-modal 'how-to' articles is beneficial to enable AI systems with an ability to perform goal oriented tasks. Learning the temporal event structure in procedures through only-text based datasets fails to capture the implicit information among events e.g. missing object of an action. We hypothesize that the visual data is adequate to augment the missing information and extend the text based dataset (Zhang with visual data. Towards our goal, we study pairwise event ordering with architectures pre-trained on uni and multi modal data. Surprisingly, we find that joining the features from architectures (Resnet-50 + BERT) which are pre-trained on uni-modal data, is superior to state-of-the-art multi-modal architectures (LXMERT and UNITER) towards temporal structure learning. Furthermore, we enhance the event relation learning with an attention mechanism. Our experiments on the extended pairwise step-order dataset shows that our approach benefit in learning the perfect order by 1.67% in comparison to text-only datasets.


Motion Field Estimation using MRF
DIP Course Project

pdf abstract

Study and analysis of edge flow in images to estimate the motion of different objects. In the course project, I use discrete Markov Random Field (MRF) to extract the motion of each object from synthetic and natural image sequences.

Data Mining & Exploration

INFR11007: Data Mining and Exploration (Spring 2021)
Instructors: Dr. Michael Gutmann, Dr. Siddharth N.

Digital Image Processing

CS478: Digital Image Processing (Monsoon 2018)
Instructors: Dr. Ravi Kiran Sarvadevabhatla, Rajvi Shah


Mentor in 1st foundations course on Artificial Intelligence and Machine Learning
Instructor: Prof. C.V. Jawahar

Professional Activities
Selected Awards
  • Awarded funding for 4 years by School of Informatics and UKRI for Ph.D.
  • Facebook Travel Support to attend CVPR 2019
  • GATE - EC qualified with 254 rank in 2009.
  • Gold Medal in Electronics and Communication at RIMT-Institute of Engineering and Technology, affiliated to PTU-Jalandhar (2007)

Template: this, this and this