I am a PhD student supervised by Prof. John K. Tsotsos and member of the Lab for Active and Attentive Vision at York University.

I study human visual attention and its applications in saliency and human gaze prediction. One aspect of this is integrating attention and vision with other cognitive abilities in AI systems, e.g. understanding driver-pedestrian interaction for safer autonomous driving. In addition, I work on analyzing gaps in human-machine performance in saliency and object detection.

Recent Work

Driving and Attention

A survey and curated database of >400 papers (published since 2010) on various aspects of attention during driving. We focus on studies where human gaze was recorded and analyzed. We first give an overview of human gaze, pros and cons of using it as a proxy for attention, and procedures for recording data. We then review behavioral research on drivers' attention and identify multiple factors, external and internal, that affect gaze allocation. The second half of the report is dedicated to analytical models of attention and various practical solutions that rely on drivers' gaze.
[Survey][Report][Database]

Pedestrian Action Benchmark

This is the first benchmark of the pedestrian action prediction algorithms that ranks a number of baselines and state-of-the-art approaches using two public datasets for studying pedestrian behavior in traffic: JAAD and PIE. We analyze the performance of the evaluated models with respect to various properties of the data and based on the analysis propose a new model, PCPA, that combines explicit and implicit features via temporal and modality attention mechanisms and demonstrates state-of-the-art results.
[Paper][Code]

Pedestrian Intention Estimation

Pedestrian intention is an early predictor of their future action. For example, pedestrian may be standing at the intersection waiting for the safe moment to cross or may be engaged in conversation or waiting for a cab. To study these behaviors, we collected a publicly available dataset with a variety of traffic scenarios and conducted a large-scale human experiment to gather human reference data for pedestrian intention. We also show that trajectory estimation can be improved by adding pedestrian intention to other context features.
[PIE Dataset][Paper][Code]

Driver-Pedestrian Interaction

Before safely deploying autonomous vehicles many challenges still have to be resolved. One of them is the interaction with vulnerable road users such as pedestrians, particularly at the point of crossing. We collected a large-scale naturalistic dataset and used it to conduct a number of studies to investigate how human drivers interact and resolve potential road conflicts with pedestrians.
[JAAD Dataset][Paper]

Pedestrian Action Anticipation

We propose a novel stacked RNN architecture SF-GRU for pedestrian crossing action prediction. In this network multimodal sources of information are gradually fused at different levels of processing. We show how length of observation, time to event, type and order of fusion affects the performance of the model.
[Paper][Code]

40 Years of Cognitive Architectures

We summarized and catalogued various approaches to cognitive architecture design from multiple disciplines spanning areas from engineering to neuroscience. This resulted in a large-scale survey of 84 cognitive architectures developed in the past 40 years. We also assessed the practical viability of these approaches by aggregating information on over 900 practical applications that were implemented using the cognitive architectures in our list.
[Project page][Paper][Data]

Do Saliency Models Detect Odd-One-Out Targets?

We evaluate the behavior of 20 saliency models on two new datasets: synthetic psychophysical images (P3) and natural odd-one-out images (O3). We show that majority of the models cannot discriminate targets that differ by color, orientation and size, which are the features that strongly guide human attention.
[P3 and O3 datasets][Paper][Supplementary Material][Code]

Saccade Generator for Static Images

Modeling human fixations on images is a large subfield of visual attention. The focus of many computational saliency models is on generating saliency maps which highlight areas that are more likely to attract human attention. However, most practical applications require a sequence of fixations rather than areas of interest. To address this issue, we developed a flexible and customizable framework for direct saccade sequence generation.
[Code][Paper]

Video game playing

For my M.Sc. thesis I developed an algorithm to play browser video games of endless runner genre, such as Canabalt and Robot Unicorn Attack, in real time and using only visual input. The goal of this project was to implement a set of attentional mechanisms and visual processing pipeline to test the biologically-inspired concept of Cognitive Programs on a complex dynamic visual task.
[Paper]

Publications

Below is the list of select publications. Please see my Google Scholar profile for a full list.

2023

  • A.Rasouli, I.Kotseruba, “PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning”, in International Conference on Robotics and Automation (ICRA), 2023.
  • I. Kotseruba, A. Rasouli, J. K. Tsotsos, “Intend-Wait-Perceive-Cross: Exploring the Effects of Perceptual Limitations on Pedestrian Decision-Making”, in Intelligent Vehicles Symposium, 2023 (oral).

  • 2022

  • I. Kotseruba, J. K. Tsotsos, “Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and Datasets”, IEEE Transactions on Intelligent Transportation Systems, 2022.
  • I. Kotseruba, A. Rasouli, “Intend-wait-cross: Towards modeling realistic pedestrian crossing behavior”, in Intelligent Vehicles Symposium, 2022.

  • 2021

  • I. Kotseruba, J. K. Tsotsos, “Behavioral Research and Practical Models of Drivers' Attention”, arXiv:2104.05677, 2021.
  • I. Kotseruba, A. Rasouli, J. K. Tsotsos, “Benchmark for Evaluating pedestrian Action Prediction”, in Winter Conference on Applications of Computer Vision (WACV), 2021.

  • 2020

  • I. Kotseruba, A. Rasouli, J. K. Tsotsos, “Do They Want to Cross? Understanding Pedestrian Intention for Behavior Prediction”, in Intelligent Vehicles Symposium (IV), 2020.

  • 2019

  • A. Rasouli, I. Kotseruba, T. Kunic, J. K. Tsotsos, “PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction” International Conference on Computer Vision (ICCV), 2019 (Oral).
  • I. Kotseruba, C. Wloka, A. Rasouli, J. K. Tsotsos, “Do Saliency Models Detect Odd-One-Out Targets? New Datasets and Evaluations” British Machine Vision Conference (BMVC), 2019 (Oral).
  • A. Rasouli, I. Kotseruba, J. K. Tsotsos, “Pedestrian Action Anticipation usingContextual Feature Fusion in Stacked RNNs” British Machine Vision Conference (BMVC), 2019.
  • J. K. Tsotsos, I. Kotseruba, A. Andreopoulos, Y. Wu, “Why Data-Driven Beats Theory-Driven Computer Vision,” International Conference on Computer Vision (ICCV) Workshops, 2019.
  • J. K. Tsotsos, I. Kotseruba, C. Wloka, “Early Salient Region Selection Does Not Drive Rapid Visual Categorization,” PLOS One 14(10): e0224306, 2019.

  • 2018

  • C. Wloka, T. Kunic, I. Kotseruba, R. Fahimi, N. Frosst, N. Bruce, J. K. Tsotsos, “SMILER: Saliency Model Implementation Library for Experimental Research,” arXiv:1812.08848.
  • I. Kotseruba, and J. K. Tsotsos,“40 Years of Cognitive Architectures: Core Cognitive Abilities and Applications,” Artificial Intelligence Review, 2018.
  • A. Rasouli, I. Kotseruba, and J. K. Tsotsos, "It's Not All About Size: On the Role of Data Properties in Pedestrian Detection," in European Conference on Computer Vision (ECCV) Workshop, pp. 210-225, 2018.
  • A. Rasouli, I. Kotseruba, and J. K. Tsotsos,“Towards Social Autonomous Vehicles: Understanding Pedestrian-Driver Interactions,” in International Conference on Intelligent Transportation Systems (ITSC), pp. 729-734, 2018.
  • C. Wloka, I. Kotseruba, J. K. Tsotsos,“Active fixation control to predict saccade sequences,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • J. K. Tsotsos, I. Kotseruba, A. Rasouli, and M. D. Solbach, “Visual attention and its intimate links to spatial cognition,” Cognitive Processing, pp. 1–10, 2018.
  • A. Rasouli, I. Kotseruba, and J. K. Tsotsos,“Understanding pedestrian behavior in complex traffic scenes,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 61–70,2018.

  • 2017

  • A. Rasouli, I. Kotseruba, and J. K. Tsotsos,“Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior,” in International Conference on Computer Vision (ICCV) Workshop, 2017, pp. 206–213.
  • A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Agreeing to cross: How drivers and pedestrians communicate,” in Intelligent Vehicles Symposium (IV), 2017, pp. 264–269.

  • 2016

  • I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Joint attention in autonomous driving (jaad),” arXiv:1609.04741, 2016.
  • J. K. Tsotsos, I. Kotseruba, C. Wloka, “A focus on selection for fixation,” Journal of Eye Movement Research, vol. 9, no. 5, pp. 1-34, 2016.
  • I. Kotseruba, “Visual Attention in Dynamic Environments and Its Application To Playing Online Games,” M.Sc. thesis, York University, 2016.

  • 2012

  • I. Kotseruba, CA. Cumbaa, and I. Jurisica, “High-throughput protein crystallization on the World Community Grid and the GPU,” Journal of Physics: Conference Series, vol. 341, no. 1, pp. 12-27, 2012.
  • Datasets

  • Pedestrian Intention Estimation (PIE) Dataset
  • A collection of videos recorded in Toronto, Canada with extensive spatial and behavioral annotations to study pedestrian intention estimation and crossing behavior.

  • Joint Attention in Autonomous Driving (JAAD)
  • A collection of 346 clips recorded in Canada and Eastern Europe to study driver-pedestrian interaction and pedestrian behavior before crossing the street.

  • Psychophysical Patterns (P3) and Odd-One-Out (O3) Datasets
  • Annotated datasets of synthetic psychophysical patterns (P3) and natural odd-one-out images (O3) for testing properties of saliency models w.r.t. features that guide human attention.

    CV

    Follow this link to my CV.