Learning Time in Static Classifiers

Ding, Xi; Wang, Lei; Wang, Chen; Koniusz, Piotr; Gao, Yongsheng

Learning Time in Static Classifiers

Xi Ding¹, Lei Wang^1,2, Piotr Koniusz^2,3,4,1, Yongsheng Gao¹

¹ Griffith University, ² Data61/CSIRO, ³ University of New South Wales, ⁴ Australian National University
AAAI 2026

Paper Code arXiv

Overview of the proposed temporal learning framework

We present a framework for learning temporal patterns in videos by modeling how visual features evolve over time. Temporally smooth video sequences are processed by a frozen image pretrained vision transformer to extract frame level features, and a lightweight temporal classifier learns feature trajectories across frames. These trajectories are optimized under the Support Exemplar Query learning framework to achieve accurate classification and maintain smooth and consistent temporal evolution.

Abstract

Real-world visual data rarely presents as isolated, static instances. Instead, it often evolves gradually over time through variations in pose, lighting, object state, or scene context. However, conventional classifiers are typically trained under the assumption of temporal independence, limiting their ability to capture suchdynamics. We propose a simple yet effective framework that equips standard feedforward classifiers with temporal reasoning, all without modifying model architectures or introducing recurrent modules. At the heart of our approach is a novelSupport-Exemplar-Query (SEQ) learning paradigm, which structures training data into temporally coherent trajectories. These trajectories enable the model to learn class-specific temporal prototypes and align prediction sequences via a differentiable soft-DTW loss. A multi-term objective further promotes semantic consistency and temporal smoothness. By interpreting input sequences asevolving feature trajectories, our method introduces a strong temporal inductive bias through loss design alone. This proves highly effective in both static and temporal tasks: it enhances performance on fine-grained and ultra-fine-grained image classification, and delivers precise, temporally consistent predictions in video anomaly detection. Despite its simplicity, our approach bridges static and temporal learning in a modular and data-efficient manner, requiring only a simple classifier on top of pre-extracted features.

Additional Results on Video Anomaly Detection

Anomaly prediction comparison on video anomaly detection

Anomaly prediction comparison. Grey regions indicate ground-truth anomalies. Blue and red curves show the baseline and our method. Our approach detects anomalies more accurately and earlier, with scores crossing the 0.5 threshold in closer alignment with the ground truth.

Poster

BibTeX

@inproceedings{ding2026learning,
  title={Learning Time in Static Classifiers},
  author={Ding, Xi and Wang, Lei and Koniusz, Piotr and Gao, Yongsheng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

Acknowledgement

Xi Ding, a visiting scholar at the ARC Research Hub for Driving Farming Productivity and Disease Prevention, Griffith University, conducted this work under the supervision of Lei Wang.

We sincerely thank the anonymous reviewers for their invaluable insights and constructive feedback, which have greatly contributed to improving our work.

This work was supported by the Australian Research Council (ARC) under Industrial Transformation Research Hub Grant IH180100002.

This work was also supported by computational resources provided by the Australian Government through the National Computational Infrastructure (NCI) under both the ANU Merit Allocation Scheme and the CSIRO Allocation Scheme.

More Works

Graph Consistency Regularization (GCR)

Learning Time in Static Classifiers

Abstract

Additional Results on Video Anomaly Detection

Poster

BibTeX

Acknowledgement