Towards Predicting Any Human Trajectory In Context

Keio University1, NVIDIA2

Banner Image

We introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables rapid adaptation without fine-tuning on the scenario-specific data.

Abstract

Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, this process is often impractical on edge devices due to constrained computational resources. To address this challenge, we introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables rapid adaptation without fine-tuning on the scenario-specific data. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks.

Approach

Banner Image
An illustration of our TrajICL framework. (a) The overall architecture includes an embedding layer, a trajectory encoder, an in-context-aware trajectory predictor, and a multi-modal decoder. (b) Rather than relying solely on past trajectories for example selection, we introduce prediction-guided example selection, which leverages both past and predicted future trajectories to identify more relevant examples.

Qualitative Results

Banner Image
Qualitative results on MotSynth, JRDB, WildTrack, and SDD.

Proposed Example Selection Method

Banner Image
Qualitative comparison between random example selection and our proposed PG-STES.