Video Understanding

  • OVIS
    OVIS is a large-scale dataset for occluded video instance segmentation. It consists of 296k high-quality instance masks from 25 semantic categories, where heavy object occlusions usually occur.
    Project Page
    1st Occluded Video Instance Segmentation Challenge in ICCV 2021
    2nd Occluded Video Instance Segmentation Challenge in ECCV 2022

  • DanceTrack
    DanceTrack is a multi-human tracking dataset, emphasizing 1) uniform appearance: humans are in highly similar and almost undistinguished appearance, and 2) diverse motion: humans are in complicated motion pattern and their relative positions exchange frequently.
    Project Page
    1st Multiple People Tracking in Group Dance Challenge in ECCV 2022

    MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intra-instance variations.
    Project Page

  • YouMVOS
    YouMVOs is a dataset for multi-shot video object segmentation, consisting of 431K segmentation masks and 200 YouTube videos.
    Project Page


  • WarpDoc
    WarpDoc is a warped document image dataset for document restoration. It consists of 1,020 camera images of documents that were collected from scientific papers, magazines, envelopes, etc., which have different paper materials, page layouts, and contents.
    Project Page