paper

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

IEEE/CVF WACV 2026

Jenna Kang, New York University

Patsorn Sangkloy, New York University

Maria Beatriz Silva, New York University

Kenneth Chen, New York University

Niall L. Williams, New York University

Qi Sun, New York University

We show example annotated bounding boxes for each model (labeled on the right). The bounding boxes are annotated in red, with their artifact category and user-annotated description below the frames. Video quality (“Overall”) and video-prompt alignment (“Prompt”) are shown to the left. Summary statistics are shown in the radar plots. Specifically, we show statistics for each artifact category, grouped by category count, average video-prompt alignment, and average video quality rating given the user-selected artifact categories. This is done for the three models in our dataset: Sora, VideoCrafter2, and Pika. See additional examples in the Appendix.

Abstract

Video generated by the current state-of-the-art generative models contain undesirable artifacts. We introduce GeneVA, the first large-scale dataset of human-annotated artifact bounding boxes in AI-generated videos. The dataset consists of 16,356 AI-generated videos, each labeled by a human annotator with per-frame artifact bounding boxes, their labels and descriptions, and video quality ratings. A custom data collection pipeline was developed in Prolific, and a novel taxonomy for spatio-temporal artifacts present in AI-generated videos was defined. The videos were from the VidProM [41] dataset, with text prompts from this dataset then used to generate an additional subset of videos using Sora. We trained an artifact detector and caption generator using a pre-trained image-based model, and a custom temporal fusion module. The dataset can be found at dummylink.com. We hope that datasets like GeneVA will encourage improvements in artifact detection in AI-generated video towards applications such as deepfake detection.

Links

paper

paper paper paper paper paper paper paper paper paper paper paper paper paper