Computer Vision Audio Processing Team Project 3rd Place — ACM Nexus 2026

Sentinel

Real-time piracy detection using dual-mode content fingerprinting. Video perceptual hashing combined with audio mel-spectrogram analysis achieves 95% accuracy with an enforcement window under 90 seconds.

View on GitHub

By the Numbers

95% — Detection Accuracy
<90s — Enforcement Window
114 — Commits
3rd — Place — ACM Nexus

The Problem

Pirated content spreads faster than manual takedown processes can handle. Traditional detection relies on exact file matching or metadata comparison — both trivially defeated by re-encoding, cropping, or speed changes.

Sentinel uses perceptual fingerprinting: it identifies content by what it looks and sounds like, not by its file signature. Re-encode, crop, speed up — the fingerprint persists.

Architecture

Dual-mode detection runs in parallel. The video pipeline extracts frames, computes perceptual hashes (pHash), and compares against a fingerprint database using Hamming distance. The audio pipeline converts audio to mel-spectrograms and runs similarity matching on frequency patterns.

A fusion layer combines both signals — content must match on both channels to trigger enforcement, reducing false positives. The React dashboard provides real-time monitoring, and Groq handles natural language reporting.

Key Features

Video pHash Detection — Perceptual hashing on extracted frames. Resistant to re-encoding, resolution changes, cropping, and minor edits. Hamming distance threshold matching.
Audio Mel-Spectrogram — Frequency-domain audio analysis via mel-spectrograms. Catches audio even when video is altered. Handles speed changes and compression artifacts.
Dual-Mode Fusion — Both video and audio must match to trigger enforcement. Dramatically reduces false positives while maintaining 95% true positive rate.
Real-Time Dashboard — React frontend showing detection events, confidence scores, and enforcement actions in real-time. Visual fingerprint comparison for manual review.
<90 Second Response — From content upload to enforcement action in under 90 seconds. Fast enough for live streaming platforms and rapid-distribution channels.
AI-Powered Reporting — Groq generates natural language reports for each detection event — human-readable summaries of what was detected, confidence levels, and recommended actions.

Tech Stack

Python · Flask · OpenCV · librosa · React · Groq · NumPy · FFmpeg