Back to Projects
Computer Vision Audio Processing Team Project 3rd Place — ACM Nexus 2026

Sentinel

Real-time piracy detection using dual-mode content fingerprinting. Video perceptual hashing combined with audio mel-spectrogram analysis achieves 95% accuracy with an enforcement window under 90 seconds.

95% Detection Accuracy
<90s Enforcement Window
114 Commits
3rd Place — ACM Nexus

The Problem

Pirated content spreads faster than manual takedown processes can handle. Traditional detection relies on exact file matching or metadata comparison — both trivially defeated by re-encoding, cropping, or speed changes.

Sentinel uses perceptual fingerprinting: it identifies content by what it looks and sounds like, not by its file signature. Re-encode, crop, speed up — the fingerprint persists.

Architecture

Dual-mode detection runs in parallel. The video pipeline extracts frames, computes perceptual hashes (pHash), and compares against a fingerprint database using Hamming distance. The audio pipeline converts audio to mel-spectrograms and runs similarity matching on frequency patterns.

A fusion layer combines both signals — content must match on both channels to trigger enforcement, reducing false positives. The React dashboard provides real-time monitoring, and Groq handles natural language reporting.

Key Features

Video pHash Detection

Perceptual hashing on extracted frames. Resistant to re-encoding, resolution changes, cropping, and minor edits. Hamming distance threshold matching.

Audio Mel-Spectrogram

Frequency-domain audio analysis via mel-spectrograms. Catches audio even when video is altered. Handles speed changes and compression artifacts.

Dual-Mode Fusion

Both video and audio must match to trigger enforcement. Dramatically reduces false positives while maintaining 95% true positive rate.

Real-Time Dashboard

React frontend showing detection events, confidence scores, and enforcement actions in real-time. Visual fingerprint comparison for manual review.

<90 Second Response

From content upload to enforcement action in under 90 seconds. Fast enough for live streaming platforms and rapid-distribution channels.

AI-Powered Reporting

Groq generates natural language reports for each detection event — human-readable summaries of what was detected, confidence levels, and recommended actions.

Tech Stack

Python Flask OpenCV librosa React Groq NumPy FFmpeg