AI-Powered Video Interviewing & Candidate Analysis Platform
We built an end-to-end video interviewing platform with real-time speech-to-text transcription, automated resume parsing, and semantic search — enabling recruiters to find key candidate responses in seconds.
Video interviewing platform with Watson speech-to-text, automated resume parsing, and semantic search. Recruiters find key candidate responses in seconds instead of watching hours of unstructured video.
The Problem
Recruiters drowning in hours of unstructured video content
An HR technology company wanted to move beyond basic video recording and playback. Their existing platform let recruiters conduct video interviews, but finding specific information required watching entire recordings — sometimes hours of footage across multiple candidates.
The pain points:
- Linear review process: recruiters had to watch full interviews to find relevant answers, with no way to skip to key moments
- Manual data entry: candidate information from resumes and LinkedIn profiles was entered by hand into the system
- Keyword-only search: existing search could find exact words but missed conceptual matches (searching “leadership experience” wouldn’t find a candidate discussing “managing a team of 12”)
- No cross-interview comparison: comparing how different candidates answered the same question required watching each interview separately and taking manual notes
- Scalability concerns: the platform needed to support high-volume concurrent interviews for enterprise clients
Our Approach
Three-pillar architecture: video, data ingestion, and AI pipeline
We designed the platform around three integrated pillars, each handling a distinct aspect of the interview workflow.
Pillar 1: Full-Stack Web Application
The core platform was built with Django (backend) and Vue.js (frontend), providing the recruiter dashboard, candidate management, interview scheduling, and review workflows. Real-time video streaming and recording used Kurento Media Server, handling WebRTC connections for live interviews and FFmpeg for post-processing recorded sessions.
The multi-tenant architecture ensured enterprise clients had isolated data environments while sharing the same infrastructure for cost efficiency.
Pillar 2: Intelligent Data Ingestion
We automated candidate data entry through two integrations:
- Sovren CV Parsing: resumes uploaded in any format (PDF, DOCX, images) were automatically parsed into structured candidate profiles. Education, work history, skills, and contact information were extracted and populated without manual input.
- Custom LinkedIn Profile Parser: a purpose-built parser extracted structured data from LinkedIn profiles, normalizing job titles, company names, and skill endorsements into the candidate database schema.
Pillar 3: AI-Powered Search & Transcription
The competitive advantage of the platform was its AI pipeline:
- IBM Watson Speech-to-Text provided real-time transcription of video interviews. As candidates spoke, their words were converted to searchable text with timestamps, enabling precise navigation within recordings.
- Custom NLP Semantic Search Model went beyond keyword matching. We trained a model that understood contextual relationships — so searching for “budget management” would surface responses where candidates discussed “controlling a $2M annual spend” even without using the exact search terms. This model indexed both transcribed interview content and parsed resume data.
Results
From hours of video review to seconds of targeted search
- Screening time reduced to seconds: recruiters could pinpoint key candidate responses instantly instead of watching entire interviews
- Automated data entry: resume and LinkedIn parsing eliminated manual profile creation, reducing recruiter admin time
- Semantic search as competitive advantage: understanding candidate responses at a conceptual level, not just keyword level, became a key differentiator in the HR-tech market
- Scalable multi-tenant platform: architecture supported high-volume concurrent interviews for enterprise clients
- Integrated workflow: single platform for scheduling, conducting, recording, transcribing, and reviewing interviews
Architecture Trade-offs
Screening time reduced from hours of video to seconds of semantic search. Search for "budget management" finds a candidate discussing "controlling a $2M annual spend" — semantic, not keyword.
IBM Watson Speech-to-Text is the accuracy ceiling. Transcription quality bounds search quality, and Watson is a third-party dependency outside the team's control.
Automated data entry via Sovren CV parsing (PDF/DOCX/images) + custom LinkedIn parser. Zero manual profile creation for recruiters.
Large operational surface. Django + Vue + Kurento Media Server + Watson + Sovren + Node.js + FFmpeg — seven distinct services to operate for a multi-tenant platform.
Technology Stack
- Backend: Python, Django, Django REST Framework
- Frontend: Vue.js, HTML5, CSS3, jQuery, SASS
- Video: Kurento Media Server, FFmpeg, WebRTC
- AI/NLP: IBM Watson Speech-to-Text, custom semantic search model
- Data Ingestion: Sovren CV Parsing, custom LinkedIn parser
- Database: PostgreSQL
- Infrastructure: Node.js (auxiliary services)
What we built with
Similar Case Studies
Related Articles
Deploy this architecture
Submit your requirements. We'll review your constraints, identify bottlenecks, and scope the path to production.
[ SUBMIT SPECS ]No SDRs. A Principal Engineer reviews every submission.
From the team behind Production-Ready AI Agents (Amazon, 2025)