Overview
Spoken communication remains one of the most critical yet overlooked aspects of workforce readiness. For HR teams, screening candidates for verbal fluency is time-consuming and subjective. For students and job seekers, gaining confidence in spoken English often requires costly coaching and comes with inconsistent feedback.
Recognizing the dual need for scalable interview automation and personalized language assessment, Optimus Information developed VocaNova, a voice-first AI platform that transforms how organizations evaluate and develop verbal communication skills. Powered entirely by Microsoft Azure, VocaNova brings together advanced speech processing, generative AI, and enterprise-ready cloud architecture to deliver consistent, unbiased, and highly accessible spoken assessments at scale.
The Challenge
Recruiters were struggling to keep pace with growing candidate volumes, particularly for roles where strong verbal communication is essential. Traditional interview processes required manual scheduling, coordination, and evaluation, often leading to inconsistent experiences and unconscious bias.
At the same time, many learners lacked access to structured tools for improving spoken English. Existing resources offered limited interactivity, little real-time feedback, and no tailored progression paths. The result was inefficiency for HR and a missed opportunity for talent development.
Optimus set out to build a solution that could:
- Automate voice-based interviews without compromising quality
- Deliver real-time feedback on spoken English fluency, confidence, and coherence
- Eliminate human bias in communication assessments
- Scale seamlessly across industries and use cases
The Solution
Optimus developed VocaNova Interview and VocaNova Learn, two modular applications designed to address both recruiting and learning challenges using the latest in Azure AI, communication, and containerization technologies.
VocaNova Interview enables HR teams to automate the screening process through voice-first, AI-led interviews. Candidates respond to dynamic questions generated by GPT-4o, with their speech transcribed, analyzed, and scored in real-time. Evaluations are consistent across all users, enabling fair comparisons and faster decisions.
VocaNova Learn provides a guided, interactive environment for users to practice spoken English. The platform simulates natural conversations, offers tailored feedback, and helps users improve fluency, pronunciation, and verbal confidence through repeated, low-friction engagement.
Both solutions are delivered through a secure, Azure-native environment that supports real-time communication, analytics, and integration with Microsoft Teams and Outlook.
Key Capabilities
VocaNova was designed to support two core workflows: automated interviewing and conversational language learning. Both use cases are delivered through a shared foundation of advanced Azure services and GPT-4o.
- Automated Interviewing: Candidates receive a personalized interview link and complete a voice-first assessment. The system evaluates fluency, clarity, and confidence in real time using Azure Speech Services and custom scoring logic powered by GPT-4o.
- Conversational Learning: Learners engage in guided, natural dialogues with the AI assistant. The system delivers feedback on pronunciation, pacing, and vocabulary usage, simulating real-world English communication scenarios.
- Real-Time Feedback: Whether in an interview or a learning session, users receive feedback immediately, eliminating the lag between effort and improvement.
- Integrated Communication: Azure Communication Services ensures seamless, low-latency delivery of voice-based interactions across browsers and devices.
Multi Agent Architecture
At the core of VocaNova is a modular, voice-first architecture that uses a system of specialized agents to manage interviews, assess speech, and generate personalized feedback in real time. Here’s how the architecture works:
AGENT | ROLE |
---|---|
Conversation Orchestrator Agent | Manages session flow, selects questions based on role profile, and routes input to analysis agents |
Speech Transcription Agent | Converts spoken responses into text using Azure Speech-to-Text, with fallback accuracy tuning |
Fluency Scoring Agent | Evaluates pace, clarity, and pronunciation using custom GPT-4o scoring models |
Content Coherence Agent | Assesses how well responses match the question prompt, identifying logic and structure gaps |
Confidence Analysis Agent | Detects vocal tone, filler words, and speaking hesitation to infer speaker confidence |
Feedback Generator Agent | Summarizes results and delivers tailored insights for learners or HR reviewers |
Deployment
The entire solution was delivered in phases between June and October 2025. The core infrastructure was set up using Microsoft’s Cloud Adoption Framework (CAF) principles to ensure governance, scalability, and compliance.
Deployment highlights include:
- Development and production environments built on Azure Container Apps
- CI/CD pipelines established via GitHub Actions
- Security and secret management handled via Azure Key Vault
- Real-time observability implemented using Azure Monitor and Log Analytics
- 99.95% uptime achieved via Azure’s serverless consumption tier
The frontend was developed using React.js, with full responsiveness across mobile and desktop devices.
The Results
VocaNova has delivered measurable, high-impact results for both HR and learning stakeholders:
- Interview screening time reduced by over 70%, freeing recruiters to focus on deeper evaluation and onboarding
- Student engagement increased by 50%, driven by accessible, real-time feedback
- Bias eliminated in scoring, with every response evaluated using a consistent AI model
- Deployment speed accelerated, with serverless infrastructure allowing for rapid iteration and scale
- Cost efficiency achieved, leveraging Azure Container Apps’ 200K free monthly requests
Tangible Business Outcomes:
VocaNova delivered measurable results across both recruitment and education use cases:
- Significantly reduced manual screening time by enabling asynchronous, AI-led interviews that required no human coordination or follow-up.
- Improved candidate evaluation consistency through standardized, unbiased scoring of fluency, coherence, and confidence.
- Enabled scalable learning access by offering students and professionals 24/7 availability to practice spoken English in a low-pressure, voice-first environment.
- Increased feedback quality and speed with real-time, automated insights, eliminating the delay and subjectivity associated with human reviewers.
- Supported enterprise-scale adoption by integrating with Microsoft Teams and Outlook, allowing seamless rollout in both corporate and academic settings.
VocaNova allowed teams to do more than just automate, it redefined what’s possible in voice-based screening and learning.
Technologies Used
- Azure OpenAI (GPT-4o)
- Azure AI Speech Services (STT, TTS)
- Azure Container Apps
- Azure Cosmos DB
- Azure SQL Database
- Azure Blob Storage
- Azure Communication Services
- Azure Monitor & Log Analytics
- Azure Key Vault
- React.js frontend
- Microsoft Teams and Outlook integration
- GitHub Actions for CI/CD
What’s Next
VocaNova’s next phase is focused on deepening personalization and expanding its reach. Optimus is enhancing the platform’s adaptive capabilities, enabling it to better tailor feedback and learning paths based on a user’s individual progress and communication goals. On the enterprise side, integration with HR platforms and applicant tracking systems is underway to streamline workflows and unify hiring data. Additional language models are also being explored to support multilingual assessment and coaching, opening the door for global deployment across new markets and industries. VocaNova is evolving from a single solution into a scalable ecosystem for voice-based intelligence.
Contact Optimus to learn how we can help you leverage Agentic AI & Cloud services.