Getting Started
Introduction
voxANN Align provides industry-leading quality assurance tools for audio dubbing workflows. It takes a source script of required phrases, in any number of our 32 supported languages, and compares that with uploaded audio to identify recorded dialogue and match it to phrases in the script.
This dramatically accelerates the assessment of recorded dialogue, identifying dubbing gaps and spotting errors, and providing rich conformed exports. Audio can be uploaded in any order, and practically any format, allowing easy management of multiple takes, patching and long-term projects that need continuous assessment.
voxANN Align is a web-based application available on a PAYG basis or subscription with bundled usage, with support for any number of users and collaborators within a workspace. Visit our website or contact our team to learn more.
Features
Align has more than 100 features to accelerate your dubbing workflows.
- Automatically align audio to source scripts.
- Extract scripts from Word, PDF or CSV files.
- Attach references at a phrase level for syncing outputs.
- Upload audio in almost any format.
- Support for more content in 32 regional languages.
- Quickly audition matched audio without worrying about the source media.
- One-click filter for errors and warnings.
- See project and script status at a glance.
- Edit transcripts and timing for granular refinements.
- Export transcripts and timecode.
- Export transcoded, segmented audio – a single audio file for every phrase.
- …and more!
Workflow
To get started in Align you first create a Project providing a context to begin work, manage requirements and work towards completion – where all requirements are fulfilled and approved.
Each Project can contain any number of Scripts. A Script is a collection of phrases in a defined language – the content to produce for a given language. Projects can be mono-lingual, contain multiple scripts in the same language, or be multi-lingual projects to manage the localisation of content.
While Phrases within a Script start with just the original source text, we can now upload audio targeting a specific Script – and therefore its language. This introduces matched audio – a transcript, along with the start and end time position from the strongest match within the uploaded audio.
The relevant audio can now be auditioned, in context, playing just the matched phrase in a virtual audio segment. The transcript and timing can be adjusted to fine-tune the match, before approving the match. Approvals push the Script towards completion, and with all Phrases in all Scripts approved, the Project is complete.
An Export can be requested at any time – you don't have to wait until all requirements are fulfilled, or for a Project to be complete. All data is aligned to relevant references – including timing, content, transcripts and status. This enables and improves many downstream workflows, and excellent delivery and archive options.
Example Use Cases
While Align has an opinionated workflow for Project progression, steps can be completed in almost any order, and it can serve many use cases that require or benefit from an automated comparison of text and audio.
Just a few examples:
- Localisation projects assessing and aligning translated dubbing with source languages.
- Non-linear voiceover sessions requiring content selection and alignment post-recording.
- ADR outcomes needing rapid QA and organised delivery.
- Open-ended video game voiceover production, with highly dynamic content requirements, and per-phrase tracking.
- Public address announcements which might add additional languages over time.
- Early-stage content validation for large-scale projects like audiobooks.
Let us know how Align is powering your workflow!
Matching Audio
When your content is evaluated, every possible match is considered and scored. This confidence is continuously checked as the project changes, presenting better or alternative matches whenever they become available. Approving a match locks it as accepted; that audio will no longer be offered for another Phrase, and that script content no longer considered for future matching.
Scoring mechanisms vary based on context, language and many other factors. High confidence allows us to automate the match with very high confidence not flagged for checking, and high confidence flagged as a warning – a likely match, but user oversight recommended.
Where multiple matches are available for a given source script phrase, the most confident match is offered by default, but the remaining close matches will be offered as alternatives. The stack of options can be quickly auditioned and the preferred match optioned.
Production Statuses
All production statuses are derived from the Phrases. A Phrase can be in any one of these five statuses:
A Script will provide an aggregate status based on the lowest quality status of any one of its Phrases. For example: if all Phrases are matched and approved – the Script status will indicate it is approved (complete); if all but one of the Phrases in the same Script is missing audio, the Script status would be missing (error).
Project status aggregates status in the same way, bubbling up the lowest common status from all Scripts, allowing an at a glance indication for when a Project needs attention.
Both Projects and Scripts also track completion – the % of Phrases which have been approved. As requirements are fulfilled and approved, the relevant status wheel will fill, with a whole wheel indicating completion.
Unmatched Audio
When audio is processed, content may be identified which cannot be matched to any available Phrase. These unmatched audio segments are deposited in the Audio Bin. Each Script has its own bin, collecting available media to be reviewed, auditioned and selected for a match. The content can also be edited, split and merged to fine tune any automated discovery.
See Audio Bin for full details of the tools available to manage unmatched audio.
Audio Processing
When an audio file is uploaded it is processed in the background to build the knowledge profile we need to perform alignment and optimise for production.
In simple terms we create a high quality streaming proxy, transcribe the speech and perform content alignment. This all happens in parallel, providing live feedback within the Project view, and automatically updating when complete.
Processing time can vary depending on format, language and length – but is highly optimised and can exceed 15x faster than real-time (audio playback length). You can upload multiple files at once, or in sequence, and keep working within any Project or Script while the files are processed.
See Supported Files for accepted file formats and sizes.