Nuix, an information protection and security intelligence provider, has announced its partnership with voice analytics vendor Voci Technologies. Under the terms of the agreement, the two companies will integrate Voci’s V-Discovery engine with the Nuix platform and its products for eDiscovery, investigation, information governance, cybersecurity and intelligence. The service has been dubbed Nuix Voice.
The cornerstone of the service is Voci’s powerful accelerated speech recognition technology, which was developed at Carnegie Mellon University, and has been used in call centre operations for a decade or so. In essence, the technology provides users with the ability to perform rapid and accurate speech transcription and voice analytics from audio and video files – an area that has been a particular sticking point in eDiscovery-related activities.
Voci applies a technique called Large Vocabulary Continuous Speech Recognition (LVCSR) that uses a series of language and acoustic models, as well as advanced algorithms, to break audio streams into sounds and words, then ultimately form sentences. Voci not only forms words based on those sounds, it then checks to make sure they make sense in the context of the sentence, thus offering the ability to apply real-time text analytics against streaming audio and generate a fully punctuated transcript.
According to Voci president and CEO Anthony Gadient, “The power of speech–to-text element means that you can perform multi-channel analytics across all touch points – for example, you could correlate click events on a website, emails send to support, questions posted on forums and now fully punctuated transcripts of calls. Basically call [or video] data is now just data.”
Transitioning the Voci V-Spark engine used in call centres to eDiscovery applications was a natural fit. “After call centres, we stared looking around for use cases where they had audio data but were more text driven. eDiscovery was an easy step.” Gadient added.
While transcription technologies have been available, what distinguishes Voci is the speed of its automated transcription of the human language. The technology listens and applies machine learning algorithms around word adjacencies. It can quickly and easily take a 911 transcript, a broker/dealer conversation or voicemail and apply textual analytics techniques to generate a transcript. Voci claims that a single appliance can get through 200 hours of audio clock time in one hour of machine time.
Explaining Nuix’s interest in the Voci technology, Nuix CTO Stephen Stewart said the exponential rise in audio and video content has made it impossible for investigators to manage, whether they’re analyzing a company’s audio-visual content or conducting a police investigation. “They’re accumulating audio for surveillance and compliance, voicemail and interviews, on a huge range of devices, including smartphones and police body cameras. By combining our technologies, we enable customers to now search and analyze all this human speech alongside communication patterns, emails, text messages, documents, chats, and many other sources.”
The eDiscovery journey
Accurate, automated audio transcription is another area in eDiscovery, which has experienced significant breakthrough over the years. Initially, focus was placed on incorporating unstructured data in the form of emails into the investigative process. As social media activity grew, so did the need to integrate even more disparate sources of unstructured data.
“Organizations started email archiving in 1999 in response to SEC [Security and Exchange Commission] regulations. We subsequently introduced an investigation tool for email that sent us down the path to helping organizations understand unstructured data,” Stewart explained. “Once we got a good handle on that, we got into social media. Now we are moving into a new era of cybersecurity, where it’s not just investigating breaches but also activities on the defensive side.”
Incorporating audio and video is in some respects the last frontier in eDiscovery.
Key challenges for organizations included excessive man hours required to transcribe files leading to exorbitant costs, and the fact that audio searches could not be integrated into the conventional workflow process because they don’t use traditional analytics. “Typically you have to break out and search audio in a totally different fashion from email, text, SMS and fax,” Stewart said.
For example, if an enterprise had 10,000 hours of audio content, the only options were to create a separate workflow using either an automated transcription service or to hire staff to listen to and transcribe audio files. “Often the costs of that are so unreasonable, it is either handled as an exception along with the tens of thousands – or in some cases, millions – of dollars in costs that go with it, or treated as an unhandled exception,” Stewart said.
The result is that eDiscovery processes (either internal or outsourced) have often excluded audio data.
Enterprises can now embed voice and voice transcription into that workflow, Stewart said. “Audio no longer needs to be treated as an exception. Rather you have searchable text that can flow downstream to the rest of the eDiscovery process.”
The audience awaits
This tightly integrated workflow holds particular appeal for a number of entities, he added. There are the obvious candidates who require transcription of digital evidence, such as criminal investigators, attorneys, regulators, governance practitioners, human resource professionals, auditors and cybersecurity advisory community members (e.g. KPMG, PWC, Deloitte, E&Y to name a few). Call centre and financial services organizations are among those entities that record tremendous volumes of audio data. “That’s not easy to deal with using brute force mechanisms,” Stewart said.
Beyond conventional forensics, Stewart described other areas where audio search and transcription might support investigative work, such as HR-related incidents. One case he cited involved an employee who used extremely inappropriate language during a huge number of recorded calls. “It was not comfortable for investigators to listen to,” he noted. “But a machine doesn’t have emotion so it doesn’t matter what’s being said.” Other potential applications include 911 transcriptions and video content from police body cameras (a particularly significant area of growth in both the US and Canada).
Stewart also claimed that the cost of using other transcription solutions available in the marketplace can be considerably higher than the cost Nuix Voice powered by VocI due to the sheer speed of the solution. Based on the $0.89 per minute that one alternative transcription provider charges, transcription fees would total $53.40 for one hour of service; with Voci, the fee is $10 an hour. “That same [alternative] transcription service also has a 48-hour turnaround. In that time period, Voci can transcribe 9,600 hours of audio content.”
While it’s still early days, Stewart believes there’s a bright future for advanced audio analysis. “Let’s not forget that in the early days, email searches were thought to be unreasonable. But technology smashed that expectation. This is simply the next step.”