Speechmatics review ~ Ciwani Tech2

Transcription services break down into those that are merely a mechanism for connecting those that need transcription with those that do the work and those that use software solutions typically based on AI. And, some companies offer both with a large cost difference between the one that needs humans and the other that doesn’t.

Speechmatics is of the AI variety, and they pride themselves on the quality of the code that converts audio into written text, even providing real-time transcription.

To be clear, plenty of companies talk about their AI technology and how accurate it is at transcribing the noises that humans make into documents. From our analysis, Speechmatics might be one of the few that don’t exaggerate the performance of their code.

Speechmatics: Plans and pricing

Speechmatics have pride in being different, and to that end, what they don’t do is engage in the cost-per-minute game with upfront pricing.

This choice is very much the actions of a business aiming to sell to Enterprises, where the scope of the agreement they create defines the price, and it might not be entirely based on the amount of transcription performed.

Speechmatics Website — Speechmatics is custom priced based on your requirements and the volume of transcriptions you need. (Image credit: Speechmatics)

That said, as a rule, the cost of transcription goes down as the volume goes up, and the Speechmatics sales team can best explain the economics of having this service partner with your operation.

Exactly the price you pay will be unique to the exact requirements, what special tools are required and features needed, along with the amount of transcription expected to be performed.

If circumstances change, Speechmatics can dynamically adjust the costs according to those conditions and tailor the solution to the client precisely.

Speechmatics: Features

For skilled developers, it’s relatively easy to create transcription for a specific language spoken with a common dialect, accent and known regional vocabulary, but Speechmatics isn’t one of those.

At the time of writing this review, Speechmatics can transcribe Arabic, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Mandarin (Traditional and Simplified), Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish and Turkish.

And, it also supports multiple dialects within many of the languages. Therefore it can understand, as an example, English as spoken by Americans, Australians and the Irish as effectively as it can interpret an English person.

The transcription process can either be a batch operation where recorded audio input and documents are created or a real-time option where live audio can be analysed to provide subtitling for a TV channel or streamed broadcast.

When it transcribes audio, it has the logic to apply the correct punctuation to the text, using full stops, commands and other appropriate symbols.

But for the Enterprise customer that wants complete control, Speechmatics is highly configurable. This flexibility might be critical if that business uses language specific to the sector they operate. Special words and meanings, a custom interface and even how it handles ‘taboo words’ are all customisable.

Speechmatics: Setup

Where Speechmatics isn’t what most might expect of a transcription system is that it doesn’t come ready to use out of the box.

The setting up process is part of the learning model, and how complicated that can be is highly dependent on how the customer intends to use Speechmatics.

For most customers, this will involve creating a unique interface that links to Speechmatics through its API and then handling the processing and delivery of the transcribed audio back to the user.

As part of the package, Speechmatics has a deployment team that can help the customer decide exactly how they want to use this technology and where the mission-critical parts of the process will reside.

The deployment options include access to Speechmatics cloud, using a public cloud, on-premises installation, or a combination of those three.

Speechmatics: Interface

What’s mildly disappointing is that Speechmatics doesn’t provide a general-purpose interface for customers to then modify. Instead, companies are meant to generate in-house software solutions and then use the Speechmatic API to bolt the transcription technology into their workflow.

This requirement assumes a level of IS resources from the client and also limits how quickly transcription can be made available to those that need it. In an attempt to smooth and accelerate this stage, Speechatics can connect customers with a regionally local partner that can provide access to prebuilt user interfaces and the knowledge of how best to modify them.

But, the usefulness of these options is dependent on the systems that the customer is using and if the prebuilt interfaces are meant to work in that environment.

Speechmatics: Performance

Plenty of companies claim to have powerful transcription software that can deliver nearly perfect work with the click of a button (and the input of a command line), but Speechmatics is that software in reality.

While it’s not easy to test, since the demo involves keying long command lines with embedded encryption keys twice, the results are remarkable.

Along with other classic recordings, we use the opening paragraphs of the first Harry Potter book as read by the national treasure called Stephen Fry.

Over the testing of many transcription tools, we’ve seen every imaginable error and butchering of ‘Dursley’s’ and ‘Dudley’, but not here. In the entire passage, it only got one thing wrong, where it called a company called Grunnings, ‘Runnings’.

But this is the best result we’ve ever seen for this test, by some considerable margin.

Given a more challenging audio quality test, the AI seems to have a remarkable ability to accurately identify words even when the speaker is muffled or distorted by the outdoor location.

The accuracy is impressive, though what also stood out was how rapidly it turns around the transcription, taking just a few seconds to handle a couple of minutes of audio with ease.

While we didn’t explore these features, the application of a custom word dictionary to the account can increase the speed and accuracy further, making for rapid turnarounds and transcriptions that require only minor tweaking.

The process engine also has an uncanny knack for differentiating multiple speakers, even if they share a common accent.

Overall, in AI transcription quality, it doesn’t get much better than this, and it’s superior to some human-created transcription we’ve encountered.

Speechmatics: Final verdict

Sometimes getting the best solution requires an investment in time and money, and that has never been so true as with Speechmatics.

While it doesn’t offer an out-of-the-box solution in a conventional sense, that restriction makes sure that those intending to use it don’t throw money at a problem and expect the Transcription company to solve all the workflow issues associated with using this technology.

All things considered, Speechmatics is one of the best speech-to-text transcription programs we’ve used. There’s no free trial or free plan available, but you can request a demo if required. Overall, the transcription engine is extremely powerful and provides fast, accurate real-time and batch transcriptions.

Along with this, Speechmatics comes with a selection of advanced features, including the ability to recognize a wide range of accents. You can also add custom words to your personal dictionary, and the punctuation tools are truly impressive.

Overall, Speechmatics is a powerful option worth considering for larger businesses with high transcription volumes. For accurate prices, your best option is to speak directly with the company’s sales team.

The competition

Speechmatics is primarily designed for large-scale transcription needs, but it’s not a great ideal for personal or small business users. It could be altered to be more attractive to those, but Speechmatics prefers to use it for Enterprise customers.

Looking at the greater scope of AI transcription tools, the only comparable ones are Otter and Brainia Pro.

Otter offers 600 minutes of free voice to text for occasional users, and paid plans start from $8.33 per month.

Braina Pro (a year costs $79, and a lifetime license is $199 for a single user) offers plenty of management and editing tools alongside its core speech-to-text functions.

Find out more about Speechmatics' competitors in our Best speech-to-text software guide.

Ciwani Tech2

Monday, May 23, 2022