DISQUS

Mashable - The Social Media Guide: 2006/03/14/castingwords-podcast-transcriptions/

  • Bart Claeys · 3 years ago
    I've been wondering why speach-to-text software isn't being used to get transcripts from a podcast. Probably because speach-to-text engines need time to get to know a voice. Suppose this "voice-fingerprint" is a file which can be distributed? A technology could be developed to get to know a voice and this intelligence could be saved into a file. Everyone who gets this file can convert that voice to text. That would be neat!
  • Lindsay Donaghe · 3 years ago
    Bart,
    I worked on a voice transcription automation project for about 3 months as a proof of concept. The project was trying to evaluate whether the voice to text tools available could take a recorded business meeting and transcribe it so that it could be readable and searchable on the corporate intranet portal. After much mucking with the tools, trying to train them and testing different scenarios, we had to conclude that it wasn't a viable option within the current abilities of the technologies.

    Looking at the percentage of correct transcription even on a highly trained transcriber with only a single person speaking, it just didn't work very well.
    Adding the problem of having it figure out which of several speakers was speaking was even further beyond the technology. And the fact that there were so many industry specific terms (acronyms or even more common words that mean different things in the context of the industry and so were mis-transcribed easily), just complicated things even more.

    Voice recognition works fine for a very limited vocabulary, for instance, when you call the bank and it asks you to say things to choose options. Those are single or small numbers of words that it's already expecting. Transcription is fast paced and not predefined.

    Yes, things have probably changed in 2.5 years, but I'd be suprised if these issues have been solved. If they had I would have expected a lot more buzz about voice to text. Instead it seems like that area is pretty dead.

    A human transcribing a podcast, especially if there are several people speaking and specialized terminology, is really the only viable option, I think. Casting Words seems to have been relatively successful so far with the accuracy of their transcriptions (at least according to their presentation at ETech).

    Too bad it's not easier and automatable, but it's a nice test case for Mechanical Turk for sure.
  • David G · 3 years ago
    FANTASTIC product (asyncronous voice is the next big channel)

    HORRIBLE business model

    Hopefully they employ mashable for some strategy consulting before it's too late.
  • Nathan McFarland · 3 years ago
    That quote is 4 months old now - which is a long time in the a startup . We've moved on from that business model - based mainly on the success of the store - which illustrated that perhaps as David G says "asyncronous voice is the next big channel" :)

    We hope so.
  • Pete Cashmore · 3 years ago
    Nathan,

    Yeah, sorry - I shouldn't have used such an old quote. I love the idea of using MTurk to power a company - I hope it works out well!
  • James · 3 years ago
    von Kempelen has clearly borrowed the idea, language, business model, website wording, and patent pending status from Nativetext. I was wondering who was visting from Santa Cruz.

    Interesting.
  • steve · 2 years ago
    Hope you are impressed, will bw glad to furnish more details on request.

    Regards,
    Steve.