The results with AI are exceptional, very accurate and super fast. This topic will discuss the possibilities of AI for the LazyTown Community.
Speech to text
First I would like to mention the transcription of speech to text. I have quite a few videos/audio with the Icelandic language and a lot of them I don't know what is being said. I had an idea about some of them, but it turned out not to be quite right. I use OpenAI with Whisper models. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. There is now a large-v2 model which does accurate and incredible fast work (on GPU-NVIDIA). The output is simple TXT, and it also creates automatically subtitle files: SRT, VTT .
More here:
https://github.com/openai/whisper
https://huggingface.co/openai/whisper-large
Speech to text
First I would like to mention the transcription of speech to text. I have quite a few videos/audio with the Icelandic language and a lot of them I don't know what is being said. I had an idea about some of them, but it turned out not to be quite right. I use OpenAI with Whisper models. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. There is now a large-v2 model which does accurate and incredible fast work (on GPU-NVIDIA). The output is simple TXT, and it also creates automatically subtitle files: SRT, VTT .
More here:
https://github.com/openai/whisper
https://huggingface.co/openai/whisper-large
Comment