The results with AI are exceptional, very accurate and super fast. This topic will discuss the possibilities of AI for the LazyTown Community.
Speech to text
First I would like to mention the transcription of speech to text. I have quite a few videos/audio with the Icelandic language and a lot of them I don't know what is being said. I had an idea about some of them, but it turned out not to be quite right. I use OpenAI with Whisper models. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. There is now a large-v2 model which does accurate and incredible fast work (on GPU-NVIDIA). The output is simple TXT, and it also creates automatically subtitle files: SRT, VTT .
data:image/s3,"s3://crabby-images/5c12b/5c12b59d8763cbc4ffaf432482802273e245b4d7" alt="Click image for larger version Name: whisper-model-transcriptions.jpg Views: 0 Size: 252.0 KB ID: 194401"
More here:
https://github.com/openai/whisper
https://huggingface.co/openai/whisper-large
Speech to text
First I would like to mention the transcription of speech to text. I have quite a few videos/audio with the Icelandic language and a lot of them I don't know what is being said. I had an idea about some of them, but it turned out not to be quite right. I use OpenAI with Whisper models. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. There is now a large-v2 model which does accurate and incredible fast work (on GPU-NVIDIA). The output is simple TXT, and it also creates automatically subtitle files: SRT, VTT .
More here:
https://github.com/openai/whisper
https://huggingface.co/openai/whisper-large
Note