We made AudioKit open-source because we believe that clear, powerful audio development is best developed and maintained through a large, active base of developers and users. Our core code, tests, examples, and website are all available for contributions. | It is more than just a fast and accurate audio to text converter. We go beyond audio transcription to help you get the most out of your content. |
Well-Named Classes and Parameters;Sensible Defaults;Tight Xcode Integration;Easy Installation;Clear Documentation and Common File Templates;Powerful Sequences and Phrases | Speech-to-text; Makes audio and video searchable, editable and shareable |
Statistics | |
GitHub Stars 11.2K | GitHub Stars - |
GitHub Forks 1.6K | GitHub Forks - |
Stacks 19 | Stacks 2 |
Followers 32 | Followers 1 |
Votes 0 | Votes 0 |
Integrations | |
| No integrations available | |

Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible.

It is a unified, developer-friendly API to the best available Speech-To-Text and Text-To-Speech services.

It is an on-device speech-to-text engine. By processing voice data locally on the device, it offers private, reliable, fully-customizable, and cost-effective audio transcription experiences. It achieves big tech-level accuracy at a fraction of their costs.

It is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Produce high quality recordings without having to shell out thousands of dollars for equipment. The only thing you need is your guitar, your computer, and a digital audio workstation.

It is a library for advanced Text-to-Speech generation. It’s built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed, and quality. It comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects.

It is fully-automated software that can turn any text into a natural lifelike voice-over... In just a few clicks. It can accommodate any business and is perfect for creating voice overs for video sales letters, educational videos, marketing videos, animated videos, podcasts, audio books, and much more!

It is a high-quality multi-lingual text-to-speech library by MyShell.ai. It supports English, Spanish, French, Chinese, Japanese and Korean.

Have full ownership of the professional audio creation workflow: from content creation and versioning from text, to generation to speech, to sound design and mastering. Create and integrate audio experiences into your mobile applications, IoT projects, websites or social channels without learning specialized audio tools.