
It is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction.
Compare MMOCR to these popular alternatives based on real-world usage and developer feedback.

Creating safe artificial general intelligence that benefits all of humanity. Our work to create safe and beneficial AI requires a deep understanding of the potential risks and benefits, as well as careful consideration of the impact.

It is a framework built around LLMs. It can be used for chatbots, generative question-answering, summarization, and much more. The core idea of the library is that we can “chain” together different components to create more advanced use cases around LLMs.

It is a trained model which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.

It is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript. The SDK supports React/Next.js, Svelte/SvelteKit, and Vue/Nuxt as well as Node.js, Serverless, and the Edge Runtime.

A fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.

Azure Machine Learning is a fully-managed cloud service that enables data scientists and developers to efficiently embed predictive analytics into their applications, helping organizations use massive data sets and bring all the benefits of the cloud to machine learning.

It is a cloud-based voice service and the brain behind tens of millions of devices including the Echo family of devices, FireTV, Fire Tablet, and third-party devices. You can build voice experiences, or skills, that make everyday tasks faster, easier, and more delightful for customers.

It is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. It comes with pre-trained statistical models and word vectors, and currently supports tokenization for 49+ languages.

It provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.

This new AWS service helps you to use all of that data you’ve been collecting to improve the quality of your decisions. You can build and fine-tune predictive models using large amounts of data, and then use Amazon Machine Learning to make predictions (in batch mode or in real-time) at scale. You can benefit from machine learning even if you don’t have an advanced degree in statistics or the desire to setup, run, and maintain your own processing and storage infrastructure.

It is a next-generation AI assistant. It is accessible through chat interface and API. It is capable of a wide variety of conversational and text-processing tasks while maintaining a high degree of reliability and predictability.

Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy to use REST API.

rasa NLU (Natural Language Understanding) is a tool for intent classification and entity extraction. You can think of rasa NLU as a set of high level APIs for building your own language parser using existing NLP and ML libraries.

Build, train, and deploy state of the art models powered by the reference open source in machine learning.

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

Amazon Rekognition is a service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You can also search and compare faces. Rekognition’s API enables you to quickly add sophisticated deep learning-based visual search and image classification to your applications.

It is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Embed- Get the world’s most powerful tool for embedding videos, photos, and rich media into websites. Extract- Use the elements—colors, text, keywords, and entities—that you want from articles. Discard the rest automatically. Display- Use the elements—colors, text, keywords, and entities—that you want from articles. Discard the rest automatically.Make the images you use look great—and display quickly—on any screen, every time.

It is Google’s largest and most capable AI model. It is built to be multimodal, it can generalize, understand, operate across, and combine different types of info — like text, images, audio, video, and code.

It is a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI.

It allows you to run open-source large language models, such as Llama 2, locally.

It is a large multimodal model (accepting text inputs and emitting text outputs today, with image inputs coming in the future) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities.

Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

It is a project that provides a central interface to connect your LLMs with external data. It offers you a comprehensive toolset trading off cost and performance.

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Keyphrase Extraction, Sentiment Analysis, Entity Recognition, Topic Modeling, and Language Detection APIs so you can easily integrate natural language processing into your applications.

Build And Run Predictive Applications For Streaming Data From Applications, Devices, Machines and Wearables

You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app. You can analyze text uploaded in your request or integrate with your document storage on Google Cloud Storage.

Makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively.

It is a Rust ecosystem of libraries for running inference on large language models, inspired by llama.cpp. On top of llm, there is a CLI application, llm-cli, which provides a convenient interface for running inference on supported models.

Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%. Amazon Elastic Inference supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon.

It is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

It provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks.

This library supports over 60 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS.

Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base.

OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google.

It is the official Portable Network Graphics (PNG) reference library. It is a platform-independent library that contains C functions for handling PNG images. It supports almost all of PNG's features, is extensible, and has been widely used and tested.

It is an open-source, drag & drop UI to build your customized LLM flow. It is built on top of LangChainJS, with the aim to make it easy for people to visualize and build LLM apps.

It is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. It comes with 160+ pretrained pipelines and models in more than 20+ languages.

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible.

It is a state-of-the-art automatic speech recognition toolkit. It is intended for use by speech recognition researchers and professionals.

It is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

It offers an API to add cutting-edge language processing to any system. Through training, users can create massive models customized to their use case and trained on their data.

It is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set.

Machine learning service that makes it easy for developers to add individualized recommendations to customers using their applications.

AlchemyLanguageTM is the world’s most popular natural language processing service. AlchemyVisionTM is the world’s first computer vision service for understanding complex scenes. AlchemyAPI is used by more than 40,000 developers across 36 countries and a wide variety of industries to process over 3 billion texts and images every month.

It is a small, yet powerful model adaptable to many use cases. It is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. We made it easy to deploy on any cloud.

It is a deep learning, text-to-image model. It is primarily used to generate detailed images conditioned on text descriptions.

It is the easiest way for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders.

Transcribe phone calls or build voice powered apps. Recognize unlimited industry specific words and phrases without any training required. All at simple, affordable pricing.

Build a custom machine learning model without expertise or large amount of data. Just go to nanonets, upload images, wait for few minutes and integrate nanonets API to your application.