Dub your content into 70+ languages at a click of a button, and reach millions of new fans. — Read More
Monthly Archives: August 2023
Machine Learning Libraries For Any Project
There are many libraries out there that can be used in machine learning projects. Of course, some of them gained considerable reputations through the years. Such libraries are the straight-away picks for anyone starting a new project which utilizes machine learning algorithms. However, choosing the correct set (or stack) may be quite challenging.
In this post, I would like to give you a general overview of the machine learning libraries landscape and share some of my thoughts about working with them. — Read More
VALL-E-X: Multilingual Text-to-Speech Synthesis and Voice Cloning
An open source implementation of Microsoft’s VALL-E X zero-shot TTS model.
VALL-E X is an amazing multilingual text-to-speech (TTS) model proposed by Microsoft. While Microsoft initially publish in their research paper, they did not release any code or pretrained models. Recognizing the potential and value of this technology, our team took on the challenge to reproduce the results and train our own model. We are glad to share our trained VALL-E X model with the community, allowing everyone to experience the power next-generation TTS! — Read More
Grubhub is bringing Amazon’s cashierless tech to colleges this fall
Grubhub’s bringing Amazon’s cashierless Just Walk Out technology to some colleges, the company announced today. The food delivery service will first focus on rolling out the tech to colleges, starting with Loyola University Maryland next week before expanding nationwide.
The tech is capable of identifying items taken from and returned to shelves so students and staff can buy food from on-campus stores without waiting in line. After scanning a QR code in the Grubhub app, the company will automatically charge their Grubhub-linked meal plans or other stored payment methods after they leave the store. — Read More
Watch out, Midjourney! Ideogram launches AI image generator with impressive typography
Earlier this week, a new generative AI image startup called Ideogram, founded by former Google Brain researchers, launched with $16.5 million in seed funding led by a16z and Index Ventures.
Another image generator? Don’t we have enough to choose from between Midjourney, OpenAI’s Dall-E 2, and Stability AI’s Stable Diffusion? Well, Ideogram has a major selling point, as it may have finally solved a problem plaguing most other popular AI image generators to date: reliable text generation within the image, such as lettering on signs and for company logos. — Read More
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
We [Alibaba] introduce the Qwen-VL series, a set of large-scale vision-language models designed to perceive and understand both text and images. Comprising Qwen-VL and Qwen-VL-Chat, these models exhibit remarkable performance in tasks like image captioning, question answering, visual localization, and flexible interaction. The evaluation covers a wide range of tasks including zero-shot captioning, visual or document visual question answering, and grounding. We demonstrate the Qwen-VL outperforms existing Large Vision Language Models (LVLMs). We present their architecture, training, capabilities, and performance, highlighting their contributions to advancing multimodal artificial intelligence. Code, demo and models are available at https://github.com/QwenLM/Qwen-VL. — Read More
AIColor: Colorize your old Photos with the power of AI
If you’re looking to colorize old black and white photos, our AI photo colorizer can help you bring your memories to life. — Read More
Introducing Code Llama, a state-of-the-art large language model for coding
Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. — Read More
Read the paper
Access the code
LLMStack
LLMStack is a no-code platform for building generative AI applications, chatbots, agents and connecting them to your data and business processes.
Build tailor-made generative AI applications, chatbots and agents that cater to your unique needs by chaining multiple LLMs. Seamlessly integrate your own data and GPT-powered models without any coding experience using LLMStack’s no-code builder. Trigger your AI chains from Slack or Discord. Deploy to the cloud or on-premise. — Read More
An analog-AI chip for energy-efficient speech recognition and transcription
Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks1,2, but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI)3,4,5,6,7 can provide better energy efficiency by performing matrix–vector multiplications in parallel on ‘memory tiles’. However, analog-AI has yet to demonstrate software-equivalent (SWeq) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles. Here we present an analog-AI chip that combines 35 million phase-change memory devices across 34 tiles, massively parallel inter-tile communication and analog, low-power peripheral circuitry that can achieve up to 12.4 tera-operations per second per watt (TOPS/W) chip-sustained performance. We demonstrate fully end-to-end SWeq accuracy for a small keyword-spotting network and near-SWeq accuracy on the much larger MLPerf8 recurrent neural-network transducer (RNNT), with more than 45 million weights mapped onto more than 140 million phase-change memory devices across five chips. — Read More
#nvidia, #human