Open Interpreter: An Interesting AI Tool to Locally Run ChatGPT-Like Code Interpreter

After Auto-GPT and Code Interpreter API, a new open-source project is making waves in the AI community. The project is named Open Interpreter, and it’s been developed by Killian Lucas and a team of open-source contributors. It combines ChatGPT plugin functionalities, Code Interpreter, and something like Windows Copilot to make AI a ubiquitous solution on any platform. You can use Open Interpreter to do anything you can think of. You can interact with the system at the OS level, files, folders, programs, internet, basically everything right from a friendly Terminal interface. So if you are interested, learn how to set up and use Open Interpreter locally on your PC. — Read More

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer’s general-purpose capabilities:

Create and edit photos, videos, PDFs, etc.
Control a Chrome browser to perform research
Plot, clean, and analyze large datasets
…etc.

GitHub

The 01 Project is building an open-source ecosystem for AI devices.

Our flagship operating system can power conversational devices like the Rabbit R1, Humane Pin, or Star Trek computer.

We intend to become the GNU/Linux of this space by staying open, modular, and free.

GitHub

#devops

Nvidia is now powering AI nurses

Nvidia announced a collaboration with Hippocratic AI on Monday, a healthcare company that offers generative AI nurses who work for just $9 an hour. Hippocratic promotes how it can undercut real human nurses, who can cost $90 an hour, with its cheap AI agents that offer medical advice to patients over video calls in real-time. — Read More

Watch Video

#augmented-intelligence, #nvidia

AI on Trial: Bot Bharara Steals Stay Tuned

How might AI infringe on intellectual property and personality rights? And could AI replace Preet as the host of Stay Tuned?  

This is the final episode of a Stay Tuned miniseries, “AI on Trial,” featuring Preet Bharara in conversation with Nita Farahany, professor of law and philosophy at Duke University.

Preet and Nita discuss the hypothetical case of an artificial intelligence chatbot that impersonates Preet as the host of a copycat podcast, Stay Tuned with Bot Bharara. The unauthorized chatbot was trained on everything Preet has ever said or written online. Can Preet protect his intellectual property rights? Is the law on the real Preet’s side, or is it time to surrender to an AI-dominated world and collaborate with the bot? — Read More

#legal, #podcasts

Nvidia’s NEW Humanoid Robots STUNS The ENITRE INDUSTRY! (Nvidia Project GROOT)

Read More

#nvidia, #robotics, #videos

Elon Musk’s Neuralink brain-chip enables paralysed man to play chess

Read More
#human, #videos

Reward-Free Curricula for Training Robust World Models

There has been a recent surge of interest in developing generally-capable agents that can adapt to new tasks without additional training in the environment. Learning world models from reward-free exploration is a promising approach, and enables policies to be trained using imagined experience for new tasks. However, achieving a general agent requires robustness across different environments. In this work, we address the novel problem of generating curricula in the reward-free setting to train robust world models. We consider robustness in terms of minimax regret over all environment instantiations and show that the minimax regret can be connected to minimising the maximum error in the world model across environment instances. This result informs our algorithm, WAKER: Weighted Acquisition of Knowledge across Environments for Robustness. WAKER selects environments for data collection based on the estimated error of the world model for each environment. Our experiments demonstrate that WAKER outperforms several baselines, resulting in improved robustness, efficiency, and generalisation. —Read More

#multi-modal, #reinforcement-learning

Covariant Introduces RFM-1 to Give Robots the Human-like Ability to Reason

The key challenge with traditional robotic automation and automation based on manual programming or specialized learned models is the lack of reliability and flexibility in real-world scenarios. To create value at scale, robots must understand how to manipulate an unlimited array of items and scenarios autonomously.

By starting with warehouse pick and place operations, Covariant’s RFM-1 showcases the power of Robotics Foundation Models. In warehouse environments, the technology company’s approach of combining the largest real-world robot production dataset with a massive collection of Internet data is unlocking new levels of robotic productivity and shows a path to broader industry applications ranging from hospitals and homes to factories, stores, restaurants, and more. — Read More

#robotics

Introducing Devin, the first AI software engineer

Read More

#devops

Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images

Today we are releasing Stable Video 3D (SV3D), a generative model based on Stable Video Diffusion, advancing the field of 3D technology and delivering greatly improved quality and view-consistency.

This release features two variants: SV3D_u and SV3D_p. SV3D_u generates orbital videos based on single image inputs without camera conditioning. SV3D_p extends the capability by accommodating both single images and orbital views, allowing for the creation of 3D video along specified camera paths. 

Stable Video 3D can be used now for commercial purposes with a Stability AI Membership. For non-commercial use, you can download the model weights on Hugging Face and view our research paper here. — Read More

#vfx

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting. — Read More

#multi-modal