Anthropic CEO goes full techno-optimist in 15,000-word paean to AI

Anthropic CEO Dario Amodei wants you to know he’s not an AI “doomer.”

At least, that’s my read of the “mic drop” of a ~15,000 word essay Amodei published to his blog late Friday. (I tried asking Anthropic’s Claude chatbot whether it concurred, but alas, the post exceeded the free plan’s length limit.)

In broad strokes, Amodei paints a picture of a world in which all AI risks are mitigated, and the tech delivers heretofore unrealized prosperity, social uplift, and abundance.  — Read More

#strategy

CISA official: AI tools ‘need to have a human in the loop’

An abbreviated rundown of the Cybersecurity and Infrastructure Security Agency’s artificial intelligence work goes something like this: a dozen use cases, a pair of completed AI security tabletop exercises and a robust roadmap for how the technology should be used.

Lisa Einstein, who took over as CISA’s first chief AI officer in August and has played a critical role in each of those efforts, considers herself an optimist when it comes to the technology’s potential, particularly as it relates to cyber defenses. But speaking Wednesday at two separate events in Washington, D.C., Einstein mixed that optimism with a few doses of caution. — Read More

#dod

I XRAY

Read More

How it works behind the scenes

#surveillance

AI will use a lot of energy. That’s good for the climate.

If you asked me how to scale clean energy, I would prescribe a magical source of urgent energy demand.

Someone willing to pay a premium to build solar+batteries, geothermal, and nuclear, in order to bring them down the cost curve and make them cheaper for everyone.

That is exactly what AI data centres are. — Read More

#strategy

AI Podcast Hosts Discover They’re AI, Not Human – NotebookLM

Read More

#podcasts

Malla: Demystifying Real-world Large Language Model Integrated Malicious Services

The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today’s public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime. — Read More

#cyber

Meta announces Movie Gen, an AI-powered video generator

A new AI-powered video generator from Meta produces high-definition footage complete with sound, the company announced today. The announcement comes several months after competitor OpenAI unveiled Sora, its text-to-video model — though public access to Movie Gen isn’t happening yet.

Movie Gen uses text inputs to automatically generate new videos, as well as edit existing footage or still images. The New York Times reports that the audio added to videos is also AI-generated, matching the imagery with ambient noise, sound effects, and background music. The videos can be generated in different aspect ratios. — Read More

#image-recognition, #vfx

DeepMind’s SCoRe shows LLMs can use their internal knowledge to correct their mistakes

While large language models (LLMs) are becoming increasingly effective at complicated tasks, there are many cases where they can’t get the correct answer on the first try. This is why there is growing interest in enabling LLMs to spot and correct their mistakes, also known as “self-correction.” However, current attempts at self-correction are limited and have requirements that often cannot be met in real-world situations.

In a new paper, researchers at Google DeepMind introduce Self-Correction via Reinforcement Learning (SCoRe), a novel technique that significantly improves the self-correction capabilities of LLMs using only self-generated data. SCoRe can be a valuable tool for making LLMs more robust and reliable and opens new possibilities for enhancing their reasoning and problem-solving abilities. — Read More

#accuracy, #trust

Training Language Models to Self-Correct via Reinforcement Learning

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model’s own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model’s own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models’ self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks. — Read More

#accuracy, #trust

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs’ powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs’ token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4’s in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks. — Read More

#devops