The AI Trillion-Dollar Product

In a very recent interview, Satya Nadella, Microsoft’s CEO, claimed that current business applications will “collapse in the agent era.” Notably, he is referring to the very same apps his company is currently selling. Thus, he is predicting the death of its own current business model in favor of AI agents.

But this vision implies a much more powerful change that Satya is less keen on mentioning because it directly impacts Microsoft’s raison d’être: the introduction of AI as a structural part of general-purpose computing, the end game of ChatGPT: the LLM Operating System, or LLM OS.

This vision is so powerful that it is unequivocally OpenAI’s grand plan. Today, we are distilling their vision into simple words. I believe this is one of my most didactic articles on the future of AI. — Read More

#strategy

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome. — Read More

#strategy

The phony comforts of AI skepticism

At the end of last month, I attended an inaugural conference in Berkeley named the Curve. The idea was to bring together engineers at big tech companies, independent safety researchers, academics, nonprofit leaders, and people who have worked in government to discuss the biggest questions of the day in artificial intelligence:

Does AI pose an existential threat? How should we weigh the risks and benefits of open weights? When, if ever, should AI be regulated? How? Should AI development be slowed down or accelerated? Should AI be handled as an issue of national security? When should we expect AGI?

If the idea was to produce thoughtful collisions between e/accs and decels, the Curve came up a bit short: the conference was long on existential dread, and I don’t think I heard anyone say that AI development should speed up. 

… At the moment, no one knows for sure whether the large language models that are now under development will achieve superintelligence and transform the world. And in that uncertainty, two primary camps of criticism have emerged. 

The first camp, which I associate with the external critics, holds that AI is fake and sucks. The second camp, which I associate more with the internal critics, believes that AI is real and dangerous. — Read More

#strategy

Over ½ of Long Posts on LinkedIn are Likely AI-Generated Since ChatGPT Launched

Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn.

We have likely all experienced the same feeling on LinkedIn within the last couple of years… seeing a long-form post and suspecting it of being AI-generated but the author is passing it off as their own thought leadership. 

In this study, we look at the impact of ChatGPT and other generative AI tools on the volume of AI content that is being published on LinkedIn.   — Read More

#strategy

AI and the 2024 Elections

It’s been the biggest year for elections in human history: 2024 is a “super-cycle” year in which 3.7 billion eligible voters in 72 countries had the chance to go the polls. These are also the first AI elections, where many feared that deepfakes and artificial intelligence-generated misinformation would overwhelm the democratic processes. As 2024 draws to a close, it’s instructive to take stock of how democracy did.

In a Pew survey of Americans from earlier this fall, nearly eight times as many respondents expected AI to be used for mostly bad purposes in the 2024 election as those who thought it would be used mostly for good. There are real concerns and risks in using AI in electoral politics, but it definitely has not been all bad.

The dreaded “death of truth” has not materialized—at least, not due to AI. And candidates are eagerly adopting AI in many places where it can be constructive, if used responsibly. But because this all happens inside a campaign, and largely in secret, the public often doesn’t see all the details. — Read More

#strategy

Friend or Faux?

Millions of people are turning to AI for companionship. They are finding the experience surprisingly meaningful, unexpectedly heartbreaking, and profoundly confusing, leaving them to wonder, ‘Is this real? And does that matter?’

… The world is rapidly becoming populated with human-seeming machines. They use human language, even speaking in human voices. They have names and distinct personalities. There are assistants like Anthropic’s Claude, which has gone through “character training” to become more “open-minded and thoughtful,” and Microsoft’s Copilot, which has more of a “hype man” persona and is always there to provide “emotional support.” It represents a new sort of relationship with technology: less instrumental, more interpersonal. 

Few people have grappled as explicitly with the unique benefits, dangers, and confusions of these relationships as the customers of “AI companion” companies. These companies have raced ahead of the tech giants in embracing the technology’s full anthropomorphic potential, giving their AI agents human faces, simulated emotions, and customizable backstories. The more human AI seems, the founders argue, the better it will be at meeting our most important human needs, like supporting our mental health and alleviating our loneliness. Many of these companies are new and run by just a few people, but already, they collectively claim tens of millions of users. — Read More

#strategy

The Gen AI Bridge to the Future

In 1945 the U.S. government built ENIAC, an acronym for Electronic Numerical Integrator and Computer, to do ballistics trajectory calculations for the military; World War 2 was nearing its conclusion, however, so ENIAC’s first major job was to do calculations that undergirded the development of the hydrogen bomb. Six years later, J. Presper Eckert and John Mauchly, who led the development of ENIAC, launched UNIVAC, the Universal Automatic Computer, for broader government and commercial applications. Early use cases included calculating the U.S. census and assisting with calculation-intensive back office operations like payroll and bookkeeping.

These were hardly computers as we know them today, but rather calculation machines that took in reams of data (via punch cards or magnetic tape) and returned results according to hardwired calculation routines; the “operating system” were the humans actually inputting the data, scheduling jobs, and giving explicit hardware instructions. Originally this instruction also happened via punch cards and magnetic tape, but later models added consoles to both provide status and also allow for register-level control; these consoles evolved into terminals, but the first versions of these terminals, like the one that was available for the original version of the IBM System/360, were used to initiate batch programs.

Any recounting of computing history usually focuses on the bottom two levels of that stack — the device and the input method — because they tend to evolve in parallel.  … What stands out to me, however, is the top level of the initial stack : the application layer on one paradigm provides the bridge to the next one. This, more than anything, is why generative AI is a big deal in terms of realizing the future. — Read More

#strategy

What the departing White House chief tech advisor has to say on AI

President Biden’s administration will end within two months, and likely to depart with him is Arati Prabhakar, the top mind for science and technology in his cabinet. She has served as Director of the White House Office of Science and Technology Policy since 2022 and was the first to demonstrate ChatGPT to the president in the Oval Office. Prabhakar was instrumental in passing the president’s executive order on AI in 2023, which sets guidelines for tech companies to make AI safer and more transparent (though it relies on voluntary participation).

The incoming Trump administration has not presented a clear thesis of how it will handle AI, but plenty of people in it will want to see that executive order nullified. Trump said as much in July, endorsing the 2024 Republican Party Platform that says the executive order “hinders AI innovation and imposes Radical Leftwing ideas on the development of this technology.” Venture capitalist Marc Andreessen has said he would support such a move.

However, complicating that narrative will be Elon Musk, who for years has expressed fears about doomsday AI scenarios, and has been supportive of some regulations aiming to promote AI safety.

As she prepares for the end of the administration, I sat down with Prabhakar and asked her to reflect on President Biden’s AI accomplishments, and how AI risks, immigration policies, the CHIPS Act and more could change under Trump. — Read More

#strategy

2024: The State of Generative AI in the Enterprise

The enterprise AI landscape is being rewritten in real time. As pilots give way to production, we surveyed 600 U.S. enterprise IT decision-makers to reveal the emerging winners and losers.

2024 marks the year that generative AI became a mission-critical imperative for the enterprise. The numbers tell a dramatic story: AI spending1 surged to $13.8 billion this year, more than 6x the $2.3 billion spent in 2023—a clear signal that enterprises are shifting from experimentation to execution, embedding AI at the core of their business strategies. 

This spike in spending reflects a wave of organizational optimism; 72% of decision-makers anticipate broader adoption of generative AI tools in the near future. This confidence isn’t just speculative—generative AI tools are already deeply embedded in the daily work of professionals, from programmers to healthcare providers.

Despite this positive outlook and increasing investment, many decision-makers are still figuring out what will and won’t work for their businesses.  — Read More

#strategy

The Anti-LLM Revolution Begins

If you lift your head over the media funnel of AI outlets and influencers that simply echo Sam Altman’s thoughts every time he speaks, you will realize that, despite the recent emergence of OpenAI’s New o1 Models, the sentiment against Large Language Models (LLMs) is at all-time highs.

The reason?

Despite the alleged increase in ‘intelligence’ that o1 models represent, they still suffer from the same issues previous generations had. In crucial aspects, we have made no progress in the last six years, despite all the hype. — Read More

#strategy