Textbooks are a cornerstone of education, but they have a fundamental limitation: they are a one-size-fits-all medium. Any new material or alternative representation requires arduous human effort, so that textbooks cannot be adapted in a scalable manner. We present an approach for transforming and augmenting textbooks using generative AI, adding layers of multiple representations and personalization while maintaining content integrity and quality. We refer to the system built with this approach as Learn Your Way. We report pedagogical evaluations of the different transformations and augmentations, and present the results of a a randomized control trial, highlighting the advantages of learning with Learn Your Way over regular textbook usage. — Read More
Tag Archives: Augmented Intelligence
Writers vs. AI: Microsoft Study Reveals How GPT-4 Impacts Creativity and Voice
Rather than fear AI, writers should learn how to use them properly. While this tech is transforming many sectors, and creative writing is no exception, it boils down to how unique a written content.
To this end, the Microsoft research team joined hands with the University of Southern California to experiment on whether generative AI boosts or weakens a writer’s uniqueness.
The study, titled “It Was 80% Me, 20% AI”, included 19 fiction writers, 30 readers, and AI-generated suggestions using OpenAI’s GPT-4. … Lead researcher Angel Hsing-Chi Hwang explained that for an author or writer, the value of someone’s work is what it means to be authentic. In this regard, co-writing with AI might destroy this purpose. — Read More
Superhuman performance of a large language model on the reasoning tasks of a physician
Performance of large language models (LLMs) on medical tasks has traditionally been evaluated using multiple choice question benchmarks. However, such benchmarks are highly constrained, saturated with repeated impressive performance by LLMs, and have an unclear relationship to performance in real clinical scenarios. Clinical reasoning, the process by which physicians employ critical thinking to gather and synthesize clinical data to diagnose and manage medical problems, remains an attractive benchmark for model performance. Prior LLMs have shown promise in outperforming clinicians in routine and complex diagnostic scenarios. We sought to evaluate OpenAI’s o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. We characterize the performance of o1-preview with five experiments including differential diagnosis generation, display of diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics. Our primary outcome was comparison of the o1-preview output to identical prior experiments that have historical human controls and benchmarks of previous LLMs. Significant improvements were observed with differential diagnosis generation and quality of diagnostic and management reasoning. No improvements were observed with probabilistic reasoning or triage differential diagnosis. This study highlights o1-preview’s ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models. New robust benchmarks and scalable evaluation of LLM capabilities compared to human physicians are needed along with trials evaluating AI in real clinical settings. — Read More
The Internet Creator’s Guide to the Future
TL;DR: Today we’re releasing a new episode of our podcast AI & I. I go in depth with Steph Smith, a16z Podcast host and internet creator. We dive into how AI is reshaping the world that internet creators live in. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.
Steph Smith is the ultimate internet explorer.
I spent an hour talking to her about the future of creating on the internet in the age of AI. We had a wide-ranging discussion about:
— How AI narrows the gap between ideas and execution
— How AI changes what humans perceive as valuable in art and creativity
— The type of AI tools that are poised for success — Read More
AI can make you more creative—but it has limits
Generative AI models have made it simpler and quicker to produce everything from text passages and images to video clips and audio tracks. Texts and media that might have taken years for humans to create can now be generated in seconds.
But while AI’s output can certainly seem creative, do these models actually boost human creativity?
That’s what two researchers set out to explore in new research published today in Science Advances, studying how people used OpenAI’s large language model GPT-4 to write short stories.
The model was helpful—but only to an extent. They found that while AI improved the output of less creative writers, it made little difference to the quality of the stories produced by writers who were already creative. The stories in which AI had played a part were also more similar to each other than those dreamed up entirely by humans. — Read More
Towards Conversational Diagnostic AI
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians’ expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE’s performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. — Read More
Communicative Agents for Software Development
Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of “software agents”, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of ChatDev highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of ChatDev unveils fresh possibilities for integrating LLMs into the realm of software development. Our code is available at this https URL. – Read More
Klarna CEO says AI can do the job of 700 workers. But job replacement isn’t the biggest issue.
Fintech company Klarna, which powers e-commerce transactions for some of the world’s most recognizable brands, including Expedia, Macy’s and Nike, is at the forefront of AI adoption. It has integrated artificial intelligence across the company, most notably with an AI chatbot that it recently said does the equivalent work of 700 customer service agents. Klarna, which employs roughly 4,000 people, recently released statistics that show how efficient and effective the tool has been, wading into the thick of sensitive and high-stakes debates about the role of generative AI in business, how humans interact with it and its implications for the future of work. CEO Sebastian Siemiatkowski explains why he is so transparent about AI’s capabilities, and what concerns him most about the new technology. This interview has been edited for length and clarity. — Read More
#augmented-intelligenceNvidia is now powering AI nurses
Nvidia announced a collaboration with Hippocratic AI on Monday, a healthcare company that offers generative AI nurses who work for just $9 an hour. Hippocratic promotes how it can undercut real human nurses, who can cost $90 an hour, with its cheap AI agents that offer medical advice to patients over video calls in real-time. — Read More
Watch Video
I spent a week using AI tools in my daily life. Here’s how it went.
Every tech company you can think of is jumping on the generative AI bandwagon and touting new features promising to make our lives easier, increase productivity, and unlock some dormant cache of hidden potential within all of us.
But “promise” is the operative word here. Despite all the AI hype and billions of dollars of investment, generative AI is still very new to the average person and has yet to transform from being a fascinating novelty into an indispensable mainstay.
…I spent a little over a week using generative AI tools that fit within my daily life and work schedule. To do this, I made an outline of what my typical week looks like and identified ways where generative AI could help and which tools to use. — Read More