GPT detectors frequently misclassify non-native English writing as AI generated, raising concerns about fairness and robustness. Addressing the biases in these detectors is crucial to prevent the marginalization of non-native English speakers in evaluative and educational settings and to create a more equitable digital landscape.
… GPT detectors exhibit significant bias against non-native English authors, as demonstrated by their high misclassification of TOEFL essays written by non-native speakers. In our study, we evaluated the performance of seven widely used GPT detectors on 91 TOEFL (Test of English as a Foreign Language) essays from a Chinese forum and 88 US eighth-grade essays from the Hewlett Foundation’s ASAP dataset. While the detectors accurately classified the US student essays, they incorrectly labeled more than half of the TOEFL essays as “AI-generated” (average false-positive rate: 61.3%). — Read More
Monthly Archives: July 2023
Emerging Architectures for LLM Applications
Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them.
In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now. — Read More
GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE
OpenAI is keeping the architecture of GPT-4 closed not because of some existential risk to humanity but because what they’ve built is replicable. In fact, we expect Google, Meta, Anthropic, Inflection, Character, Tencent, ByteDance, Baidu, and more to all have models as capable as GPT-4 if not more capable in the near term.
Don’t get us wrong, OpenAI has amazing engineering, and what they built is incredible, but the solution they arrived at is not magic. It is an elegant solution with many complex tradeoffs. Going big is only a portion of the battle. OpenAI’s most durable moat is that they have the most real-world usage, leading engineering talent, and can continue to race ahead of others with future models. — Read More
Yam Peleg posted the details. Yam’s Post Here … at least for now
Inside Google’s big AI shuffle — and how it plans to stay competitive, with Google DeepMind CEO Demis Hassabis
Today, I’m talking to Demis Hassabis, the CEO of Google DeepMind, the newly created division of Google responsible for AI efforts across the company. Google DeepMind is the result of an internal merger: Google acquired Demis’ DeepMind startup in 2014 and ran it as a separate company inside its parent company, Alphabet, while Google itself had an AI team called Google Brain.
Google has been showing off AI demos for years now, but with the explosion of ChatGPT and a renewed threat from Microsoft in search, Google and Alphabet CEO Sundar Pichai made the decision to bring DeepMind into Google itself earlier this year to create… Google DeepMind.
What’s interesting is that Google Brain and DeepMind were not necessarily compatible or even focused on the same things: DeepMind was famous for applying AI to things like games and protein-folding simulations. The AI that beat world champions at Go, the ancient board game? That was DeepMind’s AlphaGo. Meanwhile, Google Brain was more focused on what’s come to be the familiar generative AI toolset: large language models for chatbots, editing features in Google Photos, and so on. This was a culture clash and a big structure decision with the goal of being more competitive and faster to market with AI products. Read More
Med-PaLM
Med-PaLM is a large language model (LLM) designed to provide high quality answers to medical questions.
Med-PaLM harnesses the power of Google’s large language models, which we have aligned to the medical domain and evaluated using medical exams, medical research, and consumer queries. Our first version of Med-PaLM, preprinted in late 2022, was the first AI system to surpass the pass mark on US Medical License Exam (USMLE) style questions. Med-PaLM also generates accurate, helpful long-form answers to consumer health questions, as judged by panels of physicians and users.
We introduced our latest model, Med-PaLM 2, at our annual health event The Check Up in Q1, 2023. Med-PaLM 2 achieves an accuracy of 86.5% on USMLE-style questions, a 19% leap over our own state of the art results from Med-PaLM. — Read More
The UN holds a robot press conference about the state of AI
The AI for Good global summit hosted by the U.N. tech agency invited a panel of robots and their creators to a press conference to answer questions from reporters.
At the AI for Good 2023 global summit, a panel of robots and their creators sat in front of the press to answer journalists’ questions on topics such as job automation, artificial intelligence (AI) leadership and collaboration with humans for a better future.
… Altogether nine robots were in attendance, including Sophia, who serves as the U.N. Development Program’s first robot innovation ambassador, a robot healthcare service provider named Grace and a rock star robot called Desdemona. — Read More
Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai
Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and optimize cost. However, a major challenge that engineering teams face is operationalizing AI applications across different platforms as the stack changes. This requires MLOps teams to familiarize themselves with different environments and developers to customize applications to run across target platforms.
NVIDIA offers a consistent, full stack to develop on a GPU-powered on-premises or on-cloud instance. You can then deploy that AI application on any GPU-powered platform without code changes.
The NVIDIA Cloud Native Stack Virtual Machine Image (VMI) is GPU-accelerated. It comes pre-installed with Cloud Native Stack, which is a reference architecture that includes upstream Kubernetes and the NVIDIA GPU Operator. NVIDIA Cloud Native Stack VMI enables you to build, test, and run GPU-accelerated containerized applications orchestrated by Kubernetes. — Read More
MetaGPT: Multi-Agent Meta Programming Framework
MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.
Internally, MetaGPT includes product managers / architects / project managers / engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. — Read More
I Wore the Future With a Brain-Connected AR-VR Headset
The next frontier might be neurotech: OpenBCI’s Galea headset, along with advances in assistive controls, points to a wild, wearable road ahead.
A few weeks ago, I saw the best quality mixed reality headset with an interface controlled using my fingers and eyes: Apple’s Vision Pro. But a few months before its announcement, I saw something perhaps even wilder. Clips on my ears, a crown of rubbery-tipped sensors nestled into my hair and a face mask lowered in front of my eyes. Suddenly I was looking at my own brain waves in VR and moving things around with only tiny movements of my facial muscles. I was test driving OpenBCI’s Galea.
The future of VR and AR is advancing steadily, but inputs remain a challenge. For now, it’s a territory moving from physical controllers to hand- and eye-tracking. But there are deeper possibilities beyond that, and they’re neural. — Read More
Gizmodo Editor Slams ‘Shameful’ AI-Written Article: ‘It’s F–king Dogs–t’
Gizmodo’s io9 section, which focuses on science fiction, published an error-riddled article written by “Gizmodo Bot” which deputy editor James Whitbrook said on Wednesday was foisted on the site’s editorial team with little notice.
The AI-generated article, “A Chronological List of Star Wars Movies & TV Shows,” was riddled with factual errors. G/O Media, the owner of Gizmodo, said last week it was starting to use artificial intelligence on its sites, including Gizmodo, The Onion and Deadspin and the Root. — Read More