GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the same LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality. — Read More
Tag Archives: ChatBots
Apple sneaks into the AI chatbot race with ‘Apple GPT’
The iPhone maker has begun prepping an AI chatbot to rival OpenAI’s ChatGPT, Microsoft Bing, and Google Bard.
It’s the news we’ve all been waiting for. Apple is finally throwing its hat in the proverbial generative AI ring and joining, well, everybody else to contest for OpenAI’s artificial intelligence crown.
The news comes through reports from Bloomberg that the company is quietly working on a tool that engineers dub “Apple GPT,” indirectly referring to ChatGPT, the most famous AI chatbot and, until recently, fastest-growing ‘app’ of all time. — Read More
Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications
LangChain exists to make it as easy as possible to develop LLM-powered applications.
… Today, we’re introducing LangSmith, a platform to help developers close the gap between prototype and production. It’s designed for building and iterating on products that can harness the power–and wrangle the complexity–of LLMs.
LangSmith is now in closed beta. So if you’re looking for a robust, unified, system for debugging, testing, evaluating, and monitoring your LLM applications, sign up here. — Read More
SCALE: Custom Open-Source LLMs
Fine-tune open-source large language models for improved performance on your most important use cases.
… Scale Generative AI Data Engine powers the most advanced LLMs and generative models in the world through world-class RLHF, data generation, model evaluation, safety, and alignment. — Read More
Meta’s latest AI model is free for all
The company hopes that making LLaMA 2 open source might give it the edge over rivals like OpenAI.
Meta is going all in on open-source AI. The company is today unveiling LLaMA 2, its first large language model that’s available for anyone to use—for free.
Since OpenAI released its hugely popular AI chatbot ChatGPT last November, tech companies have been racing to release models in hopes of overthrowing its supremacy. Meta has been in the slow lane. In February when competitors Microsoft and Google announced their AI chatbots, Meta rolled out the first, smaller version of LLaMA, restricted to researchers. But it hopes that releasing LLaMA 2, and making it free for anyone to build commercial products on top of, will help it catch up. — Read More
Artificial intelligence chatbots are spreading fast, but hype about them is spreading faster
Those of us who have spent the last few decades reporting on technology have seen fads and fashions rise and fall on investment bubbles.
In the late 1990s it was dot-com companies, more recently crypto, blockchain, NFTs, driverless cars, the “metaverse.” All have had their day in the sun amid promises they would change the world, or at least banking and finance, the arts, transportation, society at large. To date, those promises are spectacularly unfulfilled.
That brings us to artificial intelligence chatbots.
In from three to eight years we will have a machine with the general intelligence of an average human being…. In a few months, it will be at genius level and a few months after that its powers will be incalculable. — AI pioneer Marvin Minsky — in 1970
Read More
Claude 2
We are pleased to announce Claude 2, our new model. Claude 2 has improved performance, longer responses, and can be accessed via API as well as a new public-facing beta website, claude.ai. We have heard from our users that Claude is easy to converse with, clearly explains its thinking, is less likely to produce harmful outputs, and has a longer memory. We have made improvements from our previous models on coding, math, and reasoning. For example, our latest model scored 76.5% on the multiple choice section of the Bar exam, up from 73.0% with Claude 1.3. When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning.
Think of Claude as a friendly, enthusiastic colleague or personal assistant who can be instructed in natural language to help you with many tasks. The Claude 2 API for businesses is being offered for the same price as Claude 1.3. Additionally, anyone in the US and UK can start using our beta chat experience today. — Read More
GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE
OpenAI is keeping the architecture of GPT-4 closed not because of some existential risk to humanity but because what they’ve built is replicable. In fact, we expect Google, Meta, Anthropic, Inflection, Character, Tencent, ByteDance, Baidu, and more to all have models as capable as GPT-4 if not more capable in the near term.
Don’t get us wrong, OpenAI has amazing engineering, and what they built is incredible, but the solution they arrived at is not magic. It is an elegant solution with many complex tradeoffs. Going big is only a portion of the battle. OpenAI’s most durable moat is that they have the most real-world usage, leading engineering talent, and can continue to race ahead of others with future models. — Read More
Yam Peleg posted the details. Yam’s Post Here … at least for now
AI-text detection tools are really easy to fool
Within weeks of ChatGPT’s launch, there were fears that students would be using the chatbot to spin up passable essays in seconds. In response to those fears, startups started making products that promise to spot whether text was written by a human or a machine.
The problem is that it’s relatively simple to trick these tools and avoid detection, according to new research that has not yet been peer reviewed. — Read More
OpenAI launches its GPT-4 API into general availability
OpenAI LP today made GPT-4, its newest and most capable language model, generally available through a cloud-based application programming interface.
… Alongside GPT-4, OpenAI is making three other AI models’ APIs generally available: GPT-3.5 Turbo, a predecessor to GPT-4 that offers more limited capabilities for a significantly lower cost, DALL-E for image generation, and Whisper for speech transcription. — Read More