Language models like GPT-4 and Claude are powerful and useful, but the data on which they are trained is a closely guarded secret. The Allen Institute for AI (AI2) aims to reverse this trend with a new, huge text dataset that’s free to use and open to inspection.
Dolma, as the dataset is called, is intended to be the basis for the research group’s planned open language model, or OLMo (Dolma is short for “Data to feed OLMo’s Appetite). As the model is intended to be free to use and modify by the AI research community, so too (argue AI2 researchers) should be the dataset they use to create it. — Read More
Daily Archives: August 21, 2023
Exactly the Wrong AI Copyrightability Case
Creativity Machine guy assumed away the debate and lost
Friday’s trial-court decision in Thaler v. Perlmutter, case 22-1564 in the DC district court, epitomizes the sad fact that just the wrong situation can make bad headlines easy, well before the real work in a legal debate.
I’m sure there will be links like “Court Rules AI Art Can’t Be Copyrighted” aplenty. They will be wrong. The court didn’t rule that AI art can’t be copyrighted. It ruled that copyright requires human authorship, surprising approximately zero copyright lawyers…or people who have read the Wikipedia page. — Read More
Shah Rukh Khan endorsing local businesses – with AI advertising
Does ChatGPT have a liberal bias?
A new paper making this claim has many flaws. But the question merits research.
Previous research has shown that many pre-ChatGPT language models express left-leaning opinions when asked about partisan topics. But OpenAI said in February that the workers who fine-tune ChatGPT train it to refuse to express opinions when asked controversial political questions. So it was interesting to see a new paper claim that ChatGPT expresses liberal opinions, agreeing with Democrats the vast majority of the time. — Read More