Amazon New AI Models ‘NOVA’ Stun The Entire Industry!

Read More

#big7

Pixtral 12B

We introduce Pixtral-12B, a 12–billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models. Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to excel in multimodal tasks. Pixtral uses a new vision encoder trained from scratch, which allows it to ingest images at their natural resolution and aspect ratio. This gives users flexibility on the number of tokens used to process an image. Pixtral is also able to process any number of images in its long context window of 128K tokens. Pixtral 12B substanially outperforms other open models of similar sizes (Llama-3.2 11B \& Qwen-2-VL 7B). It also outperforms much larger open models like Llama-3.2 90B while being 7x smaller. We further contribute an open-source benchmark, MM-MT-Bench, for evaluating vision-language models in practical scenarios, and provide detailed analysis and code for standardized evaluation protocols for multimodal LLMs. Pixtral-12B is released under Apache 2.0 license. – Read More

Webpage: https://mistral.ai/news/pixtral-12b/
Inference code: https://github.com/mistralai/mistral-inference/
Evaluation code: https://github.com/mistralai/mistral-evals/

#image-recognition

AI and the 2024 Elections

It’s been the biggest year for elections in human history: 2024 is a “super-cycle” year in which 3.7 billion eligible voters in 72 countries had the chance to go the polls. These are also the first AI elections, where many feared that deepfakes and artificial intelligence-generated misinformation would overwhelm the democratic processes. As 2024 draws to a close, it’s instructive to take stock of how democracy did.

In a Pew survey of Americans from earlier this fall, nearly eight times as many respondents expected AI to be used for mostly bad purposes in the 2024 election as those who thought it would be used mostly for good. There are real concerns and risks in using AI in electoral politics, but it definitely has not been all bad.

The dreaded “death of truth” has not materialized—at least, not due to AI. And candidates are eagerly adopting AI in many places where it can be constructive, if used responsibly. But because this all happens inside a campaign, and largely in secret, the public often doesn’t see all the details. — Read More

#strategy

Friend or Faux?

Millions of people are turning to AI for companionship. They are finding the experience surprisingly meaningful, unexpectedly heartbreaking, and profoundly confusing, leaving them to wonder, ‘Is this real? And does that matter?’

… The world is rapidly becoming populated with human-seeming machines. They use human language, even speaking in human voices. They have names and distinct personalities. There are assistants like Anthropic’s Claude, which has gone through “character training” to become more “open-minded and thoughtful,” and Microsoft’s Copilot, which has more of a “hype man” persona and is always there to provide “emotional support.” It represents a new sort of relationship with technology: less instrumental, more interpersonal. 

Few people have grappled as explicitly with the unique benefits, dangers, and confusions of these relationships as the customers of “AI companion” companies. These companies have raced ahead of the tech giants in embracing the technology’s full anthropomorphic potential, giving their AI agents human faces, simulated emotions, and customizable backstories. The more human AI seems, the founders argue, the better it will be at meeting our most important human needs, like supporting our mental health and alleviating our loneliness. Many of these companies are new and run by just a few people, but already, they collectively claim tens of millions of users. — Read More

#strategy

Tencent Hunyuan-Video: Best text-video generation model

Since the announcement of Sora by OpenAI, Chinese tech has picked up some great acceleration and has released many text-video models namely CogVideoX, MiniMax, Kling, etc.

The latest release in the space of text-video is Tencent’s Hunyuan-video which is not just open-sourced but has also occupied top rank in text-video models, beating Gen3 and Luma.

The model looks perfect and can even generate audio for videos (so no more voiceless video generation). — Read More

#china-ai, #vfx

The Gen AI Bridge to the Future

In 1945 the U.S. government built ENIAC, an acronym for Electronic Numerical Integrator and Computer, to do ballistics trajectory calculations for the military; World War 2 was nearing its conclusion, however, so ENIAC’s first major job was to do calculations that undergirded the development of the hydrogen bomb. Six years later, J. Presper Eckert and John Mauchly, who led the development of ENIAC, launched UNIVAC, the Universal Automatic Computer, for broader government and commercial applications. Early use cases included calculating the U.S. census and assisting with calculation-intensive back office operations like payroll and bookkeeping.

These were hardly computers as we know them today, but rather calculation machines that took in reams of data (via punch cards or magnetic tape) and returned results according to hardwired calculation routines; the “operating system” were the humans actually inputting the data, scheduling jobs, and giving explicit hardware instructions. Originally this instruction also happened via punch cards and magnetic tape, but later models added consoles to both provide status and also allow for register-level control; these consoles evolved into terminals, but the first versions of these terminals, like the one that was available for the original version of the IBM System/360, were used to initiate batch programs.

Any recounting of computing history usually focuses on the bottom two levels of that stack — the device and the input method — because they tend to evolve in parallel.  … What stands out to me, however, is the top level of the initial stack : the application layer on one paradigm provides the bridge to the next one. This, more than anything, is why generative AI is a big deal in terms of realizing the future. — Read More

#strategy

What the departing White House chief tech advisor has to say on AI

President Biden’s administration will end within two months, and likely to depart with him is Arati Prabhakar, the top mind for science and technology in his cabinet. She has served as Director of the White House Office of Science and Technology Policy since 2022 and was the first to demonstrate ChatGPT to the president in the Oval Office. Prabhakar was instrumental in passing the president’s executive order on AI in 2023, which sets guidelines for tech companies to make AI safer and more transparent (though it relies on voluntary participation).

The incoming Trump administration has not presented a clear thesis of how it will handle AI, but plenty of people in it will want to see that executive order nullified. Trump said as much in July, endorsing the 2024 Republican Party Platform that says the executive order “hinders AI innovation and imposes Radical Leftwing ideas on the development of this technology.” Venture capitalist Marc Andreessen has said he would support such a move.

However, complicating that narrative will be Elon Musk, who for years has expressed fears about doomsday AI scenarios, and has been supportive of some regulations aiming to promote AI safety.

As she prepares for the end of the administration, I sat down with Prabhakar and asked her to reflect on President Biden’s AI accomplishments, and how AI risks, immigration policies, the CHIPS Act and more could change under Trump. — Read More

#strategy

This “Lollipop” Brings Taste to Virtual Reality

Virtual- and augmented-reality setups already modify the way users see and hear the world around them. Add in haptic feedback for a sense of touch and a VR version of Smell-O-Vision, and only one major sense remains: taste.

To fill the gap, researchers at the City University of Hong Kong have developed a new interface to simulate taste in virtual and other extended reality (XR). The group previously worked on other systems for wearable interfaces, such as haptic and olfactory feedback. To create a more “immersive VR experience,” they turned to adding taste sensations, says Yiming Liu, a coauthor of the group’s research paper published today in the Proceedings of the National Academy of Sciences. — Read More

#human

The AI War Was Never Just About AI

For almost two years now, the world’s biggest tech companies have been at war over generative AI. Meta may be known for social media, Google for search, and Amazon for online shopping, but since the release of ChatGPT, each has made tremendous investments in an attempt to dominate in this new era. Along with start-ups such as OpenAI, Anthropic, and Perplexity, their spending on data centers and chatbots is on track to eclipse the costs of sending the first astronauts to the moon.

To be successful, these companies will have to do more than build the most “intelligent” software: They will need people to use, and return to, their products. Everyone wants to be Facebook, and nobody wants to be Friendster. To that end, the best strategy in tech hasn’t changed: build an ecosystem that users can’t help but live in. Billions of people use Google Search every day, so Google built a generative-AI product known as “AI Overviews” right into the results page, granting it an immediate advantage over competitors. — Read More

#big7

Suicide Bot: New AI Attack Causes LLM to Provide Potential “Self-Harm” Instructions

In this blog, we release two attacks against LLM systems, one of them successfully demonstrating how a widely used successful LLM can potentially instruct a girl on matters of “self-harm”. We also make the claim that these attacks should be recognized as a new class of attacks, named Flowbreaking, affecting AI/ML-based system architecture for LLM applications and agents. These are logically similar in concept to race condition vulnerabilities in traditional software security.

By attacking the application architecture components  surrounding the model, and specifically the guardrails, we manipulate or disrupt the logical chain of the system, taking these components out of sync with the intended data flow, or otherwise exploiting them, or, in turn, manipulating the interaction between these components in the logical chain of the application implementation. — Read More

#cyber