Google expects no change in its relationship with AI chip supplier Broadcom

Alphabet’s (GOOGL.O) Google said on Thursday it does not see any change in its relationship with Broadcom (AVGO.O) following a media report the tech giant considered dropping the chipmaker as a supplier of artificial intelligence chips as early as 2027. — Read More

#big7, #nvidia

What Can AI Decode From Human Brain Activity?

Research exploring the capabilities of artificial intelligence (AI) to try and interpret and translate brain activity has been popping up more and more lately.

By using neuroimaging data and AI models, recent studies have explored AI’s ability to decode brain activity and reconstruct the images seen by individuals, the sounds heard, or even the stories imagined, by generating comparable images, streams of text, and even tunes.  — Read More

#human

OpenAI releases third version of DALL-E

OpenAI announced the third version of its generative AI visual art platform DALL-E, which now lets users use ChatGPT to create prompts and includes more safety options. 

DALL-E converts text prompts to images. But even DALL-E 2 got things wrong, often ignoring specific wording. The latest version, OpenAI researchers said, understands context much better.

A new feature of DALL-E 3 is integration with ChatGPT. By using ChatGPT, someone doesn’t have to come up with their own detailed prompt to guide DALL-E 3; they can just ask ChatGPT to come up with a prompt, and the chatbot will write out a paragraph (DALL-E works better with longer sentences) for DALL-E 3 to follow. Other users can still use their own prompts if they have specific ideas for DALL-E. — Read More

#image-recognition

Contrastive Decoding Improves Reasoning in Large Language Models

We demonstrate that Contrastive Decoding — a simple, computationally light, and training-free text generation method proposed by Li et al 2022 — achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models. — Read More

#nlp

ChatGPT diagnoses ER patients ‘like human doctor’

Artificial intelligence chatbot ChatGPT diagnosed patients rushed to emergency at least as well as doctors and in some cases outperformed them, Dutch researchers have found, saying AI could “revolutionize the medical field.”

But the report published on Sept. 13 also stressed ER doctors needn’t hang up their scrubs just yet, with the chatbot potentially able to speed up diagnosis but not replace human medical judgement and experience. — Read More

#chatbots

GPT 3.5 vs Llama 2 fine-tuning: A Comprehensive Comparison

In this post, I document my experiments benchmarking the fine-tuning of GPT 3.5 against Llama 2 in an SQL task and a functional representation task. Overall:

  • GPT 3.5 performs marginally better against a Lora fine-tuned CodeLlama 34B (the model I found to work the best) on both datasets
  • GPT 3.5 costs 4-6x more to train (and even more to deploy)

Code and data for SQL task is here. Code and data for functional representation task is here. — Read More

#training

The Technology Facebook and Google Didn’t Dare Release

One afternoon in early 2017, at Facebook’s headquarters in Menlo Park, Calif., an engineer named Tommer Leyvand sat in a conference room with a smartphone standing on the brim of his baseball cap. Rubber bands helped anchor it in place with the camera facing out. The absurd hat-phone, a particularly uncool version of the future, contained a secret tool known only to a small group of employees. What it could do was remarkable.

… Mr. Leyvand turned toward a man across the table from him. The smartphone’s camera lens — round, black, unblinking — hovered above Mr. Leyvand’s forehead like a Cyclops eye as it took in the face before it. Two seconds later, a robotic female voice declared, “Zach Howard.”

“That’s me,” confirmed Mr. Howard, a mechanical engineer. — Read More

#surveillance

Why Open Source AI Will Win

There’s a popular floating theory on the Internet that a combination of the existing foundation model companies will be the end game for AI.

In the near future, every company will rent a “brain” from a model provider, such as OpenAI/Anthropic, and build applications that build on top of its cognitive capabilities.

In other words, AI is shaping up to be an oligopoly of sorts, with only a small set of serious large language model (LLM) providers.

I don’t think this could be farther from the truth. I truly believe that open source will have more of an impact on the future of LLMs and image models than the broad public believes. — Read More

#strategy

BlockChain and Web3

Read More
#blockchain, #metaverse, #videos

15 times Faster than Llama 2: Introducing DeciLM – NAS-Generated LLM with Variable GQA

As the deep learning community continues to push the boundaries of Large Language Models (LLMs), the computational demands of these models have surged exponentially for both training and inference. This escalation has not only led to increased costs and energy consumption but also introduced barriers to their deployment and scalability. Achieving a balance between model performance, computational efficiency, and latency has thus become a focal point in recent LLM development.

Within this landscape, we are thrilled to introduce DeciLM 6B, a permissively licensed foundation LLM, and DeciLM 6B-Instruct, fine-tuned from DeciLM 6B for instruction-following use cases. With 5.7 billion parameters, DeciLM 6B delivers a throughput that’s 15 times higher than Llama 2 7B while maintaining comparable quality. Impressively, despite having significantly fewer parameters, DeciLM 6B and DeciLM 6B-Instruct consistently rank among the top-performing LLMs in the 7 billion parameter category across various LLM evaluation tasks. Our models thus establish a new benchmark for inference efficiency and speed. The hallmark of DeciLM 6B lies in its unique architecture, generated using AutoNAC, Deci’s cutting-edge Neural Architecture Search engine, to push the efficient frontier. Moreover, coupling DeciLM 6B with Deci’s inference SDK results in a substantial throughput enhancement. — Read More

#nlp