With Effort you can adjust smoothly – and in real time – how many calculations you’d like to do during inference of an LLM model.
At 50% calculations it is as fast as regular matrix multiplications on Apple Silicon chips. At 25% effort it’s twice as fast and still retains most of the quality.
You can also freely choose to skip loading the least important weights.
It is implemented for Mistral now, it should work for all the other models just as well. No retraining needed, just conversion to a different format and some precomputation. — Read More
Daily Archives: April 18, 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs. — Read More
#nlpHow AI adds to human potential
Generative AI is advancing at a breakneck pace, prompting questions on risk and opportunity, from content creation to personal data management. In a special live recording, we delve into the ways AI can augment human work and spur innovation, instead of simply using AI to cut costs or replace jobs. Host Jeff Berman joined a seasoned AI researcher, Intel’s Lama Nachman, and a young start-up founder, Scale AI’s Alexandr Wang, on stage at the Intel Vision event in April 2024. They explore topics like AI’s disruption of creative industries, mitigating its biggest risks (like deep fakes), and why human critical thinking will be even more vital as AI technology spreads. — Read More
UMD-LinkUp AI Maps Transforms AI Job Tracking
UMD-LinkUp, a collaboration between the Robert H. Smith School of Business at the University of Maryland, LinkUp Job Market Data, and Outrigger Group, introduced the world’s first tool for mapping the creation of jobs requiring artificial intelligence skills: UMD-LinkUp AI Maps.
AI Maps leverages LinkUp’s industry-leading job data to visualize the spread of jobs requiring skills in AI across the country – by sector, state and more granular geographic levels. The resulting interactive map allows users to track the creation of U.S.-based AI jobs each month; rank states by their share of those jobs; do a deeper dive across economic sectors, metropolitan areas, and counties; and determine a region’s AI Intensity: the ratio of its AI jobs to all other postings. — Read More