Key Trends in Data Lakes

Data lakes have become a key tool for mining competitive insight from large repositories of data.

The term data lake has been with us for many years. It’s origin is attributed to James Dixon who coined the term while writing, “If you think of a data mart as a store of bottled water – cleansed, packaged, and structured for easy consumption – the data lake is a large body of water in a more natural state.”

Many a subsequent writer has questioned whether organizations were creating data lakes with business value or data swamps with limited or no value. Given this, Marco Iansiti and Karim Lakhani have suggested that the data lake, data in it is original source, is part of a data platform with “data flowing from bottom to top…And the data platform aggregates, cleans, refines, and processes data” captured in the data lake.

Given this more refined view, the question is: where is the data lake within its hype cycle? To answer this question, I asked CIOs and industry experts for their opinions. Read More

#data-lake

Meet GPT-3. It Has Learned to Code (and Blog and Argue).

The latest natural-language system generates tweets, pens poetry, summarizes emails, answers trivia questions, translates languages and even writes its own computer programs.

This summer, an artificial intelligence lab in San Francisco called OpenAI unveiled a technology several months in the making. This new system, GPT-3, had spent those months learning the ins and outs of natural language by analyzing thousands of digital books, the length and breadth of Wikipedia, and nearly a trillion words posted to blogs, social media and the rest of the internet.

Mckay Wrigley, a 23-year-old computer programmer from Salt Lake City, was one of the few invited to tinker with the system, which uses everything it has learned from that vast sea of digital text to generate new language on its own. Mr. Wrigley wondered if it could imitate public figures — write like them, perhaps even chat like them. Read More

#nlp

Ethical AI isn’t the same as trustworthy AI, and that matters

Artificial intelligence (AI) solutions are facing increased scrutiny due to their aptitude for amplifying both good and bad decisions. More specifically, for their propensity to expose and heighten existing societal biases and inequalities. It is only right, then, that discussions of ethics are taking center stage as AI adoption increases.

In lockstep with ethics comes the topic of trust. Ethics are the guiding rules for the decisions we make and actions we take. These rules of conduct reflect our core beliefs about what is right and fair. Trust, on the other hand, reflects our belief that another person — or company — is reliable, has integrity and will behave in the manner we expect. Ethics and trust are discrete, but often mutually reinforcing, concepts.

So is an ethical AI solution inherently trustworthy? Read More

#ethics, #trust

Teachable Machine From Google Makes It Easy To Train And Deploy ML Models

Teachable Machine is an experiment from Google to bring a no-code and low-code approach to training AI models. Anyone with a modern browser and webcam can quickly train a model with no prior knowledge or experience with AI. Read More

#big7, #transfer-learning

Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs

Results might provide a convenient screening tool for people who may not suspect they are infected.

Asymptomatic people who are infected with Covid-19 exhibit, by definition, no discernible physical symptoms of the disease. They are thus less likely to seek out testing for the virus, and could unknowingly spread the infection to others.

But it seems those who are asymptomatic may not be entirely free of changes wrought by the virus. MIT researchers have now found that people who are asymptomatic may differ from healthy individuals in the way that they cough. These differences are not decipherable to the human ear. But it turns out that they can be picked up by artificial intelligence. Read More

#voice

When A.I. Falls in Love

https://www.youtube.com/watch?v=8qkCJeC-mco
Read More

#nlp, #videos

Remote Works

We’re going through a major transformation in the way we work. The workforce is moving out of the office in never before seen numbers. The pace has been dizzying and most of us are still trying to figure out how it’s done. Remote Works is a show to help you understand and embrace this rapid evolution. We’ll dig deep to find out how companies can securely and successfully move to remote work en masse. We’ll look at the mindset that’s required for high-performing distributed teams. You’ll meet people whose work depends on the tightest of connections and most dependable of tech. Experts share their insight on what the lasting changes will be to the places we work and what we need to adjust, stay productive and thrive. Read More

#podcasts

TLDR: Extreme Summarization of Scientific Documents

We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SCITLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high-quality summaries while minimizing annotation burden. We propose CATTS, a simple yet effective learning strategy for generating TLDRs that exploits titles as an auxiliary training signal. CATTS improves upon strong baselines under both automated metrics and human evaluations. Data and code are publicly available at https://github.com/allenai/scitldr. Read More

#nlp

Five Most Controversial Moments Of AI In 2020

Artificial Intelligence has to be one of the most impactful technologies that the world has seen in recent years. It is no longer just limited to the quaint research and development labs of academies and bigger institutions but has successfully penetrated the normal and day-to-day functioning of the society. 

Like any other technology, AI also comes with its set of challenges. However, the stakes are slightly higher, considering the impact AI-technology-gone-rogue can have. Below we list some of the most controversial moments of the AI industry in 2020. If not anything, this may be considered as a cautionary alarm moving forward. Read More

#artificial-intelligence

Deep Evidential Regression

Deterministic neural networks (NNs) are increasingly being deployed in safety critical domains, where calibrated, robust, and efficient measures of uncertainty are crucial. In this paper, we propose a novel method for training non-Bayesian NNs to estimate a continuous target as well as its associated evidence in order to learn both aleatoric and epistemic uncertainty. We accomplish this by placing evidential priors over the original Gaussian likelihood function and training the NN to infer the hyperparameters of the evidential distribution. We additionally impose priors during training such that the model is regularized when its predicted evidence is not aligned with the correct output. Our method does not rely on sampling during inference or on out-of-distribution (OOD) examples for training, thus enabling efficient and scalable uncertainty learning. We demonstrate learning well-calibrated measures of uncertainty on various benchmarks, scaling to complex computer vision tasks, as well as robustness to adversarial and OOD test samples. Read More

#trust