Google AI Introduces ‘WIT’, A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning

Image and text datasets are widely used in many machine learning applications. To model the relationship between images and text, most multimodal Visio-linguistic models today rely on large datasets. Historically, these datasets were created by either manually captioning images or crawling the web and extracting the alt-text as the caption. While the former method produces higher-quality data, the intensive manual annotation process limits the amount of data produced. The automated extraction method can result in larger datasets. However, it requires either heuristics and careful filtering to ensure data quality or scaling-up models to achieve robust performance. 

To overcome these limitations, Google research team created a high-quality, large-sized, multilingual dataset called the Wikipedia-Based Image Text (WIT) Dataset. It is created by extracting multiple text selections associated with an image from Wikipedia articles and Wikimedia image links. Read More

#big7, #image-recognition

Greece used AI to curb COVID: what other nations can learn

Governments are hungry to deploy big data in health emergencies. Scientists must help to lay the legal, ethical and logistical groundwork.

…Between August and November 2020 — with input from Drakopoulos and his colleagues — Greece launched a system that uses a machine-learning algorithm to determine which travellers entering the country should be tested for COVID-19. …The machine-learning system, which is among the first of its kind, is called Eva and is described in Nature this week (H. Bastani et al. Nature https://doi.org/10.1038/s41586-021-04014-z; 2021). It’s an example of how data analysis can contribute to effective COVID-19 policies. But it also presents challenges, from ensuring that individuals’ privacy is protected to the need to independently verify its accuracy. Moreover, Eva is a reminder of why proposals for a pandemic treaty (see Nature 594, 8; 2021) must consider rules and protocols on the proper use of AI and big data. These need to be drawn up in advance so that such analyses can be used quickly and safely in an emergency. Read More

#surveillance