Rick's Cafe AI 9:34 am on June 24, 2021
Tags: GANS ( 71 ), Image Recognition

NVIDIA’s Canvas app turns doodles into AI-generated ‘photos’

NVIDIA has launched a new app you can use to paint life-like landscape images — even if you have zero artistic skills and a first grader can draw better than you. The new application is called Canvas, and it can turn childlike doodles and sketches into photorealistic landscape images in real time. It’s now available for download as a free beta, though you can only use it if your machine is equipped with an NVIDIA RTX GPU.

Canvas is powered by the GauGAN AI painting tool, which NVIDIA Research developed and trained using 5 million images. Read More

#gans, #image-recognition

Rick's Cafe AI 9:27 am on June 24, 2021
Tags: Image Recognition

Rembrandt’s The Night Watch painting restored by AI

The missing edges of Rembrandt’s painting The Night Watch have been restored using artificial intelligence.

The canvas, created in 1642, was trimmed in 1715 to fit between two doors at Amsterdam’s city hall.

Since then, 60cm (2ft) from the left, 22cm from the top, 12cm from the bottom and 7cm from the right have been missing.

But computer software has now restored the full painting for the first time in 300 years. Read More

#image-recognition

Rick's Cafe AI 10:03 am on June 17, 2021
Tags: Image Recognition, NLP ( 486 )

Full Page Handwriting Recognition via Image to Sequence Extraction

We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on Image to Sequence architecture, it can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and non-text. Further, it can also be trained to generate auxiliary markup related to formatting,layout and content. We use character level vocabulary, thereby enabling language and terminology of any subject. The model achieves a new state-of-art in paragraph level recognition on the IAM dataset. When evaluated on scans of real world handwritten free form test answers -beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols – it performs better than all commercially available HTR cloud APIs. It is deployed in production as part of a commercial web application. Read More

#image-recognition, #nlp

Rick's Cafe AI 9:31 am on June 15, 2021
Tags: GANS ( 71 ), Image Recognition

TextStyleBrush: Transfer of text aesthetics from a single example

We present a novel approach for disentangling the content of a text image from all aspects of its appearance. The appearance representation we derive can then be applied to new content, for one-shot transfer of the source style to new content. We learn this disentanglement in a self-supervised manner. Our method processes entire word boxes, without requiring segmentation of text from background, per-character processing, or making assumptions on string lengths. We show results in different text domains which were previously handled by specialized methods, e.g., scene text, handwritten text. To these ends, we make a number of technical contributions: (1) We disentangle the style and content of a textual image into a non-parametric, fixed-dimensional vector. (2) We propose a novel approach inspired by StyleGAN but conditioned over the example style at different resolution and content. (3) We present novel self-supervised training criteria which preserve both source style and target content using a pre-trained font classifier and text recognizer. Finally, (4) we also introduce Imgur5K, a new challenging dataset for handwritten word images. We offer numerous qualitative photo-realistic results of our method. We further show that our method surpasses previous work in quantitative tests on scene text and handwriting datasets, as well as in a user study. Read More

#image-recognition, #gans

Rick's Cafe AI 12:20 pm on June 12, 2021
Tags: Image Recognition

Voilà

Voilà lets you turn selfies into caricatures, cartoons, and 18th-century paintings. This AI empowered photo editor is one of the top free apps available on GooglePlay and is also now available on the AppStore. Read More

#image-recognition

Rick's Cafe AI 10:38 am on June 11, 2021
Tags: Adversarial ( 67 ), Image Recognition

Markpainting: Adversarial Machine Learning meets Inpainting

Inpainting is a learned interpolation technique that is based on generative modeling and used to populate masked or missing pieces in an image; it has wide applications in picture editing and retouching. Recently, inpainting started being used for watermark removal, raising concerns. In this paper we study how to manipulate it using our markpainting technique. First, we show how an image owner with access to an inpainting model can augment their image in such a way that any attempt to edit it using that model will add arbitrary visible information. We find that we can target multiple different models simultaneously with our technique. This can be designed to reconstitute a watermark if the editor had been trying to remove it. Second, we show that our markpainting technique is transferable to models that have different architectures or were trained on different datasets, so watermarks created using it are difficult for adversaries to remove. Markpainting is novel and can be used as a manipulation alarm that becomes visible in the event of inpainting. Read More

#adversarial, #image-recognition

Rick's Cafe AI 9:35 am on June 11, 2021
Tags: Image Recognition

BARF: Bundle-Adjusting Neural Radiance Fields

Neural Radiance Fields (NeRF) [30] have recently gained a surge of interest within the computer vision community for its power to synthesize photorealistic novel views of real-world scenes. One limitation of NeRF, however, is its requirement of accurate camera poses to learn the scene representations. In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses — the joint problem of learning neural 3D representations and registering cam-era frames. We establish a theoretical connection to classical image alignment and show that coarse-to-fine registration is also applicable to NeRF. Furthermore, we show that naïvely applying positional encoding in NeRF has a negative impact on registration with a synthesis-based objective. Experiments on synthetic and real-world data show that BARF can effectively optimize the neural scene representations and re-solve large camera pose misalignment at the same time. This enables view synthesis and localization of video sequences from unknown camera poses, opening up new avenues for visual localization systems (e.g. SLAM) and potential applications for dense 3D mapping and reconstruction. Read More

#image-recognition

Rick's Cafe AI 10:57 am on June 3, 2021
Tags: Image Recognition, NLP ( 486 )

Learning Transferable Visual Models From Natural Language Supervision

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision.We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the ac-curacy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28million training examples it was trained on. Read More

#image-recognition, #nlp

Rick's Cafe AI 10:52 am on June 3, 2021
Tags: Image Recognition

Towards General Purpose Vision Systems

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system’s ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting. Read More

#image-recognition

Rick's Cafe AI 11:04 am on May 27, 2021
Tags: Image Recognition

Projected Distribution Loss for Image Enhancement

Features obtained from object recognition CNNs have been widely used for measuring perceptual similarities between images. Such differentiable metrics can be used as perceptual learning losses to train image enhancement models. However, the choice of the distance function between input and target features may have a consequential impact on the performance of the trained model. While using the norm of the difference between extracted features leads to limited hallucination of details, measuring the distance between distributions of features may generate more textures; yet also more unrealistic details and artifacts. In this paper, we demonstrate that aggregating 1D-Wasserstein distances between CNN activations is more reliable than the existing approaches, and it can significantly improve the perceptual performance of enhancement models. More explicitly, we show that in imaging applications such as denoising, super-resolution, demosaicing, deblurring and JPEG artifact removal, the proposed learning loss outperforms the current state-of-the-art on reference-based perceptual losses. This means that the proposed learning loss can be plugged into different imaging frameworks and produce perceptually realistic results. Read More

#image-recognition

Recent Activity

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Tag Archives: Image Recognition

NVIDIA’s Canvas app turns doodles into AI-generated ‘photos’

Rembrandt’s The Night Watch painting restored by AI

Full Page Handwriting Recognition via Image to Sequence Extraction

TextStyleBrush: Transfer of text aesthetics from a single example

Voilà

Markpainting: Adversarial Machine Learning meets Inpainting

BARF: Bundle-Adjusting Neural Radiance Fields

Learning Transferable Visual Models From Natural Language Supervision

Towards General Purpose Vision Systems

Projected Distribution Loss for Image Enhancement