Rick's Cafe AI 10:44 am on January 21, 2021
Tags: Adversarial ( 67 )

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.VILLA consists of two training stages: (i) task-agnostic adversarial pretraining; followed by (ii) task-specific adversarial fine tuning. Instead of adding adversarial perturbations on image pixels and textual tokens, we propose to perform adversarial training in the embedding space of each modality. To enable large-scale training, we adopt the “free” adversarial training strategy, and combine it with KL-divergence-based regularization to promote higher invariance in the embedding space. We apply VILLA to current best-performing V+L models, and achieve new state of the art on a wide range of tasks, including Visual Question Answering, Visual Commonsense Reasoning,Image-Text Retrieval, Referring Expression Comprehension, Visual Entailment,and NLVR. Read More

#adversarial

Recent Activity

s: search
c: compose new post
r: reply
e: edit
t: go to top
j: go to the next post or comment
k: go to the previous post or comment
o: toggle comment visibility
esc: cancel edit post or comment

Design a site like this with WordPress.com

Get started

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31