Summarizing text is a task at which machine learning algorithms are improving, as evidenced by a recent paper published by Microsoft. That’s good news — automatic summarization systems promise to cut down on the amount of message-reading enterprise workers do, which one survey estimates amounts to 2.6 hours each day.
Not to be outdone, a Google Brain and Imperial College London team built a system — Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence, or Pegasus — that leverages Google’s Transformers architecture combined with pretraining objectives tailored for abstractive text generation. They say it achieves state-of-the-art results in 12 summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills, and that it shows “surprising” performance on low-resource summarization, surpassing previous top results on six data sets with only 1,000 examples. Read More
Daily Archives: December 30, 2019
Relative contributions of Shakespeare and Fletcher in Henry VIII: An Analysis Based on Most Frequent Words and Most Frequent Rhythmic Patterns
The versified play Henry VIII is nowadays widely recognized to be a collaborative work not written solely by William Shakespeare. We employ combined analysis of vocabulary and versification together with machine learning techniques to determine which authors also took part in the writing of the play and what were their relative contributions. Unlike most previous studies, we go beyond the attribution of particular scenes and use the rolling attribution approach to determine the probabilities of authorship of pieces of texts, without respecting the scene boundaries. Our results highly support the canonical division of the play between William Shakespeare and John Fletcher proposed by James Spedding, but also bring new evidence supporting the modifications proposed later by Thomas Merriam. Read More