Paper

BilBOWA: Fast Bilingual Distributed Representations without Word\n Alignments

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple\nand computationally-efficient model for learning bilingual distributed\nrepresentations of words which can scale to large monolingual datasets and does\nnot require word-aligned parallel training data. Instead it trains directly on\nmonolingual data and extracts a bilingual signal from a smaller set of raw-text\nsentence-aligned data. This is achieved using a novel sampled bag-of-words\ncross-lingual objective, which is used to regularize two noise-contrastive\nlanguage models for efficient cross-lingual feature learning. We show that\nbilingual embeddings learned using the proposed model outperform\nstate-of-the-art methods on a cross-lingual document classification task as\nwell as a lexical translation task on WMT11 data.\n

arXiv (Cornell University)Published 2014-10-09Paper linkPDF

Authors: Gouws, Stephan · Bengio, Yoshua · Corrado, Greg

Topics

Relevant entities

People

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.