Sharing Multilingual Word Embeddings between High and Low Resource Languages

Authors: Assist. Prof. Onur Guzey, Gihad Sohsah

Data: TR-EN-WordVectors

Find the code on Github

Abstract:

In this project a multilingual representation space is used to enable Turkish natural language processing tools to utilize foreign language resources. In applications such as document classification, it has been shown that this space can help resource sharing among languages that make up this space. Therefore, this project can increase the amount of data available for Turkish language tools. The project focuses on Turkish and other morphologically complex languages, and aims to improve the state-of-the-art results for these languages. At the end of the project, the multilingual embedding space and supporting software that can be used for applications such as document classification has been shared with other researchers.