In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.
「為配合國家發展委員會「推動ODF-CNS15251為政府為文件標準格式實施計畫」,以及 提供使用者有文書軟體選擇的權利,本館檔案下載部分文件將公布ODF開放文件格式, 免費開源軟體可至LibreOffice 下載安裝使用,或依貴慣用的軟體開啟文件。」