臺灣國際科展

Cross-lingual Information Retrieval

科展類別
臺灣國際科展作品
屆次
2022年
科別
電腦科學與資訊工程
學校名稱
Raffles Girls' School
指導老師
Shaun de Souza
作者
Alysa Lee Mynn
關鍵字
Cross Lingual Information Retrieval CLIR)、Cross-lingual word embeddings、 random shuffling

摘要或動機

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.


「為配合國家發展委員會「推動ODF-CNS15251為政府為文件標準格式實施計畫」,以及 提供使用者有文書軟體選擇的權利,本館檔案下載部分文件將公布ODF開放文件格式, 免費開源軟體可至LibreOffice 下載安裝使用,或依貴慣用的軟體開啟文件。」

檔案名稱 檔案大小 格式
190039.pdf 538 KB Adobe Reader(Pdf)檔案