全國中小學科展

依全國中小學科展屆次查詢

依相關評語查詢

Cross-lingual Information Retrieval

科展類別

臺灣國際科展作品

屆次

2022年

科別

電腦科學與資訊工程

學校名稱

Raffles Girls' School

指導老師

Shaun de Souza

作者

Alysa Lee Mynn

關鍵字

Cross Lingual Information Retrieval CLIR)、Cross-lingual word embeddings、 random shuffling

摘要或動機

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.

190039.pdf

Adobe Reader(Pdf)檔案