全國中小學科展

新加坡

Cross-lingual Information Retrieval

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.

Adversarial Attacks Against Detecting Bot Generated Text

With the introduction of the transformer architecture by Vaswani et al. (2017), contemporary Text Generation Models (TGMs) have shown incredible capabilities in generating neural text that, for humans, is nearly indistinguishable from human text (Radford et al., 2019; Zellers et al., 2019; Keskar et al., 2019). Although TGMs have many potential positive uses in writing, entertainment and software development (Solaiman et al., 2019), there is also a significant threat of these models being misused by malicious actors to generate fake news (Uchendu et al., 2020; Zellers et al., 2019), fake product reviews (Adelani et al., 2020), or extremist content (McGuffie & Newhouse, 2020). TGMs like GPT-2 generate text based on a given prompt, which limits the degree of control over the topic and sentiment of the neural text (Radford et al., 2019). However, other TGMs like GROVER and CTRL allow for greater control of the content and style of generated text, which increases its potential for misuse by malicious actors (Zellers et al., 2019; Keskar et al., 2019). Additionally, many state-of-the-art pre-trained TGMs are available freely online and can be deployed by low-skilled individuals with minimal resources (Solaiman et al., 2019). There is therefore an immediate and substantial need to develop methods that can detect misuse of TGMs on vulnerable platforms like social media or e-commerce websites. Several methods have been explored in detecting neural text. Gehrmann et al. (2019) developed the GLTR tool which highlights distributional differences in GPT-2 generated text and human text, and assists humans in identifying a piece of neural text. The other approach is to formulate the problem as a classification task to distinguish between neural text and human text and train a classifier model (henceforth a ‘detector’). Simple linear classifiers on TF-IDF vectors or topology of attention maps have also achieved moderate performance (Solaiman et al., 2019; Kushnareva et al., 2021). Zellers et al. (2019) propose a detector of GROVER generated text based on a linear classifier on top of the GROVER model and argue that the best TGMs are also the best detectors. However, later results by Uchendu et al. (2020) and Solaiman et al. (2019) show that this claim does not hold true for all TGMs. Consistent through most research thus far is that fine-tuning the BERT or RoBERTa language model for the detection task achieves state-of-the-art performance (Radford et al., 2019; Uchendu et al., 2020; Adelani et al., 2020; Fagni et al., 2021). I will therefore be focussing on attacks against a fine-tuned RoBERTa model. Although extensive research has been conducted on detecting generated text, there is a significant lack of research in adversarial attacks against such detectors (Jawahar et al., 2020). However, the present research that does exist preliminarily suggests that neural text detectors are not robust, meaning that the output can change drastically even for small changes in the text input and thus that these detectors are vulnerable to adversarial attacks (Wolff, 2020). In this paper, I extend on Wolff’s (2020) work on adversarial attacks on neural text detectors by proposing a series of attacks designed to counter detectors as well as an algorithm to optimally select for these attacks without compromising on the fluency of generated text. I do this with reference to a fine-tuned RoBERTa detector and on two datasets: (1) the GPT-2 WebText dataset (Radford et al., 2019) and (2) the Tweepfake dataset (Fagni et al., 2021). Additionally, I experiment with possible defences against these attacks, including (1) using count-based features, (2) stylometric features and (3) adversarial training.