全國中小學科展

新加坡

Cross-lingual Information Retrieval

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.

Adversarial Attacks Against Detecting Bot Generated Text

With the introduction of the transformer architecture by Vaswani et al. (2017), contemporary Text Generation Models (TGMs) have shown incredible capabilities in generating neural text that, for humans, is nearly indistinguishable from human text (Radford et al., 2019; Zellers et al., 2019; Keskar et al., 2019). Although TGMs have many potential positive uses in writing, entertainment and software development (Solaiman et al., 2019), there is also a significant threat of these models being misused by malicious actors to generate fake news (Uchendu et al., 2020; Zellers et al., 2019), fake product reviews (Adelani et al., 2020), or extremist content (McGuffie & Newhouse, 2020). TGMs like GPT-2 generate text based on a given prompt, which limits the degree of control over the topic and sentiment of the neural text (Radford et al., 2019). However, other TGMs like GROVER and CTRL allow for greater control of the content and style of generated text, which increases its potential for misuse by malicious actors (Zellers et al., 2019; Keskar et al., 2019). Additionally, many state-of-the-art pre-trained TGMs are available freely online and can be deployed by low-skilled individuals with minimal resources (Solaiman et al., 2019). There is therefore an immediate and substantial need to develop methods that can detect misuse of TGMs on vulnerable platforms like social media or e-commerce websites. Several methods have been explored in detecting neural text. Gehrmann et al. (2019) developed the GLTR tool which highlights distributional differences in GPT-2 generated text and human text, and assists humans in identifying a piece of neural text. The other approach is to formulate the problem as a classification task to distinguish between neural text and human text and train a classifier model (henceforth a ‘detector’). Simple linear classifiers on TF-IDF vectors or topology of attention maps have also achieved moderate performance (Solaiman et al., 2019; Kushnareva et al., 2021). Zellers et al. (2019) propose a detector of GROVER generated text based on a linear classifier on top of the GROVER model and argue that the best TGMs are also the best detectors. However, later results by Uchendu et al. (2020) and Solaiman et al. (2019) show that this claim does not hold true for all TGMs. Consistent through most research thus far is that fine-tuning the BERT or RoBERTa language model for the detection task achieves state-of-the-art performance (Radford et al., 2019; Uchendu et al., 2020; Adelani et al., 2020; Fagni et al., 2021). I will therefore be focussing on attacks against a fine-tuned RoBERTa model. Although extensive research has been conducted on detecting generated text, there is a significant lack of research in adversarial attacks against such detectors (Jawahar et al., 2020). However, the present research that does exist preliminarily suggests that neural text detectors are not robust, meaning that the output can change drastically even for small changes in the text input and thus that these detectors are vulnerable to adversarial attacks (Wolff, 2020). In this paper, I extend on Wolff’s (2020) work on adversarial attacks on neural text detectors by proposing a series of attacks designed to counter detectors as well as an algorithm to optimally select for these attacks without compromising on the fluency of generated text. I do this with reference to a fine-tuned RoBERTa detector and on two datasets: (1) the GPT-2 WebText dataset (Radford et al., 2019) and (2) the Tweepfake dataset (Fagni et al., 2021). Additionally, I experiment with possible defences against these attacks, including (1) using count-based features, (2) stylometric features and (3) adversarial training.

Efficient Modelling of Aeroacoustic Phenomena in Seebeck Sirens: A Simplified Approach for Real-World Applications

This paper presents a simplified but mostly accurate model for the acoustic mechanism of Seebeck sirens. We investigate the impact of key parameters, including the number and size of holes, as well as the angular speed of the disk, on the characteristics of the produced sound. The disk is fabricated using fused deposition modelling 3D printing, and we used a brushless motor, an air compressor, and a shotgun microphone to capture the generated sound. An order of magnitude analysis was conducted on the Navier-Stokes equation to formulate a simplified version. These simplifications allowed for a low computational intensity model relating volume flow rate to sound pressure level, which is used to predict the waveform of sound produced. Our findings reveal that the fundamental frequency of the sound can be precisely predicted by only the rotational frequency of the disk and the number of holes, a relationship validated experimentally. Notably, observed asymmetry in the waveform was attributed to skin drag effects, and this hypothesis was experimentally verified. Our model computes a solution in less than half a second on average: far less than the 21h 47min needed for a k−ω turbulent model to compute the same phenomenon. The research presents and verifies a simplified model of acoustic mechanics for the sound generated by rotating systems that require little computational resources, which can prove useful in situations where absolute precision is not required, in exchange for ease of computation. For more precise systems, this model serves as a foundation for quickly generating an initial design, paving the way for subsequent iterations using more comprehensive models. The developed model not only serves as a foundation for efficient preliminary designs but also contributes valuable insights into the intersection of fluid dynamics and sound production.

Modal frequencies in a nonlinear beam-magnet coupled oscillator system

In this paper, I investigated the motion of a nonlinear coupled oscillator system consisting of two leaf springs secured to a non-magnetic base with magnets attached to the upper ends such they repel and are free to move. My results showed that the system exhibits the beats phenomenon, and interestingly that the frequencies show a dependence on initial conditions. I hence hypothesized this sensitivity is due to two sources of nonlinearities: geometric nonlinearity during large deflections of the leaf springs and the nonlinearity in the magnetic force. To test this hypothesis, a nonlinear mathematical model was developed, accounting for nonlinear beam effects up to third order and fully solving the nonlinear magnetic force using a current cylinder model, accounting for the tilting of the magnets. An approximate linear model was also developed for comparison. The theoretical models were validated experimentally by investigating the dynamic motion of the springs through time, as well as how the modal frequencies in the system depend on the initial displacement, the length of the spring, and the distance between the springs. The more accurate nonlinear model I derived shows good agreement with experimental results while the linear theory does not, highlighting the importance of nonlinearities in this system. An improved understanding of these nonlinear systems could lead to advancements in design and efficiency, and safety in various applications such as energy harvesting.