全國中小學科展

新加坡

FAT10 Haplotypes as a Potential Biomarker for Cancer

Cancer is the second leading cause of death today[1], accounting for nearly 1 in 6 deaths worldwide. Despite this, diagnosis and treatment models for cancer are limited and as such, new methods to identify and treat susceptible patients are required urgently. HLAF- adjacent transcript 10 (FAT10) is an oncogene that is strongly implicated in the development of inflammation-associated cancers[2]. Previous research on this highly polymorphic gene has identified 2 haplotypes – the reference haplotype, which is found in both cancer patients and healthy individuals, as well as an additional haplotype that is occurs at higher frequency in cancer patients and is associated with higher odds of cancer. In this study, it was hypothesised that the cancer-associated FAT10 haplotype can better promote tumorigenicity and could thereby serve as a useful biomarker for cancer. Here, we functionally characterize the 2 FAT10 haplotypes to understand how they influence some of the hallmarks of cancer. The cancer-exclusive haplotype was observed to enhance hallmarks of cancer, namely uncontrolled cell growth, resisting cell death and anchorage-independent growth as compared to the reference haplotype. Moreover, we uncovered the differential gene expression patterns induced by each haplotype. Molecules involved in cell adhesion and proliferation, as well as transcription were upregulated by the cancer-associated haplotype and hence could have contributed to the increased tumourigenic potential of the cancer haplotype.

Synthesis of Biodegradable Plastic From Food Waste

Based on NEA Waste Statistics and Overall Recycling Rate for 2017, 809,800 tonnes of food waste and 815,200 tonnes of plastic waste was generated. Both food waste and plastic waste account for more than 10% of the total waste generated in Singapore in 2017 respectively. However only 16% of the food waste and 6% of plastic waste was recycled, the rest of it was disposed at the incineration plants and then the landfill. Such action will eventually lead to 2 major environmental issues that Singapore will face in near future: 1)Semakau landfill is our only landfill left and it is expected to run out of space in near future 2)The burning of food waste results in the release of methane (CH4), a greenhouse gas that has over 25 times the impact in trapping excess heat in the atmosphere as compared to Carbon Dioxide (CO2). This will increase carbon footprint and contribute to greenhouse effect and global warming in due course. According to the Sustainable Singapore Blueprint 2015, Singapore is working towards becoming a Zero Waste Nation by reducing our consumption, reusing and recycling all materials. A national recycling rate target of 70% has been set for 2030 with an aim to increase domestic recycling rate from 20% in 2013 to 30% by 2030 and non-domestic recycling rate from 77% in 2013 to 81% by 2030. As part of our total commitment towards waste management and sustainability effort, the purpose of doing this research project is to investigate whether food waste can be recycled and made into biodegradable plastics. First of all, chitosan will be derived from shrimp shells and be dissolved in acetic acid and lactic acid produced by probiotic fermentation of fruit and/ or vegetable waste for synthesis of biodegradable plastics.

Hydrogen Functionalization of Graphene using RF Plasma for photodetection

The growth of the internet is propelling an ever-increasing need for faster communication. Modern telecommunication data is mainly carried through fibre-optic cables, with pulses of light representing bits of data; the main factor limiting data transfer speed is the rate at which the optical receiver at the opposite end of the cable can detect light pulses. Graphene-silicon Schottky photodiodes are a promising alternative to traditionally-used germanium photodiodes, promising higher detection frequency and better contrast between light and dark. To make it less susceptible to erroneous measurements due to graphene having a low band gap, hydrogen functionalisation was used to increase the barrier potential of the Schottky diode so that a higher voltage would be required to allow current to pass through in forward voltage bias and trigger the sensor. This study seeks to determine the optimal conditions — of physical proximity, duration of exposure, and plasma power — for hydrogen functionalisation using radio frequency plasma. Graphene was synthesised using low pressure chemical vapour deposition, then transferred onto P-type silicon to create a photodiode. The graphene-silicon photodiode was then doped with hydrogen plasma to introduce defects in the graphene layer to increase the barrier potential of the photodiode. To assess the effectiveness of hydrogen functionalisation, photocurrent measurements were conducted while light was shone onto the photodiode in pulses of increasing frequency to find the magnitude and spontaneity of the response. Light was shone in pulses of 100ms, and was successfully detected by the photodiode. The pulse spacings were gradually decreased and it was found that the diode was able to detect pulse spacings as low as 1µs, significantly better than germanium photodetectors. The sample demonstrated clear optoelectronic response and was sensitive to changes in frequency. Results show that the intensity of the optoelectronic response in graphene-silicon diodes is inversely related to its physical proximity to the plasma source during hydrogen functionalization; and directly related to the power of the plasma and to the duration of exposure up to a point, after which it will deteriorate. Thus, it can be concluded that graphene-silicon Schottky diodes offer much promise in electronic communication.

HoneySurfer: Intelligent Web-Surfing Honeypots

In Singapore’s evolving cyber landscape, 96% of organisations have suffered at least one cyber attack and 95% of organisations have been reporting more sophisticated attacks in the frame of one year according to a 2019 report[1] by Carbon Black. As such, more tools must be utilised to counter increasingly refined attacks performed by malicious actors. Honeypots are effective tools for studying and mitigating these attacks. They work as decoy systems, typically deployed alongside real systems to capture and log the activities of the attacker. These systems are useful as they can actively detect potential attacks, help cybersecurity specialists study an attacker’s tactics and even misdirect attackers from their intended targets. Honeypots can be classified into two main categories: 1. Low-interaction honeypots merely emulate network services and internet protocols, allowing for limited interaction with the attacker. 2. High-interaction honeypots emulate operating systems, allowing for much more interaction with the attacker. Although honeypots are powerful tools, its value diminishes when its true identity is uncovered by attackers. This is especially so with attackers becoming more skilled through system fingerprinting or analysing network traffic from targets and hence, hindering honeypots from capturing more experienced attackers. While substantial research has been done to defend against system fingerprinting scans (see 1.1 Related Work), not much has been done to defend against network traffic analysis. As pointed out by Symantec[2][3], when attackers attempt to sniff network traffic of the system in question, the lack of network traffic raises a red flag, increasing the likelihood of the honeypot’s true identity being discovered. In addition, the main concern with regards to honeypot deployment being their ability to attract and engage attackers for a substantial period of time, an increased ability to interest malicious actors is invaluable. Producing human-like network activity on a honeypot would appeal to more malicious actors. Hence, this research aims to build an intelligent web-surfer which can learn and thus simulate human web-surfing behaviour, creating evidence of human network activities to disguise the identity of honeypots as production systems and luring in more attackers interested in packet sniffing for malicious purposes.

HOST TARGET PROTEINS OF SPIKE PROTEIN OF SARS-COV-2

Coronavirus Disease 2019 (COVID-19) is a newly emerged infectious disease caused by the new severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV-2). In less than one year, the virus has spread around the entire world, killing millions of people and disrupting travel and business worldwide. During infection, the virus uses its Spike protein to dock onto the Ace2 protein on the surface of its human host cell. Spike is 1273 amino acids long and only a short fragment of Spike (319-541) is sufficient to bind Ace2. We hypothesized that the remaining protein sequences of Spike might have functions for viral replication beyond the binding of Ace2. We have performed Split-Ubiquitin protein-protein interaction screens to isolate human proteins by their ability to bind to Spike, and we have identified Annexin2A2 and Cytochrome b as novel human protein interaction partners of Spike. Annexin2A2 is involved in both endocytosis and exocytosis, and the protein interaction with Spike might help the virus to enter and exit its host cell. The presence of the mitochondrial Cytochrome b protein inside the cytosol promotes apoptosis, and the protein interaction with Spike could speed up sapoptosis of the infected human cell. The Nub cDNA libraries that we have generated also allowed us to screen for synthetic peptides that interact with Spike. We have isolated two synthetic peptides, FL1a and FL7a, derived from the non-coding parts of human mRNAs by their ability to interact with Spike. We found that both FL1a and FL7a interact with the C-terminal half of the Spike protein. We also found that FL7a is able to block the Spike-Spike self-interaction at the C-terminal half of the Spike protein and we think that this could block the reassembly of the Spike protein in the host cell during viral reassembly. We hope that those synthetic peptides could be used as drugs due to their ability to block protein-protein interactions of Spike with human host proteins that are essential for viral replication.

HOST TARGET PROTEINS OF SPIKE PROTEIN OF SARS-COV-2

Coronavirus Disease 2019 (COVID-19) is a newly emerged infectious disease caused by the new severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV-2). In less than one year, the virus has spread around the entire world, killing millions of people and disrupting travel and business worldwide. During infection, the virus uses its Spike protein to dock onto the Ace2 protein on the surface of its human host cell. Spike is 1273 amino acids long and only a short fragment of Spike (319-541) is sufficient to bind Ace2. We hypothesized that the remaining protein sequences of Spike might have functions for viral replication beyond the binding of Ace2. We have performed Split-Ubiquitin protein-protein interaction screens to isolate human proteins by their ability to bind to Spike, and we have identified Annexin2A2 and Cytochrome b as novel human protein interaction partners of Spike. Annexin2A2 is involved in both endocytosis and exocytosis, and the protein interaction with Spike might help the virus to enter and exit its host cell. The presence of the mitochondrial Cytochrome b protein inside the cytosol promotes apoptosis, and the protein interaction with Spike could speed up sapoptosis of the infected human cell. The Nub cDNA libraries that we have generated also allowed us to screen for synthetic peptides that interact with Spike. We have isolated two synthetic peptides, FL1a and FL7a, derived from the non-coding parts of human mRNAs by their ability to interact with Spike. We found that both FL1a and FL7a interact with the C-terminal half of the Spike protein. We also found that FL7a is able to block the Spike-Spike self-interaction at the C-terminal half of the Spike protein and we think that this could block the reassembly of the Spike protein in the host cell during viral reassembly. We hope that those synthetic peptides could be used as drugs due to their ability to block protein-protein interactions of Spike with human host proteins that are essential for viral replication.

Cross-lingual Information Retrieval

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.

Cross-lingual Information Retrieval

In this project, we evaluate the effectiveness of Random Shuffling in the Cross Lingual Information Retrieval (CLIR) process. We extended the monolingual Word2Vec model to a multilingual one via the random shuffling process. We then evaluate the cross-lingual word embeddings (CLE) in terms of retrieving parallel sentences, whereby the query sentence is in a source language and the parallel sentence is in some targeted language. Our experiments on three language pairs showed that models trained on a randomly shuffled dataset outperforms randomly initialized word embeddings substantially despite its simplicity. We also explored Smart Shuffling, a more sophisticated CLIR technique which makes use of word alignment and bilingual dictionaries to guide the shuffling process, making preliminary comparisons between the two. Due to the complexity of the implementation and unavailability of open source codes, we defer experimental comparisons to future work.

Crossing Number of Join Product of Some Graphs

A drawing of a graph G is a representation of G on a plane, with its vertices represented by distinct points, and its edges by arcs connecting the corresponding points. The crossing number of G is the minimum number of intersections between arcs across all possible drawings of G.

Adversarial Attacks Against Detecting Bot Generated Text

With the introduction of the transformer architecture by Vaswani et al. (2017), contemporary Text Generation Models (TGMs) have shown incredible capabilities in generating neural text that, for humans, is nearly indistinguishable from human text (Radford et al., 2019; Zellers et al., 2019; Keskar et al., 2019). Although TGMs have many potential positive uses in writing, entertainment and software development (Solaiman et al., 2019), there is also a significant threat of these models being misused by malicious actors to generate fake news (Uchendu et al., 2020; Zellers et al., 2019), fake product reviews (Adelani et al., 2020), or extremist content (McGuffie & Newhouse, 2020). TGMs like GPT-2 generate text based on a given prompt, which limits the degree of control over the topic and sentiment of the neural text (Radford et al., 2019). However, other TGMs like GROVER and CTRL allow for greater control of the content and style of generated text, which increases its potential for misuse by malicious actors (Zellers et al., 2019; Keskar et al., 2019). Additionally, many state-of-the-art pre-trained TGMs are available freely online and can be deployed by low-skilled individuals with minimal resources (Solaiman et al., 2019). There is therefore an immediate and substantial need to develop methods that can detect misuse of TGMs on vulnerable platforms like social media or e-commerce websites. Several methods have been explored in detecting neural text. Gehrmann et al. (2019) developed the GLTR tool which highlights distributional differences in GPT-2 generated text and human text, and assists humans in identifying a piece of neural text. The other approach is to formulate the problem as a classification task to distinguish between neural text and human text and train a classifier model (henceforth a ‘detector’). Simple linear classifiers on TF-IDF vectors or topology of attention maps have also achieved moderate performance (Solaiman et al., 2019; Kushnareva et al., 2021). Zellers et al. (2019) propose a detector of GROVER generated text based on a linear classifier on top of the GROVER model and argue that the best TGMs are also the best detectors. However, later results by Uchendu et al. (2020) and Solaiman et al. (2019) show that this claim does not hold true for all TGMs. Consistent through most research thus far is that fine-tuning the BERT or RoBERTa language model for the detection task achieves state-of-the-art performance (Radford et al., 2019; Uchendu et al., 2020; Adelani et al., 2020; Fagni et al., 2021). I will therefore be focussing on attacks against a fine-tuned RoBERTa model. Although extensive research has been conducted on detecting generated text, there is a significant lack of research in adversarial attacks against such detectors (Jawahar et al., 2020). However, the present research that does exist preliminarily suggests that neural text detectors are not robust, meaning that the output can change drastically even for small changes in the text input and thus that these detectors are vulnerable to adversarial attacks (Wolff, 2020). In this paper, I extend on Wolff’s (2020) work on adversarial attacks on neural text detectors by proposing a series of attacks designed to counter detectors as well as an algorithm to optimally select for these attacks without compromising on the fluency of generated text. I do this with reference to a fine-tuned RoBERTa detector and on two datasets: (1) the GPT-2 WebText dataset (Radford et al., 2019) and (2) the Tweepfake dataset (Fagni et al., 2021). Additionally, I experiment with possible defences against these attacks, including (1) using count-based features, (2) stylometric features and (3) adversarial training.