臺灣網路科教館

Enhancement of Online Stochastic Gradient Descent using Backward Queried Images

Stochastic gradient descent (SGD) is one of the preferred online optimization algorithms. However, one of its major drawbacks is its predisposition to forgetting previous data when optimizing through a data stream, also known as catastrophic interference. In this project, we attempt to mitigate this drawback by proposing a new low-cost approach which incorporates backward queried images with SGD during online training. Under this new approach, we propose that for every new training sample through the data stream, the neural network is optimized using the corresponding backward queried image from the initial dataset. After compiling the accuracy of the proposed method and SGD under a data-stream of 50,000 training cases with 10,000 test cases and comparing our algorithm to SGD, we see substantial improvements in the performance of the neural network with two different MNIST datasets (Fashion and Kuzushiji), classifying the MNIST datasets at a high accuracy for the mean, minimum, lower quartile, median, and upper quartile, while maintaining lower standard deviation in performance, demonstrating that our proposed algorithm can be a potential alternative to online SGD.

語音情緒辨識之研究

情緒辨識是增進人際溝通的重要能力。如生命線、電話客服等應用情境缺乏表情、肢體語言等輔助時，單以語音進行情緒辨識有極高的實用價值。本研究探討比較支持向量機（SVM）及卷積神經網路（CNN）兩種機器學習方法於訓練「AI語音情緒辨識」分類器模型的表現。我們採用SAVEE和RAVDESS兩個英文語音資料庫，並自行製作與標註「逼逼中文情緒語料庫」。研究結果顯示SVM對SAVEE資料庫單一情緒的辨識正確率達84~94%，個別錄音員正確率達75%，超越官網紀錄的73.7%。同時，實驗顯示深度學習的模型在訓練資料不足的狀況下，反而相對遜色。

Deep learning on Covid-19 prediction and X-ray severity grading system

利用深度學習解決醫學問題一直是受矚目的研究主題。鑒於近期新冠肺炎疫情上升，有關新冠肺炎檢測的研究便成了熱門研究主題。目前，最有效的檢測方法是聚合酶連鎖反應 (PCR)，然而，PCR耗時甚久且有人為誤差。因此，以X光影像圖透過深度學習來診斷並分級是一個有效率且安全的做法。在研究中，我們利用深度學習進行疾病診斷，在五元分類上有相當高的準確率(84.91%)、在COVID-19單獨辨識時得到了極高的準確率(99.35%)、產生出疾病熱區及設計了新的分級系統( X-ray Severity Grading System , XSGS)，並將其用於嚴重程度分類，在不同分級下具有可辨別的差異。

Limited Query Black-box Adversarial Attacks in the Real World

We study the creation of physical adversarial examples, which are robust to real-world transformations, using a limited number of queries to the target black-box neural networks. We observe that robust models tend to be especially susceptible to foreground manipulations, which motivates our novel Foreground attack. We demonstrate that gradient priors are a useful signal for black-box attacks and therefore introduce an improved version of the popular SimBA. We also propose an algorithm for transferable attacks that selects the most similar surrogates to the target model. Our black-box attacks outperform state-of-the-art approaches they are based on and support our belief that the concept of model similarity could be leveraged to build strong attacks in a limited-information setting.

A Person Re-identification based Misidentification-proof Person Following Service Robot

Two years ago, I attended a robot contest, in which one of the missions required the robot to follow the pedestrian to complete the task. At that time, I used their demo program to complete the task. Not long after, I found two main issues: 1. The program follows the closest point read by the depth camera, which if I walk close to a wall next to, the robot may likely ‘follow’ the wall. 2. Not to mention if another pedestrian crosses between the robot and the target. Regarding these two issues, I decided to improve it. We’ve designed a procedure of using YOLO Object Detection and Person re-identification to re-identify the target for continuous following.

A Person Re-identification based Misidentification-proof Person Following Service Robot

Two years ago, I attended a robot contest, in which one of the missions required the robot to follow the pedestrian to complete the task. At that time, I used their demo program to complete the task. Not long after, I found two main issues: 1. The program follows the closest point read by the depth camera, which if I walk close to a wall next to, the robot may likely ‘follow’ the wall. 2. Not to mention if another pedestrian crosses between the robot and the target. Regarding these two issues, I decided to improve it. We’ve designed a procedure of using YOLO Object Detection and Person re-identification to re-identify the target for continuous following.

Limited Query Black-box Adversarial Attacks in the Real World

We study the creation of physical adversarial examples, which are robust to real-world transformations, using a limited number of queries to the target black-box neural networks. We observe that robust models tend to be especially susceptible to foreground manipulations, which motivates our novel Foreground attack. We demonstrate that gradient priors are a useful signal for black-box attacks and therefore introduce an improved version of the popular SimBA. We also propose an algorithm for transferable attacks that selects the most similar surrogates to the target model. Our black-box attacks outperform state-of-the-art approaches they are based on and support our belief that the concept of model similarity could be leveraged to build strong attacks in a limited-information setting.

Method of prosthetic vision

This work is devoted to solving the problem of orientation in the space of visually impaired people. Working on the project, a new way of transmitting visual information through an acoustic channel was invented. In addition, was developed the device, which uses distance sensors to analyze the situation around a user. Thanks to the invented algorithm of transformation of the information about the position of the obstacle into the sound of a certain tone and intensity, this device allows the user to transmit subject-spatial information in real time. Currently, the device should use a facette locator made of 36 ultrasonic locators grouped in 12 sectors by the azimuth and 3 spatial cones by the angle. Data obtained in such a way is converted into its own note according to the following pattern : the angle of the place corresponds to octave, the azimuth corresponds to the note and the distance corresponds to the volume. The choice of the notes is not unambiguous. However, we used them for the reason that over the centuries, notes have had a felicitous way of layout on the frequency range and on the logarithmic scale. Therefore, the appearance of a new note in the total signal will not be muffled by a combination of other notes. Consequently, a blind person, moving around the room with the help of the tone and volume of the sound signals, will be able to assess the presence and location of all dangerous obstacles. After theoretical substantiation of the hypothesis and analysis of the available information, we started the production of prototypes of the devices that would implement the idea of transmitting information via the acoustic channel.

基於觸控軌跡及裝置加速度資料提升年長者之觸控準確率

本研究使用機器學習方法，改善年長者使用手機時觸控系統對於點按位置判斷之能力。首先設計實驗比較年長者使用手機時，點按位置及手勢判斷的準確率，接著收集年長使用者的觸控軌跡及裝置相關資料，並訓練模型以減少系統判斷的錯誤率和誤差幅度。再比較及分析不同機器學習模型對於本研究之資料的適用程度及經校準後點按位置準確率的提升，進而挑選出一個能夠最有效提升點按位置準確率的模型進行點按位置的預測。實驗過後選擇最有效提升準確率的Random Forest Regressor進行其他的校正實驗及分析。使用者點按位置的預測準確率能被有效提升，準確率能提高32.3%。而最終，將訓練後的模型套回實驗用的手機程式，系統判斷受測者的點按位置能從原本的63.7%提升至97.5%。

應用網路爬蟲於社交軟體實現群眾互動平臺之研究

現今的大型活動，如：校慶活動、新北耶誕城等，缺乏互動性與參與感，其中原因大多是觀眾時常埋沒於手機中的社交軟體當中所導致。而我們的研究將利用此特性，探討大眾對於活動的觀點，搭配網路爬蟲抓取使用者的貼文，觀眾只需在Instagram、Twitter等社交軟體中發布文章，系統就會即時推播至活動中的大螢幕上，並且結合圖像辨識快速審核貼文，設計出一套能改善互動性低落的解決方案。研究中我們探討不同的網路爬蟲演算法、圖像辨識技術，及問卷調查等來使作品更加精進，且搭配Line Bot、後臺管理，及常駐貼文等功能來為各類大型活動量身打造，也能夠運用於政令宣導或文宣廣告等用途上，大幅提升活動的互動性與精采程度。

電腦科學與資訊工程