::Accepted Papers :: 13th International Conference on Computer Science and Information Technology (CCSIT 2023)

Accepted Papers

Challenges Performing IoT Forensic Investigations and Frameworks Addressing These Challenges: a Systematic Literature Review

Boitumelo Nkwe and Michael Kyobe, Department of Information Systems, University of Cape Town, Cape Town, South Africa

ABSTRACT

The increasing adoption of the Internet of Things (IoT) has introduced unique challenges to both users and the cybersecurity domain. As IoT evolves, cybersecurity threats and vulnerabilities meted against IoT devices also increased. IoT devices are susceptible to breaches, therefore forensic investigations focusing on IoT technologies need to be improved. This study aims to provide an understanding of the challenges in IoT forensics investigation since 2017. Furthermore, the article looks at different solutions in the form of frameworks and methodologies that have been developed to address these challenges and the gaps in the existing literature. The researchers adopted a systematic review methodology to guide the synthesis of the literature. The key issues highlighted in this study include the heterogeneous nature of IoT, the lack of proper investigative tools and frameworks that encompass all levels of IoT forensics, the lack of privacy, and the lack of standardization in the investigation process.

KEYWORDS

Internet of Things (IoT), IoT forensics, Cybersecurity, Challenges

Automatic Discovery of Multiword Nouns Based on Syntactic-semantic Representations

Xiaoqin HU, Beijing Language and Culture University, China

ABSTRACT

This research aims to explore a deeper representation of the internal structure and semantic relationship of multiword nouns (MWNs) for improving MWN discovery. This representation focuses on MWN formations, which follow a series of categorical and semantic constraints. The internal semantic relations of MWNs are represented by semantic class combinations of constituents, and the internal structures are represented by a set of categorical combinations in a hierarchy. These linguistically motivated semantic features are combined with statistically motivated semantic features, and the results present an improvement for MWN discovery.

KEYWORDS

Multiword nouns, automatic discovery, internal structure, internal semantic relation, semantic class combination, linguistic knowledge

A Deep Learning System for Domain-specific Speech Recognition

Yanan Jia, Businessolver, USA

ABSTRACT

As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The domain specific data are collected using proposed semi-supervised learning annotation with little human intervention. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM, which surpasses the Google and AWS ASR systems on benefit-specific speech. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated. Results of a benefit-specific natural language understanding (NLU) task show that the domain-specific fine-tuned ASR system can outperform the commercial ASR systems even when its transcriptions have higher word error rate (WER), and the results between fine-tuned ASR and human transcriptions are similar.

KEYWORDS

Automatic Speech Recognition, DeepSpeech2, Wav2Vec2, Semi-supervised learning annotation, Spoken language understanding

Acoustic Characteristics and Related Influencing Factors on the Acquisition of Retroflex Vowels in Putonghua by Learners of Different Native Language

HongLi Deng¹ XinZhong Liu² XianMing Bei³ ¹School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China College of Culture and Communication, Guangxi Science and Technology NormalUniversity Laibin, Guangxi, China ²School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China ³School of Chinese Language and Culture, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China

ABSTRACT

Based on the theory of second language acquisition, The article analyzes the pronunciation of the “er” which means “two” in Chinese by learners from different native language backgroud, and explores the key acoustic characteristics and related influencing factors on the acquisition of retroflex vowels in S&P.; The study has 4 findings: (1) F2 of retroflex vowels rises in S&P; , and F3 falls, and the difference between F3 endpoint and F2 endpoint is small, which means that F3 and F2 are closer to each other. The key characteristics to the learner s “two” pronunciation is the slope of F3 and the value of F3, the greater the slope of F3 falls, the smaller the value, and the closer the learner’s “two” pronunciation is to the S&P.; (2) Retroflex vowels are highly marked phonemes, which makes it difficult to acquire them. (3) Factors such as the acquisition environment, the length of second language acquisition time,influence the acquisition of retroflex vowel. (4)The early learning environment promotes the acquisition of retroflex vowel in Putonghua.

KEYWORDS

retroflex vowels; slope of F3; acoustic characteristics; influencing factors ;acquisition theory

Gpt-3 Models Are Few-shot Financial Reasoners

Raul Salles de Padua Imran Qureshi and Mustafa U. Karakaplan, Stanford University, University of Texas at Austin, University of South Carolina

ABSTRACT

Financial analysis is an important tool for evaluating company performance. Practitioners work to answer financial questions to make profitable investment decisions, and use advanced quantitative analyses to do so. As a result, Financial Question Answering (QA) is a question answering task that requires deep reasoning about numbers. Furthermore, it is unknown how well pre-trained language models can reason in the financial domain. The current state-of-the-art requires a retriever to collect relevant facts about the financial question from the text and a generator to produce a valid financial program and a final answer. However, recently large language models like GPT-3 [3] have achieved state-of-the-art performance on wide variety of tasks with just a few shot examples. We run several experiments with GPT-3 and find that a separate retrieval model and logic engine continue to be essential components to achieving SOTA performance in this task, particularly due to the precise nature of financial questions and the complex information stored in financial documents. With this understanding, our refined prompt engineering approach on GPT-3 achieves near SOTA accuracy without any fine-tuning.

KEYWORDS

Question Answering, GPT-3, Financial Question Answering, Large Language Models, Information Retrieval, BERT, RoBERTa, F

Deduplicating Highly Similar News in Large News Corpora

Wu Zhang Miotech, 69 Jervois St, Sheung Wan, Hong Kong

ABSTRACT

Duplicated training data usually downgrades machine learning models’ performance. This paper presents a practical algorithm for efficiently deduplicating highly similar news articles in large datasets. Our algorithm comprises three components - document embedding, similarity computation, and clustering- each utilizing specific algorithms and tools to optimize both speed and performance. We demonstrate the efficacy of our approach by accurately deduplicating over 7 million news articles in less than 4 hours.

KEYWORDS

News deduplication, natural language processing

Synthetic Source Low-resource Indonesian Augmentation for Colloquial Neural Machine Translation

Asrul Sani Ariesandy¹, Mukhlis Amien², Alham Fikri Aji³, Radityo Eko Prasojo⁴,¹Sekolah Tinggi Informatika & Komputer Indonesia (STIKI), Malang, Indonesia, ²Kata.ai Research Team, Jakarta, Indonesia, ³Beijing Institute of Technology, China, ⁴Faculty of Computer Science, Universitas Indonesia

ABSTRACT

Neural Machine Translation (NMT) works better in Indonesian when it takes into account local dialects, geographical context, and regional culture (colloquialism). NMT is typically domaindependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models, despite the fact that Indonesians frequently employ colloquial language. In this work, we develop a colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.

KEYWORDS

Neural Machine Translation, NMT, Natural Language Processing, NLP, Low-Resource Language, Indonesian, Artificial Intelligence

CCSIT

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

13^th International Conference on Computer Science and Information Technology (CCSIT 2023)

July 22 ~ 23, 2023, Toronto, Canada

Accepted Papers

Challenges Performing IoT Forensic Investigations and Frameworks Addressing These Challenges: a Systematic Literature Review

ABSTRACT

KEYWORDS

Automatic Discovery of Multiword Nouns Based on Syntactic-semantic Representations

ABSTRACT

KEYWORDS

A Deep Learning System for Domain-specific Speech Recognition

ABSTRACT

KEYWORDS

Acoustic Characteristics and Related Influencing Factors on the Acquisition of Retroflex Vowels in Putonghua by Learners of Different Native Language

ABSTRACT

KEYWORDS

Gpt-3 Models Are Few-shot Financial Reasoners

ABSTRACT

KEYWORDS

Deduplicating Highly Similar News in Large News Corpora

ABSTRACT

KEYWORDS

Synthetic Source Low-resource Indonesian Augmentation for Colloquial Neural Machine Translation

ABSTRACT

KEYWORDS

Reach Us