Incorporating Language Identification in Digital Forensics Investigation Framework

Autor: Ali Selamat, Nicholas Akosu
Rok vydání: 2014
Předmět:
Zdroj: Studies in Computational Intelligence ISBN: 9783319058849
Computational Intelligence in Digital Forensics
DOI: 10.1007/978-3-319-05885-6_4
Popis: In current business practices, majority of organizations rely heavily on digital devices such as computers, generic media, cell phones, network systems, and the internet to operate and improve their business. Thus, a large amount of information is produced, accumulated, and distributed via electronic means. Consequently, government and company interests in cyberspace and private networks become vulnerable to cyberspace threats. The investigation of crimes involving the use of digital devices is classified under digital forensics which involves adoption of practical frameworks and methods to recover data for analysis which can serve as evidence in court. However, cybercrime has advanced to the stage where criminals try to cover their tracks through the use of anti-forensic strategies such as data overwriting and data hiding. Research into anti-forensics has given rise to the concept of ‘live’ forensics which comprises proactive forensics approaches capable of digitally investigating an incident as it occurs. However, information exchange using ICT facilities has reduced the world into a global village without eliminating the linguistic diversity on the planet. Moreover, existing digital forensics frameworks have assumed the language of stored information. If such assumption turns out to be wrong, semantic interpretation of extracted text would also be wrong leading to wrong conclusions. We propose incorporation of language identification (LID) in digital forensics investigation (DFI) models in order to help law enforcement to be a step ahead of criminals. In this chapter, we outline issues of language identification in DFI frameworks and propose a new framework with language identification component. The LID component is to carry out digital surveillance by scrutinizing emails, SMS, and text file transfers, in and out of the system of interest. The collected text is then subjected to language identification. Determining the language of the text would help to decide if the communication is regular and safe or suspicious and should be subjected to further forensic analysis. Finally we discuss results from a simple language identification scheme that can be easily and quickly integrated to a DFI model yielding very high accuracy without compromising speed performance.
Databáze: OpenAIRE