Identifying causality and contributory factors of pipeline incidents by employing natural language processing and text mining techniques
Autor: | Mason Boyd, Guanyang Liu, Noor Quddus, Mengxi Yu, S. Zohra Halim |
---|---|
Rok vydání: | 2021 |
Předmět: |
021110 strategic
defence & security studies Environmental Engineering Dependency (UML) business.industry Computer science General Chemical Engineering 0211 other engineering and technologies 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Causality Pipeline (software) Resource (project management) Text mining Workflow Risk analysis (business) Environmental Chemistry Artificial intelligence Safety Risk Reliability and Quality business Cluster analysis computer Natural language processing 0105 earth and related environmental sciences |
Zdroj: | Process Safety and Environmental Protection. 152:37-46 |
ISSN: | 0957-5820 |
Popis: | The key to learning from the past incidents is to identify the underlying causes and contributory factors of the incidents. A large amount of text data on incident narratives has been accumulated over the years and can be a good learning source, if properly utilized. However, the vast amount and unstructured nature of the text data impedes generating insights on occurring patterns of incidents. This research sets upon applying natural language processing (NLP) and text mining techniques to utilize the resource for understanding contributing factors and causations behind the incidents with pipeline industry as an illustrative example. The 3587 records of incident narratives of the ‘comment’ section in the incident database of Pipeline and Hazardous Materials Safety Administration (PHMSA) are exploited. Two methods of text analytics, K-means clustering and co-occurrence network, are employed to infer latent causality of incidents. The results demonstrate that both methods are capable of identifying contributing factors under specific failure types. The co-occurrence network approach exhibits advantages on extracting dependency among the contributory factors, while K-means clustering is only able to indicate general correlations. The workflow proposed in this paper provides new perspectives of identifying contributing factors and their causal dependency from incident text data for promising applications in risk analysis and accident modeling. |
Databáze: | OpenAIRE |
Externí odkaz: |