Popis: |
Processes are an integral part of nearly all organizations, driving their daily operations and support activities. Increasingly, these business processes are supported by some information system, e.g. Workflow Management Systems (WfMSs), or Enterprise Resource Planning (ERP) systems. Once a process is supported by an information system, it becomes possible to observe and record its execution, in the form of event logs. The field of process mining is concerned with the analysis of event log data from a process perspective. Process mining techniques can aim at the discovery of process models, e.g. there are process mining techniques that can extract a Petri net of the control flow of a process, or a social network describing the handover of work among people involved in this process. Process mining also includes analyzing conformance, i.e. how well a previously available process model describes the actual observations in the event log. Finally, process mining can be used for extension, i.e. to augment previously available models with additional information, as extracted from event logs. This thesis presents our research into the application of process mining in the context of flexible environments. Thus, in contrast to traditional approaches, which expect processes to be well-structured, limited in scope, and tightly controlled by an information system, our research aims to extend the applicability of process mining to a considerably wider range of processes. Many information systems for supporting processes nowadays allow their users to deviate significantly from prescribed process definitions, or to change these on the fly. Furthermore, there are many processes which are not strictly enforced, but merely observed by an information system. Finally, every sufficiently complex activity or piece of machinery implements a process, be it explicitly encoded or prescribed, or emerging implicitly from patterns of use or external constraints. The objective of our research is to develop process mining techniques that are suitable for the analysis also of these less-structured, flexible processes. In this context, the results of process mining have the potential to be far more useful and beneficial, since the actual behavior of flexible processes is often not well-understood in practice, but captured in event logs. Event logs are the general starting point for any process mining analysis. We present a comprehensive methodological framework for handling and analyzing event logs in the context of process mining. Our general process and event log taxonomy provides a reference to better align event logs from diverse processes to the generic, fixed expectations of process mining algorithms. We also provide a set of structural log metrics that can be used to obtain an abstract characterization of an event log. Further, this thesis presents fundamental guidelines for the elicitation of event logs from existing sources, and for their transformation, so that they fit our general taxonomy and meta-model. These guidelines are complemented by an architecture of a generic framework for log elicitation and transformation. To create suitable data for testing a process mining approach, we present a framework for artificial log synthesis, which enables the creation of event logs under controlled conditions. The storage and management of realistically-sized event log data is a non-trivial task, typically facing all kinds of performance problems due to logs that may contain terabytes of event data. Since the performance, and thus the applicability, of process mining algorithms is directly related to this problem, we have designed a framework for the efficient storage and management of event log data. Another methodological contribution of this thesis is our analysis of the problems typically faced when applying process mining to flexible environments. When event logs from flexible processes are analyzed by traditional process mining algorithms, they typically yield large, highly unstructured, and essentially useless "spaghetti" models. We have traced these undesirable results back to a number of implicitlyheld assumptions. Most traditional approaches either assume noise-free event logs, or regard noise only as the result of errors in the logging functionality. We extend this notion of noise with other, commonly-found artifact types. Another assumption of traditional approaches is that mining results should strive for precision. We can discriminate precision of behavior and precision of scope, which both result in large and overly complex process models. Further, we have identified an attitude of entitlement in traditional approaches, which manifests in their singularity (i.e., only one singular mining result), their immutability (i.e., static results which cannot be modified), and their non-interactivity (i.e., no ability to focus and explore the result). Finally, the purity of traditional approaches, i.e. their sole reliance on traditional process representations, fails to communicate mining results in an appropriately efficient manner. Based on our methodological analysis of real-life event logs, and the problems faced by process mining in flexible environments, we have developed a number of approaches. An event log that has been extracted from a source system, and has properly been transformed to the general taxonomy and meta-model, can be analyzed in a number of ways. We introduce a set of techniques for event log schema transformation, which aims to re-align the information found in event logs for specific analysis purposes. Event class projection is a straightforward technique to cluster subsequences of low-level events into higher-level entities, such that they better correspond to the perceived process. While event class projection relies on explicit mappings, another approach, trace segmentation, can discover coherent subsequences of lower-level events automatically. These subsequences can either be collapsed into higher-level events (activity discovery), or they can be regarded as traces of an implicitly-contained subprocess (trace discovery). We introduce both a local (i.e., bottom-up) and a global (i.e., top-down) approach for trace segmentation. Another technique is process type discovery, which can discover tacit process types from a set of traces, by clustering these into more homogeneous subsets. A notable strength of event log schema transformation approaches is that they can be applied independently of the analysis goal and method, and thus leverage existing and future process mining techniques. One major contribution of this thesis is the approach of adaptive process simplification, which is directly based on the problems identified with traditional algorithms. This approach explicitly abandons the goals of precision and entitlement, and also introduces a novel type of interactive result visualization, which departs from the purity identified in earlier approaches. Adaptive process simplification has been inspired by (road) maps, as simple and intuitive representations of large, complex topologies.We have identified a number of concepts and visual metaphors from maps, which can be adapted for the description of flexible processes. Another novelty of this approach is that the event log is analyzed from multiple perspectives, as opposed to only focusing on the sequence of event names. From this extensive information we derive the significance and correlation metrics, which more appropriately describe the observed behavior. For representing mining results, we introduce fuzzy models, whose relaxed executional semantics allow us to describe complex behavior in a compact fashion. A set of visual metaphors derived from maps is used to increase the density of information in fuzzy models. Our algorithm for adaptive graph visualization can be used to derive a fuzzy model from the significance and correlation metrics, on an arbitrary level of abstraction. Therefore, the user can generate a map of the observed process, which can be as complex or as compact as desired. To evaluate the usefulness of fuzzy models, we introduce two quality and authority metrics, namely detail and conformance. These metrics provide the analyst with a quick and reliable feedback, indicating how representative the current model is with respect to the actual, observed behavior. In order to be able to leverage the results of adaptive process simplification also with other analysis methods, we show how fuzzy models can be converted into other modeling formalisms. Also we show how these models can be projected onto a log, simplifying its events onto the current level of abstraction. We argue that a large part of the problems experienced by process mining in flexible environments is not only due to algorithms, or the intelligence of the analysis, but that results are not communicated efficiently. The presentation of complex knowledge, most importantly its visualization, has a large impact on the understandability and clarity of analysis tools. We introduce two approaches for process mining analysis, relying on new ways of information visualization. For the efficient exploration and characterization of event log data, we have applied and adapted the dotplot visualization, as known from bioinformatics. In a second approach, we introduce fuzzy model animation, which projects the behavior of a process over time onto a static process model, thereby making it more intuitive to understand and analyze. The work presented in this thesis is supported and accompanied by concrete implementations, which have been integrated in the ProM and ProMimport frame works. These implementations have been crucial in enabling a number of real-life case studies with major corporations, of which four are discussed in this thesis. The results presented in this thesis have been presented in more than ten peer-reviewed scientific publications. Furthermore, the process mining techniques developed in the context of this thesis have been adopted by, and are actively used in, a number of large commercial enterprises. |