Popis: |
A good description of a class should be (reasonably) accurate and interpretable. Previous works address this class-description problem by either analyzing the correlation of each attribute with the class, or by producing rules as in building a classifier. These solutions suffer from issues in accuracy and interpretability. A sentence is usually defined as a disjunction or conjunction of several terms, each of which specifies a constraint (range/set of values) on an attribute. From the data analysis point of view, a sentence specifies a subspace in the database. In this paper, we create a richer yet interpretable form of a sentence. Here, a sentence describes an object if any k attributes of that object satisfy the specified constraints, or in other words, the object is partially covered by the subspace. Since this simple enhancement subsumes rules used in previous solutions, descriptions based on such sentences are provably better. To that end, we design Pub, an algorithm that produces descriptions with our form of sentences. Theoretically, while constructing a sentence (within the description), Pub finds the optimal range/set of values for each attribute in linear time. Empirically, we show that Pub is efficient, and able to produce more accurate, concise and interpretable descriptions than current approaches on various real datasets. We also perform an illustrative case study on the Glass dataset, providing some useful insights. |