Abstrakt: |
Selecting a few important representatives that could reveal the intrinsic structure of a data set with massive data samples, i.e., subset selection, is very useful for different applications in machine learning and information retrieval domains. In this paper, we propose a cost-sensitive sparse regression-based subset selection method, termed cost-sensitive sparse subset selection (CS4). CS4 considers the cost of different subsets for the prediction of all the data samples in a given data set and can choose a subset that has minimal prediction cost. Hence, compared to the related sparse regression-based methods, CS4 is capable of selecting the most informative representatives to characterize the structures of data sets. Moreover, we present an optimization algorithm for solving CS4 problem. The convergence and computation complexity of the algorithm have been analyzed. The relationships between CS4 and the related algorithms have been also discussed. Finally, the experiments on representative selection and classification show the effectiveness and superiorities of CS4. [ABSTRACT FROM AUTHOR] |