An Efficient Redundant Sparse Look-up Indexing Based File Deduplication Using Semantic Cluster Content Chunking for Optimizing Cloud Storage.

Autor:	A., Sevuga Pandian, M., Gomathi
Předmět:	CLOUD storage TIME complexity CLOUD computing SPACE environment MATHEMATICAL optimization
Zdroj:	International Journal of Intelligent Engineering & Systems; 2024, Vol. 17 Issue 5, p130-141, 12p
Abstrakt:	Cloud computing and storage processing is a big service for maintaining a large number of data in centralized servers to store and retrieve data depending on use to pay as a service mode. Due to increasing storage depending on duplicate copy presence during different sceneries, the size increase leads to cost increase. To resolve this problem, efficient Deduplication techniques are implemented to reduce storage. Redundant Sparse Look-up Indexing (RSLI) based file Deduplication using Semantic Cluster Content Chunking (SCCC) for optimizing cloud storage. To optimize the cloud storage contiguous fragmentation was applied to reduce the storage blocks. Initially, preprocessing can be used to track content availability by checking file name, size, and type patterns using file space optimization techniques. Also, the file can be split into hash-indexed chunks by implementing predefined methods based on these. The content union can also be compared to other files through Sparse Lookup Content Source Duplication (SLCSD). Their use helps to identify the number of overlaps between files in different transformations. Distance Vector Weightage Correlation (DWWC) Document Similarity Weights Based on Presence Counts help to include and cluster related documents. Finally, the RCB compares the document based on the coefficient match case contents present in the cluster using the similarity weight to find the duplicate content. The simulated results demonstrated high performance in terms of precision, recall, storage efficiency, and time complexity when utilizing content file types. The proposed method enhances de-duplication performance by efficiently locating duplicate files, improving indexing, and reducing storage space in the cloud environment. The proposed method results are precision is 96.9%, recall is 97.8%, the false rate is 2.6% and storage redundant is 36.9% than other methods. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu