data filtering; web information processing; data recovery
Web data being transmitted over a network channel on the Internet with excessive amount of data causes data processing problems, which include selectively choosing useful information to be retained for various data applications. In this paper, we present an approach for filtering less-informative attribute data from a source Website. A scheme for filtering attributes, instead of tuples (records), from a Website becomes imperative, since filtering a complete tuple would lead to filtering some informative, as well as less-informative, attribute data in the tuple. Since filtered data at the source Website may be of interest to the user at the destination Website, we design a data recovery approach that maintains the minimal amount of information for data recovery purpose while imposing minimal overhead for data recovery at the source Website. Our data filtering and recovery approach (i) handles a wide range of Web data in different application domains (such as weather, stock exchanges, Internet traffic, etc.), (ii) is dynamic in nature, since each filtering scheme adjusts the amount of data to be filtered as needed, and (iii) is adaptive, which is appealing in an ever-changing Internet environment.