Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

ReLiC: entity profiling using random forest and trustworthiness of a source

dc.contributor.authorVarma S.; Sameer N.; Chowdary C.R.
dc.date.accessioned2025-05-24T09:39:55Z
dc.description.abstractThe digital revolution has brought most of the data in the world to the world wide web, but at the same time, the data available on WWW has increased manyfold in the past decade. Social networks, online clubs etc., have come into existence. Expert systems are required to extract information from these venues about a real-world entity like a person, organisation, event, etc. However, this information may change over time, and there is a need to maintain the data. Therefore, it is desirable to have an intelligent model to extract relevant data items from different sources and merge them to build a complete profile of an entity (entity profiling). Further, this model should be able to handle incorrect or obsolete data items. In this paper, we propose a novel method for completing a profile. We have developed a two-phase method. (1) The first phase (resolution phase) links records to the queries. We have studied the performance of various classifiers for this purpose and observed that the use of the random forest is best suited for entity resolution. Also, we proposed and used “trustworthiness of a source” as a feature to the random forest. (2) The second phase selects the appropriate values from records to complete a profile based on our proposed selection criteria. We used the concept of assigning authority to a reliable source in entity profiling, and it is established through our results that the use of an authoritative source has significantly improved the performance of the proposed system. Experimental results show that our proposed system ReLiC outperforms COMET. © 2019, Indian Academy of Sciences.
dc.identifier.doihttps://doi.org/10.1007/s12046-019-1178-x
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/18609
dc.relation.ispartofseriesSadhana - Academy Proceedings in Engineering Sciences
dc.titleReLiC: entity profiling using random forest and trustworthiness of a source

Files

Collections