User Tools

Site Tools


7a.003.UL - Ontology-based Fast Semantic Indexing for Structured and Unstructured Data in Health Care

Project - Summary

In the current big data environment, most of the data is gathered from multiple sources. Entity resolution or duplication of data is a major problem in this scenario. This duplicate data is more pronounced in patient data from health care. Recent studies indicate that about 15% of the Master Patient Index of major hospitals are duplicate entries. Issues like heterogeneous data, incomplete information, constantly changing properties associated with entities, and temporal information pose major challenges to identifying duplicate entities in the data. To solve this problem, we propose a sketch based indexing (SBI) technique that combines the demographic and clinical information associated with the patient. The proposed sketch based technique creates a clinical sketch of the patient based on the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD) codes associated with the patient profile. A sketch is generated based on the distances of the ICD codes associated with the patient to a set of core nodes within the semantic graph extracted from the ontology. The results show that SBI enables us to detect changes to the patient profile with partial or incomplete information. We evaluate the performance of SBI over a simulated set of patient records that includes incomplete and partial information. The results show that SBI can detect a unique patient with an accuracy of 97.3% when 20% of the ICD codes are missing from patient information. The results show that codes generated from SBI information can be updated to incorporate changes in the ontology. Finally, the results also show that SBI can maintain a unique sketch when the underlying ICD codes are updated for a patient.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Satya Katragadda PI (337) 482-0625 UL Lafayette
Raju Gottumukkala Researcher (337) 482-0632 UL Lafayette
Vijay Raghavan Researcher (337) 482-6603 UL Lafayette
Adeola Siwoku Graduate Student (337) 735-0219 - Cell UL Lafayette
Mike Lucito Project Mentor Not available Schumacher Clinical Partners
Nicholas Ruiz Project Mentor Not available Not available Schumacher Clinical Partners

Project - Novelty of Approach

  1. Ontology-based approach to: a) Understand schema and connect objects for faster indexing and b) Convert, via connected objects, unstructured to structured data (e.g. records without SSN)
  2. Dynamic weighted approach for calculating match probability

Project - Deliverables

1 Investigate various instance matching based on entity recognition, record linkage, and entity co-reference approaches in current literature
2 Develop a global identifier for each instance based on the properties or features associated with that instance
3 Design a blocking technique that identifies the matching between two instances based on the global identifier
4 Build a prototypical system for healthcare data to identify duplicate entries in a Master Patient Index

Project - Benefits to IAB

Deduplication techniques are important to identify duplicate records or profiles in the database or the data warehouse. The duplicate records are primarily due to missing data, erroneous data, and changes to the record over time. As these duplicate records increase, they result in inefficient outcomes also increases. Thus, a deduplication technique that is based on the properties associated with the record can help identify similar record or profiles based on the properties. Our sketch based technique can uniquely identify profile with minor changes that accumulate over time. The ontology based sematic graph can be used to generate sketch over multiple standards by mapping sematic graphs over multiple standards together.

Project - Presentation Video

Project - Documents

projects/year7/7a.003.ul.txt · Last modified: 2021/06/02 15:54 by sally.johnson