User Tools

Site Tools


7a.004.DU - Large-scale Probabilistic Anomaly Detection from Text and Its Application to Medical Records

Project - Summary

Anomaly detection based on frequencies of structured features is widely studied for cybersecurity, fraud detection, event detection, etc. In contrast, anomaly detection from rich-text records have not been sufficiently investigated in recent years, although it has valuable applications in some problems. The anomalies from such rich-text datasets are more meaningful for decision-making because the algorithm can provide what word occurrences in a record deviate from the normal pattern. This is particularly useful for rich-text healthcare records: in addition to insurance frauds, it can also help identify potential unmatched diagnosis, record errors, unknown drug effects, extreme medical cases, medical resource abuses, etc.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Tony Hu PI (215) 895-0551 Drexel University
Zheng Chen Student (215) 939-5997 Drexel University
Chris Page Project Mentor Not available Not available GlaxoSmithKline (GSK)

Project - Novelty of Approach

  • Our proposed approach takes a generative-view of the rich-text records. It is universally applicable, and requires no domain knowledge.
  • Our approach is conditional, context-aware anomaly detection.
  • Our algorithm is intended for bigdata. Techniques like mean-field approximation are well studied for reducing the time complexity of probabilistic generative models.
  • Our approach can be converted to an online algorithm more suitable for industrial needs.

Project - Deliverables

1 Develop novel methods and prototype APIs to discover anomalies from rich-text datasets, including off-line batch processing algorithms and online algorithms.
2 Work with IAB members to evaluate the prototype APIs using real-world log dataset and transfer them to the IAB members.

Project - Benefits to IAB

Our paper entitled “Correlated Anomaly Detection from Large Streaming Data” was accepted and presented at the 2018 IEEE International Conference on Big Data.

Project - Presentation Video

Project - Documents

projects/year7/7a.004.du.txt · Last modified: 2019/08/20 10:46 by sally.johnson