Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics

Jing Lu, Alan Hales, David Rew

Research output: Contribution to journalArticle

284 Downloads (Pure)

Abstract

This research presents a methodology for health data analytics through a case study for modelling cancer patient records. Timeline-structured clinical data systems represent a new approach to the understanding of the relationship between clinical activity, disease pathologies and health outcomes. The novel Southampton Breast Cancer Data System contains episode and timeline-structured records on >17,000 patients who have been treated in University Hospital Southampton and affiliated hospitals since the late 1970s. The system is under continuous development and validation. Modern data mining software and visual analytics tools permit new insights into temporally-structured clinical data. The challenges and outcomes of the application of such software-based systems to this complex data environment are reported here. The core data was anonymised and put through a series of pre-processing exercises to identify and exclude anomalous and erroneous data, before restructuring within a remote data warehouse. A range of approaches was tested on the resulting dataset including multi-dimensional modelling, sequential patterns mining and classification. Visual analytics software has enabled the comparison of survival times and surgical treatments. The systems tested proved to be powerful in identifying episode sequencing patterns which were consistent with real-world clinical outcomes. It is concluded that, subject to further refinement and selection, modern data mining techniques can be applied to large and heterogeneous clinical datasets to inform decision making.
Original languageEnglish
Pages (from-to)30-51
JournalInternational Conference on Information Technology in Bio- and Medical Informatics. ITBAM 2017. Lecture Notes in Computer Science
Volume10443
Publication statusPublished - 26 Jul 2017

Keywords

  • Clinical data environment, electronic patient records, health information systems, data mining, visual analytics, decision support

Cite this