Abstract |
The development and investigation of medical applications require patient data from various
Electronic Health Records (EHR) or Clinical Records (CR). However, in practice, patient data is and
should be protected and monitored to avoid unauthorized access or publicity, because of many reasons
including privacy, security, ethics, and confidentiality. Thus, many researchers and developers
encounter the problem to access required patient data for their research or make patient data
available for example to demonstrate the reproducibility of their results. In this talk, we propose
a knowledge-based approach of synthesizing large scale patient data. Our main goal is to make the
generated patient data as realistic as possible, by using domain knowledge to control the data
generation process. Such domain knowledge can be collected from biomedical publications such as
PubMed, from medical textbooks, or web resources (e.g. Wikipedia and medical websites). Collected
knowledge is formalized in the Patient Data Definition Language (PDDL) for the patient data
generation. We have implemented the proposed approach in our Advanced Patient Data Generator (APDG).
We have used APDG to generate large scale data for breast cancer patients in the experiments of
SemanticCT, a semantically-enabled system for clinical trials. The results show that the generated
patient data are useful for various tests in the system.
|