Description

Title Diagnosis and Knowledge Discovery of Turner Syndrome Based on Facial Images Using Machine Learning Methods
Abstract Turner Syndrome (TS) is a chromosomal disorder disease existing in females, which brings many difficulties to their lives. Due to the potential special facial features involved in TS patients, we propose to use machine learning methods to diagnoseTS early based on their face images and to help physicians to discover underlying knowledge existing in TS patients. Face alignment, multi-task cascaded convolutional networks and grayscale image approaches are employed to intercept aligned, entire and detailed facial images. Furthermore, five feature sets containing meaningful facial features are well defined and extracted from the preprocessed images, i.e., Rough Energy Features (REF), Rough Ratio Features, Finer Energy Features (FEF), Finer Ratio Features (FRF) and FRF2. To the best of our knowledge, it is the first time that the five kinds of feature sets are used to diagnose TS. By applying Support Vector Machine (SVM) to the five feature sets, TS prediction models are built. To improve the predict performance, feature selection methods of Principal Components Analysis (PCA) and Kernel PCA, and ensemble methods of nonweighted and weighted voting methods are used. Finally, the weighted voting method achieves the highest accuracy of 0.9172, and FEF always outperforms the other four feature sets. In addition, SVM and PCA techniques are also first used to find out the potential important facial features in TS. By using SVM, we found that the feature 23 (35 and 31) of REF reflects that the larger nasal root height (wide jaw) is more likely existed in TS. And other discoveries like more shadow in external canthus areas and low ratio of forehead height and width seem to occur in TS patients. By analyzing features based on PCA, we observed that the left zygoma area is valuable. More informative variables are also detected in our work. These discoveries derived from TS facial images indicate that the proposed five feature sets provide a more interpretable manner for physicians to discover underlying knowledge in TS.