Comprehensive Summary
There is increasing demand for cosmetic skin-procedures (such as laser interventions), and many non-dermatologists are now doing skin assessments and treatments. Machine learning (ML) and AI have shown promise in dermatology/image-analysis tasks, but simultaneous prediction of multiple skin characteristics from facial photos has not been thoroughly investigated. This study set out to develop a high-performing machine learning model that can simultaneously predict Fitzpatrick skin type, hyperpigmentation (severity), and redness (severity. The authors curated a dataset named “SkinAnalysis” of 3,662 facial photographs drawn from publicly available sources. These were labeled by a board-certified dermatologist across multiple scales: Fitzpatrick skin type, Kesty hyperpigmentation scale, and Kesty redness scale. The dataset was split into training, validation, and test subsets. The authors then trained and compared 15 different model configurations and three neural-network architectures: VGG-16, ResNet-50, and EfficientNet. The best performing model was an EfficientNet-V2M architecture combined with custom SkinCELoss. The model tended to perform better at extreme values of the scales: very light or very dark skin types.
Outcomes and Implications
The study demonstrates that machine-learning models can simultaneously predict multiple skin characteristics from color facial photographs with reasonably high accuracy. In practice, such AI tools could assist non-dermatologists in performing skin evaluation and thus improve treatment planning and safety. The authors propose that future work should involve larger and more diverse datasets, external validation, prospective clinical deployment, and integration into workflows. The dataset, although relatively large, was drawn from publicly available internet-images, hence it may not cover the full clinical diversity. Additionally, since the labeling was done by a single dermatologist, inter-rater variability was not deeply explored; also non-dermatologist assessment performance was not compared.