We thank Dr. Daungsupawong and Dr. Wiwanitkit for their important comments1 related to our study.2 We agree that inter-operator agreement is crucial in evaluating the results of the presented study, which tests the application of lung ultrasound (LUS) in robust multicenter pragmatic research. In a previous paper, Lerchbaumer et al. documented moderate concordance (κ = 0.41) in inter-observer agreement for the overall LUS score in a small cohort of COVID-19 patients. Interestingly, there was a significant discrepancy for score 1, while higher coefficients for inter- and intra-operator concordance were reported for the other LUS scores (2 and 3), as well as for subpleural consolidations and air bronchograms.3
Therefore, the generalizability of the results inevitably reflects the quality of the LUS assessment. Consequently, we included only expert clinicians trained according to the recommendations of the Italian Society of Ultrasonography in Medicine and Biology. Additionally, doubtful LUS clips were collected and collegially debated, as reported in our manuscript. We also recognize that various LUS protocols have been widely reported, which limits the interpretation of published data. In our study, we selected a 12-field protocol, which has been shown to be an adequate trade-off in a previous paper including COVID-19 and post-COVID-19 patients.4 This protocol was used to validate the previous results with robust statistical methods in pragmatic multicenter research, ultimately offering proof of concept for the LUS data validation. These data also documented that the baseline LUS score is an easy and reproducible tool independently associated with adverse outcomes after adjusting for clinical and laboratory parameters. This suggests that, despite COVID-19 pneumonia exhibiting different clinical and radiological phenotypes, LUS evaluation could potentially allow prognostic quantification for COVID-19 patients.
However, these findings should not limit future developments requiring integrated systems, including artificial intelligence (AI) to handle broad data sets. A previous study using a machine learning algorithm documented that urea, lymphocytes, glucose, basophils, and age are predictors of poor survival among COVID-19 patients.5 Similarly, AI-based imaging interpretation has shown significant potential for LUS evaluation in identifying LUS signs and artifacts (B-lines, airspace consolidations, pleural effusion), evaluating the LUS score, grading COVID-19 severity, and differentiating COVID-19 from other infectious or non-infectious disorders.6 Despite these potential advantages, the quality of data collection for machine learning also depends on the standardization of LUS. In this context, novel scenarios involving robotic ultrasound combined with data cloud storage systems could enhance the quality of assessment and AI interpretation through the accumulation of large, high-quality datasets essential for AI-based LUS evaluation. Nevertheless, deep learning algorithms have not yet achieved consistency in clinical research. Data interpretability and medical-legal implications present barriers to the widespread use of AI in clinical practice, limiting the translational effects of the current scientific literature.7