Introduction

The influence of spine fractures on patients’ functioning, including social and financial situation, is considered very significant compared to other injuries [1]. Currently, the decision-making between non-operative management and surgical care is far from settled for various types of spine fractures. In this perspective, measurement of outcomes is relevant in order to compare different treatment options, and thereby develop more rational choices for treatment strategies [2].

To address this void, the AO Spine Knowledge Forum Trauma developed the first disease-specific outcome measure for spine trauma patients, the Patient Reported Outcome Spine Trauma (AO Spine PROST) [3]. An important note is that there may be discrepancies when comparing patients’ perspective with clinicians’ perspective on what is considered as a good outcome of a specific treatment [4, 5]. It is imperative to also capture the perspective of the clinicians in a simple, reliable and quick to administer tool. Including the most relevant clinical and radiological parameters, this tool would be able to evaluate and predict clinical outcomes of spine trauma patients. This led to the development of a separate, unique tool that is rated by clinicians: the Clinician Reported Outcome Spine Trauma (AO Spine CROST) [6].

An initial reliability study, using anonymized clinical cases from daily clinical practice through an online system, showed moderate results [6]. It was hypothesized that a more adequate evaluation of the CROST would be possible when patients were seen and assessed by the clinician in a true clinical setting. Therefore, the aim of the current study was to evaluate the feasibility, internal consistency, inter-rater reliability, and prospective validity of the CROST in the clinical setting. Also, the correlation between the clinician reported CROST and patient reported PROST was investigated.

Materials and methods

Study design

An international multicenter cross sectional study with prospective follow-up until 1-year post-trauma was performed in four centers, recruited through the AO Spine Knowledge Forum (KF) Trauma. The participating centers included trauma hospitals from Australia (The Alfred Hospital, National Trauma Research Institute, Monash University, Clayton), the Netherlands (University Medical Center, Utrecht), Slovakia (Slovak Medical University, F. D. Roosevelt University General Hospital, Banska Bystrica), and Switzerland (Inselspital, University of Bern). Data were gathered through the online system REDCap, using study identification codes. According to the Medical Ethics Committee of the participating centers, this protocol did not need ethical approval under the scope of the Medical Research Involving Human Subjects Act because participants were not subjected to procedures, nor were they required to follow any specific protocol.

Surgeons

Two spine surgeons with at least 3 years of experience in spine trauma care participated from each center. Surgeon 1 was a member of the AO Spine KF Trauma, and was considered as the most experienced among these two surgeons. Surgeon 2 was recruited by Surgeon 1 at each center.

Patients

Adult patients (≥ 18 years) sustaining traumatic spine fractures and within 3 months post-trauma were included. They had to have mild or no neurological deficit (American Spinal Injury Association (ASIA) Severity score (AIS) C, D or E) at the time of discharge from hospital. In line with the target patient population in previous validation studies of PROST, patients with motor complete paralysis (AIS A or B) and hospitalized patients were excluded [3]. The desired sample size was 100 patients (25 per center), based on recommendations for this type of study [7].

Instruments

Two separate questionnaires were administered: one to the surgeons and another to the patients.

Surgeons completed CROST for each patient at their center. As shown in Appendix 1, this tool consists of 10 parameters. Eight parameters are rated for both surgically and nonsurgically treated patients, while 2 parameters are only applicable to surgically treated patients (‘Wound healing’ and ‘Implants’). Each parameter is rated both for the short-term (<12 months) and long-term (≥12 months). A ‘yes’-answer provides 1 point, and expresses any expected problems or adverse events for the parameters. The total recorded score is the sum of the ‘yes’-answers with a maximum achievable score being 8 points for nonsurgically and 10 points for surgically treated patients. A higher score indicates worse expected outcome.

Additionally, surgeons were also asked to complete patients’ background data, as well as evaluation questions in order to assess the feasibility: time to complete CROST, if it was considered as an easy and useful tool, if any difficulties were encountered when filling out, and if there were any redundant or missing parameters. Finally, the AO Spine KF Trauma surgeon was asked to assess the overall patient outcome in various prospective time points.

The patient part of the questionnaire consisted of PROST, which includes 19 questions on a broad range of aspects of functioning [3, 8,9,10,11,12]. Each item has a 0–100 Numeric Rating Scale, with 0 indicating no function at all and 100 the pre-injury level of function. The item “Work/Study” is optional. The total score is calculated by the mean of the answered questions. A higher score indicates improved outcome.

Study procedures

Eligible patients were identified and screened either just before discharge from hospital or at their first outpatient clinic appointment. Patients were enrolled in the study after informed consent was given. They were seen at three time points: baseline (i.e., the first outpatient clinic visit), 6-months, and 1-year after the trauma that caused their spine injury. At all these time points, patients were asked to complete PROST.

In order to assess the reliability of CROST, the two surgeons located at the same center independently made clinical assessments, and completed the tool for the same patient at the baseline visit.

Concerning the prospective evaluation, CROST was also scored at 6-months and 1-year visits. At these time points, the questionnaire was only completed by Surgeon 1 (i.e., the AO Spine KF Trauma member). This surgeon was also asked to judge the overall outcome of the patient at 6-months and 1-year with a binary definition: ‘same or better outcome than expected’ or ‘worse outcome than expected’. A ‘same or better outcome than expected’ was scored if the treatment goals were achieved, and ‘worse outcome than expected’ if they were not. For example, conversion of a conservatively treated patient to a surgical case, a surgically treated patient that undergoes a re-operation, or a patient highly dysfunctional in daily activities could be considered as ‘worse outcome than expected’.

Statistical analysis

Descriptive statistics were used to analyze patient characteristics and the feasibility of CROST. The internal consistency of the tool was analyzed by calculating Cronbach’s α. An α > 0.70 is accepted as satisfactory result [7].

Inter-rater reliability analysis was performed both for individual CROST items as well as for the total score. Kappa statistics was used for the individual CROST items, with < 0 values indicating poor agreement, 0.00–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement [13]. The Intraclass Correlation Coefficient (ICC) was used for total CROST score, with an ICC of 0.70–0.85 and >0.85 indicating good and excellent reliability, respectively [7].

The prospective analysis was performed by comparing outcomes as assessed at the baseline to the outcomes at 6-months and 1-year follow-up. The CROST scores at baseline were compared to the actual outcomes (same/better versus worse outcome) at 6-months and 1-year follow-up. Also, Spearman correlation coefficients (rs) between CROST scores at baseline and the scores at 6-months and 1-year follow-up were analyzed. The rs ranges from + 1 to −1, with + 1 indicating a perfect association, 0 no association, and −1 perfect negative association [7].

Finally, correlations between the clinician-reported CROST scores and patient-reported PROST scores were explored. Descriptive statistics were used to correlate CROST scores at baseline to PROST scores at different prospective time points. The change in CROST and PROST scores over time was analyzed using Spearman correlations. Also, the association between the ‘actual’ binary outcome (same/better versus worse outcome) was compared to PROST scores at 6-months and 1-year follow-up.

Results

Patient characteristics

A total of 92 patients were included in the study: 24 (26.1%) from Australia, 27 (29.3%) Dutch patients, 15 (16.3%) from Slovakia, and 26 (28.3%) Swiss patients. Table 1 shows the overall patient characteristics, as well as stratified for the provided treatment and per participating center.

Table 1 Patient characteristics. Also stratified per type of treatment as well as per center

Feasibility

The questions concerning the feasibility of the CROST were completed by 7 surgeons. Five surgeons stated that it took less than 5 min to complete the tool; while, two surgeons mentioned 5–10 min. All agreed the tool was easy to use and no difficulties were experienced in completing. No parameter was deemed difficult, redundant or missing. All surgeons expected that the CROST would be a useful tool in the clinical setting.

Internal consistency

As shown in Table 2, the internal consistency of CROST total score was moderate with Cronbach’s α ranging from 0.58 and 0.70.

Table 2 Internal consistency results (Cronbach’s α), shown for the AO Spine CROST scores at different study time points (baseline, 6-months, and 1-year follow-up), stratified for the short-term (<12 months) and long-term (≥12 months) anticipated CROST scores, as well as stratified for surgeons (Surgeon 1 and Surgeon 2). Results are shown for all patients (conservative and surgically treated patients, i.e., 8 CROST items scored) and only surgically treated patients (i.e., 10 CROST items scored)

Inter-rater reliability

The inter-rater reliability results for the total CROST scores as well as for each item are shown in Tables 3 and 4, respectively.

Table 3 Descriptive and agreement statistics for AO Spine CROST total score, between Surgeon 1 and Surgeon 2 at baseline study time point
Table 4 Descriptive and agreement statistics for each AO Spine CROST item, between Surgeon 1 and Surgeon 2 at baseline study time point (n = 92)

Moderate reliability results were found for the total scores, both for the short-term anticipated scores (ICC = 0.55) and long-term anticipated scores (ICC = 0.52). Subanalysis showed better reliability results for conservatively treated patients (ICC = 0.59–0.81) compared with surgically treated patients (ICC = 0.34–0.39).

As shown in Table 4, analyses of the mean scores per CROST item showed very good exact agreement results ranging from 73.9% (‘Range of motion impairment’) to 98.9% (‘Sagittal alignment problems’) for the short-term anticipated scores. Comparable results were seen for the long-term anticipated scores: 81.5% (rage of motion impairment) to 100.0% (wound healing problems). Additional analysis including Kappa values showed somewhat varying results. Except poor agreement for ‘Implants adverse events’ (κ = −0.4 both for the short-term and long-term anticipated scores), most other CROST items showed moderate agreement; while, ‘Sagittal alignment problems’ showed an almost perfect agreement (κ = 0.85).

Prospective analysis

The CROST scores at baseline were divided into 3 scoring subcategories: 0, 1, and ≥ 2. As shown in Table 5, none of those subcategories showed a specific correlation to the actual assessed outcomes at the follow-up. Nevertheless, a trend was seen when CROST was scored 0 (indicating no concerns at all), in which the vast majority of patient outcomes (87.0–93.8%) were classified as ‘same or better than expected’. Moderate to strong positive Spearman correlations were found between CROST scores at baseline and the scores at 6-months and 1-year follow-up, with significant rs values ranging from 0.41 to 0.64 (Table 6).

Table 5 AO Spine CROST scores for the short-term (<12-months) as scored at baseline study time point compared to assessed outcomes (‘same/better’ versus ‘worse’ outcome than expected) at the follow-up: 6-months (n = 89) and at 1-year (n = 75) study time points. Results are shown for the total patient sample as well as for conservatively and surgically treated patients
Table 6 Spearman correlations (rs) between AO Spine CROST scores at baseline and 6-months and 1-year follow-up study time points. Results are shown for the total patient sample as well as for conservatively and surgically treated patients

Correlation AO Spine CROST and PROST

No specific correlation was observed between the clinician-reported CROST scores at baseline as compared to the patient-reported PROST scores at different time points (baseline, 6-monts, and 1-year follow-up). Higher CROST scores (i.e., more concerned from clinical perspective) did not result in worse PROST scores nor were the differences statistically significant (Table 7). As shown in Table 8, no Spearman correlations were found between the change in CROST scores and change in PROST scores when compared at the baseline relatively to the scores at 6-months and 1-year follow-up (rs = -0.33 – 0.07). Finally, there seemed to be a statistically significant correlation between the PROST score and the assessed outcome by the surgeon (same/better versus worse outcome than expected). Table 9 reflects this with worse patient-reported PROST scores when the overall outcome is assessed as worse than expected.

Table 7 Relationships between short-term (<12 months) AO Spine CROST as scored at baseline study time point, in comparison to AO Spine PROST scores at baseline and follow-up study time points (6-months and 1-year). Results are shown for the total patient sample as well as for conservatively and surgically treated patients
Table 8 Spearman correlations between change in AO Spine CROST scores and change in AO Spine PROST scores as compared between baseline to 6-months and 1-year follow-up study time points. Results are shown for the total patient sample as well as for conservatively and surgically treated patients
Table 9 Relationships between assessed outcomes (‘same/better’ versus ‘worse’ outcome than expected) in comparison to mean AO Spine PROST scores (SD) at 6-months and at 1-year follow-up study time points. Results are shown for the total patient sample as well as for conservatively and surgically treated patients

Discussion

This study investigated the validation of the AO Spine CROST (Clinician Reported Outcome Spine Trauma) in the clinical setting. In contrast to a previous validation study that included online cases [6], the current study was performed in an actual clinical setting including patients from daily clinical practice. Excellent feasibility and acceptable internal consistency results were found. This indicates that the tool is deemed useful in the clinical setting and that its content measures the intended concept of assessing clinical outcomes from the perspective of the clinicians.

The inter-rater reliability analysis showed moderate results. Although only minor differences were found for the total CROST scores between Surgeon 1 and Surgeon 2 (0.2–0.9 difference), the agreement percentages were relatively low (48.9–57.6%). This may be explained by the high amount of variations in scoring the same exact score with a total ranging from 0 to 10. Additional subanalysis per CROST item showed very good exact agreement results (73.9–100.0%). On the other hand, varying Kappa values were found with the most agreements being moderate. These Kappa results may be skewed, and not fully representative, due to the very high number of CROST items that were responded with a ‘no’-answer (i.e., no concerns were expected with those items).

Prospective evaluation analysis of the CROST scores did not show a specific correlation to the overall outcomes as assessed by the surgeon at follow-up time points (same/better versus worse than expected). It is interesting to explore the clinicians’ perspective relative to the patients’ perspective on health and functioning. In the case of the treatment of spinal trauma patients, several clinical and radiological parameters are generally used by treating surgeons to evaluate treatment results. The most relevant parameters among spine trauma patients were identified in two preparatory studies in the developmental process of CROST [14, 15]. An estimation of any expected problems with respect to those parameters are made by the treating surgeons in order to determine the further course of treatment. The surgeon’s assessment may differ substantially from the patient’s perception [16, 17]. These discrepant views have also been addressed for a variety of other diseases, including metastatic diseases [18], multiple sclerosis [19], rheumatoid arthritis [20], and peripheral artery diseases [21]. The current study substantiates the discrepant views, and therefore the need for the clinician-reported CROST.

The patient-reported PROST analysis was not the main focus of the current study and, therefore, not further detailed in the Results section. Nevertheless, it is worth to mention that during the follow-up a gradual increase is seen in the mean PROST scores, indicating gradual recovery of the patients over time. This is in line with previous validation studies in which the PROST was cross-culturally translated and validated in the Dutch, English, German, Nepali and Slovak versions [8,9,10,11, 12]. A very recent publication states that translations have been, or are being, performed in a total of 17 languages [22]. This facilitates a worldwide use of the patient-reported outcome measure. As the clinician-reported CROST is assessed by the treating surgeons or clinicians, the authors recommend no additional translations besides the original English version.

This study has several limitations. The intra-rater reliability was not assessed due to the study procedures, as it was considered very challenging to see patients back at multiple additional time points across 4 different centers. Secondly, the number of included patients was lesser than initially anticipated, and the contribution of included patients from the 4 centers was not equal. The different amount of spine trauma exposure and local practical difficulties at the centers contributed to this limitation. Also, the patient population was somewhat heterogeneous. Finally, the binary outcome as assessed by the treating surgeon may be somewhat arbitrary. However, we believe this is a valid strategy to assess clinical outcomes, as judged by a highly experienced spine trauma surgeon.

In conclusion, the AO Spine CROST showed moderate results in the current validation study in a true clinical setting including patients from the daily clinical practice. In future studies, the validation will be further investigated among larger patient and clinician samples. With its unique approach as a clinician-rated outcome measure, this tool has the potential to be valuable for use in clinics and research.