The Chester Step Test (CST) is a simple and inexpensive field test, which requires minimal physical space to assess exercise capacity. Such characteristics make the CST suitable to be used in different settings, however, its measurement properties in patients with interstitial lung diseases (ILD) are unknown.
MethodsA cross-sectional study was conducted in patients with ILD. First, a CST-1 and a 6-minute walk test (6MWT) were performed. After 48-72 hours, a CST-2 was repeated. A 2nd rater was present in one of the sessions. Relative reliability was measured using intraclass correlation coefficient (ICC1,1 and ICC2,1). Absolute reliability was determined using standard error of measurement (SEM), minimal detectable change at 95% confidence interval (MDC95) and the Bland-Altman method. The values of SEM and MDC95 were also expressed as a percentage of the mean. Construct validity was explored using Spearman correlation coefficient (rs) between the number of steps taken in the best CST and the distance performed in the 6MWT.
ResultsSixty-six patients with ILD (65.5±12.9 years; 48.5%men; FVC 79.4±18.8pp; DLCO 49.0±18.3pp) participated in the study. Relative (ICC 0.95-1.0) and absolute reliability were excellent without evidence of systematic bias. The SEM and MDC95 were 11.8 (14.7%) and 32.6 steps (40.7%), respectively. The correlation between CST and 6MWT was significant, positive, and high (rs=0.85, p=0.001).
ConclusionThe CST is a reliable and valid test and might be especially useful to assess exercise capacity in patients with ILD in limited space environments.
Interstitial lung diseases (ILD) are a heterogeneous group of diffuse parenchymal lung disorders which, despite presenting diverse etiologies, share several clinical features.1,2 Although comprising various degrees of inflammation and/or fibrosis, a percentage of ILD patients can develop a progressive self-sustaining fibrosis, namely those with idiopathic nonspecific interstitial pneumonia, unclassifiable idiopathic interstitial pneumonia, connective tissue disease-associated ILDs, hypersensitivity pneumonitis, sarcoidosis and ILDs related to other occupational exposures (e.g. silicosis, asbestosis).3 This progressive-fibrosing phenotype may lead to worsening of symptoms (mainly exertional dyspnea, dry cough and fatigue), progressive impairment in gas exchange and lung function decline, reduced exercise capacity, muscle dysfunction and reduced quality of life.4,5 Some of these patients can also present a median survival rate below 5 years after diagnosis and generally exhibit high associated socioeconomic costs, high dependency on others to perform daily living activities and a massive burden on healthcare systems.6-9
In patients with ILD, reduced exercise capacity is associated with poor health-related quality of life and increased hospital admissions.10,11 Finding valid, feasible and standardized tests which enable comparison with standard values and provide agreement between healthcare professionals is, thus, a priority in order to provide the best quality of care to this population.11 The gold standard to evaluate exercise capacity is cardiopulmonary exercise testing (CPET).12 However, CPET is not easily available in clinical practice as it requires expensive equipment, the presence of specialized human resources and is time-consuming.13 To overcome these limitations, field tests such as the 6-minute walk test (6MWT), the incremental shuttle walk test (ISWT) and, more recently, the Chester Step Test (CST) have been used to assess exercise capacity in patients with chronic respiratory diseases.13-15 Field tests are more affordable and simpler to apply than CPET and are better related to patients’ demands during activities of daily living.13,16 Particularly, the CST requires less space than the other field tests, which allows it to be easily applied in different settings, including inpatient, outpatient and home-based settings.16
The CST is an externally paced, incremental and multistage test, designed to assess exercise capacity in healthy individuals.15,16 Recently, it has been validated to assess exercise capacity in patients with chronic obstructive pulmonary disease (COPD).15 However, the physiological mechanisms of exercise limitation in patients with COPD differ significantly from those in patients with ILD, in whom exercise intolerance is mostly due to impaired gas exchange and circulation limitation.10,17 Thus, it is imperative to test the measurement properties of the CST in ILD to assure that the selection of this instrument for research and clinical practice is evidence-based and its results can be reliably interpreted.18
The authors hypothesize that the number of steps in the CST will present: (1) excellent intra and inter-rater reliability; (2) significant, positive, and high correlation with the distance covered in the 6MWT (6MWD). The main purpose of this study was to assess the reliability (relative and absolute) and construct validity of the CST in patients with ILD.
Materials and methodsStudy design and populationThis cross-sectional study was integrated in a larger trial (POCI-01-0145-FEDER-007628, POCI-01-0145-FEDER-028806 and PTDC/SAU-SER/28806/2017), with ethical approval from the Unidade de Investigação em Ciências da Saúde: Enfermagem (UICISA: E) of the Escola Superior de Enfermagem de Coimbra, Portugal (N°P517-08/2018), from the Ethics Committee for Health of the Centro Hospitalar do Baixo Vouga, EPE, Aveiro, Portugal (N/Ref 0863926) and from the Hospital Distrital da Figueira da Foz, EPE, Leiria, Portugal (March 15th 2019). All participants signed an informed consent.
The study initiated in January 2019, and it was completed in November 2020. Patients were considered eligible if they were diagnosed by their pulmonologist with any ILD, according with the internationally accepted guidelines,11920 and were clinically stable over the past month (i.e., no hospital admissions, exacerbations – i.e., acute, clinically significant respiratory deterioration, typically less than 1 month in duration, with new bilateral glass opacity and/or consolidation superimposed on a background pattern consistent with fibrosing ILD21 - or changes in their pharmacological treatment strategy). They were excluded if they had other lung diseases, signs of cognitive impairment or substance abuse, or presence of a significant cardiovascular, neurological, or musculoskeletal disease that precluded their participation in data collection.
This study was conducted and reported according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative and a minimum of fifty participants were aimed to be included in order to achieve a good sample size for measurement properties assessment.18,22
Data collectionParticipants were asked to attend to two assessment sessions, with at least 48-72 hours apart. Sociodemographic, anthropometric, and clinical data were first obtained to characterize the sample. Lung function tests (spirometry and diffusion capacity for carbon monoxide - DLCO) were conducted during patients´ routine medical appointments and collected from their most recent clinical records. The Self-Administered Comorbidities Questionnaire (SCQ) was used to score the severity of comorbidities.23,24 This questionnaire is composed of 12 medical and 3 optional conditions and attributes a maximum of 3 points to each condition (1 point for the presence of the problem, 1 point if receiving treatment for it and 1 point if the medical condition limits the person activities). Scores range from 0 to 45, with higher scores indicating more severe comorbidities.23,24 Then participants performed a CST-1 and a 6MWT, in this specific order. A resting period of at least 30 minutes between tests was given to allow for recovery of vital signs, fatigue, and dyspnea to their baseline values (a longer period was given, after the 30 min, if participants had not returned to their baselines levels). In the second session, a CST-2 was performed. A 2nd rater was present in one of the sessions to assess the CST.
The CST was performed using a digital recording with timed metronome rhythms and a 20 cm tall single-step device.16 The digital recording also gives the standardized instructions of the test and the chance to practice the test briefly.16 The CST has 5 stages, lasting 2 minutes each. The timed metronome set the step cadence, which starts at 15 steps/minute and increases 5 steps/minute every 2 minutes: stage 1 (15 steps/minute); stage 2 (20 steps/minute); stage 3 (25 steps/minute); stage 4 (30 steps/minute); stage 5 (35 steps/minute). The maximum test duration is 10 minutes, corresponding to the final stage 5. Heart rate (HR) and peripheral arterial oxygen saturation (SpO2) were monitored continuously with a pulse oximeter (Konica Minolta, Pulsox-300i, Japan) and recorded on paper by the rater every minute or at the end of each 2-minute stage, respectively.25 The perceived dyspnea and fatigue were recorded using the modified Borg scale every minute.25 The CST ended when: the participant reached 80% of the reserve HR25; if the SpO2 dropped below 85%; or if the participant was unable to maintain the step cadence for 15 seconds. Moreover, if the participant showed signs of intolerable dyspnea, being over-tired or dizzy, the CST was immediately terminated. The main outcome measure of the CST was the total number of steps taken. The best of the two CST, where the participant performed the highest number of steps, was selected for validity analysis.
The 6MWT was performed on a flat, straight, 30 meters length corridor with a hard surface, according to the European Respiratory Society/American Thoracic Society guidelines.26 Before each 6MWT, participants rested on a chair, located near the starting position, for at least 10 minutes. Then, participants were instructed to walk as fast as possible, without running or jogging, for 6 minutes.26 If participants requested to pause during the test or their SpO2 dropped below 85%, they could sit on the chairs placed along the corridor.26 Participants were encouraged to resume walking as soon as they could or when their SpO2 reached at least 88%. Criteria for immediately terminating the test included chest pain, intolerable dyspnea, leg cramps, staggering, diaphoresis and pale or ashen appearance.26 Standard encouragement was given each minute.26 The 6MWD was the main outcome measure.
Statistical analysisData analysis was performed using IBM SPSS Statistics (version 25.0, IBM Corporation, Armonk, NY, USA).
Descriptive statistics, i.e., relative frequencies (percentage), mean ± standard deviation (SD) or median [interquartile range] were used to describe the sample. The Kolmogorov-Smirnov test (KS) was used to determine the normality of data distribution.27 Outliers were identified, through the inspection of extreme points on the plotted graphs of the variables in study and analysis were performed with and without their presence. We decided not to remove outliers since their presence did not affect results significantly. Tests with a p<0.05 were considered statistically significant.
Relative reliability was measured using intraclass correlation coefficient (ICC)(29). ICC1,1 and ICC2,1 models were used to determine intra-rater and inter-rater reliability,29 respectively, according to the following equations:
where BMS is between-subjects mean squares, WMS is within-subjects mean squares, EMS is the error (residual) of mean squares, RMS is between raters mean squares, k is the number of measurements/raters (k=1), and n is the number of participants. An ICC lower than 0.50 was considered of poor reliability, 0.5-0.75 moderate, 0.75-0.90 good and greater than 0.9 excellent reliability.18,28Absolute reliability was determined by calculating the standard error of measurement (SEM) and the minimal detectable change at 95% confidence interval (MDC95)(29). The SEM was measured according to the following equation: SEM=SDdifference/2), where SDdifference is the SD of the differences between the CST-1 and CST-2.18,30 The MDC95 was calculated as follows: MDC95=1.96×2×SEM.18 The values of SEM and MDC95 were also expressed as a percentage of the mean and calculated as follow: SEM%=(SEM/mean)×100andMDC95%=(MDC95/mean)×100, where mean is the mean of the number of steps taken in CST-1 and CST-2. A MDC95% of less than 30% was considered acceptable.31
The Bland-Altman method was applied to calculate absolute reliability.18,22 First, we plotted the difference between the number of steps taken in CST-1 and CST-2 against the mean of the number of steps taken in CST-1 and CST-2.32 Then, we calculated the mean and SD of the differences between CST-1 and CST-2, the closer the mean difference is to zero and the smaller the SD of the differences, the more reliable is the measure.32 Finally, we calculated the 95% limits of agreement (LoA95) as follows: LoA95=meandifference±1.96×SDdifferences.32
Construct validity was assessed by analysing the relationship between the number of steps taken in the best CST and the 6MWD using Spearman correlation coefficient (rs)(18). A correlation of 0-0.3 was considered poor, 0.3-0.5 weak, 0.5-0.7 moderate, 0.7-0.9 strong, and 0.9-1.0 excellent.18,27
ResultsSeventy-eight patients with ILD were screened to be included in the study. Sixty-six were eligible to participate and twelve were excluded for the following reasons: decline to participate (n=4), drop out for no reason given (n=1), presence of a significant cardiovascular disease (ischemic cardiomyopathy associated with myocardial infarction, n=4), presence of a significant musculoskeletal disease (severe gonarthrosis, n=2), presence of cognitive impairment (Alzheimer's disease, n=1). Sixty-six participants were included for the construct validity study and fifty-three for the reliability study, since thirteen individuals did not attend the second assessment session due to participants unavailability. A flow diagram of recruitment is provided in Fig. 1.
Eligible participants were on average 65.5±12.9 years old, slightly overweight (body mass index= 28.9±5.2kg/m2) and 48.5% were male (n=32). Most prevalent types of ILD were chronic hypersensitivity pneumonitis (n=29, 43.9%), followed by idiopathic pulmonary fibrosis (n=16, 24.2%) and sarcoidosis (n=6, 9.1%). Participants presented a mean forced expiratory volume in one second (FEV1) of 81.9±19.9% of predicted and mean forced vital capacity (FVC) of 79.4±18.8% of predicted. Nineteen participants (28.8%) presented mild ILD (DLCO >60% predicted), 18 participants (27.3%) moderate ILD (40%≤DLCO≤60% predicted) and 19 participants (28.8%) severe ILD (DLCO<40% predicted). We were unable to access DLCO values for 10 patients (15.1%). Thirty-one participants used long-term oxygen therapy (47%) and five participants used non-invasive ventilation during sleep (7.6%). The mean number of steps taken in CST-1 and CST-2 were 77.7±50.2, and 82.4±55.7, respectively. The main reason for CST termination was the inability to maintain the required step cadence. The mean 6MWD was 399.4±128.2 meters (83.1±26.4% of predicted). A detailed sample characterization is summarized in Table 1.
Sample characterization (n=66).
Characteristics | Eligible participants (n=66) |
---|---|
Age, years | 65.5±12.9 |
Gender, male n (%) | 32 (48.5) |
BMI, kg/m2 | 28.9±5.2 |
Smoking status, n (%) | |
Current | 2 (3) |
Former | 24 (36.4) |
Never | 42 (63.6) |
Packs/year | 35.0 [7.0-55.8] |
Exacerbations/year, n (%) | |
0 | 48 (72.7) |
1 | 14 (21.2) |
≥2 | 4 (6) |
Lung function | |
FEV1, L | 2.1±0.7 |
FEV1, %predicted*1 | 83.7±20.8 |
FVC, L | 2.5±0.8 |
FVC, %predicted*1 | 79.4±18.8 |
FEV1/FVC, % | 82.7±9.2 |
DLCO, %predicted*1 | 49.0±18.3 |
DLCO >60%predicted, n (%) | 19 (28.8) |
40%≤DLCO ≤60%predicted, n (%) | 18 (27.3) |
DLCO <40%predicted, n (%) | 19 (28.8) |
Long-term oxygen therapy, n (%) | 31 (47) |
Non-invasive ventilation, n (%) | 5 (7.6) |
Pharmacological treatment for ILD, n (%) | |
Glucocorticoids | 43 (65.2) |
Immunosuppressant | 30 (45.5) |
Antifibrotics | 5 (7.6) |
ILD types, n (%) | |
IPF | 16 (24.2) |
Sarcoidosis | 6 (9.1) |
Chronic hypersensitivity pneumonitis | 29 (43.9) |
NSIP secondary to systemic sclerosis | 2 (3) |
UIP secondary to systemic sclerosis | 4 (6.1) |
UIP secondary to rheumatoid arthritis | 2 (3) |
Anti-synthetase syndrome | 2 (3) |
Desquamative interstitial pneumonia | 1 (1.5) |
LIP related to Sjogren's syndrome | 1 (1.5) |
Silicosis | 1 (1.5) |
Respiratory bronchiolitis ILD | 1 (1.5) |
Follicular bronchiolitis related to Sjogren's syndrome | 1 (1.5) |
SCQ | 9±3.9 |
CST-1 | 77.7±50.2 |
CST-2 | 82.4±55.7 |
6MWT | 399.4±128.2 |
6MWT, % of predicted⁎2 | 83.1±26.4 |
Notes: Values are presented as mean±standard deviation or median [interquartile range].
Legend: BMI, body mass index; CST, Chester Step Test; FEV1, forced expiratory volume in one second; FVC, forced vital capacity; DLCO, diffusion capacity for carbon monoxide; ILD, interstitial lung disease; IPF, idiopathic pulmonary fibrosis; LIP, lymphocytic interstitial pneumonia; NSIP, non‐specific interstitial pneumonia; UIP, usual interstitial pneumonia; SCQ, self-administered comorbidities questionnaire; 6MWT, 6-minute walk test.
The CST demonstrated excellent relative reliability, for both intra-rater reliability (ICC1,1=0.95; 95%CI 0.91-0.97) and inter-rater reliability (ICC2,1=1.0; 95%CI 0.99-1.0). Regarding absolute reliability, SEM and MDC95 values were 11.8 steps (SEM%=14.7%) and 32.6 steps (MDC95%=40.7%), respectively. The Bland-Altman plot was created and a mean difference of -4.72 steps was observed with the LoA95 ranging from -37.28 and 27.84 steps (Fig. 2).
Bland and Altman plot of the difference between number of steps in the Chester Step Test-1 (CST-1) and CST-2 against the mean of the number of steps in test -1 and test -2 in patients with interstitial lung disease (n=66). The dashed horizontal line represents the mean difference, and the solid horizontal lines represent the 95% upper and lower limits of agreement.
The correlation between the number of steps of the best CST and the 6MWD was significant, positive, and strong (rs=0.85, p=0.001) (Fig. 3).
DiscussionExcellent intra-rater and inter-rater reliability were found for the CST. A similar study, which focused on patients with COPD and assessed the reliability of the CST also showed excellent relative reliability (ICC of 0.99; 95%CI 0.97-0.99)(15). This finding indicates that CST provides consistent results and excellent agreement between healthcare professionals, which allows for comparisons of patients’ results even when CST is applied by different raters on different occasions, i.e., pre and post PR program.11,33 Moreover, it also suggests that only minimal training is required for healthcare professionals to apply the CST.
Our findings suggest that it is necessary to improve above 32.6 steps to assume that the change observed was above the measurement error.18 Although this cut-off is informative, whether that change is clinically meaningful remains unknown. Moreover, our MDC95% was above the 30% acceptable limit.31 It is likely that this finding was influenced by the high heterogeneity of symptoms and exercise capacity observed in our sample of patients within disease subgroups (mild DLCO 28.8%; moderate DLCO 27.3% and severe DLCO 28.8%), thus increasing the MDC95 value.1,34 Therefore, future studies determining the minimal clinically important differences of the CST in patients with ILD, ideally using homogeneous samples, and using both anchor (i.e., mean change, the receiver operating characteristic curves and linear regression analysis) and distribution-based methods (i.e., 0.5 times the SD, SEM, 1.96 times the SEM; MDC95 and ES)14 are needed.
In the present study, no evidence of systematic bias was observed. In patients with COPD, the CST showed smaller and narrower values of mean difference and LoA95 (mean difference of -1.1 steps with LoA95 ranging from -20.2 to 17.9 steps).15 It is likely that the high heterogeneity observed in our sample of patients within disease subgroups may have contributed to the wider range of the LoA95, especially those patients that achieved higher stages on the CST.
The CST has adequate construct validity as shown by the strong correlation between the number of steps and the 6MWD. Similar results were found for the ISWT in the same population (r=0.76, p<0.0001), while a lower correlation was observed in patients with COPD (r=0.60, p=0.001).13,15 In addition to the adequate validity, the CST presents some advantages over the ISWT and the 6MWT, such as the apparent absence of a learning effect and requiring minimal space for its application.35 As reported previously, the 6MWT requires a 30 meters length corridor, and the ISWT a 10 meters corridor, which is often difficult to find in home and clinical settings.35,36
The CST may also be useful in other contexts and populations. During the COVID-19 pandemic, the need to transfer patients to rehabilitation facilities or discharge them to home increased exponentially and so did the demand for home-based and/or remote rehabilitation.37,38 Worldwide, healthcare professionals and researchers were looking for validated, easy and practical to implement exercise tests that allowed assessing and prescribing exercise in non-clinical settings.39 Step tests emerged as validated measures, however up until now only the 3-minute step test had been used in such settings.39 To the best of the authors´ knowledge, the current study is the first to show that the CTS is safe, reliable, and valid to be conducted in a community setting. Futures studies should be conducted to assess its applicability in other populations, such as patients post-COVID-19.
Assessment of exercise capacity is of paramount importance as exercise capacity in patients with ILD has been associated with a more reserved prognosis and a poorer health-related quality of life.11,17,1 Clinicians and researchers now have at their disposal a field test that is valid and reliable to assess exercise capacity in patients with ILD even in limited space settings.
LimitationsThis study has some limitations that need to be acknowledged. First, we only assessed construct and not criterion validity (i.e., correlation with the gold standard – CPET). Second, we did not evaluate the effects of the disease severity and variability on the MDC95, since our sample size was too small to perform subgroup analysis. Third, our study only included patients with clinical stable ILD and participants with any other significant impairment or disease were excluded, thus our results cannot be generalized to all patients. Modified versions of the CST may be needed either to assess patients only with mild ILD or to safely assess patients with more severe impairments.
ConclusionsThe CST is a reliable and valid test to evaluate exercise capacity in patients with clinical stable ILD. Due to its characteristics, the CST may constitute an appropriate alternative to the 6MWT and the ISWT in limited space environments.
Authors’ contributions: AM obtained the funding, had full access to all data in the study and takes responsibility for the data and the accuracy of data analysis, including and especially any adverse effects. AM and AO conceived the idea. All authors contributed to the design and interpretation of data. AA, AO and PGF contributed to data acquisition. AA performed the analysis and drafted the paper. All authors critically revised the manuscript and approved the final version.
Financial/ nonfinancial disclosures: none declared
Role of sponsors: The sponsors had no role in the design of the study, the collection and analysis of the data, or the preparation of the manuscript.
Other contributions: The authors would like to thank Cátia Paixão, Filipa Machado, Patrícia Rebelo, Liliana Santos and Sara Souto-Miranda for their contribution in data collection.
Funding information: This work, was funded by Programa Operacional de Competitividade e Internacionalização – POCI, through Fundo Europeu de Desenvolvimento Regional - FEDER (POCI-01-0145-FEDER-007628 and POCI-01-0145-FEDER-028806), Fundação para a Ciência e Tecnologia (PTDC/SAU-SER/28806/2017) and under the project UIDB/04501/2020.