Arthritis & Rheumatism, Volume 62,
November 2010 Abstract Supplement
Abstracts of the American College of
Rheumatology/Association of Rheumatology Health Professionals
Annual Scientific Meeting
Atlanta, Georgia November 6-11, 2010.
Floor and Ceiling Effects and Choice of Physical Function Instruments.
Krishnan3, Eswar, Lingala4, Bharathi, Bruce1, Bonnie, Fries2, James F.
The spectrum of physical function ranges from high-level nursing home to long-distance runner, and improvement anywhere along this continuum represents an improvement in health. Yet, our measurement tools, from the SF-36 and the HAQ to the IRT-based new PROMIS tools, do not contain floor items sufficiently basic to assess low levels of function nor sufficiently difficult ceiling items to test function above the average. In many populations more than half may be beneath the floor or above the ceiling. As a result, instrument performance might suffer at population extremes, leading to much larger sample size requirements. We sought to estimate the size of this effect and to suggest remedies
We initially performed a simulation study that estimated the sample size requirements of an IRT-calibrated 8 item questionnaire at 3 separate settings: general population, and populations where physical function was 1 standard deviation worse and 1 standard deviation better than general population. Based on the results we performed a prospective observational study of 451 patients with RA (rheumatid arthritis). Mean 12-month score changes (D) in Physical function short form (PF10), PF-10 improved using IRT techniques, and PROMIS PF short form were measured and used to compute sample size needed for a 2.5% change with 80% power at a p value of 0.05.
Whereas 50 patients were needed for detecting a change at 80% among those with physical function worse than general population (typical of patient populations) the corresponding sample size needs in the general population setting was 140. In samples of individuals with physical function 1 standard deviation superior to general population,(positive health), the sample size need was 325. In the empirical study the median sample size needs for those with no measurable disability at baseline (HAQ-DI =0), and those at progressively higher baseline disability categories (0.01<HAQ-DI<=1.50; and HAQ-DI >=1.6) the median estimated sample sizes were 146, 228 and 284 respectively.
Our data document the profound deterioration in instrument performance when the function of the study population does not match the coverage range of the items used. We did not have sufficient items addressing the floors of physical function to enable precise estimation of change, even in the PROMIS PF 154 item bank. Traditional power calculations in such instances will over-estimate power, and power should be separated estimated in upper, middle, and higher ranges of severity as well as overall. The immediate need is extension of item bank content toward the extremes and validation of new items against populations with matching levels of impairment. This can lead to development of new short-forms appropriate to the population studied. The extended item banks become large in size, and this in turn argues for early transition to a Computerized Adaptive Testing (CAT) environment to reduce questionnaire burden while extending sensitive measurement across the range.
To cite this abstract, please use the following information:
Krishnan, Eswar, Lingala, Bharathi, Bruce, Bonnie, Fries, James F.; Floor and Ceiling Effects and Choice of Physical Function Instruments. [abstract]. Arthritis Rheum 2010;62 Suppl 10 :1545