Patient-centered assessment of rheumatoid arthritis using a smartwatch and customized mobile app in a clinical setting
Study design
This was a prospective study among participants with moderate to severe RA and 28 controls, matched for age (± 3 years), gender, and race. A sample size of 60 was chosen to facilitate one-on-one follow-ups. RA and control participants were recruited from the rheumatology clinic and general medical clinic of the Reliant Medical Group (Worcester, Massachusetts). All participants were provided with an Apple Watch Series 4 and an iPhone 7 with a pre-loaded, customized, study-specific mobile application. Personal training was provided by an on-site study coordinator on how to use the devices and application, perform the guided testing, and complete the PRO measures. Participants were required to complete PRO measurements weekly, daily, or twice daily, and guided testing twice daily for 14 days (Fig. 2; Supplementary Table S3). The data was transmitted in near real time to monitor compliance and to allow study coordinators to contact participants to encourage task completion if necessary.
Prior to the start of the study, an advisory board of patients with RA (who did not participate in the study) provided input on the study design and mobile app; This was followed by a beta test over a period of five days, involving five participants with RA, to gather feedback on the usability of the app, the clarity of the instructions and the feasibility of the assessment scheme. Feedback from the beta test included issues remembering how to charge the devices, inconsistent syncing of data with the backend server, and limitations of the iPhone leg strap. In response to this feedback, the training was updated with more specific instructions on when to charge the devices, the software was updated to improve data synchronization, and the leg strap was redesigned, along with a detailed pictorial guide on how to attach it.
Study objectives
The following research objectives were investigated:
-
1.
Construct validity: correlations between guided test performance and PRO measure severity.
-
2.
Clinical utility: differences in guided test performance and PRO measure scores between RA and control cohorts.
-
3.
Feasibility: Survey assessment completion rates and Apple Watch wear rates in RA and control cohorts.
-
4.
Repeatability and reproducibility: Changes in guided test performance and PRO measures over time in RA and control cohorts.
Ethics
All documentation, including the study protocol, any amendments, and informed consent procedures, was reviewed and approved by the Reliant Medical Group Institutional Review Board. All participants provided written informed consent before any research procedures were undertaken. The study was conducted in accordance with the principles of Good Clinical Practice of the International Committee for Harmonization and the Declaration of Helsinki.
Selection of participants
The full list of inclusion and exclusion criteria for participants is included in Supplementary Table S4. Briefly, participants with RA were recruited by physicians from the Reliant Medical Group during a clinical visit if they had a clinically verified diagnosis of moderate to severe RA, with severity assessed using Routine Assessment of Patient Index Data 3 (RAPID- 3; score ≤ 12). : moderate RA; score > 12: severe RA). Controls were outpatients of the Reliant Medical Group and were excluded if they had a previous or current diagnosis of a rheumatologic disease, inflammatory disease, malignancy, or other relevant diseases.
Study assessments
The rating scheme and example screenshots of the custom mobile application used to collect the statistics of these ratings are provided in Supplementary Table S3 and Supplementary Figure S3, respectively.
For the guided tests, the iPhone was used to collect accelerometer and gyroscope data while participants performed predefined guided tests of physical function. The guided exercises are designed using clinical and patient feedback to test aspects of participants’ functionality that are likely to be affected by the symptoms most important to patients with RA (i.e. joint pain, stiffness, fatigue and sleep ).2. Participants were instructed to perform each guided test daily, once in the morning (immediately after waking) and once in the afternoon, to assess change in stiffness throughout the day. The wrist ROM test is described in detail in the PARADE study5. Briefly, while holding the iPhone pointed upward over the edge of a table, participants flexed and extended their wrist joint to the maximum angle (without going outside the comfort zone), repeating the movement for 10 seconds. The test was performed once with both hands. For the sit-to-stand test, participants sat on a chair with the iPhone strapped to their right thigh and their arms crossed over their chest. Then they stood up and sat down five times at their own pace. The average time taken to go from sitting to standing and from standing to sitting was extracted from accelerometer and gyroscope data. For the lie-to-stand test, participants lay on a bed with their legs extended and the iPhone strapped to their right thigh, then stood up on the floor twice at their own pace. The average time taken to transition from lying down to standing and from standing to lying down was extracted from accelerometer and gyroscope data. Other guided tests included the walking test and the 9-hole peg test (ResearchKit; Apple Inc., CA, USA)5. In the walking test, participants were asked to attach the iPhone to their right thigh and walk in a straight line for 30 seconds. In the 9-hole peg test, which measures manual dexterity, participants are asked to use two fingers of their left hand to drag a circular “peg” on the iPhone screen to a “hole” elsewhere on the screen and then two Use fingers of their right hand to remove the pin from the hole. The Apple Watch was also used to continuously collect background accelerometer data to passively measure daily and nightly activity counts, the data of which are not reported in this article.
PROs were assessed on days 1, 7, and 14 and included the following: Functional Assessment of Chronic Disease Therapy – Fatigue (FACIT Fatigue) to assess fatigue28; HAQ-DI and SF-3629 questionnaires to indicate the impact on the participant’s quality of life24,30,31; Patient-Reported Outcomes Measurement Information System (PROMIS) Pain Interference to assess how pain disrupts participants’ daily well-being, and PROMIS Sleep Disturbance to assess sleep quality32; RASIQ to quantify the severity of symptoms and their impact on the participant4. RA-specific assessments (RASIQ, PGA, stiffness) were not performed in controls.
Short questionnaires were administered every day, except for morning stiffness, which was administered on days 2–6 and 8–13. Morning stiffness and stiffness severity were assessed based on responses to questions 11, 12, and 13 of RASIQ4. The JMAP recorded the number and severity of painful joints experienced at a given time, from 55 pre-specified joints, displayed on a body map; pain was scored as no pain, mild pain, moderate pain, or severe pain11. Pain VAS assessed the severity of pain on a scale ranging from 0 mm (no pain) to 100 mm (worst pain)33. PGA generally measures how RA affects participants and/or disease activity, using a single-item question and the VAS score34. A global assessment of fatigue over the past 24 hours was measured on a 10-point scale ranging from ‘no fatigue’ (0) to ‘as bad as you can imagine’ (10).
Guided testing algorithms
Details of the algorithms for the wrist ROM test, sit-to-stand test, and lie-to-stand test are given in Supplementary Methods S1. An illustrative flowchart of the algorithm for the lie-to-stand test is shown in Supplementary Figure S4 and has been previously reported for the wrist ROM test11.
Data quality assessment
Automated and manual data quality assessments were performed throughout the study to ensure that the data analyzed came only from properly conducted tests. The quality control of the wrist test was performed manually, and both manual and algorithmic quality controls were performed for the walking test, the sit-to-stand test, and the lie-to-stand test. Tests that were clearly performed incorrectly were removed from the sample.
static analysis
Descriptive statistics were used for demographic and clinical characteristics, PRO measures, and guided testing. Wilcoxon signed-rank tests were used for matched (i.e., participants with RA vs. controls and morning vs. afternoon) and rank-sum tests for unmatched (i.e., participants with moderate vs. severe RA) comparisons. Nonparametric tests were used because a normal distribution could not be assumed due to the small sample size. Trends over time were assessed using univariate mixed effects models, with study day as fixed effect and individual differences as random effects. ICCs were calculated to measure the consistency of guided testing over time for each participant using a two-way mixed effect, single rater, consistency convention; a higher ICC indicated more regular test performance over the study period than a lower ICC. Correlations between PROs and supervised tests were assessed using Pearson correlation coefficients, and one-way ANOVA was performed using the Kruskal-Wallis test by rank, with Mann-Whitney U tests for post-hoc pairwise comparisons. There was no adjustment for multiplicity in this study, and the study was not suitable for hypothesis testing; therefore, Pvalues were used for quantification/descriptive purposes only.