Electronic data capture in resource-limited settings using the lightweight clinical data acquisition and recording system

We created a metadata driven EDC software for clinical studies. We developed the software with the goal of creating a lightweight and scalable software, which can capture data from mobile devices and is easy to set up, manage and maintain without profound knowledge in software engineering or other significant resources. The complete source code is written in R. It is available on GitHub (https://github.com/hcstubbe/lcarsc).DeploymentThe software can be installed as R package from CRAN or directly from GitHub using the R software package devtools (see supplementary material for detailed instructions)22. These methods are sufficient for installing and running the software on a local machine (e.g. a laptop or desktop computer) within a few minutes and do not require advanced IT knowledge. From here, the software can be used for a specific study on the local machine. We used this deployment strategy for the retrospective YEARS study, where we recorded the clinical data set from patient records via a single desktop computer. Alternatively, the software can be obtained as Docker image from our Docker repository and launched in a Docker container. Similarly, a local deployment on several independent machines can be created using identical configuration templates. Such deployment strategy would enable asynchronous offline data acquisition on several devices without relying on the internet or any network at all (see supplementary materiel, Fig. S1). After completion of data acquisition, the datasets of each machine are merged.To deploy the software on a server, additional steps are necessary depending on the study requirements. For worldwide and Transport Layer Security (TLS) encrypted access with multiple users and secure user authentication, we use ShinyProxy: ShinyProxy is an open-source Spring boot-based web application, which deploys R/Shiny applications in docker containers23. This approach isolates each Shiny application in a user-specific docker container and creates an additional layer of security by separating the application management by ShinyProxy from the R/Shiny app in each container. For user authentication, we use Keycloak, which is an open-source software for identity and access management24. For managing web traffic and TSL certificates, we use traefik, which is a HTTP reverse proxy25. This setup requires a Linux (e.g. Ubuntu Server 22.04 LTS) server with at least 4 GB ram, 4 CPU cores and 50 GB disk space (preferably SSD). In addition, a DNS domain, a sub-domain for Keycloak and an e-mail address are necessary, all of which can easily be obtained from a research institution and/or a DNS domain- and e-mail-provider at no or very low cost. About one hour is needed to set up the system. This approach is summarized in Fig. 1. A detailed step-by-step description of this setup is given in the supplementary materials.Figure 1Examples for the deployment of LCARS-C and LCARS-M. The upper panel depicts a simple local deployment. The lower panel depicts a cloud-deployment using Docker swarm serving a multicenter, multiuser setting.With the above deployment strategy, we are hosting a server running several studies in parallel: currently, the Post-COVID-Care Study and the URGENT-GI-Database are hosted on this server. Recently, we completed the PREDICT-COVID Study, which was hosted in parallel. Since 12/2020 until today, this server deployment was stable and we did not observe any downtime, errors, or other software-related problems.Designing a studyOnce the study protocol of a new clinical trial is finalized and approved by all required instances, the software can be configured according to the study requirements.At first start, the editor mode is launched. Here, the meta data defining visits and input variables are created. After completing the process of developing and testing the meta data, is moved into deployment mode. Only in the deployment mode, clinical data is recorded permanently (Fig. 2).Figure 2Workflow. The diagram shows the workflow for creating a new electronic case report form.When first starting the meta data development, the user must define which study visits need to be recorded. For instance, the URGENT-GI-Database records a baseline visit (i.e. the hospitalization) and follow-up visits (i.e. each treatment-intervention for gastrointestinal bleeding during the hospital stay). The visits and variables are defined in the editor tab (Fig. 3). Each new visit and variable are added and edited through an input form. This form guides the process of creating new variables by allowing only correct user input and by supplying information regarding the respective input fields. If previous studies published their variable sets, these sets can be uploaded into the library. From here, required variables can be added to the respective visits. Alternatively, the complete meta data of a previous study (i.e. visit and variable definitions) can be uploaded directly into the editor creating an exact copy of the previous study or uploading previously developed definitions.Figure 3Editor user interface. interface of the widget editor. Here, widgets and visits are defined.Once the required visits and variables are defined, the interface can be built in the editor tab and thereafter tested in the preview tab. If the user chooses to add a mobile visit, the mobile preview tab will show the mobile interface. If the testing meets the desired results, the application is moved into the deployment mode. Activating the deployment mode should be done with great care. In the current version, a reversal into the editor mode can only be done by the administrator. Existing visits cannot be changed anymore to protect the database integrity.Collecting clinical dataOnly after activating the deployment mode, clinical data can be stored permanently. Study participants are included in the database (i.e. pseudonymized) using the inclusion tab. After inclusion, clinical data can be recorded for each participant using the documentation tab. The documentation tab provides an overview of the status of documentation for each participant (Fig. 4). The clinical data is entered and edited through an input form, which renders the required input fields for each study visit based on the respective metadata. The system supports all common data inputs: (text, integers, floats, checkboxes, time, date, radio buttons, drop-down choice menus and drop-down choice menus with search function for larger lists or vocabularies such as ICD-10 or ICHI). The input types can be extended easily if needed.Figure 4Production user interface. (Left screenshot) Interface of the eCRF in production mode. Note that the bar on top is now showing the study’s name and is colored in blue. Patient IDs and usernames are hidden; (right screenshot) an entry form showing different types of data input, such as time, text or radio buttons.Mobile data captureIf mobile data capture is included into the study protocol (e.g. for recording patient reported outcomes), the LCARS-M app must be deployed together with LCARS-C. LCARS-M is provided as a separate R package, to allow separate development cycles. LCARS-M must access the same databank as LCARS-C. It pulls the meta-data from the databank, which was defined by LCARS-C, to render its input interface. If LCARS-M is used by the study center to collect patient information using a mobile device (e.g. a tablet), the study personnel enters the participants pseudonymized/anonymized ID (PID) into the tablet. LCARS-M then checks, if the respective PID exists and opens the input form. If LCARS-M is configured to collect data from the same participant (e.g. from the participant’s smartphone), the participant has to login to LCARS-M using his smart phone, tablet or computer with login data provided by the study site.Exporting dataOnce a study is completed, the complete study dataset can be downloaded as a zip file. This file contains the clinical dataset, as well as all metadata and an automatically generated codebook. In addition, the administrator can download an aggregated dataset from all visits as comma separated values (CSV) file, excel (XLSX) file, or as R Data Format (RDS) file.Use in clinical studiesAs of today, we completed four clinical studies using the software: the prospective multicenter PREDICT-COVID-Study investigated the predictive power of the artificial intelligence (AI)-based SACOV-19 predictor and score. The retrospective YEARS-Study investigated the risk of pulmonary embolisms in patients hospitalized with acute COVID-19. The prospective Post-COVID-Care (PCC) Study investigated long-lasting signs and symptoms of COVID-19. The retrospective URGENT-GI-Database examined a large cohort of patients with gastrointestinal bleeding. For PREDICT-COVID and the YEARS-Study, two visits were recorded: one baseline and one follow-up visit. We included 124 patients in the PREDICT-COVID-Study. In total, 64 variables were collected for each study participant. In the YEARS-Study, we included 413 participants. Here, we collected a total of 101 variables. During the data capturing, we did not encounter any technical problems. Missing values in both studies were due to a lack of information in the clinical records or implausible clinical records, but never because of technical problems. Their results were published recently26,27. In the PCC Study, we collected up to 679 variables per participant in baseline visits and several follow-up visits encompassing medical history, current signs and symptoms, laboratory data, several questionnaires, diagnostic procedures, imaging results, specialist consultations, clinical management decisions and smart-watch data. To acquire patient reported scores and outcomes, we used the software on tablets. In total, 353 participants were included. First results of the PCC-study were published recently or are under review28,29,30. For the URGENT-GI-Study, we recorded two different study visits (one baseline and one follow-up visit). Here, we collect 173 variables per patient and included 779 participants. At the time of writing, first publications are being prepared. During all studies, no significant technical problems occurred, and no technical problems were reported by users. For all clinical studies, data was checked for consistency and completeness. Here, missing data and minor inconsistencies were due to missing or inconsistent clinical records, but never to technical issues of the software.The ethics committee of the Medical Faculty of the LMU Munich reviewed and approved the Post-COVID-Care Study, the PREDICT-COVID-Study, the URGENT-GI-Study, and the YEARS-Study.Several other studies are currently ongoing or in the planning stage.

Electronic data capture in resource-limited settings using the lightweight clinical data acquisition and recording system

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Parasitologists up in arms as NIH ends funding for key database

BigSur – leveraging gene correlations in single cell transcriptomic data

Some common ‘forever chemicals’ may pass through skin

Hot Topics

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Parasitologists up in arms as NIH ends funding for key database

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Parasitologists up in arms as NIH ends funding for key database

BigSur – leveraging gene correlations in single cell transcriptomic data

Popular Articles

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Parasitologists up in arms as NIH ends funding for key database