Clinical Data Interchange Standards Applied to the Study Visuals Data Warehouse

Research Data and Communications Technology (RDCT) provides clinical research data management tools for almost 100 active protocols. A number of them are interventional clinical trials but most are longer running, observational, or natural history investigations that require continuous management and oversight by scientific investigators and their study coordinators to ensure the data is accurate and consistent.

The Study Visuals Data Warehouse (SVDW) project provides a standardized database to house data from multiple research protocols that reside in study-specific databases on other Clinical Data Management Systems (CDMS), laboratory information management system (LIMS), or picture archiving and communication systems used in medical imaging technology. The Extract-Transform-Load (ETL) process from the CDMS consolidates the data and standardizes the format to populate the warehouse. The ETL process maps the data using the Clinical Data Interchange Standards Consortium (CDISC) standards and loads it into the warehouse, which is based upon the Study Data Tabulation Model (SDTM).

RDCT’s SVDW provides data managers, study coordinators, investigators, and sponsors with a central standardized electronic system for analyzing and monitoring study progress. It has user access and security mechanisms to make sure that unauthorized individuals cannot access study data. Research staff interact directly with the warehouse and can customize data queries to understand and make informed decisions for the ongoing health and safety of research participants. The use of the SDTM data model also affords effective archiving, data exchange, and interchange with other systems. The SVDW also promotes efficient development of study reports to the Food and Drug Administration (FDA), or other regulatory authorities.

Data warehouses consolidate data from multiple data sources in a consistent way, even when the operational data is formatted, stored, or maintained in many different ways. This approach affords the separation of decision support functions from data collection and data quality management systems. It also enables internationally recognized common data standards for data exchange, integration, and regulatory reporting.

Research organizations and publishers increasingly require that investigators release or share data using publicly accessible repositories, such as, that are based on diverse data models. The SDTM-based warehouse will simplify publication to these public repositories and make data sharing more efficient. SDTM warehouses facilitate building standard data visualizations and clinical study reports. The increasing use of standards also reduces training required for data managers and visualization designers because of widespread access to workshops and existing SDTM resources.

For academic research communities in low - to middle-income countries that operate with limited budgets, use of open-source systems provides a low cost solution. The warehouse solution leverages widely accepted open-source tools to provide a model that other institutions can use without additional licensing costs. The RDCT clinical data warehouse uses the following open-source tools.

  • PostgreSQL database: PostgreSQL is a powerful open-source object-relational database system. It runs on all major operating systems, including Linux and UNIX.
  • Pentaho data integration: Pentaho open-source community edition delivers powerful ETL capabilities.
  • Red Hat Enterprise Linux 6.7: the open-source version is freely available as CentOS 7 and is virtually identical in capabilities and features.

The SVDW serves as a standard starting place to layer specialized analytics and reporting tools such as data visualization applications and centralized study reporting. This includes clinical study reports such as those related to demographics summaries, enrollment trends, patient safety outcomes, and accrual and disposition as well as page counts and data quality indicators. SVDW provides a standard system for creating Data and Safety Monitoring Board (DSMB) reports, reducing the need for sophisticated statistical programming skills to produce customized reports on a study-to-study basis.

Standard vocabularies for clinical research data provide a consistent data format to streamline the review process for data managers, clinical team members, statistical analysts, and regulators. It promotes patient safety through the standardized model during trials by reducing the possibility of confusion and ensuring that trial results can be analyzed accurately. Standards are also important since they facilitate aggregation of data to improve signal detection and drug safety.

The data warehouse strategy to meet CDISC compliance has proven to be an efficient centralized solution that enables RDCT to meet the FDA submissions requirement for multiple protocols, and in addition has the benefit of structuring research data into a standardized relational database format. From this substrate, multiple useful downstream analytic business lines can be deployed to support the various requirements and goals of the clinical researchers RDCT supports.