Backup, replication and archiving… What measures to take to preserve the integrity of your data?

The integrity of regulated data (“data integrity”) is a joint concern of the authorities(1) and the health industries, well before the appearance of the first guidelines or guides on this subject; If many experts agree that most of the aspects related to “data integrity” are not new subjects, the focus put by the various regulatory authorities on this subject has the merit of calling into question certain practices which may have varied over time and technological developments.

It is particularly up to the laboratories concerned to take stock of their data management practices and in particular to guarantee their accuracy (“accuracy”), contemporaneity (“contemporaneous”), their origin (“attributable”), their readability (“legible” ), to preserve its original characteristics (“original”), their completeness (“complete”) and this throughout the regulatory retention period (“enduring” and “available”).

The systems generating electronic data are not all uniform and homogeneous in the management of this data; some of them have advanced storage methods (integrated database, etc.) and native integrity preservation functions: checksum, backup, archiving, but others are more limited. these subjects and require the addition of external components to secure this data throughout its life cycle.

The purpose of this article is to evaluate some current practices for electronic data retention on isolated systems or on networks which can allow the main regulatory requirements to be met depending on the case.


1. Regulatory context
From a practical standpoint, we will take as a definition of data that proposed by the WHO:(2) : “All original records and certified copies of the original records, including source data and metadata and all subsequent transformations and reports of these data, which are recorded at the time of the GxP activity and allow a complete reconstruction of the GxP activity. Data must be accurately recorded by permanent means at the time of the activity. Data may be contained in paper documents (such as worksheets and laboratory notebooks), electronic records and audit trails, photographs, microfilm or microfiche, audio or video files or any other medium. ”

The recently published PIC/S draft guidance (3) states that “data storage must include all original data and metadata, including audit trails, using a secure and validated process.”

If the data is backed up or copies of it are made, then the backup and copies must also have the same appropriate level of control to prevent unauthorized access, modification or deletion of data or its alteration. For example, a company that backs up data to portable hard drives should prohibit the ability to delete data from the hard drive.

Some additional considerations for data storage and backup indicate that:

  • True copies of dynamic electronic records can be made, with the guarantee that the entire content (that is all data and metadata) are included and that the significance of the original records is retained.
  • The stored data must be accessible in a legible format; companies may need to maintain appropriate equipment and software to access data stored electronically in the form of backups or copies during the retention period.
  • Routine backup copies must be stored in a remote location (physically separated) assuming a possible disaster.
  • The backup data must be legible during the entire defined regulatory retention period, even if a new version of this software has been updated or replaced by another which performs better.
  • The systems must allow the backup and restoration of all data, including metadata and audit trails.

Finally, the FDA (4) uses the term backup in § 211.68 (b) to refer to a certified copy of the original recording kept securely throughout the period for which the records are kept (for example, article 211.180). The data saved must be exact, complete and preserved from any alteration, erasure or any involuntary loss (article 211.68 b). The backup file must contain the data (including associated metadata) in their original format or in a format compatible with this original format. The use of the term “backup” by the FDA is consistent with the term “archive” used in the FDA industry and staff guide General Principles of Software Validation.
Temporary backup copies (created, for example, in the event of a computer failure or other service disruption) do not meet the requirements of Section 211.68 (b) to keep a backup file of the original data.

It appears from these different sources that the data is in practice inseparable from its recording medium; American law has also clarified in its guide published in December 2018 that “when generated to meet a regulatory requirement, all data becomes regulated records(5)

For example, pH data recorded on paper(6) has specific metadata (or context data): date and time of the measurement, identification of the sample, temperature, identification of the probe, etc .; if this data is transferred to another recording medium (electronic in a LIMS system for example), the same data will potentially lose certain metadata and will probably have others: identification of the person behind the entry, date and time of entry and “audit trail” of any changes to this data…

The integrity of electronic data in its IT definition refers to a property associated with data which, during their processing or transmission, do not undergo any intentional or accidental corruption or destruction, and retain a format which allows them to be used.

The techniques for preserving data and their integrity which involve electronic data memorized on a durable medium (hard disk, flash memory…), have been known and implemented for a long time; European regulations also address these subjects in Annex 11 mainly in the following articles:

  • 7.1. Data must be protected from possible damage by physical and electronic means. The accessibility, readability and accuracy of the data stored must be checked. Access to data must be guaranteed throughout the retention period.
  • 7.2. Regular backups of relevant data should be made. The integrity and accuracy of the backed up data, as well as the ability to restore the data, must be checked during validation and checked periodically.

Finally, article 17 deals with the problem of archiving: “Data can be archived. The accessibility, readability and integrity of this data must be checked. If significant changes to the system are to be made (for example, a change in computer equipment or software), then the ability to recover archived data must be guaranteed and tested. ”


2. Backup and archiving

To clearly distinguish the terms, backup is the operation which consists in duplicating and securing (usually in a place separate from the main place of use) the data contained in a computerized system while archiving involves moving some of this data that has already been fully processed to a long-term storage device in order to free up space in the main storage for new data.

There are several types of computer backup:

  • Full backup.
  • Incremental backup.
  • Differential backup…

Backup data can be recovered from their secure space and transferred (referred to as restoration) to their original environment.

Regular data backups are intended to preserve data integrity of in the event of a major incident on the computerized system (permanent corruption of the hard disk for example).

The diagram below gives an indication of the main performance indicators of a backup process:


The term “recovery time objective” (RTO) designates the time required for a partial (degraded) restart of the operational service while the “recovery point objective” (RPO) quantifies the capacity for recovery from backup of the resource. The set makes it possible to determine the total time of interruption of a resource after a major incident. For example, for a laboratory working from 8 a.m. to 18 p.m., if a major incident takes place at noon, and the last backup was made the previous evening, it will have lost the data equivalent to 4 hours of work (from 8 a.m. to 12 p.m. ).

Backup also provides the ability to selectively restore data deleted by mistake; this capacity is however very relative and depends mainly on the storage methods of the data of each system. If it is possible, from a regulatory point of view, to delete data, it is necessary to keep track of the deletion of this data: “It should be considered, on the basis of a risk analysis, the ‘inclusion in the computerized system of a log (known as an “audit trail”) allowing to keep track of any modification or deletion that has occurred on the data having a BPF impact. Any modification or deletion of data having a BPF impact must be justified and documented(7)“. As the deletion of electronic data is not recommended, it is desirable to have in the various systems a logical rather than physical deletion of the recordings made.

Logical deletion of a record consists in marking the record as deleted with respect to the application or operating system but in only deleting it physically (permanently) after reorganization or defragmentation of the storage medium.


3. Replication

In IT, replication is a process of sharing information to ensure data consistency between several redundant data sources, to improve reliability, fault tolerance, or availability. We talk about data replication if the same data is duplicated on several devices(8).

It is possible to replicate data on several storage disks of the same server. This is commonly known as RAID virtualization (“Redundant Array of Independent Disks”).

RAID 5 virtual storage is an N + 1 redundancy package. The parity, which is included with each writing is distributed circularly on the different discs. Each band therefore consists of N data blocks and a parity block. Thus, in the event of failure of one of the disks in the cluster, for each band there will be missing either a data block or the parity block. If it is the parity block, it does not matter, because no data is missing. If it is a data block, its content can be calculated from the N-1 other data blocks and the parity block. The data integrity of each band is preserved. So not only is the cluster still in working order, but it is also possible to rebuild the disk once exchanged from the data and parity information contained on the other disks.


While this system theoretically allows for greater availability, its implementation is expensive and data recovery guarantees are not absolute.
This replication mechanism can be performed at the scale of a database on several servers or data center (“datacenter”).
It should also be noted that the main cloud storage providers offer data replication in several data centers in their standard services (“availability zones” at AWS) and that long-term data storage costs are particularly reduced (0,0045 , USD 2018 per GB / month, AWS Paris rate as of December XNUMX).

This mechanism of replication enables a high level of availability of storage services since the RTO and the RPO are in practice equal to zero in contrast to backup systems where, as has been seen, the RTO/RPO is always positive and conditions the unavailability of the service to a greater or lesser extent.

It is nevertheless necessary to be attentive to the fact that a data item that is deleted on one component will also be deleted on the other components which is not a problem for the systems which have a logical deletion mode but which may be a problem for more basic systems.


In conclusion, data backup, archiving and replication technologies are currently widely available; they can be implemented by a qualified independent IT administrator who should select the most appropriate means depending on the recording medium and the particular technology of each system. It is moreover necessary to keep in mind that the backup or archived data are also subject to review particularly to check the restoration capacity of these data; this review can be conducted at the same time as the periodic review of the system.

Share article

Capture D’écran 2019 01 23 À 11.29.53


Since November 2004, Jean-Louis JOUVE has been the manager and principal consultant of COETIC, an expertise and consultancy company dedicated to regulated industries such as the pharmaceutical and cosmetic industry, manufacturers of medical devices, biotechnology companies, producers of active pharmaceutical ingredients. Before the creation of COETIC, Jean-Louis JOUVE was the managing director of a company specializing in informing the quality processes of regulated companies: more than 50 systems for around 30 national and international customers were implemented in this period. Jean-Louis JOUVE holds an engineering degree from the Ecole Supérieure de Chimie Industrielle de Lyon (CPE LYON) and a Diploma of Advanced Studies (DEA) in Analytical Chemistry from the University of Lyon I.


(1) Statement from FDA Commissioner Scott Gottlieb, M.D., on the agency’s efforts to improve drug quality through vigilant oversight of data integrity and good manufacturing practice, December 12, 2018
(2) WHO, Annex 5, Technical Report Series; No 996 “Guidance on Good Data and Record Management Practices,” May 2016
(3) Good Practices for Data Management and Integrity in Regulated GMP/GDP Environments, PI 041-1 (Draft 3) 30 November 2018
(4) US FDA Data Integrity and Compliance With CGMP Guidance for Industry Questions and Answers December 2018
(5) Under section 704 (a) of the Food Drug & Cosmetic Act, FDA inspections of manufacturing sites “apply to all items (including records, records, documents, processes, controls and facilities) that may have an impact on prescription drugs [and] over-the-counter drugs intended for human consumption… in terms of falsification or misidentification… or otherwise relating to a violation of this chapter ”. As a result, the FDA systematically requests and reviews records that are not necessarily intended to satisfy a CGMP requirement, but which nevertheless contain CGMP information (for example, shipping records or the like that may be used to reconstruct an activity. Ibid
(6) If this ticket is printed on a thermal paper printer (heat sensitive), the recording medium may not be legible (“legible”) during the entire regulatory retention period; a “true copy” of this registration is then necessary.
(7) EMA Annexe 11.9 Traçabilité des modifications
(8) Wikipedia