Data replication is the process of storing data in more than one location to improve both availability and accessibility. It involves the frequent copying of data from a database (or other data store) to one that allows all users to share the same level of information. Data replication can also reduce the workload on databases (e.g., transactional) where performance is key. Data replication software facilitates this process, offering tools to integrate, distribute, centralize, and synchronize data across these various data stores and systems. This allows users to manage growing data volumes while gaining access to real-time information.
In terms of the scale of replication, there can be full replication, in which the whole database is stored at every site. There can also be partial replication, in which some frequently used fragment of the database are replicated and others are not replicated. Data replication tools also include the ability to capture and identify changes made to a database, also known as change data capture (CDC).
Data replication software provides the end user with a graphical interface in which they can centrally manage and monitor their replicated data. As such, infrastructure teams—whether managing servers, virtual machines, databases, or other infrastructure—can use data replication software to improve the availability of data and ensure it is consistent.
Data replication software is similar to server backup software inasmuch as they both can be used for the storage of a copy of company data. However, server backup is more limited; its main use case is for preventing data loss in disaster scenarios, while data replication is more broad and is used for any case in which a company might want to have copies of data in different databases, servers, etc. Data replication software is typically used alongside data integration software, which allows businesses to pull data from several sources and formats into one place, and big data processing and distribution software, which offer a way to collect, distribute, store, and manage massive, unstructured data sets in real time.
To qualify for inclusion in the Data Replication category, a product must:
Enable real-time data integration with log-based change data capture
Replicate data/infrastructure across a wide range of databases, data warehouses, and other platforms
Capture and identify changes made to a database (CDC)
Provide an interface for users to monitor data replication