Aerospike XDR (Cross-Datacenter Replication) enables real-time asynchronous data replication between multiple data centers.
XDR ensures low-latency replication, even across geographically distributed sites.
โ
Only new records are shipped after enabling XDR.
โ
Existing records are NOT shipped unless rewind is enabled.
โ
Rewind Feature:
Rewind=ALL: Ship all records, regardless of timestamp.XDR is considered stable when:
[DC1] ----> [DC2]
40961024 master partitionsEach XDR source manages transaction queues to track changes.
โ
XTQ exists per DC, per namespace, per partition.
โ
Total XTQs = 1024 per partition.
โ
Each XTQ holds up to 16K elements.
Digest (20 bytes) + LUT (5 bytes) = 25 bytes
| Component | Responsibility |
|---|---|
| DC Thread | Picks digest from queue (FIFO), passes to service thread |
| Service Thread | Reads record, ships it, processes destination response |
| XDR Destination | Receives and writes replicated data |
โ Only the record data is shipped (metadata is not included).
XDR replication is resilient to destination failures, but source failures can cause issues.
โ
No impact on source XDR.
โ
Replication resumes once destination recovers.
โ If a source node goes down, replication stops for its partitions.
โ
Failover mechanism:
๐ System Metadata (SMD) Update: Every 30 seconds, the service thread writes into SMD (per namespace).
Hot Key = A record that is frequently written.
๐น Transaction Queue Overflow Handling
Recoveries can be triggered in the following scenarios:
max-ship-throughput, transaction queues may overflow, causing partitions to enter recovery.For more details, refer to this XDR KB article.
โ
If a timeout occurs (record sent but no destination acknowledgment), it is placed in the retry queue.
โ
Each partition has its own retry queue.
โ
Retry queues are per DC, per namespace, per partition.
โ
Each retry queue holds only 50 elements (fixed limit, cannot be changed).
โ
Unless in recoveries, where it is hardcoded to 1000 elements per partition per DC.
โ ๏ธ Some retries may happen but not trigger retry metricsโsee this XDR KB article for more details.
max-ship-throughput setting.โ
XDR ships only new records unless Rewind is enabled
โ
XDR queues (XTQ) store transaction records per DC/Namespace/Partition
โ
Failure handling allows replicas to take over if a source node fails
โ
LST ensures only the latest records are shipped
โ
Retry queues help manage failed transactions but are limited to 50 elements per partition