时间:2021-07-01 10:21:17 帮助过:44人阅读
The topic consistent backups in HBase comes up every now and then. In this article I will outline a scheme that does provide timestamp-consistent backups. Consistent backups are possible in HBase. With "consistent" I mean "consistent as of
The topic consistent backups in HBase comes up every now and then.create hbase org.apache.hadoop.hbase.mapreduce.Export hbase org.apache.hadoop.hbase.mapreduce.Export
原文地址:(timestamp-) Consistent backups in HBase, 感谢原作者分享。
, {NAME=>
Export can now we used as follows:
-D hbase.mapreduce.include.deleted.rows=true
2147483647 -2147483648
As long as the Export finishes within 2T, a consistent snapshot as of the time the Export was started is created. Otherwise some data might be missing, as it could have been compacted away before the Export had a chance to see it.
Since the backups also copied deleted rows and delete markers, a backup restored to an HBase instance can be queried using a time range (see Scan) to retrieve the state of the data at any arbitrary time.
Export is current limited to a single table, but given enough storage in your live cluster this can be extended to multiple table Exports, simply by setting the endTime of all Exports jobs to the start time of the first job.
This same trick can also be used for incremental backups. In that case the TTL has to be large enough to cover the interval between incremental backups.
If, for example, the incremental backups frequency is daily, the TTL above can be set to 2 days (TTL=>172800). Then use Export again:
-D hbase.mapreduce.include.deleted.rows=true
2147483647
The longer TTL guarantees that there will be no gaps that are not covered by the incremental backups.
An example:
Note that in this scenario is does not matter when the backup jobs finish.
The full backup contains only p1. The incremental backup contains p2 and the Delete. p3 is not included in any backup, yet.
The state at T2 (p1) and T5 (p1, p2, delete) can be directly restored. Using time range Scans or Gets the state as of T4 and T3 can also be retrieved, once both backups have been restored into the same HBase instance (you need HBASE-4536 for this to work correctly with Deletes).
Finally, if keeping enough data to cover the time between two incremental backups in the live HBase cluster is problematic for your organization, it is also possible to archive HBase's Write Ahead Logs (WAL) and then replay with the built-in WALPlayer (HBASE-5604), but that is for another post.
人气教程排行