以下为大致分析的过程,感兴趣的可以看一下。
--启动数据库,有一个mirror出错。
[gpadmin@gpmaster ~]$
gpstart -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: ‘201310150‘
20150727:22:28:29:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Master Started...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<<
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
...............................................
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Process results...
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 3
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<<
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1
20150727:22:29:24:001922 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping...
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 1
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
--执行全量恢复,5次尝试超时,提示Unable to connect to database and start transaction。
[gpadmin@gpmaster ~]$
gprecoverseg -F
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 1
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 2
20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:15:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 3
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 4
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 5
20150727:22:38:30:002478 gprecoverseg:gpmaster:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason=‘Unable to connect to database and start transaction‘) exiting...
--以下为相关节点的配置。
template1=#
select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | |
2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 |
4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 |
3 | 1 | p | p | s | u | 40000 | gpslave-2 | gpslave-2 | 41000 |
5 | 1 | m | m | s | u | 50000 | gpslave-1 | gpslave-1 | 51000 |
(5 rows)
--在这里将content=1的mirror也停掉,该mirror恢复时应该是没有问题的,将其对比content=0上的mirror,来发现相关问题。
--重新启动数据库。
[gpadmin@gpmaster ~]$
gpstart -a
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: ‘201310150‘
20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Master Started...
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<<
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-1 directory /data/mirror/gpseg1 <<<<<
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait.....
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Process results...
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 2
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 2 <<<<<<<<
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances, skipped 2 other segments
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping...
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 2
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
--现在的配置信息。
template1=#
select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | |
2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 |
4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 |
3 | 1 | p | p | c | u | 40000 | gpslave-2 | gpslave-2 | 41000 |
5 | 1 | m | m | s | d | 50000 | gpslave-1 | gpslave-1 | 51000 |
(5 rows)
--执行全量恢复,并查看相关的详细信息。
[gpadmin@gpmaster ~]$
gprecoverseg -F -v
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F -v
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Obtaining master‘s port from master data directory
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf port=5432
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf max_connections=200
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Validating configuration...
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] finished cmd: Get segment status cmdStr=‘ssh -o ‘StrictHostKeyChecking no‘ gpslave-1 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000"‘
had result: cmd had rc=1 completed=True halted=False
stdout=‘‘
stderr=‘mode: PrimarySegment
segmentState:
ChangeTrackingDisabled
--content=0的primary的状态是ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] finished cmd: Get segment status cmdStr=‘ssh -o ‘StrictHostKeyChecking no‘ gpslave-2 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000"‘
had result: cmd had rc=1 completed=True halted=False
stdout=‘‘
stderr=‘mode: PrimarySegment
segmentState:
Ready --content=1的primary的状态是Ready
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: Ready
dataState: InChangeTracking
faultType: NotInitialized
‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Encountered error Not ready to connect to database mode: PrimarySegment
segmentState: ChangeTrackingDisabled
--由此可见,之前无法执行全量恢复是由于content=0的primary有问题,与mirror无关
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 1
^C20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-WorkerPool haltWork()
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker2] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker3] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker4] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker5] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker6] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker7] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker8] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker9] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker10] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker11] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker12] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker13] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker14] haltWork
20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker15] haltWork
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker5] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker3] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker6] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker4] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker2] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker7] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker9] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker10] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker12] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker8] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker11] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker14] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker13] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker15] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] stopping
20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] stopping
User Interrupted
--查看content=0的primary此次启动的日志记录,与下面content=1的primary此次启动的日志记录对比,发现还是CHANGETRACKING有问题。而且gprecoverseg -F执行后没有产生日志记录,说明相应文件只在数据库启动时生效。
[gpadmin@gpslave-1 pg_log]$
cat gpdb-2015-07-27_224945.csv
2015-07-27 22:49:46.228113 CST,"gpadmin","postgres",p2367,th2061358880,"[local]",,2015-07-27 22:49:46 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,
2015-07-27 22:49:46.248685 CST,,,p2370,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) ‘gpslave-1(41000)‘ mirror address(port) ‘gpslave-2(51000)‘",,,,,"mirroring role ‘primary role‘ mirroring state ‘change tracking‘ segment
state ‘not initialized‘ process name(pid) ‘filerep main process(2370)‘ filerep state ‘not initialized‘ ",,0,,"cdbfilerep.c",3472,
2015-07-27 22:49:46.252539 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToInsync() found insync_transition_completed:‘false‘ full resync:‘true‘",,,,,,,0,,"cdbresynchronizechangetracking.c",2545,
2015-07-27 22:49:46.252589 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToResync() found resync_transition_completed:‘false‘ full resync:‘true‘",,,,,,,0,,"cdbresynchronizechangetracking.c",2582,
2015-07-27 22:49:46.254921 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372,
2015-07-27 22:49:46.255156 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/8C75AD8",,,,,,,0,,"xlog.c",6560,
2015-07-27 22:49:46.255194 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/8C75AD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617,
2015-07-27 22:49:46.255219 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3051; next OID: 43252",,,,,,,0,,"xlog.c",6621,
2015-07-27 22:49:46.255241 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624,
2015-07-27 22:49:46.255280 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/8C75B70",,,,,,,0,,"xlog.c",6988,
2015-07-27 22:49:46.255863 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3051",,,,,,,0,,"xlog.c",6033,
2015-07-27 22:49:46.292261 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084,
2015-07-27 22:49:46.292308 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095,
2015-07-27 22:49:46.294384 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202,
2015-07-27 22:49:46.300763 CST,,,p2373,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647,
2015-07-27 22:49:46.303294 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640,
2015-07-27 22:49:46.303322 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled
on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647,
2015-07-27 22:49:47.847981 CST,"gpadmin","template1",p2381,th2061358880,"192.168.22.110","5303",2015-07-27 22:49:47 CST,3054,con5,,seg0,,,x3054,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29,
[gpadmin@gpslave-2 pg_log]$
cat gpdb-2015-07-27_224944.csv
2015-07-27 22:49:45.707976 CST,"gpadmin","postgres",p2302,th-240429280,"[local]",,2015-07-27 22:49:45 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,
2015-07-27 22:49:45.726077 CST,,,p2305,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) ‘gpslave-2(41000)‘ mirror address(port) ‘gpslave-1(51000)‘",,,,,"mirroring role ‘primary role‘ mirroring state ‘change tracking‘ segment
state ‘not initialized‘ process name(pid) ‘filerep main process(2305)‘ filerep state ‘not initialized‘ ",,0,,"cdbfilerep.c",3472,
2015-07-27 22:49:45.735230 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last checkpoint location for generating initial changetracking log 0/CC12DD8",,,,,,,0,,"xlog.c",11560,
2015-07-27 22:49:45.735376 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"WARNING","XX000","unable to read change tracking ‘full log‘ file : Success (cdbresynchronizechangetracking.c:3382)",,,,,,,0,,"cdbresynchronizechangetracking.c",3382,
2015-07-27 22:49:45.735423 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last changetracked location in changetracking full log 0/0",,,,,,,0,,"xlog.c",11572,
2015-07-27 22:49:45.735475 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","record with zero length at 0/CC12E70",,,,,,,0,,"xlog.c",4155,
2015-07-27 22:49:45.735505 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","scanned through 1 initial xlog records since last checkpoint for writing into the resynchronize change log",,,,,,,0,,"cdbresynchronizechangetracking.c",206,
2015-07-27 22:49:45.736987 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372,
2015-07-27 22:49:45.737260 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/CC12DD8",,,,,,,0,,"xlog.c",6560,
2015-07-27 22:49:45.737298 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/CC12DD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617,
2015-07-27 22:49:45.737322 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3006; next OID: 43252",,,,,,,0,,"xlog.c",6621,
2015-07-27 22:49:45.737343 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624,
2015-07-27 22:49:45.737382 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/CC12E70",,,,,,,0,,"xlog.c",6988,
2015-07-27 22:49:45.738000 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3006",,,,,,,0,,"xlog.c",6033,
2015-07-27 22:49:45.743458 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084,
2015-07-27 22:49:45.743497 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095,
2015-07-27 22:49:45.745424 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202,
2015-07-27 22:49:45.751648 CST,,,p2308,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647,
2015-07-27 22:49:45.754195 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640,
2015-07-27 22:49:45.754222 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled
on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647,
2015-07-27 22:49:47.296961 CST,"gpadmin","template1",p2316,th-240429280,"192.168.22.110","39217",2015-07-27 22:49:47 CST,3009,con5,,seg1,,,x3009,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29,
--查看/data/primary/gpseg0/pg_changetracking下的文件,就这么三个文件,而造成上面gprecoverseg无法执行的正是其中的CT_METADATA。
[gpadmin@gpslave-1 pg_changetracking]$
ll
total 102900
-rw-------. 1 gpadmin gpadmin 128 Jul 27 22:49 CT_METADATA
--二进制文件,就是这个出问题了的,关于这个文件的信息,我也没有查到。
-rw-------. 1 gpadmin gpadmin 322636 Jul 27 22:46 FILEREP_CONFIG_LOG
--记录配置变化的文件。
-rw-------. 1 gpadmin gpadmin 105028298 Jul 27 22:46 FILEREP_LOG
--记录数据变化的文件。
--对比content=0的primary和content=1的primary,这两个文件还是不同,但具体信息我也不是很清楚。
[gpadmin@gpslave-1 pg_changetracking]$
od -tx1 -tc -Ax CT_METADATA
000000 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
\0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0
000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
000080
[gpadmin@gpslave-2 pg_changetracking]$
od -tx1 -tc -Ax CT_METADATA
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
000080
--将content=0的primary上的CT_METADATA重命名,并将content=1的primary上的CT_METADATA传到content=0的primary上。
[gpadmin@gpslave-2 pg_changetracking]$
scp CT_METADATA gpadmin@gpslave-1:/data/primary/gpseg0/pg_changetracking
CT_METADATA 100% 129 0.1KB/s 00:00
[gpadmin@gpslave-1 pg_changetracking]$
ll
total 102904
-rw-------. 1 gpadmin gpadmin 129 Jul 27 23:09 CT_METADATA
-rw-------. 1 gpadmin gpadmin 128 Jul 27 22:49 CT_METADATA_original
-rw-------. 1 gpadmin gpadmin 322636 Jul 27 22:46 FILEREP_CONFIG_LOG
-rw-------. 1 gpadmin gpadmin 105028298 Jul 27 22:46 FILEREP_LOG
--重启数据库,执行全量恢复成功。
[gpadmin@gpmaster ~]$
gpstart -a
[gpadmin@gpmaster ~]$
gprecoverseg -F
20150727:23:29:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F
20150727:23:29:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:23:29:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:23:29:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:23:29:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:23:29:49:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:23:29:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Performing persistent table check
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Greenplum instance recovery parameters
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Recovery type = Standard
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Recovery 1 of 2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Synchronization mode = Full
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance host = gpslave-2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance address = gpslave-2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance directory = /data/mirror/gpseg0
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance port = 50000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance replication port = 51000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance host = gpslave-1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance address = gpslave-1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance directory = /data/primary/gpseg0
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance port = 40000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance replication port = 41000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Target = in-place
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Recovery 2 of 2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Synchronization mode = Full
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance host = gpslave-1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance address = gpslave-1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance directory = /data/mirror/gpseg1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance port = 50000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Failed instance replication port = 51000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance host = gpslave-2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance address = gpslave-2
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance directory = /data/primary/gpseg1
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance port = 40000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Source instance replication port = 41000
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:- Recovery Target = in-place
20150727:23:30:30:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:----------------------------------------------------------
Continue with segment recovery procedure Yy|Nn (default=N):
>
y
20150727:23:30:34:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-2 segment(s) to recover
20150727:23:30:34:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
20150727:23:30:36:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20150727:23:30:41:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Cleaning files from 2 segment(s)
..
20150727:23:30:43:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Building template directory
20150727:23:30:46:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Validating remote directories
.
20150727:23:30:47:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Copying template directory file
.
20150727:23:30:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Configuring new segments
.
20150727:23:30:49:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Cleaning files
.
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting file move procedure for gpslave-2:/data/mirror/gpseg0:content=0:dbid=4:mode=s:status=d
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting file move procedure for gpslave-1:/data/mirror/gpseg1:content=1:dbid=5:mode=s:status=d
updating flat files
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating configuration with new mirrors
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating mirrors
.
20150727:23:30:51:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting mirrors
20150727:23:30:51:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
............................................................................
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Process results...
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating configuration to mark mirrors up
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating primaries
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
....
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Process results...
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Done updating primaries
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-******************************************************************
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-******************************************************************
[gpadmin@gpmaster ~]$
gpstate -e
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -e
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
..
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-Segment Mirroring Status Report
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-All segments are running normally
--总结一下解决方法:当全量恢复gprecoverseg -F无法正常运行,且报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason=‘Unable to connect to database and start transaction‘) exiting...时,
从其他正常的主事例上拷贝一份data/primary/gpseg*/pg_changetracking下的CT_METADATA到出问题的主事例上,覆盖原CT_METADATA就可以正常进行全量恢复了。
版权声明:本文为博主原创文章,未经博主允许不得转载。
全量恢复gprecoverseg -F出现Unable to connect to database时的相关分析及解决方法
标签:数据库 greenplum 全量恢复