当前位置:Gxlcms >
数据库问题 >
CDH使用秘籍(一):Cloudera Manager和Managed Service的数据库
CDH使用秘籍(一):Cloudera Manager和Managed Service的数据库
时间:2021-07-01 10:21:17
从业务发展需求,大数据平台须要使用spark作为机器学习、数据挖掘、实时计算等工作,所以决定使用Cloudera Manager5.2.0版本号和CDH5。
曾经搭建过Cloudera Manager4.8.2和CDH4,在搭建Cloudera Manager5.2.0版本号的时候,发现对应的Service Host Monitor 和 Service Monitor不能配置外部表,刚开是还以为是配置出错,后来才发现应该是新版本号的Cloudera的存储改变方式了。查了非常多文档,果然发现,新版本号中
Host Monitor 和 ServicMonitore 不须要配置数据库,默认使用内置存储方式。而且不能改动。
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations,
Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the
embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
Although the embedded database is useful for getting started quickly, you can also use your own
PostgreSQL, MySQL, or Oracle database for
the Cloudera Manager Server and services that use databases.
The Cloudera
Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server,
and Cloudera Navigator Metadata Server all
require databases. The type of data contained in the databases and their estimated sizes are as follows:
- Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (<100 MB) is
the most important to back up.
- Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
- Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
- Hive Metastore - Contains Hive metadata. Relatively small.
- Sentry Server - Contains authorization metadata. Relatively small.
- Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
- Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.
The Cloudera Manager Service Host Monitor and Service Monitor roles have an internal
datastore. (注意。就是此处说明了, Host Monitor and Service Monitor在CM5版本号中,不能配置外部表,仅仅能使用内置表。与CM4版本号有差别)
Cloudera Manager 提供三种不同的安装方式,方法A是自己主动化安装。方法B和C是使用rpm或tar手动安装:
- Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you can optionally choose
to create external databases forActivity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
- Path B and Path C require you to create databases for the Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
当然,你也能够将service和database分开安装到不同的机器上。在大型部署中或者database管理员须要这种配置,比方这种场景,Oracle DBA须要独立的管理database。
Manager Server数据库
Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server搭建外部数据库
下一篇文章中,我将具体介绍Cloudera Manager中database的存储机制。如何配置,调优等。
CDH使用秘籍(一):Cloudera Manager和Managed Service的数据库
标签:ack bsp 配置步骤 repo time 1.5 this ica cm5