当前位置:Gxlcms > 数据库问题 > DataX安装使用实现MySQL到MySQL数据同步

DataX安装使用实现MySQL到MySQL数据同步

时间:2021-07-01 10:21:17 帮助过:8人阅读

DataX安装使用实现MySQL到MySQL数据同步

1.前置条件:

1.1jdk安装

  • jdk安装前往官网,这里我安装jdk-8u261

  • 解压

    1. <code>sudo mkdir -p /opt/moudle
    2. sudo tar -zxvf jdk-8u261-linux-x64.tar.gz -C /opt/moudle/
    3. </code>
  • 设置环境变量

    1. <code>export JAVA_HOME=/opt/moudle/jdk1.8.0_261
    2. export JRE_HOME=${JAVA_HOME}/jre
    3. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    4. export PATH=${JAVA_HOME}/bin:$PATH
    5. </code>
  • 刷新配置

    1. <code>source /etc/profile
    2. </code>
  • 检查java

    1. <code>java -version
    2. # 出现下面安装成功
    3. java version "1.8.0_261"
    4. Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
    5. Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
    6. </code>

1.2python安装

  • 略(官方推荐>=2.6.X)

1.3 Hadoop单机伪分布式安装

2.安装DataX

  • DataX是阿里巴巴的一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

    !技术图片

  • 下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

  • 解压

    1. <code>tar -zxvf datax.tar.gz -C /opt/software/
    2. </code>
  • 运行自检脚本

    1. <code>cd /opt/software/datax/
    2. bin/datax.py job/job.json
    3. </code>
  • 出现下面界面表示成功:

技术图片

  • /opt/software/datax/job/job.json格式。
  1. <code class="language-python">{
  2. "content":[
  3. {
  4. "reader":{
  5. "name":"streamreader",# 流式读,根据DataX定义好的设置
  6. "parameter":{
  7. "column":[#把column里所有value读到流当中
  8. {
  9. "type":"string",
  10. "value":"DataX"
  11. },
  12. {
  13. "type":"long",
  14. "value":19890604
  15. },
  16. {
  17. "type":"date",
  18. "value":"1989-06-04 00:00:00"
  19. },
  20. {
  21. "type":"bool",
  22. "value":true
  23. },
  24. {
  25. "type":"bytes",
  26. "value":"test"
  27. }
  28. ],
  29. "sliceRecordCount":100000
  30. }
  31. },
  32. "writer":{
  33. "name":"streamwriter",# 流式写,根据DataX定义好的设置
  34. "parameter":{
  35. "encoding":"UTF-8",
  36. "print":false#打印
  37. }
  38. }
  39. }
  40. ],
  41. "setting":{
  42. "errorLimit":{# errorLimit错误限制
  43. "percentage":0.02,# 最大容忍错误限制百分比2%
  44. "record":0# 容忍错误记录调试 0
  45. },
  46. "speed":{# 控制并发数:通过byte或channel控制,这里默认通过byte控制
  47. "byte":10485760#以 sliceRecordCount乘以byte,打印数据条数占用空间
  48. }
  49. }
  50. }
  51. </code>

3.基本使用

3.1从stream读取数据并打印到控制台。

  • 首先查看官方json配置模版

    1. <code class="language-python"># 查看 streamreader --> streamwriter 模版
    2. python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter
    3. # 模版如下:
    4. DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
    5. Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    6. Please refer to the streamreader document:
    7. https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
    8. Please refer to the streamwriter document:
    9. https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
    10. Please save the following configuration as a json file and use
    11. python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
    12. to run the job.
    13. {
    14. "job": {
    15. "content": [
    16. {
    17. "reader": {
    18. "name": "streamreader",
    19. "parameter": {
    20. "column": [],
    21. "sliceRecordCount": ""
    22. }
    23. },
    24. "writer": {
    25. "name": "streamwriter",
    26. "parameter": {
    27. "encoding": "",
    28. "print": true
    29. }
    30. }
    31. }
    32. ],
    33. "setting": {
    34. "speed": {
    35. "channel": ""
    36. }
    37. }
    38. }
    39. }
    40. </code>
  • 根据模版编写json文件

    1. <code class="language-python">{
    2. "job": {
    3. "content": [
    4. {
    5. "reader": {
    6. "name": "streamreader",
    7. "parameter": {
    8. "column": [
    9. {
    10. "type":"string",
    11. "value":"xujunkai, hello world!"
    12. },
    13. {
    14. "type":"string",
    15. "value":"徐俊凯, 你好!"
    16. },
    17. ],
    18. "sliceRecordCount": "10"#打印次数
    19. }
    20. },
    21. "writer": {
    22. "name": "streamwriter",
    23. "parameter": {
    24. "encoding": "utf-8", #编码方式utf-8
    25. "print": true
    26. }
    27. }
    28. }
    29. ],
    30. "setting": {
    31. "speed": {#控制并发数
    32. "channel": "2"#控制并发2次-->这里因为是打印所以会sliceRecordCount乘以channel 打印20遍。如果设置为mysql真的会进行并发
    33. }
    34. }
    35. }
    36. }
    37. </code>
  • 创建一个json文件,在根目录

    1. <code>mkdir json
    2. cd json/
    3. vim stream2stream.json
    4. # 将上述内容粘贴进去
    5. </code>
  • 运行job

    1. <code>/opt/software/datax/bin/datax.py ./stream2stream.json
    2. </code>
  • 如下图:

    技术图片

3.2从MySQL到MySQL批量插入

3.2.1预先准备工作:
  • 写入和读取方准备创建库和表

    1. <code class="language-mysql"># 创建库
    2. create database `testdatax` character set utf8
    3. # 创建表
    4. create table user1w(
    5. id int not null auto_increment,
    6. name varchar(10) not null,
    7. score int not null,
    8. primary key(`id`))engine=InnoDB default charset=utf8;
    9. </code>
  • 编写一个简单存储过程,读取数据端插入数据:

    1. <code class="language-mysql">DELIMITER //
    2. create PROCEDURE add_user(in num INT)
    3. BEGIN
    4. DECLARE rowid INT DEFAULT 0;
    5. DECLARE name CHAR(1);
    6. DECLARE score INT;
    7. WHILE rowid < num DO
    8. SET rowid = rowid + 1;
    9. set name = SUBSTRING(‘abcdefghijklmnopqrstuvwxyz‘,ROUND(1+25*RAND()),1);
    10. set score= FLOOR(40 + (RAND()*60));
    11. insert INTO user1w (name,score) VALUES (name,score);
    12. END WHILE;
    13. END //
    14. DELIMITER ;
    15. </code>
  • 执行插入数据

    1. <code>call add_user(10000);
    2. </code>
3.2.2 查看一下mysql到mysql的json配置
  • python /opt/software/datax/bin/datax.py -r mysqlreader -w mysqlwriter,json文件配置:

    1. <code class="language-mysql">{
    2. "job": {
    3. "content": [
    4. {
    5. "reader": {
    6. "name": "mysqlreader", # 读取端,根据DataX定义好的设置
    7. "parameter": {
    8. "column": [], # 读取端需要同步的列
    9. "splitPk": "",# 数据抽取时指定字段进行数据分片
    10. "connection": [
    11. {
    12. "jdbcUrl": [], #读取端连接信息
    13. "table": []# 读取端指定的表
    14. }
    15. ],
    16. "password": "", #读取端账户
    17. "username": "", #读取端密码
    18. "where": ""# 描述筛选条件
    19. }
    20. },
    21. "writer": {
    22. "name": "mysqlwriter", #写入端,根据DataX定义好的设置
    23. "parameter": {
    24. "column": [], #写入端需要同步的列
    25. "connection": [
    26. {
    27. "jdbcUrl": "", # 写入端连接信息
    28. "table": []# 写入端指定的表
    29. }
    30. ],
    31. "password": "", #写入端密码
    32. "preSql": [], # 执行写入之前做的事情
    33. "session": [],
    34. "username": "", #写入端账户
    35. "writeMode": ""# 操作乐星
    36. }
    37. }
    38. }
    39. ],
    40. "setting": {
    41. "speed": {
    42. "channel": ""#指定channel数
    43. }
    44. }
    45. }
    46. }
    47. </code>
  • 我的配置json:

    1. <code class="language-python">{
    2. "job": {
    3. "content": [
    4. {
    5. "reader": {
    6. "name": "mysqlreader",
    7. "parameter": {
    8. "username": "root",
    9. "password": "123"
    10. "column": ["*"],
    11. "splitPk": "id",
    12. "connection": [
    13. {
    14. "jdbcUrl": [
    15. "jdbc:mysql://读取端IP:3306/testdatax?useUnicode=true&characterEncoding=utf8"
    16. ],
    17. "table": ["user1w"]
    18. }
    19. ]
    20. }
    21. },
    22. "writer": {
    23. "name": "mysqlwriter",
    24. "parameter": {
    25. "column": ["*"],
    26. "connection": [
    27. {
    28. "jdbcUrl": "jdbc:mysql://写入端IP:3306/testdatax?useUnicode=true&characterEncoding=utf8",
    29. "table": ["user1w"]
    30. }
    31. ],
    32. "password": "123",
    33. "preSql": [
    34. "truncate user1w"
    35. ],
    36. "session": [
    37. "set session sql_mode=‘ANSI‘"
    38. ],
    39. "username": "root",
    40. "writeMode": "insert"
    41. }
    42. }
    43. }
    44. ],
    45. "setting": {
    46. "speed": {
    47. "channel": "5"
    48. }
    49. }
    50. }
    51. }
    52. </code>
  • cd到datax下bin目录执行:

    1. <code>python2 datax.py /root/json/mysql2mysql.json
    2. </code>
  • 会打印同步数据信息完毕。更多配置见github-dataX

3.3从数据库MySQL数据导入到HDFS中

  1. <code>python /opt/software/datax/bin/datax.py -r mysqlreader -w mysqlwriter
  2. </code>
  • 未完待续...

DataX安装使用实现MySQL到MySQL数据同步

标签:使用   shu   mic   tab   oracle   hdf   刷新   数据   default   

人气教程排行