当前位置:Gxlcms > 数据库问题 > Azure 云平台用 SQOOP 将 SQL server 2012 数据表导入 HIVE / HBASE

Azure 云平台用 SQOOP 将 SQL server 2012 数据表导入 HIVE / HBASE

时间:2021-07-01 10:21:17 帮助过:3人阅读

    [ID] [int] NOT NULL,

    [FName] [nvarchar](50) NOT NULL,

    [LName] [nvarchar](50) NOT NULL,

CONSTRAINT [PK_Table_4] PRIMARY KEY CLUSTERED

(

    [ID] ASC

)

) ON [PRIMARY]

GO

  • Run the Following to Populate Table1 with 4 rows.

    INSERT INTO [dbo].[Table1] VALUES (1,‘Jhon‘,‘Doe‘), (2,‘Harry‘,‘Hoe‘), (3, ‘Carla‘,‘Coe‘), (4,‘Jackie‘,‘Joe‘);

    GO

  • Now finally run the following T-SQL to make sure that is table is populated with the sample data. You should see the output as below.

    SELECT * from [dbo].[Table1]

  •  

    技术分享

     

    Now let‘s follow the steps below to Import the rows in Table1 to the HDInsight Cluster.

    1. Login to your HDInsight cluster head node via Remote Desktop (RDP) and double click the ‘Hadoop Command Line‘ icon in the desktop to open Hadoop Command Line. RDP access is turned off by default but you can follow the steps inthis blog to enable RDP and then RDP to the head node of your HDInsight cluster.
    2. In Hadoop Command Line please navigate to the "C:\apps\dist\sqoop-1.4.3.1.3.1.0-06\bin" folder.

      Note: Please verify the path for the Sqoop bin folder in your environment. It may slightly vary from version to version.

    3. Run the following Sqoop command to import all the rows of table "Table1" from  Windows Azure SQL Database "mfarooqSQLDB" to HDInsight Cluster.

      sqoop.cmd import –-connect "jdbc:sqlserver://<SQLDatabaseServerName>.database.windows.net:1433;username=<SQLDatabasUsername>@<SQLDatabaseServerName>;password=<SQLDatabasePassword>;database=<SQLDatabaseDatabaseName>" --table Table1 --target-dir /user/hdp/SqoopImportTable1

      Once the command is executed successfully you should see something similar as below in Hadoop Command Line window.

      技术分享

    4. There are quite a number of tools available to upload/download and view data in WASB. Let‘s use Azure Storage Explorer tool. You need to install the tool in your work station and configure for your cluster. Once all is done open the tool and find out /user/hdp/SqoopImportTable1 folder. You should see something similar as below. It shows 4 files indicating 4 map jobs were used. You can select a file and click the ‘View‘ button to see the actual text data.

    技术分享

    Now let‘s export the same rows back to the SQL server from HDInsight cluster. Please use a different table with the same schema as ‘Table1‘. Otherwise you would get a Primary Key violation error since the rows already exist in ‘Table1‘.

    1. Create an empty table ‘Table2‘ with the same schema as ‘Table1‘.

      CREATE TABLE [dbo].[Table2](

          [ID] [int] NOT NULL,

          [FName] [nvarchar](50) NOT NULL,

          [LName] [nvarchar](50) NOT NULL,

      CONSTRAINT [PK_Table_2] PRIMARY KEY CLUSTERED

      (

          [ID] ASC

      )

      ) ON [PRIMARY]

      GO

    2. Run the following Sqoop command from Hadoop Command Line.

    sqoop.cmd export --connect "jdbc:sqlserver://<SQLDatabaseServerName>.database.windows.net:1433;username=<SQLDatabasUsername>@<SQLDatabaseServerName>;password=<SQLDatabasePassword>;database=<SQLDatabaseDatabaseName>" --table Table2 --export-dir /user/hdp/SqoopImportTable1 --input-fields-terminated-by ","

    More sample Sqoop commands:

    Import from a SQL server on Window Azure VM:

    sqoop.cmd import --connect "jdbc:sqlserver:// <WindowsAzureVMServerName>.cloudapp.net:1433; username=<SQLServerUserName>; password=<SQLServerPassword>; database=<SQLServerDatabaseName>" --table Table_1 --target-dir /user/hdp/SqoopImportTable

    Export to a SQL server on Window Azure VM:

    sqoop.cmd export --connect "jdbc:sqlserver://<WindowsAzureVMServerName>.cloudapp.net:1433; username=<SQLServerUserName>; password=<SQLServerPassword>; database=<SQLServerDatabaseName>" --table Table_2 --export-dir /user/hdp/SqoopImportTable2 --input-fields-terminated-by ","

    Importing to HIVE from Windows Azure SQL Database:

    C:\apps\dist\sqoop-1.4.2\bin>sqoop.cmd import –connect "jdbc:sqlserver://<WindowsAzureVMServerName>.cloudapp.net:1433; username=<SQLServerUserName>; password=<SQLServerPassword>; database=<SQLServerDatabaseName>" --table Table1 --hive-import

    Note: This will store the files under hive/warehouse/TableName folder in HDFS (For example hive/warehouse/table1/part-m-00000 )

    Run Sqoop job remotely using HDInsight SDK PowerShell cmlets

    To use HDInsight PowerShell tools you need to install Windows Azure PowerShell tools first and then install HDInsight PowerShell tools. Then you need to prepare your workstation to use the HDInsight SDK. Please follow the detail steps in this earlier blog post to install the tools and prepare your work station to use the HDInsight SDK.

    Once you have installed and configured Windows Azure PowerShell tools and HDInsight SDK running a Sqoop job is very easy. Please follow the steps below to import all the rows of table "Table2" from Windows Azure SQL Database "mfarooqSQLDB" to HDInsight Cluster.

    1. Open the Windows azure PowerShell console on the workstation and run the following cmdlets one at a time.

      Note: You can also use Windows Powershell ISE to type the code and run all at once. Powershell ISE makes edits easy and you can open the tool from "C:\Windows\System32\WindowsPowerShell\v1.0\powershell_ise.exe".

    2. Set the variables for your Windows Azure Subscription name and the HDInsight cluster name.

      $subscriptionName = "<WindowsAzureSubscriptionName>"

      $clusterName = "<HDInsightClusterName>"

      Select-AzureSubscription $subscriptionName

      Use-AzureHDInsightCluster $clusterName -Subscription $subscriptionName

    3. Define the Sqoop job that we want to run. In this exercise we will import all the rows of table "Table2" that we created earlier in Windows Azure SQL Database.

      $sqoop = New-AzureHDInsightSqoopJobDefinition -Command "import --connect jdbc:sqlserver://<SQLDatabaseServerName>.database.windows.net:1433;username=<SQLDatabasUsername>@<SQLDatabaseServerName>; password=<SQLDatabasePassword>; database=<SQLDatabaseDatabaseName> --table Table2 --target-dir /user/hdp/SqoopImportTable8"

    4. Run the Sqoop job that we just defined.

      $sqoopJob = Start-AzureHDInsightJob -Subscription $subscriptionName -Cluster $clusterName -JobDefinition $sqoop

    5. Run the following to wait for the completion or failure of the HDInsight job and show its progress.

      Wait-AzureHDInsightJob -Subscription $subscriptionName -WaitTimeoutInSeconds 3600 -Job $sqoopJob

    6. Run the following to retrieve the log output for a job from the storage account associated with a specified cluster.

      Get-AzureHDInsightJobOutput -Cluster $clusterName -Subscription $subscriptionName -StandardError -JobId $sqoopJob.JobId

    If the Sqoop job completes successfully you should see something similar as below in your Windows Azure PowerShell command line window.

    技术分享

    人气教程排行