当前位置:Gxlcms > 数据库问题 > sql 数据处理时join字段慎重选择--避免出现double数据!

sql 数据处理时join字段慎重选择--避免出现double数据!

时间:2021-07-01 10:21:17 帮助过:2人阅读

count(YYYYMMDD) FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] --8080945 View Code

我的数据源sql:选取(8110909条记录)

技术分享
SELECT  [GUID]
      ,[YYYYMMDD]
      ,[BRAND]
      ,[BRAND_GROUP]
      ,[MODEL]
      ,[CHANNEL]
      ,[SALES_UNIT]
      ,[SALES_VALUE]
      ,[PC_TYPE]
      ,[DESIGN]
      ,[PROCESSOR_BRAND]
      ,[PROCESSOR]
      ,[PROCESSOR_NUMBE]
      ,[STORAGE_CAPAC]
      ,[RAM_MB]
      ,[DVD_WRITER]
      ,[SIZE_INCH]
      ,[GPU_BRAND]
      ,[GPU_MODEL]
      ,[ONBOARD_GRAPHIC]
      ,[V_RAM]
      ,[OS_PLANTFORM]
      ,[MICROSOFT_OS]
      ,[CATEGORIES]
      ,[PPC_PRICE_BRAND]
      ,[DPC_PRICE_BRAND]
      ,a.[CITY_NAME_CN]
      ,a.[CITY_NAME_EN]
      ,ISNULL(b.CITY_CODE,0)AS CITY_CODE
      ,[BU_GROUP_CODE]
      ,A.CITY_FULL_NAME_CN
      ,B.CITY_FULL_NAME
      ,Form_Factor
,CITY_GROUP
  FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] A
   LEFT JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_NAME_CN=B.CITY_NAME_CN
View Code 技术分享
SELECT 
      count([YYYYMMDD])
  FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] A
   LEFT JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_NAME_CN=B.CITY_NAME_CN
View Code

由此可见,问题出在我的事实表A join 维度表B 的时候出现问题,出现了一对多的问题。

首先,我要查一下维表CB_GEOGRAPHY对于字段CITY_NAME_CN(乡镇名称)是不是有重复,

SELECT a.CITY_NAME_CN,count(1)as num
   FROM [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] a
   --where a.CITY_NAME_CN=N‘九江‘
   group by a.CITY_NAME_CN having count(1)>1 

 技术分享

想一下很容易理解,地理维度表中,不同地区的乡镇名可能出现重名。所以特设了一个CITY_FULL_NAME的字段。那么join之后究竟是哪些字段出现了重复呢,

   SELECT b.CITY_FULL_NAME_CN,B.CITY_NAME_CN FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] B inner join 
   (SELECT a.CITY_NAME_CN,count(1)as num
   FROM [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] a
   --where a.CITY_NAME_CN=N‘九江‘
   group by a.CITY_NAME_CN having count(1)>1 )aa  ON aa.CITY_NAME_CN=B.CITY_NAME_CN

技术分享

发现这些记录数为:29964,而且全是出在‘九江‘这个地方,8080945-8110909=-29964,BINGO!

故:出现了一对多的情况,join之后出现了DOUBLE的数据,

技术分享
SELECT 
      A.CITY_FULL_NAME_CN
      ,B.CITY_FULL_NAME
      ,A.CITY_NAME_CN
      ,B.CITY_NAME_CN
  FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] A
   inner JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_NAME_CN=B.CITY_NAME_CN
   where b.CITY_NAME_CN=N九江 --59928

SELECT 
      A.CITY_FULL_NAME_CN
      ,B.CITY_FULL_NAME
      ,A.CITY_NAME_CN
      ,B.CITY_NAME_CN
  FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] A
   LEFT JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_FULL_NAME_CN=B.CITY_FULL_NAME
   where b.CITY_NAME_CN=N九江   ----29964
View Code

修改之后,LEFT JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_FULL_NAME_CN=B.CITY_FULL_NAME,okay!

技术分享
SELECT  [GUID]
      ,[YYYYMMDD]
      ,[BRAND]
      ,[BRAND_GROUP]
      ,[MODEL]
   ,[CHANNEL]
      ,CASE [CHANNEL]WHEN NON-ITSTHEN3CWHEN INTERNET B2CTHENNCWHEN ASSBTHENTCWHEN CSTHENTCELSETC END AS CHANNEL_TYPE_CODE
      ,[SALES_UNIT]
      ,[SALES_VALUE]
      ,[PC_TYPE]
      ,[DESIGN]
      ,[PROCESSOR_BRAND]
      ,[PROCESSOR]
      ,[PROCESSOR_NUMBE]
      ,[STORAGE_CAPAC]
      ,[RAM_MB]
      ,[DVD_WRITER]
      ,[SIZE_INCH]
      ,[GPU_BRAND]
      ,[GPU_MODEL]
      ,[ONBOARD_GRAPHIC]
      ,[V_RAM]
      ,[OS_PLANTFORM]
      ,[MICROSOFT_OS]
      ,[CATEGORIES]
      ,[PPC_PRICE_BRAND]
      ,[DPC_PRICE_BRAND]
      ,a.[CITY_NAME_CN]
      ,a.[CITY_NAME_EN]
      ,ISNULL(b.CITY_CODE,0)AS CITY_CODE
      ,[BU_GROUP_CODE]
      ,A.[CITY_FULL_NAME_CN]
      ,Form_Factor
      ,isnull(CITY_GROUP,‘‘)as CITY_GROUP
  FROM [DM_ACCN_T1].[dbo].[FS_MS_GFK] A
   LEFT JOIN  [DM_ACCN_T1].[dbo].[CB_GEOGRAPHY] B ON A.CITY_FULL_NAME_CN=B.CITY_FULL_NAME
View Code

 

sql 数据处理时join字段慎重选择--避免出现double数据!

标签:

人气教程排行