在电子商务网站中,假如我们需要分析用户的地区的分布的时候,虽然一般的网站都有记录用户所在省份和地区的字段,但是一般的网站都不会去核实用户的这些信息,因此数
据的准确性就有比较大的偏差,甚至很多用户都不填写这个信息(如果不是必填项的话),那么这个时候用户注册时候的IP地址就显得可靠多了,但是要把IP地址转换成地区并
非易事,倒不是说这个技术很难实现,是因为IP分配规则太乱了,不象邮政编码或者区号那么定的那么死,所以这个就成为了一项比较浩大的工程。IP地址库有商业的可以购买
,也有免费的可以使用。我就是使用的广大网友经常使用的QQWry.Data 这个东东,我先叫人家把这里面的数据整理成我定好的格式(这是一项比较大的工程,3个人弄大概弄了2
天),然后导入到数据仓库中我的地区维度表。
数据仓库中的地区维度的表结构如下:
GeographyKey int
Province varchar(50)
City varchar(50)
IPSections text
我的源系统中的用户的表结构如下:
CustomerID varchar(50)
Account nvarchar(50)
RegisterTime datetime
IPAddress varchar(50)
UpdateTime datetime
数据仓库中的用户的表结构如下:
CustomerKey int
GeographyKey int
RegisterDateKey int
CustomerCodeAlternateKey varchar(50)
Account nvarchar(50)
IPAddress varchar(50)
RegisterTime datetime
UpdateTime datetime
接下来还是使用我们的好帮手SSIS。
其中关键在于中间的那个脚本组件,里面包含一个类IPCompare
![]()
Code
Imports System
Imports System.Net
Public Class IPCompare
Shared Function AreEqual(ByVal IPAddr1 As String, ByVal IPAddr2 As String) As Boolean
Return IPAddressToLongBackwards(IPAddr1) = IPAddressToLongBackwards(IPAddr2)
End Function
Shared Function IsGreater(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) > IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsLess(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) < IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsGreaterOrEqual(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) >= IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsLessOrEqual(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) <= IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function LongToIPAddress(ByVal IPAddr As Long) As String
Return New System.Net.IPAddress(IPAddr).ToString()
End Function
Shared Function IPAddressToLongBackwards(ByVal IPAddr As String) As Long
Dim i As Integer
Dim IpByte As String() = IPAddr.Split(New Char() {"."c})
Dim nUpperBound As Integer = IpByte.GetUpperBound(0)
If (nUpperBound <> 3) Then
IpByte = New String(4 - 1) {}
i = 1
Do While (i <= (3 - nUpperBound))
IpByte((nUpperBound + i)) = "0"
i += 1
Loop
End If
Dim TempByte4 As Byte() = New Byte(4 - 1) {}
i = 0
Do While (i <= 3)
If IPCompare.IsNumeric(IpByte(i)) Then
TempByte4((3 - i)) = CByte((Convert.ToInt32(IpByte(i)) And &HFF))
End If
i += 1
Loop
Return BitConverter.ToUInt32(TempByte4, 0)
End Function
Private Shared Function IsNumeric(ByVal str As String) As Boolean
If (System.Text.RegularExpressions.Regex.IsMatch(str, "^-?\d+$")) Then
Return True
End If
Return False
End Function
End Class
脚本组件里面的代码
![]()
Code
1
' Microsoft SQL Server Integration Services user script component
2
' This is your new script component in Microsoft Visual Basic .NET
3
' ScriptMain is the entrypoint class for script components
4![]()
5
Imports System
6
Imports System.Xml
7
Imports System.Net
8
Imports System.Data
9
Imports System.Data.SqlClient
10
Imports System.Math
11
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
12
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
13![]()
14
Public Class ScriptMain
15
Inherits UserComponent
16![]()
17
Private dataSet As DataSet = New DataSet()
18![]()
19
Public Overrides Sub 输入0_ProcessInputRow(ByVal Row As 输入0Buffer)
20![]()
21
Dim GeographyKey As Int32 = 0
22
Dim ipcurrent As String = String.Empty
23![]()
24
Try
25
If (Row.IPAddress_IsNull()) Then
26
Row.GeographyKey = -1
27
Return
28
End If
29![]()
30
ipcurrent = Row.IPAddress
31![]()
32
If (ipcurrent.Length = 0) Then
33
Row.GeographyKey = -1
34
Return
35
End If
36![]()
37
Dim i As Integer
38
For i = 0 To dataSet.Tables("DimGeography").Rows.Count - 1
39
Dim dtRow As DataRow = dataSet.Tables("DimGeography").Rows(i)
40
Dim ipsections As String()
41
ipsections = dtRow("IPSections").ToString().Split(CChar(";"))
42![]()
43
Dim ipsection As String
44
For Each ipsection In ipsections
45
If (IpBetween(ipcurrent, ipsection)) Then
46
GeographyKey = Convert.ToInt32(dtRow("GeographyKey").ToString())
47
Exit For
48
End If
49
Next
50
If (GeographyKey <> 0) Then
51
Row.GeographyKey = GeographyKey
52
Return
53
End If
54
Next
55
GeographyKey = -1
56
Row.GeographyKey = GeographyKey
57
Catch
58
GeographyKey = -1
59
Row.GeographyKey = GeographyKey
60
End Try
61![]()
62
End Sub
63![]()
64
Public Overrides Sub PreExecute()
65
MyBase.PreExecute()
66![]()
67
Dim conn As OleDb.OleDbConnection
68
Dim adapter As OleDb.OleDbDataAdapter
69
Dim sql As String = "select GeographyKey,IPSections from DimGeography"
70![]()
71
conn = New OleDb.OleDbConnection(Connections.DataCenterDW.ConnectionString + ";pwd=work8520")
72![]()
73
adapter = New OleDb.OleDbDataAdapter(sql, conn)
74![]()
75
Try
76
conn.Open()
77
adapter.Fill(dataSet, "DimGeography")
78
Catch ex As Exception
79![]()
80
Finally
81
conn.Close()
82
End Try
83![]()
84
End Sub
85![]()
86
Private Function IpBetween(ByVal targetip As String, ByVal ipsection As String) As Boolean
87
Dim ipstart As String = ipsection.Split(CChar("-"))(0)
88
Dim ipend As String = ipsection.Split(CChar("-"))(1)
89![]()
90
If (IPCompare.IsGreaterOrEqual(targetip, ipstart) And IPCompare.IsLessOrEqual(targetip, ipend)) Then
91
Return True
92
End If
93![]()
94
Return False
95
End Function
96![]()
97
End Class
98![]()
99![]()
这样就OK了。
附件里面我提供了这个包文件和地区维度表数据的下载,希望能够给朋友们提供一点微薄的帮助。
附件下载