Abstract:
In the past few years, there has been a growing need for accurate geolocation of IP
addresses, which is now a must-have feature of many Internet applications. Automated geolocation
of IP addresses has important applications, including targeted delivery of localized
content over Internet (news, weather, advertising, restriction of localized content based on
regional policies, etc.), prevention of Internet crimes (credit card and bank fraud, identity
theft, spam, phishing, etc.), detection and prevention of cyberattacks and cyberterrorism,
etc. The current geolocation algorithms can be divided into several classes according to
the data that is used for determining the geographic location: database-based (which use a
database of mappings between Internet prefixes and their corresponding geographical locations),
pure-delay based (which take as input is the round trip delay of the probing hosts
which are called landmarks), location-delay based (which use the information about both
the geographical location and the probing hosts), supplementary information based (which
in addition to delay and geographical location, use other available information, such as DNS
parsing, geographical and demographical data, etc.).
However, use of network delay time for geolocation has proved not very reliable in the
past, because of the non-linear correlation between distances and delays generated by the
network congestion, queuing delay and circuitous routes. This thesis brings important advancements
to two classes of geolocation methods. The first advancement is a family of
pure delay-based algorithms based on a general class of proximity measures. When such
measures are carefully chosen to discard the data which contains little information about
the geographical location of a target IP address, the resulting algorithms have improved
accuracy over the existing pure-delay based schemes. The second advancement, belonging
to the location-delay based class of algorithms, is the development of a statistical geolocation
scheme based on the application of kernel density estimation to delay measurements
amongst a set of landmarks. An estimate of the target IP location is then obtained by
maximizing the likelihood of the distances from the target to the landmarks, given the
measured delays. This is achieved by an algorithm which combines gradient ascent and
force-directed methods. We compare the proposed geolocation schemes with the previous
methods by developing a measurement framework based on PlanetLab infrastructure and
we compare the experimental geolocation error for the proposed algorithms compared with
that for the existing schemes. We find the proposed geolocation algorithms have superior
accuracy to the previously developed ones.