arrow left
Back to Developer Education

    Internet Inconsistencies with R-Programming

    Internet Inconsistencies with R-Programming

    An Internet Protocol (IP) address is one of several Domain Name System (DNS) components. Frequently, IP sequences are displayed in IPv4 and IPv6 formats. Internet directories contain further information about IP addresses. Approximate geological location, Internet Service Provider (ISP), Virtual Private Network (VPN), and Autonomous System Numbers (ASN) are a few examples of data that can be found. <!--more--> If not redacted, these pieces of information can merge into one collective research platform. This tutorial can help individuals and groups who are interested in detecting internet inconsistencies.

    Table of contents

    Prerequisites

    As a prerequisite, the reader must have the following:

    • A device with unlimited functional capabilities.
    • Installed a functional Linux emulator (Kali Linux was chosen).
    • R-Programming software.
    • Internet access.
    • DNS mechanics and knowledge.
    • R-Programming library installations and documentation.
    • Some arithmetic experience.

    Goals

    One goal of this tutorial is to acknowledge internet gaps that may impact unaware individuals and groups. An additional goal is to provide probable insights to internet complexities.

    It is also important for readers to understand terms and content within scope.

    Introduction

    In this tutorial, R-Programming is used to statistically analyze data from an IPv4 address. The purpose is to gain understanding about accuracies and inaccuracies from internet activities.

    As a starting point, 45.88.197.212 will be the defined IP address throughout this tutorial.

    Let's get started.

    Linux fundamentals

    Open any Linux Shell.

    For those who prefer using Linux without ROOT.

    sudo apt update
    

    As a reminder, users with permission can be in ROOT mode by entering the following line:

    sudo -i
    

    For those who prefer using Linux with ROOT.

    apt update
    

    Open a new Kali-Linux window and enter in the following:

    kex
    

    A window should pop-up something like this:

    kex

    Screen capture

    R-Programming

    Enter in the following line to install a Linux version of the R-Programming application:

    sudo apt-get install r-base r-base-dev
    

    The following screens may appear:

    kexry

    Screen capture

    kexr

    Screen capture

    Alternatively, using an R-programming application can be equally effective.

    r

    Screen capture of RStudio

    If not installed, the libraries used in this tutorial are listed below:

    install.packages(c("Rwhois", dependencies = TRUE))  
    install.packagec(c("iptools", dependencies = TRUE))   
    install.packages(c("rIP", dependencies = TRUE))  
    install.packages(c("rattle", dependencies = TRUE))  
    

    Information about the IP registrar responsible can be found using this library below:

    library(Rwhois)
    

    Partial Output:

    indexkeyval
    1NetRange45.80.0.0 to 45.95.255.255
    2CIDR45.80.0.0/12
    3NetNameRIPE
    4NetHandleNET-45-80-0-0-1
    5ParentNET45 (NET-45-0-0-0-0)
    6NetTypeEarly Registrations, Transferred to RIPE NCC

    A server coordinates with the domain extension (example, ".us"). If a server name is included, DNS parking name servers can be displayed.

    The following code shows the name servers:

    ("asianausa.us", server = "whois.nic.us")
    

    Partial Output:

    keyval
    Name serverns1.dns-parking.com
    Name serverns2.dns-parking.com

    The code shown below can confirm if this IP is valid or not:

    library(iptools)
    iptools::is_valid("45.88.197.212")
    

    Output:

    [1] TRUE
    

    To check if the IP is using a DNS proxy or not, we will have to use the following command:

    library(rIP)
    proxycheck("45.88.197.212", api_key = proxycheck_api_key())
    

    Displaying an IP address without a proxy will appear as shown below:

    Output:

    [1] "no"
    

    An IP address can be categorized under multiple geological regions. The next step will showcase basic statistics that can be derived from an IPv4 address.

    Basic statistics

    Geological location of an IP address can resemble many statistical data models. The probability of determining the correct geological location can be tough, as various DNS factors are considered.

    For example, the IP address 45.88.197.212 overlaps with Lithuania, Germany, Cyprus, Netherlands, and Amsterdam.

    Factors can include:

    • DNS variables found previously in this tutorial.
    • Directories.

    A few helpful directories are listed in the table below:

    Directory NameInformation
    RIPERéseaux IP Européens (European IP Networks) serves Europe.
    NICServer directory for extensions.
    ARINAmerican Registry for Internet Numbers serves North America and portions of the Caribbean.
    IANAInternet Assigned Numbers Authority provides overall directory and registrar information.
    CIRACanadian Internet Registration Authority serves Canada.
    • Privacy redactions.

    The country classified with this IP address is complex. Hostinger International Limited (AS47583) is the ASN hosting website responsible for IP addresses between 45.88.197.0 to 45.88.197.255.

    With reverse IP engineering being done on 45.88.197.212, we can find five possible geological locations:

    • Lithuania (Li)
    • Cyprus (Cyp)
    • Germany (De)
    • Netherlands (Nl)
    • Amsterdam (Am)

    Rattle can generate data models. A decision tree model can provide a logical breakdown. Shown below, is a manually made IP address data frame:

    dataframe

    Typically, a decision tree selects the highest possible number as the optimal choice.

    In this scenario, the countries categorized as less optimal are analyzed. Amsterdam, Netherlands, and Cyprus were shown as the top three choices. Lithuania and Germany seemed to be less optimal.

    decisiontree

    Screen capture

    It is possible to evaluate variable importance from a random forest model. Variable importance is shown in the image below:

    variableimportance

    Screen capture

    With the highest score of the five countries, Lithuania showed the most links to the IP address. Germany also showed some correlation. This statistical analysis using Gini found Lithuania generated higher variable importance with a value of 3087.48.

    Linux reverse IP lookup

    To verify validity, here is a quick code to assess:

    sudo curl http://ipinfo.io/45.88.197.212
    

    Output:

    {
      "ip": "45.88.197.212",
      "city": "Kaunas",
      "region": "Kaunas",
      "country": "LT",
      "loc": "54.9027,23.9096",
      "org": "AS47583 Hostinger International Limited",
      "postal": "44001",
      "timezone": "Europe/Vilnius",
      "readme": "https://ipinfo.io/missingauth"
    }
    

    ipinfolinux

    Screen capture

    A curl function can list the possible domain names on an IP address. The code below uses reverse IP engineering.

    sudo curl https://host.io/asianausa.us
    

    Partial Output:

    grh-interviews.online
    recruits-agility.com
    careers-mfc.work
    careers-massiveinsights.work
    grandrivershospital.com
    mindfieldconsulting.work
    careers-mconsulting.work
    grandrivershosp.ca
    interviews-sobeys.com
    interviews-massiveinsights.digital
    morgeesmodcon.com
    

    Did you notice the domain names listed above are companies registered with ARIN and CIRA without any connection to RIPE?

    Internet inconsistencies exist as European countries usually should not have ownership of an IP address with North American company domain names.

    Codes can help identify internet data as either accurate or inaccurate. A statistical coding approach can display a web of DNS relationships. Online identities can be revealed with internet directories and IP lookups.

    Takeaways

    • Statistics can reveal internet inconsistencies.
    • Advanced data models can provide further DNS relationships.
    • Internet registrars are important to allocate IP data.

    Happy coding!

    References


    Peer Review Contributions by: Srishilesh P S

    Published on: Mar 26, 2021
    Updated on: Jul 15, 2024
    CTA

    Start your journey with Cloudzilla

    With Cloudzilla, apps freely roam across a global cloud with unbeatable simplicity and cost efficiency