Security

Securing Data with H2o

H2O is a library that is compatible with many coding languages including Python. This library can be used to analyze descriptive and predictive trends.  It is similar to other machine learning libraries in Python. The option to include security is a hidden advantage that many Python libraries may not include.

Read on to find out how security can be implemented throughout the data analysis cycle when using H2O with Python.

Introduction

Security in the data analytical space is important to consider internally and externally.

While surrounding a network with cybersecurity protection increases stability, data within the coding program appears to remain vulnerable. Cyberthreats and attacks are unpredictable, however, this tutorial shows one of several security precautions to be considered prior to starting, during, and after data access.

In this tutorial, a module in Python programming called H2O and bash commands are both introduced. Remember, this tutorial is written at root.

This tutorial uses H2O as a module in relation to the data analysis cycle and considers security from various stages of data analysis cycle. Security considers the following: the H2O instance, data file, and database.

Prerequisites
Installation
Instance Internal Security
Data File Security
Database Security

Prerequisites

Linux computer Operating System (OS) or Linux container with root authorization (this tutorial uses Kali Linux.
Basic Bash knowledge.
Python (preferably 3.3+, tutorial uses 3.9).
Prior knowledge on how to install H2O, how to create H2O instances, and how to create security files
Bash installations: H2O, Java, keytools, and a Linux text editor.
Python installation of H2O.
Some Java coding knowledge.
Internet access to internet browser.

Installation

This section is to emphasize on the fact that Bash is appropriate for this tutorial.

It is understandable that individuals may not have a Linux OS. One of several solutions for this compatibility issue is to complete a successful installation of a container. There are options available such as Ubuntu LTS or Kali Linux.

Both options are capable of functioning, however, certain ones are used for specific purposes. Kali Linux is known for ethical pen-testing while Ubuntu LTS is a version generally used among individuals.

For any security mechanism to become stable, the generation of authentication files is necessary. Refer to the instructions at the H2O documentation website to generate keystore and truststore authentication files.

Instance internal security

One noticeable SSL internode security change is when false becomes true as shown below.

Connection without internal security: false

Connection without internal security.

After following Standalone/AWS and Java instructions, a message with a specific URL dedicated to H2O appears and can be used for h2o.init().

Connection with internal security: true

Connection with internal security.

Data access is not universal. If a database is used, permissions can also limit who can gain specific data access when logging in and during active state.

Data file security

File permission is one of several options to access a data file.

Reading, writing, and executing are all default settings that can be implemented. Password protection is also another file protector that can be distributed based on roles. As a reminder, the code below is one of many methods to enforce permissions.

The code below specifies group permission for reading:

import os, sys, stat
os.chmod(“./filename.csv”,  stat.S_IRGRP)

This code below can move filename.csv to an offline folder and can no longer be accessible online at any time.

Moving a file to a different online folder location without general access is another option:

mv ./filename.csv ./offline/

Quantitative security measurements can depend on the dataset value associated with recovery, loss, and technology costs.

Database security

While most experts are aware that 100% of security and prevention techniques are not constantly possible, some simple preventative solutions are available before discontinuing online services.

SQL with its many variations can appear to become a well-known database language. SQL injection can become a cyberattack technique to gain access and control, of dataset tables inside databases. A simple solution is to limit user input.

Before a data file is read inside a H2O instance, it is very likely the data file originated from a database.

For example, the following if-then statement can detect frequently used SQL keywords and combinations.

If a database written in Java is connected to H2, this code below can detect and refuse basic SQL injection attacks:

import static java.lang.System.in;
import java.util.Scanner;
public class Check
{
    public static void main(String[] args)
    {
      Scanner scnr = new Scanner(in);
      String input;
      String alter;
      String drop;
      String insert;
          System.out.println(“Please enter input: “);
          input = scnr.nextLine().toLowerCase();
          if (!input.contains(“alter “) && !input.contains(“drop”) && !input.contains(“insert”) && !input.contains(“select”)) {
                System.out.println("Valid input. Thank you.");
                }
                // Continue with processing data.
            else {
                System.out.println("Invalid input. Try again.");
                // Discontinue procedure to process data.
            }
    }
}

This Java application can be used in a real SQL injection scenario. Input with any spacing combination mentioned with alter, drop, and insert can result in an error message.

For example, retrieving data with select.

Input:

select table

Output:

Invalid input. Try again.

It is beneficial to include all SQL-oriented commands to stop each attempt of negative impact. Another helpful tip is to conduct further research on libraries such as evaluating source codes.

There is a possibility of libraries with built-in codes incorporating malicious activities such as memory issues.

Conclusion

Technological cyberattacks can (and will) increase as technological complexities allow vulnerabilities.

There are a number of mechanisms to secure and protect data. When attacks are successful, it is ideal to keep alternative options available to manage risk and recover from impact.

Protecting data is one approach to keeping data safe.