arrow left
Back to Developer Education

    Prototyping Machine learning models with Streamlit

    Prototyping Machine learning models with Streamlit

    Are you a machine learning engineer or data scientist? Are the models you have built not served on an application because you do not know how to build a web or mobile application? <!--more--> Do you find the process cumbersome, find yourself saving your model weights, and that's all? Streamlit is a platform that lets you prototype your model(s) with very few lines of code and technical know-how.

    Table of Contents

    1. Introduction to Streamlit
    2. Streamlit Setup
    3. Building Sentiment Analyzer
    4. Conclusion

    Introduction to Streamlit

    Streamlit is an open-source Python framework that enables developers to prototype their ML models on web applications. You can turn your data science scripts into a website with few code lines, saving development time and energy. We’ll dive straight into the installation, setup, and deployment of a text summarizer web app.

    Streamlit Set up

    The installation is straightforward; just run this code snippet below in your command prompt or terminal.

    pip install streamlit
    streamlit hello
    

    To import it, use the following code:

    import streamlit as st
    

    And to run your streamlit app use.

    streamlit run app.py
    

    Building Sentiment Analyzer

    In this tutorial, we build a sentiment analyzer model and deploy it with Streamlit. The first step will be to get a dataset that can we download at this link. Next, let’s create a directory and create a virtual environment. To do that, open your terminal or command prompt and type in the code below.

    $ mkdir textanalyzer
    $ cd textanalyzer
    

    To create a virtual environment, we use the code below. This is to isolate the code environment and avoid errors due to libraries' inter-dependencies.

    $ python3.8 -m venv env
    $ source env/bin/activate
    

    The first line of code creates the virtual environment while the second activates it. Now copy the dataset as it is in the folder and paste it into this textanalyzer directory. Create a file and save it as sentiment_analyzer.py.

    This will be where we will be writing our code to build the Streamlit app. The next thing will be to import all the packages we'll be using for this tutorial.

    import streamlit as st
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.naive_bayes import BernoulliNB
    from sklearn.metrics import confusion_matrix
    import numpy as np
    import itertools
    import matplotlib.pyplot as plt
    

    After that, let’s add a header for our web app and a subheader, and to do that, we use the code below.

    st.title("Sentiment Analyzer Based On Text Analysis ")
    st.subheader("Paras Patidar - MLAIT")
    st.write('\n\n')
    

    The next step will be to write a function that will get all the data from our directory and bring them up.

    Your directory should look like this:

    Simple

    @st.cache
    def get_all_data():
     root = "Datasets/"
     with open(root + "imdb_labelled.txt", "r") as text_file:
         data = text_file.read().split('\n')
     with open(root + "amazon_cells_labelled.txt", "r") as text_file:
         data += text_file.read().split('\n')
     with open(root + "yelp_labelled.txt", "r") as text_file:
         data += text_file.read().split('\n')
     return data
    

    Notice the code snippet @st.cache in the first line. This tells Streamlit that whenever the function is called, some requirements should be fulfilled.

    Create a new variable called all_data. all_data calls the function get_all_data we defined earlier.

    all_data = get_all_data()
    

    Next, we create a checkbox using the st.checkbox function to display the dataset. The checkbox should have the values from the all_data variable we created earlier.

    if st.checkbox('Show Dataset'):
     st.write(all_data)
    

    Now we write another function that will carry out data preprocessing, and to achieve that, we use the code below.

    @st.cache
    def preprocessing_data(data):
     processing_data = []
     for single_data in data:
         if len(single_data.split("\t")) == 2 and single_data.split("\t")[1] != "":
             processing_data.append(single_data.split("\t"))
     return processing_data
    
    

    Now let’s create another checkbox that will display show_preprocessed_data and will get the data from the all_data function. The all_data function will then pass it through the preprocessing data function we wrote above.

    if st.checkbox('Show PreProcessed Dataset'):
     st.write(preprocessing_data(all_data))
    

    The next step will be writing a function that will split the data and call it split_data

    @st.cache
    
    def split_data(data):
     total = len(data)
     training_ratio = 0.75
     training_data= []
     evaluation_data = []
     for indice in range(0,total):
         if indice<total*training_ratio:
             training_data.append(data[indice])
         else:
             evaluation_data.append(data[indice])
     return training_data, evaluation_data
    
    

    Now we're going to create two functions. One will be preprocessing_step that will get the data, preprocess it, and then split it. The second one will be training_step that will receive two parameters and vectorize the training text.

    @st.cache
    def preprocessing_step()
     data = get_all_data()
     processing_data = preprocessing_data(data)
     return split_data(processing_data)
    def training_step(data,vectorizer):
     training_text = [data[0] for data in data]
     training_result = [data[1] for data in data]
     training_text = vectorizer.fit_transform(training_text)
     return BernoulliNB().fit(training_text,training_result)
    

    The next step will be passing the training and evaluation data into the "preprocessing_step" function, then choosing a "vectorizer" (in this case, we will be using CountVecotizer) and finally a 'classifier" variable.

    The "Classifier" variable will call the "training_step" function and pass in the "training_data" and vectorizer.

    The last line uses BernoulliNB, a Naive Bayes model; it predicts the probability of the input being classified for all the classes and is used for text classification with the 'Bag of Words' model.

    Finally, we fit our "training_text" and "training_result" into it.

    training_data,evaluation_data = preprocessing_step()
    vectorizer = CountVectorizer(binary='true')
    classifier = training_step(training_data,vectorizer)
    

    After carrying out the above step, the next step will be to write two functions.

    The first will be analyze_text, and it will carry out the analysis by taking in the classifier, vectorizer, and text value.

    The second function will be called print_result and will send a result as positive or negative.

    def analyse_text(classifier,vectorizer,text):
     return text,classifier.predict(vectorizer.transform([text]))
    def print_result(result):
     text,analysis_result = result
     print_text = "Positive" if analysis_result[0]=='1' else "Negative"
     return text,print_text
    

    Now it’s time for us to continue building the interface. We will need an input form, button, and output textbox.

    review = st.text_input("Enter The Review","Write Here...")
    if st.button('Predict Sentiment'):
     result = print_result(analyse_text(classifier,vectorizer,review))
     st.success(result[1])
    else:
     st.write("Press the above button..")
    

    The codes above use Streamlit st.text_input to create a text input form that will display “enter the review, write here,” and the if st.button will create a button that will display “predict sentiment”. The variable result then calls the print_result function and passes in the parameters.

    If the whole process is successful, the st.success(result[1) displays the result else, it passes an error message asking the user to press the button above. Run your app using the command below.

    streamit run sentiment_analyzer.py
    

    This is what you should run on your terminal.

    Simple

    And your application should look like this:

    Simple

    Conclusion

    If you have followed the process, you will see how fun and easy it is to prototype your machine learning models with Streamlit and save yourself a lot of stress. You can explore more advanced examples and take this project forward by deploying it on Heroku or any other hosting platform.


    Peer Review Contributions by: Lalithnarayan C

    Published on: Nov 13, 2020
    Updated on: Jul 15, 2024
    CTA

    Cloudzilla is FREE for React and Node.js projects

    Deploy GitHub projects across every major cloud in under 3 minutes. No credit card required.
    Get Started for Free