How to Build an AI Document Chatbot using Next.js and Gemini API

Have You Ever Wanted to Chat with Your Documents?

In this tutorial, we'll build an AI document chatbot that can process PDF and Word documents, understand their content, and engage in conversations about them using Google's Gemini API.

Using Next.js for our web application and Google's Gemini AI, the chatbot will not only read your documents but also provide summaries and answer questions about their content.

Whether you're a developer looking to get started with AI or trying to build tools for document analysis, this project offers a practical introduction to combining web development with artificial intelligence.

Project Overview

We'll build a full-stack application that allows users to:

Upload PDF and Word documents
Process and extract text from documents
Get AI-generated summaries of uploaded documents
Chat with an AI about the document's contents

Project Setup
Main Chat Interface
Type Definitions
API Routes
Getting Your Gemini AI API Key
Running Your Project
Deploy to Vercel

Prerequisites

Node.js 18+ installed
Basic knowledge of React and TypeScript
Code editor (VS Code recommended)
Google Cloud account with Gemini API access
Basic understanding of async/await and API calls

Project Setup

1. Create a New Next.js Project

Open up your VS Code terminal and create the project by running this command:

npx create-next-app@latest gemini-chatbot --typescript --tailwind

This command creates a new Next.js project folder called "gemini-chatbot" and configures it with TypeScript and Tailwind CSS.

Next, enter your new project directory using this command:

cd gemini-chatbot

Project Structure

This is how the project structure should look at the end of the article.

gemini-chatbot/
├── src/
│   ├── app/
│   │   ├── api/
│   │   │   ├── chat/
│   │   │   │   └── route.ts         # Chat API endpoint
│   │   │   └── process-document/
│   │   │       └── route.ts         # Document processing endpoint
│   │   ├── page.tsx                 # Main chat interface
│   │   ├── layout.tsx               # Root layout
│   │   └── globals.css              # Global styles
│   └── types/
│       └── chat.ts                  # Type definitions

2. Install Required Dependencies

Let’s install these required dependencies for the project using this command:

npm install @google/generative-ai @langchain/community @langchain/google-genai lucide-react

@google/generative-ai: Provides access to Google's Gemini AI language models
@langchain/community: Provides document loaders for various file formats and handles text splitting and processing
@langchain/google-genai: Integrates with Google's Generative AI and enables features like chat memory and chains

Main Chat Interface (`src/app/page.tsx`)

The page.tsx file serves as the main chat interface of our application.

Initial Imports

'use client';

import { useState, useRef, useEffect } from 'react';
import { Message } from '@/types/chat';
import { Send, Upload, Loader, Bot } from 'lucide-react';

From the code above:

'use client': Marks this as a client-side component
useState: For managing component state
useRef: For DOM references
useEffect: For side effects
Message: Our custom type for chat messages
Icon imports from lucide-react

State and Refs

export default function Home() {
  // State management
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [documentContext, setDocumentContext] = useState<string>('');

  // Refs
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const fileInputRef = useRef<HTMLInputElement>(null);

From the code above:

messages: Array of chat messages
input: Current text input value
isLoading: Loading state indicator
error: Error message state
documentContext: Stores processed document text
messagesEndRef: Reference for auto-scrolling
fileInputRef: Reference for the file input element

Auto-scroll Implementation

  // Auto-scroll to bottom of messages
  const scrollToBottom = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  };

  useEffect(() => {
    scrollToBottom();
  }, [messages]);

From the code above:

scrollToBottom: Function to scroll to the latest message
useEffect: Triggers scroll when messages update

File Upload Handler Function

  // Handle file upload
  const handleFileUpload = async (files: FileList | null) => {
    if (!files) return;

    setIsLoading(true);
    setError(null);

    try {
      const allowedTypes = [
        'application/pdf',
        'application/msword',
        'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
      ];

      const uploadedFiles = Array.from(files)
        .filter(file => allowedTypes.includes(file.type));

      for (const file of uploadedFiles) {
        const formData = new FormData();
        formData.append('file', file);

        const response = await fetch('/api/process-document', {
          method: 'POST',
          body: formData,
        });

        if (!response.ok) throw new Error('Failed to process document');

        const { text, summary } = await response.json();

        setDocumentContext(prev => prev + '\n' + text);

        setMessages(prev => [
          ...prev,
          {
            role: 'system',
            content: `Document "${file.name}" has been processed.`
          },
          {
            role: 'assistant',
            content: `Here's a summary of the document:\n\n${summary}`
          }
        ]);
      }
    } catch (error) {
      setError('Failed to process document. Please try again.');
    } finally {
      setIsLoading(false);
    }
  };

This code validates file types, processes multiple files sequentially, updates the document context, adds system and summary messages, and handles errors and loading states.

Chat Submission Handler Function

  // Handle chat submission
  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isLoading) return;

    const userMessage: Message = {
      role: 'user',
      content: input
    };

    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsLoading(true);

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          messages: [...messages, userMessage],
          documentContext
        }),
      });

      if (!response.ok) throw new Error('Failed to get response');

      const { content } = await response.json();

      setMessages(prev => [...prev, {
        role: 'assistant',
        content
      }]);
    } catch (error) {
      setError('Failed to send message. Please try again.');
    } finally {
      setIsLoading(false);
    }
  };

This code block does the following:

Prevents empty submissions
Adds user message immediately
Sends context to API
Handles API response
Updates messages with AI response

Getting Your Gemini AI API Key

Visit Google AI Studio, and sign in with your Google account.
Click "Get API Key", then click on “Create API key” to create your API key.

Remember to monitor activities on your app to prevent abuse and over billing from Google Cloud and don’t expose your API keys.

Running the Project

1. Environment Variable Setup

Create a .env.local file in your project root:

GOOGLE_API_KEY=paste_your_gemini_api_key_here

After creating the file, paste your Google Gemini API key.

2. Start the Development Server

npm run dev

3. Open http://localhost:3000 in Your Browser

Deploy to Vercel

To deploy your project to Vercel, you must have a Vercel account (you can sign up with your GitHub account).

Steps to Deploying Your Project

Prepare Your Project

Ensure your project is production-ready:
```
npm run build
```

Push to GitHub

To push your project to GitHub, run these commands sequentially:

# Initialize git repository (if not already done)
git init

# Add all files
git add .

# Commit changes
git commit -m "Initial commit"

# Add your GitHub repository as remote
git remote add origin https://github.com/yourusername/your-repo-name.git

# Push to GitHub
git push -u origin main

Deploy to Vercel

Go to your Vercel Dashboard, click "New Project" or import your GitHub repository.
Set the Environment Variables

Add your Gemini API key to environment variables in the imported project settings:
```
GOOGLE_API_KEY=your_gemini_api_key_here
```
Visit Your Deployed Site

You can check out the deployed site using the provided link.

Conclusion

This project shows how to build an AI chatbot that can interact with documents using Google's Gemini AI. We built it to demonstrate things like handling files, working with AI, and building a clean UI.

The complete code is available in the repository, and you can extend it further by adding features like:

Authentication
Document management
More file format support
Improved error handling
Conversation history persistence
Pricing

Resources

Author

Jethro Magaji

Jethro Magaji is a student at Kaduna State University With Frontend Development and UI/UX Design skills. He is passionate and enthusiastic about Blockchain technology and also uses creative thinking to solve business problems using a user-centered approach. He spends most of his time either learning a new skill or teaching others what he loves doing best.