How to Build an AI Document Chatbot using Next.js and Gemini API

Have You Ever Wanted to Chat with Your Documents?
In this tutorial, we'll build an AI document chatbot that can process PDF and Word documents, understand their content, and engage in conversations about them using Google's Gemini API.
Using Next.js for our web application and Google's Gemini AI, the chatbot will not only read your documents but also provide summaries and answer questions about their content.
Whether you're a developer looking to get started with AI or trying to build tools for document analysis, this project offers a practical introduction to combining web development with artificial intelligence.
Project Overview
We'll build a full-stack application that allows users to:
- Upload PDF and Word documents
- Process and extract text from documents
- Get AI-generated summaries of uploaded documents
- Chat with an AI about the document's contents
Table of Contents
- Project Setup
- Main Chat Interface
- Type Definitions
- API Routes
- Getting Your Gemini AI API Key
- Running Your Project
- Deploy to Vercel
Prerequisites
- Node.js 18+ installed
- Basic knowledge of React and TypeScript
- Code editor (VS Code recommended)
- Google Cloud account with Gemini API access
- Basic understanding of
async/await
and API calls
Project Setup
1. Create a New Next.js Project
Open up your VS Code terminal and create the project by running this command:
npx create-next-app@latest gemini-chatbot --typescript --tailwind
This command creates a new Next.js project folder called "gemini-chatbot" and configures it with TypeScript and Tailwind CSS.
Next, enter your new project directory using this command:
cd gemini-chatbot
Project Structure
This is how the project structure should look at the end of the article.
gemini-chatbot/
├── src/
│ ├── app/
│ │ ├── api/
│ │ │ ├── chat/
│ │ │ │ └── route.ts # Chat API endpoint
│ │ │ └── process-document/
│ │ │ └── route.ts # Document processing endpoint
│ │ ├── page.tsx # Main chat interface
│ │ ├── layout.tsx # Root layout
│ │ └── globals.css # Global styles
│ └── types/
│ └── chat.ts # Type definitions
2. Install Required Dependencies
Let’s install these required dependencies for the project using this command:
npm install @google/generative-ai @langchain/community @langchain/google-genai lucide-react
@google/generative-ai
: Provides access to Google's Gemini AI language models@langchain/community
: Provides document loaders for various file formats and handles text splitting and processing@langchain/google-genai
: Integrates with Google's Generative AI and enables features like chat memory and chains
Main Chat Interface (src/app/page.tsx
)
The page.tsx
file serves as the main chat interface of our application.
Initial Imports
'use client';
import { useState, useRef, useEffect } from 'react';
import { Message } from '@/types/chat';
import { Send, Upload, Loader, Bot } from 'lucide-react';
From the code above:
'use client'
: Marks this as a client-side componentuseState
: For managing component stateuseRef
: For DOM referencesuseEffect
: For side effectsMessage
: Our custom type for chat messages- Icon imports from
lucide-react
State and Refs
export default function Home() {
// State management
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState('');
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const [documentContext, setDocumentContext] = useState<string>('');
// Refs
const messagesEndRef = useRef<HTMLDivElement>(null);
const fileInputRef = useRef<HTMLInputElement>(null);
From the code above:
messages
: Array of chat messagesinput
: Current text input valueisLoading
: Loading state indicatorerror
: Error message statedocumentContext
: Stores processed document textmessagesEndRef
: Reference for auto-scrollingfileInputRef
: Reference for the file input element
Auto-scroll Implementation
// Auto-scroll to bottom of messages
const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
};
useEffect(() => {
scrollToBottom();
}, [messages]);
From the code above:
scrollToBottom
: Function to scroll to the latest messageuseEffect
: Triggers scroll when messages update
File Upload Handler Function
// Handle file upload
const handleFileUpload = async (files: FileList | null) => {
if (!files) return;
setIsLoading(true);
setError(null);
try {
const allowedTypes = [
'application/pdf',
'application/msword',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
];
const uploadedFiles = Array.from(files)
.filter(file => allowedTypes.includes(file.type));
for (const file of uploadedFiles) {
const formData = new FormData();
formData.append('file', file);
const response = await fetch('/api/process-document', {
method: 'POST',
body: formData,
});
if (!response.ok) throw new Error('Failed to process document');
const { text, summary } = await response.json();
setDocumentContext(prev => prev + '\n' + text);
setMessages(prev => [
...prev,
{
role: 'system',
content: `Document "${file.name}" has been processed.`
},
{
role: 'assistant',
content: `Here's a summary of the document:\n\n${summary}`
}
]);
}
} catch (error) {
setError('Failed to process document. Please try again.');
} finally {
setIsLoading(false);
}
};
This code validates file types, processes multiple files sequentially, updates the document context, adds system and summary messages, and handles errors and loading states.
Chat Submission Handler Function
// Handle chat submission
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!input.trim() || isLoading) return;
const userMessage: Message = {
role: 'user',
content: input
};
setMessages(prev => [...prev, userMessage]);
setInput('');
setIsLoading(true);
try {
const response = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
messages: [...messages, userMessage],
documentContext
}),
});
if (!response.ok) throw new Error('Failed to get response');
const { content } = await response.json();
setMessages(prev => [...prev, {
role: 'assistant',
content
}]);
} catch (error) {
setError('Failed to send message. Please try again.');
} finally {
setIsLoading(false);
}
};
This code block does the following:
- Prevents empty submissions
- Adds user message immediately
- Sends context to API
- Handles API response
- Updates messages with AI response
Getting Your Gemini AI API Key
- Visit Google AI Studio, and sign in with your Google account.
- Click "Get API Key", then click on “Create API key” to create your API key.
Remember to monitor activities on your app to prevent abuse and over billing from Google Cloud and don’t expose your API keys.
Running the Project
1. Environment Variable Setup
Create a .env.local
file in your project root:
GOOGLE_API_KEY=paste_your_gemini_api_key_here
After creating the file, paste your Google Gemini API key.
2. Start the Development Server
npm run dev
3. Open http://localhost:3000 in Your Browser
Deploy to Vercel
To deploy your project to Vercel, you must have a Vercel account (you can sign up with your GitHub account).
Steps to Deploying Your Project
Prepare Your Project
Ensure your project is production-ready:
npm run build
Push to GitHub
To push your project to GitHub, run these commands sequentially:
# Initialize git repository (if not already done) git init # Add all files git add . # Commit changes git commit -m "Initial commit" # Add your GitHub repository as remote git remote add origin https://github.com/yourusername/your-repo-name.git # Push to GitHub git push -u origin main
Deploy to Vercel
Go to your Vercel Dashboard, click "New Project" or import your GitHub repository.
Set the Environment Variables
Add your Gemini API key to environment variables in the imported project settings:
GOOGLE_API_KEY=your_gemini_api_key_here
Visit Your Deployed Site
You can check out the deployed site using the provided link.
Conclusion
This project shows how to build an AI chatbot that can interact with documents using Google's Gemini AI. We built it to demonstrate things like handling files, working with AI, and building a clean UI.
The complete code is available in the repository, and you can extend it further by adding features like:
- Authentication
- Document management
- More file format support
- Improved error handling
- Conversation history persistence
- Pricing
Resources
