Building a RAG-Based Training Center Assistant with AWS Bedrock and Pinecone

Introduction
Background
Architecture Overview
Step-by-Step Implementation

A. Create Pinecone API Key
B. Create Pinecone Index
C. Store Pinecone API Key in AWS Secrets Manager
D. Create Bedrock Knowledge Base
E. Create Lambda Function for the Agent to access DynamoDB table
F. Create S3 Bucket and Upload OpenAPI Schema
G. Create Bedrock Agent
H. Create Resource based Policy for Lambda function

Conclusion

Introduction

In this article, I will guide you through building a Retrieval-Augmented Generation (RAG) application using AWS Bedrock, Pinecone, and Amazon Titan Text G1 - Premier Model. We will create a Training Center Assistant Chatbot capable of answering customer queries about training courses by combining static information (e.g., course descriptions, trainer details) and real-time data (e.g., course availability). This tutorial is divided into two parts: the first part focuses on setting up the infrastructure, and the second part will cover the frontend chatbot application and backend API.

Key Components:

Amazon Bedrock: For building the RAG agent.
Pinecone: As the vector store for semantic search.
Titan Text G1 - Premier Model: Large Language Model for generate response (alternatively, you can use models like Claude Sonnet).
Cohere embed-english-v3.0 model: Embed translates text in knowledge base into numerical vectors that LLM models can understand.
DynamoDB (DDB): For storing real-time course status.
S3: For storing static course information in CSV format.

Background

The Training Center Assistant is designed to:

Answer customer queries about training courses.
Provide static information such as course descriptions, trainer details, and schedules from a CSV file stored in S3.
Retrieve real-time course availability status from DynamoDB.
Combine both static and dynamic information to deliver accurate and up-to-date responses.

Architecture Overview

The architecture consists of the following components:

AWS Bedrock Knowledge Base: Integrates with Pinecone for vector storage and S3 for static data.
Lambda Function: Acts as the agent, querying DynamoDB for real-time status and invoking the Bedrock Knowledge Base.
Pinecone Vector Store: Stores embeddings of course descriptions for semantic search.
DynamoDB: Stores real-time course availability status.
S3 Bucket: Hosts the OpenAPI schema and static CSV data for Bedrock Knowledge Base.

Step-by-Step Implementation

A. Create Pinecone API Key

Go to the Pinecone Console.
Sign up or log in to your account.
Navigate to API Keys and create a new API key.
Securely store the API key for later use.

B. Create Pinecone Index

Create a Pinecone index with the following configuration:

Dimensions: 1024 (matches the output of Cohere Embeddings).
Metric: Cosine similarity.
Model: Text embedding ada 002
Cloud Provider: AWS.
Region: us-east-1.

C. Store Pinecone API Key in AWS Secrets Manager

Open the AWS Secrets Manager console.
Choose Store a new secret.
Select Other type of secret.
Add a key-value pair:
- Key: apiKey
- Value: <your-pinecone-api-key>
Note the ARN of the secret for use in the Bedrock Knowledge Base configuration.

D. Create Bedrock Knowledge Base

The knowledge base is for some frequently asked static information which changes less frequently. For example, class information such as course description, category, duration, trainer name, etc. A CSV file that holds this information is uploaded to S3. This CSV file acts as the data source for the bedrock knowledge base.

course_catalog.csv file

Go to AWS S3 console, create a bucket and upload the course_catalog.csv file

In the AWS Bedrock Console, navigate to Knowledge Bases and click Create - knowledge base with vector store.
Input knowledge base name and description
Choose Data Source:
- Select an S3 bucket containing your CSV file with course information.
Choose default chunking strategy
Select Embedding Model:
- Choose Amazon Titan Embed or Cohere English.
Input Vector Store Details:
- Select Pinecone.
- Provide the Pinecone endpoint URL (e.g., https://training-courses-XXXXX.svc.us-east-1-aws.pinecone.io).
- Enter the ARN of the secret storing the Pinecone API key (created in step C).
Input Field Mapping:
- Text field: text
- Metadata field: metadata
Acknowledge the authorization checkbox to allow AWS to access Pinecone on your behalf.

E. Create Lambda Function for the Agent to access DynamoDB table

1. Create DynamoDB Table and add data

In our example, certain dynamic information about the course is stored in DynamoDB. This information includes the number of enrolled students in the classes and the registration status of the class (FULL or OPEN). These details are subject to frequent changes. Consequently, the RAG agent will retrieve the most recent class information through a Lambda function.

We should first create a DynamoDB table with class_code as primary key

You can then run the following script to prepare the data for the table. You must install AWS cli and configure it properly


    #!/bin/bash

    # Insert course_status data into DynamoDB table using PartiQL statements
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '101', 'capacity': 30, 'current_students': 25, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '102', 'capacity': 25, 'current_students': 25, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '103', 'capacity': 40, 'current_students': 15, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '104', 'capacity': 35, 'current_students': 30, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '105', 'capacity': 30, 'current_students': 20, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '106', 'capacity': 35, 'current_students': 28, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '107', 'capacity': 30, 'current_students': 30, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '108', 'capacity': 40, 'current_students': 35, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '109', 'capacity': 25, 'current_students': 25, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '110', 'capacity': 45, 'current_students': 40, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '111', 'capacity': 30, 'current_students': 22, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '112', 'capacity': 35, 'current_students': 35, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '113', 'capacity': 40, 'current_students': 38, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '114', 'capacity': 30, 'current_students': 30, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '115', 'capacity': 45, 'current_students': 42, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '116', 'capacity': 35, 'current_students': 25, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '117', 'capacity': 30, 'current_students': 30, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '118', 'capacity': 40, 'current_students': 32, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '119', 'capacity': 25, 'current_students': 20, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '120', 'capacity': 45, 'current_students': 45, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '121', 'capacity': 30, 'current_students': 28, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '122', 'capacity': 35, 'current_students': 35, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '123', 'capacity': 40, 'current_students': 36, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '124', 'capacity': 30, 'current_students': 25, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '125', 'capacity': 45, 'current_students': 45, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '126', 'capacity': 35, 'current_students': 30, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '127', 'capacity': 30, 'current_students': 28, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '128', 'capacity': 40, 'current_students': 40, 'status': 'FULL'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '129', 'capacity': 25, 'current_students': 20, 'status': 'OPEN'}"
    aws dynamodb execute-statement --statement "INSERT INTO \"course_status\" VALUE {'class_code': '130', 'capacity': 45, 'current_students': 40, 'status': 'OPEN'}"

2. Lambda Function Python Code

The following Lambda function will later be used by Bedrock agent, querying DynamoDB for real-time status and invoking the Bedrock Knowledge Base.


import json
import boto3
client=boto3.client('dynamodb')

def lambda_handler(event, context):
    try:
        # print the event value to the cloudwatch logs
        print(event)
        class_code = event['parameters'][0]['value']

        # get the class code from the event
        ddb_response = client.get_item(TableName='course_status', 
                                    Key={'class_code': {'S': class_code}})
        
        # Initialize response components
        session_attributes = {}
        prompt_session_attributes = {}
        print(ddb_response)
        # Check if the item was found
        if 'Item' in ddb_response:
            course_status = ddb_response['Item']

            response_body = {
                'application/json': {
                    'body': json.dumps(ddb_response)
                }
            }
            action_response = {
                'actionGroup': event['actionGroup'],
                'apiPath': event['apiPath'],
                'httpMethod': event['httpMethod'],
                'httpStatusCode': 200,
                'responseBody': response_body
            }
        else:
            response_body = {
                'application/json': {
                    'body': f'No course found with class code: {class_code}'
                }
            }
            action_response = {
                'actionGroup': event['actionGroup'],
                'apiPath': event['apiPath'],
                'httpMethod': event['httpMethod'],
                'httpStatusCode': 404,
                'responseBody': response_body
            }

        return {
            'messageVersion': '1.0',
            'response': action_response,
            'sessionAttributes': session_attributes,
            'promptSessionAttributes': prompt_session_attributes
        }

    except KeyError as ke:
        print(f"KeyError: {str(ke)}")
        response_body = {
            'application/json': {
                'body': 'Invalid event structure or missing parameter'
            }
        }
        action_response = {
            'actionGroup': event['actionGroup'],
            'apiPath': event['apiPath'],
            'httpMethod': event['httpMethod'],
            'httpStatusCode': 400,
            'responseBody': response_body
        }
        return {
            'messageVersion': '1.0',
            'response': action_response,
            'sessionAttributes': session_attributes,
            'promptSessionAttributes': prompt_session_attributes
        }

    except Exception as e:
        print(f"Error: {str(e)}")
        response_body = {
            'application/json': {
                'body': 'An unexpected error occurred'
            }
        }
        action_response = {
            'actionGroup': event['actionGroup'],
            'apiPath': event['apiPath'],
            'httpMethod': event['httpMethod'],
            'httpStatusCode': 500,
            'responseBody': response_body
        }
        return {
            'messageVersion': '1.0',
            'response': action_response,
            'sessionAttributes': session_attributes,
            'promptSessionAttributes': prompt_session_attributes
        }

3. Lambda IAM Execution Role

Attach the following permissions to the Lambda execution role:

Bedrock: Full access.
DynamoDB: Read access to the CourseStatus table.
Secrets Manager: Access to retrieve the Pinecone API key.

F. Create S3 Bucket and Upload OpenAPI Schema

The OpenAPI schema is a standardized way to define RESTful APIs. In the context of this project, the OpenAPI schema is used to define the interface for the Bedrock Agent to interact with the Lambda function. This is a YAML file that will be used in step G when you are creating the Bedrock agent.

Create an S3 bucket to store the OpenAPI schema.
Upload the following courseAssistant.yaml file:


openapi: 3.0.0
info:
    title: Training School Course Assistant - Training Courses status and information
    version: 1.0.0
    description: API for determining the status and information of a specific course enquired based on course code 
paths:
    "/courseAsistant":
    get:
        summary: Get a specific course with information and status
        description: Get a specific course with information and status
        operationId: getCourseInfoStatus
        parameters:
        - name : class_code
        in: path
        description: The class code of the course that the customer looking for class status and information
        required: true
        schema:
            type: int
        responses:
        '200':
            description: Successful response containing the course status and information
            content:
            application/json:
                schema:
                type: object
                properties:
                    class_code:
                    type: string
                    description: The class code of the course that the customer looking for class status and information
                    capacity:
                    type: int
                    description: The capacity of the course that the customer looking for class status and information 
                    current_students:
                    type: string
                    description: The number of students currently enrolled in the course that the customer looking for class status and information
                    Status:
                    type: string
                    description: The status of the course that the customer looking for class status and information (i.e. FULL or still OPEN for registration)

G. Create Bedrock Agent

Now proceed to the most crucial part of this tutorial. We need to create a Bedrock agent and attach an action group to invoke the Lambda function.

Input Agent Details:
- Name: TrainingCenterAssistant
- Description: "A chatbot to assist customers with training course queries."
Create a new Service Role:
- Make sure the IAM role with permissions for Bedrock, Lambda, and DynamoDB.
Model Selection:
- Choose Titan Text G1 - Premier(you can also use Llama or Claude Sonnet)

Input Instructions for the Agent (This is the detail prompt of the chatbot):

   
You are a course assistant for a training school. Your primary responsibilities are to:

1. Help customers find and understand course information by:
    - Searching courses by class code, topic, or keywords from the knowledge base (course catalog)
    - Providing detailed course information including:
        * Course name and code
        * Course description and learning objectives
        * Duration 
        * Instructor details
        * Price 
    - Remember Available seats and enrollment status (FULL or OPEN for registration) is based on the lambda function courseAsistant


2. Assist with course recommendations by:
    - Suggesting relevant courses based on student interests
    - Explaining course relationships

3. Communication guidelines:
    - Always be polite and professional
    - If course information is not found, acknowledge it and offer to search for alternative courses
    - When discussing prices, include information about any available discounts or packages
    - If asked questions outside of course information, politely redirect to appropriate channels

4. Format responses:
    - Present information in a clear, organized manner
    - Break down long responses into sections for better readability
    - Highlight key information such as dates, prerequisites, and availability

5. When uncertain:
    - Acknowledge any limitations in the information available
    - Offer to provide contact information for further assistance
    - Suggest alternative ways to get the requested information

Add Action Group for the agent
- In the agent details page, click Add Action Group.
- Action Group Name: Enter course_assistant_action_group as the name.
- Action Group Type: Select Define with API Schema.
- Lambda Function: Choose the existing Lambda function you created earlier from the dropdown menu.
- Action Group Schema: Select Select an existing schema from S3.
- Input the S3 URL of the YAML file you uploaded (e.g., s3://your-bucket-name/path/to/your-schema.yaml).
Save the agent
Check if the agent is in "Prepared" status

H. create Resource-Based Policy for Lambda function

A Resource-Based Policy is an IAM policy that is attached to a specific resource (in this case, the Lambda function). It defines who or what can access the resource and under what conditions. In this project, the resource-based policy is used to allow the Bedrock Agent to invoke the Lambda function.

Go to back to Lambda function console and navigate to Configuration Tab

Under permission and Resource-based policy statements, click add permission

Choose AWS Service and other service

{
"Version": "2012-10-17",
"Statement": [
{
    "Sid": "BedrockAgentInvoke",
    "Effect": "Allow",
    "Principal": {
    "Service": "bedrock.amazonaws.com"
    },
    "Action": "lambda:InvokeFunction",
    "Resource": "arn:aws:lambda:us-east-1:YOUR_ACCOUNT:function:TrainingAssistant",
    "Condition": {
    "ArnLike": {
        "AWS:SourceArn": "arn:aws:bedrock:us-east-1:YOUR_ACCOUNT:agent/AGENT_ID"
    }
    }
}
]
}

Now, you can go to the agent and use the Bedrock console to test the agent

Test the agent with sample queries like:

"I want to learn python. Do you have any recommendation?"
"Who teach this course?"

Conclusion

In this tutorial, we built a RAG-based Training Center Assistant using AWS Bedrock, Pinecone, and DynamoDB. The application combines static course information from S3 with real-time data from DynamoDB to provide accurate and dynamic responses to customer queries.

In the next article, we will cover:

Building a frontend chatbot interface.
Integrating the backend API for seamless communication.

By leveraging AWS Bedrock and Pinecone, you can create powerful, scalable, and intelligent applications that deliver exceptional user experiences. Stay tuned for Part 2!

Source code of the Lambda function for the Bedrock Agent can be found in Github

Search This Blog

Steps to Pro

Building a RAG-Based Training Center Assistant with Amazon Bedrock and Pinecone (PART1)