Building an AI-Driven Document Query Assistant: Using Non-OpenAI’s Embed Model

Yesterday, I published a blog post detailing the process of creating an Building an AI-Driven Document Query Assistant: A Step-by-Step Python Tutorial. However, upon further reflection, I realized that utilizing this assistant with a sizable dataset could prove costly due to its reliance on OpenAI’s embed model. To address this, I’ve reconfigured the code to utilize nomic-embed-text:latest embed model.

You still need OpenAI’s API Key because we’re relying on OpenAI’s LLM like gpt-3.5-turbo for chat_engine. Without the chat_engine our script cannot understand the context of questions, and it also won’t be able to answer correctly. So, make sure you have this OpenAI’s API Key in your .env file.

Don’t forget to install Ollama and download the nomic-embed-text:latest embed model. If you’re on Linux you can do

sudo systemctl status ollama.service

to see if your Ollama is running. If nomic-embed-tex:latest model exists, then the script should be able to use this for creating embedding data.

When you execute the script by doing

streamlit run app.py

you may notice a button labeled “Re-index Documents.” Only click this button if you’ve manually added new documents to your Documents folder. Otherwise, simply wait for the script to fully initialize.

Once initialization is complete, you can begin interacting with your documents. The script provided below is based on the one outlined in the “Building an AI-Driven Document Query Assistant: A Step-by-Step Python Tutorial” tutorial. To integrate it into your existing application, replace the original code in the app.py file with the script provided at the end of this blog post (app.py code).

For a seamless recreation of the application, refer to the aforementioned tutorial. Once you’ve followed all the steps and replaced the code, your application will be ready to leverage your documents’ info.

# code (app.py)
import streamlit as st
import os
from dotenv import load_dotenv
from llama_index.llms.openai import OpenAI
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, load_index_from_storage, StorageContext
from llama_index.core.settings import Settings
from llama_index.core.response.pprint_utils import pprint_response
import warnings
from llama_index.core.indices.postprocessor import SentenceTransformerRerank
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import TextNode
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json
import traceback
from llama_index.embeddings.ollama import OllamaEmbedding
# Load environment variables
load_dotenv('.env')
# Suppress specific FutureWarnings from huggingface_hub
warnings.filterwarnings("ignore", category=FutureWarning, module='huggingface_hub')
class QueryBundle:
    def __init__(self, query_str):
        self.query_str = query_str
# Define paths
storage_path = './vectorstore'
documents_path = './documents'
# Set the model configuration
Settings.llm = OpenAI(model='gpt-3.5-turbo', temperature=0.1)
ollama_embedding = OllamaEmbedding(
    model_name="nomic-embed-text:latest",
    base_url="http://localhost:11434",
    ollama_additional_kwargs={"mirostat": 0},
)
Settings.embed_model = ollama_embedding
# Ensure directories exist
if not os.path.exists(storage_path):
    os.makedirs(storage_path, exist_ok=True)
if not os.path.exists(documents_path):
    os.makedirs(documents_path, exist_ok=True)
# Initialize the reranker
reranker = SentenceTransformerRerank(model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5)
# Initialize the parser
parser = SentenceSplitter()
def document_changes_detected(documents_path, metadata_path):
    # Load existing metadata if available
    if os.path.exists(metadata_path):
        with open(metadata_path, 'r') as f:
            indexed_files = set(json.load(f))
    else:
        indexed_files = set()
    # Get the current set of documents
    current_files = {file for file in os.listdir(documents_path) if os.path.isfile(os.path.join(documents_path, file))}
    # Detect changes
    new_files = current_files - indexed_files
    removed_files = indexed_files - current_files
    # Update metadata file if changes are detected
    if new_files or removed_files:
        with open(metadata_path, 'w') as f:
            json.dump(list(current_files), f)
        return True
    return False
def pprint_response(response, show_source=False):
    if isinstance(response, str):
        print(response)  # Handle the string directly
    else:
        if response.response is None:
            print("No response.")
        else:
            print(response.response)
            if show_source:
                print("Source:", response.source)
                
class EnhancedTextNode:
    def __init__(self, text_node):
        self.node = text_node  # Wrap the original TextNode
    def get_content(self, metadata_mode):
        return self.node.text  # Implement a method that the reranker might call
# Modify the enhance_and_rerank_responses function to wrap TextNodes
def enhance_and_rerank_responses(responses, query):
    """ Combine reranking and enhancing to select the most comprehensive and relevant response. """
    if not responses:
        return "No responses available."
    
    # Reranking using the semantic reranker
    query_bundle = QueryBundle(query)
    nodes = [EnhancedTextNode(TextNode(text=res)) for res in responses]  # Wrap TextNodes for compatibility
    reranked_nodes = reranker.postprocess_nodes(nodes=nodes, query_bundle=query_bundle)
    reranked_responses = [node.node.text for node in reranked_nodes]  # Adjust access to text
    # Enhance the response quality by selecting the most comprehensive answer
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(reranked_responses)
    cosine_matrix = cosine_similarity(tfidf_matrix)
    avg_similarity = cosine_matrix.mean(axis=0)
    best_response_idx = avg_similarity.argmax()
    return reranked_responses[best_response_idx]
@st.cache_resource(show_spinner=False)
def initialize(force_reindex=False):
    metadata_path = os.path.join(storage_path, 'metadata.json')
    if force_reindex or document_changes_detected(documents_path, metadata_path):
        documents = SimpleDirectoryReader(documents_path).load_data()
        nodes = parser.get_nodes_from_documents(documents)
        index = VectorStoreIndex(nodes, embed_model=Settings.embed_model)
        index.storage_context.persist(persist_dir=storage_path)
    else:
        storage_context = StorageContext.from_defaults(persist_dir=storage_path)
        index = load_index_from_storage(storage_context)
    return index
def main():
    st.title('Ask the Document')
    # Button to force re-indexing
    force_reindex = st.button("Re-index Documents")
    if force_reindex:
        st.info("Re-indexing triggered...")
    try:
        # Initialize or reinitialize index if needed
        index = initialize(force_reindex=force_reindex)
        st.info("Index initialized or loaded successfully.")
        # Check for documents and handle uploads
        if not os.listdir(documents_path):
            st.error("No documents found. Please upload your documents.")
            uploaded_files = st.file_uploader("Upload documents", accept_multiple_files=True, type=['pdf', 'txt', 'docx'])
            if uploaded_files:
                for uploaded_file in uploaded_files:
                    with open(os.path.join(documents_path, uploaded_file.name), "wb") as f:
                        f.write(uploaded_file.getvalue())
                st.experimental_rerun()  # Rerun the script after files are uploaded
        else:
            if 'messages' not in st.session_state:
                st.session_state.messages = [{'role': 'assistant', 'content': 'Ask me a question!'}]
            # Document interaction section
            chat_engine = index.as_chat_engine(chat_mode='condense_question', verbose=True)
            if prompt := st.text_input('Your question'):
                st.session_state.messages.append({'role': 'user', 'content': prompt})
            for message in st.session_state.messages:
                with st.expander(f"{message['role'].title()} says:"):
                    st.write(message['content'])
            if st.session_state.messages[-1]['role'] != 'assistant':
                with st.spinner('Thinking...'):
                    response = chat_engine.chat(prompt)
                    response_texts = response.response if isinstance(response.response, list) else [response.response]
                    st.write(response_texts)
                    best_response = enhance_and_rerank_responses(response_texts, prompt)
                    pprint_response(best_response, show_source=True)
                    st.session_state.messages.append({'role': 'assistant', 'content': best_response})
    except Exception as e:
        st.error("An error occurred during document processing or initialization.")
        st.text(f"Error: {e}")
        st.text(traceback.format_exc())  # To show full traceback in the interface
if __name__ == "__main__":
    main()

requirements.txt:

aiohttp==3.9.5
aiosignal==1.3.1
altair==5.3.0
annotated-types==0.6.0
anyio==4.3.0
async-timeout==4.0.3
attrs==23.2.0
beautifulsoup4==4.12.3
black==24.4.2
blinker==1.8.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
dataclasses-json==0.6.5
Deprecated==1.2.14
dirtyjson==1.0.8
distro==1.9.0
exceptiongroup==1.2.1
filelock==3.14.0
frozenlist==1.4.1
fsspec==2024.3.1
gitdb==4.0.11
GitPython==3.1.43
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.23.0
idna==3.7
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
llama-index==0.10.35
llama-index-agent-openai==0.2.4
llama-index-cli==0.1.12
llama-index-core==0.10.35.post1
llama-index-embeddings-huggingface==0.2.0
llama-index-embeddings-ollama==0.1.2
llama-index-embeddings-openai==0.1.9
llama-index-indices-managed-llama-cloud==0.1.6
llama-index-legacy==0.9.48
llama-index-llms-ollama==0.1.3
llama-index-llms-openai==0.1.18
llama-index-multi-modal-llms-openai==0.1.5
llama-index-program-openai==0.1.6
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.22
llama-index-readers-llama-parse==0.1.4
llama-parse==0.4.2
llamaindex-py-client==0.1.19
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.2
mdurl==0.1.2
minijinja==2.0.1
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.3
nltk==3.8.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
openai==1.27.0
packaging==24.0
pandas==2.2.2
pathspec==0.12.1
pillow==10.3.0
platformdirs==4.2.1
protobuf==4.25.3
pyarrow==16.0.0
pydantic==2.7.1
pydantic_core==2.18.2
pydeck==0.9.0
Pygments==2.18.0
pypdf==4.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
PyYAML==6.0.1
referencing==0.35.1
regex==2024.4.28
requests==2.31.0
rich==13.7.1
rpds-py==0.18.1
safetensors==0.4.3
scikit-learn==1.4.2
scipy==1.13.0
sentence-transformers==2.7.0
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
SQLAlchemy==2.0.30
streamlit==1.34.0
striprtf==0.0.26
sympy==1.12
tenacity==8.3.0
threadpoolctl==3.5.0
tiktoken==0.6.0
tokenizers==0.19.1
toml==0.10.2
tomli==2.0.1
toolz==0.12.1
torch==2.3.0
tornado==6.4
tqdm==4.66.4
transformers==4.40.2
triton==2.3.0
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
watchdog==4.0.0
wrapt==1.16.0
yarl==1.9.4

EssayBoard