How can I optimize my OpenAI-based chatbot's NLP processing time in Python?

Question

I ran the following code which is based on Langchain and Chroma and it is supposed to be functioning in a production environment with users as a client facing chat bot so I expect it or maybe I should rather say I desire that it should run in one or two seconds but I instead found that it takes about half a minute just for this portion to run and that is way too slow for my use case and if you have any advice I would really appreciate that to make it quicker thank you so much for your very valuable time and assistance!

def generate_chain():
    chain = load_qa_with_sources_chain(
        OpenAI(temperature=0, openai_api_key=openai.api_key),
        chain_type="map_reduce"
    )
    return chain

def ask_docs(relevant_documents, query):
    chain = generate_chain()
    sourced_answer_obj = chain(
        {"input_documents": [relevant_document[0] for relevant_document in relevant_documents],
         "question": query}, return_only_outputs=True)
    sourced_answer_str = sourced_answer_obj['output_text'].strip()
    return sourced_answer_str

I tried the code above I expected it to take about a second or less and it ended up taking half a minute

daniel mihai · Accepted Answer · 2023-06-01 18:14:03Z

There are a few potential optimizations that can be made to improve the performance of your code. Here are some suggestions:

Avoid regenerating the chain every time the ask_docs function is called: Currently, the generate_chain function is called each time ask_docs is invoked. Generating the chain involves loading the QA model and its associated resources, which can be a time-consuming operation. To improve performance, you can generate the chain once and reuse it for subsequent queries.

For example:

    # Define the chain outside of the ask_docs function
chain = generate_chain()

# Call ask_docs function multiple times, reusing the chain
answer1 = ask_docs(relevant_documents1, query1)
answer2 = ask_docs(relevant_documents2, query2)

Load the OpenAI API key outside the generate_chain function: In the generate_chain function, you are loading the OpenAI API key each time it is called. This can be optimized by loading the API key outside of the function and passing it as an argument. This way, you only load the API key once and reuse it when necessary.

Batch the input documents: Instead of passing each relevant document individually to the chain, you can consider batching the documents together. Batching can help reduce the number of API calls and potentially improve performance. Modify the ask_docs function to accept a list of relevant documents instead of a single document, and then pass the batched documents to the chain.

Here's an updated version of your code with these optimizations applied:

 # Load the OpenAI API key outside the function
openai_api_key = openai.api_key

def generate_chain():
    chain = load_qa_with_sources_chain(
        OpenAI(temperature=0, openai_api_key=openai_api_key),
        chain_type="map_reduce"
    )
    return chain

# Generate the chain once outside the function
chain = generate_chain()

def ask_docs(relevant_documents, query):
    sourced_answer_obj = chain(
        {"input_documents": [doc[0] for doc in relevant_documents],
         "question": query}, return_only_outputs=True)
    sourced_answer_str = sourced_answer_obj['output_text'].strip()
    return sourced_answer_str

By applying these optimizations, you should observe improved performance when running your code. Remember to adapt the changes to your specific use case and verify the results accordingly.

You are amazing thank you so much for your very valuable time and assistance — Yishai Rasowsky, Commented Jun 4, 2023 at 11:42

Collectives™ on Stack Overflow

How can I optimize my OpenAI-based chatbot's NLP processing time in Python?

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related