FastAPI streamingResponse Not Streaming Text response but instead getting it in one shot [on GAE Platform]

Question

I want to stream vertexAI response for that I have prepared the following function which presumably yields the response in Chunks :

import vertexai
import os
import time
from vertexai.language_models import TextGenerationModel



def prompt_ai(prompt):

    vertexai.init(project="XXX-YYYY", location="ZZ-PPPP")
    parameters = {
        "max_output_tokens": 1024,
        "temperature": 0.2,
        "top_p": 0.8,
        "top_k": 40
    }
    model = TextGenerationModel.from_pretrained("text-bison")
    responses = model.predict_streaming(
        prompt,
        **parameters
    )
    results = []
    #print ("===========>>>> GETTING VERTEX RESPONSE <<<<<================")
    for response in responses:
        text_chunk = str(response)
        yield text_chunk

And this FastAPI Endpoint which uses it :

async def search(ai_prompt: str): 
  return StreamingResponse(prompt_ai(ai_prompt), media_type='text/event-stream')

Both of which are deployed on Google app engine

But when I try to call it via the following Python script (on my PC) :

import requests

url = "https://myGCPdomain.appspot.com/search"
params = {
    "ai_prompt": "Tell me something funny",
}

headers = {
    "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6Ietc..."
}

response = requests.post(url, params=params, headers=headers, stream=True)

for chunk in response.iter_lines():
    if chunk:
        print(chunk.decode("utf-8"))

It should presumably "Stream" the text response as it comes from the VertexAI, instead I am getting it in One Shot.

What am I missing here ? Appreciate your help.

Note: This isn't a duplicate. This issue is specifically with respect to Google App Engine Platform

Does this answer your question?
– Chris
Commented Dec 14, 2023 at 15:15 — Chris, Commented Dec 14, 2023 at 15:15

NoCommandLine · Accepted Answer · 2023-12-14 23:41:37Z

2

According to Google App Engine Documentation

App Engine does not support streaming responses where data is sent in incremental chunks to the client while a request is being processed. All data from your code is collected as described above and sent as a single HTTP response.

answered Dec 14, 2023 at 23:41

NoCommandLine

6,3882 gold badges8 silver badges24 bronze badges

You are right, the issue stems from Google App Engine not supporting True Streaming Response.
– APIS
Commented Dec 19, 2023 at 12:38

Add a comment |

Collectives™ on Stack Overflow

FastAPI streamingResponse Not Streaming Text response but instead getting it in one shot [on GAE Platform]

1 Answer 1

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Linked

Related