I want to stream vertexAI response for that I have prepared the following function which presumably yields the response in Chunks :
import vertexai
import os
import time
from vertexai.language_models import TextGenerationModel
def prompt_ai(prompt):
vertexai.init(project="XXX-YYYY", location="ZZ-PPPP")
parameters = {
"max_output_tokens": 1024,
"temperature": 0.2,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison")
responses = model.predict_streaming(
prompt,
**parameters
)
results = []
#print ("===========>>>> GETTING VERTEX RESPONSE <<<<<================")
for response in responses:
text_chunk = str(response)
yield text_chunk
And this FastAPI Endpoint which uses it :
async def search(ai_prompt: str):
return StreamingResponse(prompt_ai(ai_prompt), media_type='text/event-stream')
Both of which are deployed on Google app engine
But when I try to call it via the following Python script (on my PC) :
import requests
url = "https://myGCPdomain.appspot.com/search"
params = {
"ai_prompt": "Tell me something funny",
}
headers = {
"Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6Ietc..."
}
response = requests.post(url, params=params, headers=headers, stream=True)
for chunk in response.iter_lines():
if chunk:
print(chunk.decode("utf-8"))
It should presumably "Stream" the text response as it comes from the VertexAI, instead I am getting it in One Shot.
What am I missing here ? Appreciate your help.
Note: This isn't a duplicate. This issue is specifically with respect to Google App Engine Platform