I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document.
I read that after having reached maximum context, I should restart the session: https://www.reddit.com/r/ollama/comments/1hrsoav/comment/n2bgj2r/
It is confusing statement for me because I don't know what a session is when I start the server on a local computer.
It makes wonder if I should restart the ollama server each time I run an experiment.
The experiment consist in executing a prompt on a document; I am testing the effects of different prompts, and of context size; I am re-running a test over and over, each time assigning the results to the same variable, and repeat, like so:
from ollama import chat, ChatResponse, Options
def get_completion(prompt: str, system_prompt="", prefill=""):
response = chat(
model=MODEL_NAME,
options=Options(
max_tokens=2000,
temperature=0.0,
num_ctx=2048*4,
),
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
{"role": "assistant", "content": prefill}
]
)
return response.message.content
And then save the results
results = get_completion(PROMPT, SYSTEM_PROMPT, PREFILL)
# Save on file
# Change Prompt
# Repeat
However, not sure if internally the model is keeping memory of prior prompts and their results: is the last prompt independent, or biased from prior "chats" ?
Should I restart the server each time to ensure the test of prompt is indepentent from prior tests, or not necessary ?
Grateful if you could clarify what a session is when ollama run a model on a local computer and prompt are executed in juptyter's cells (I mean, the chat is not continuous).