43 questions
0
votes
0
answers
178
views
'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'
I'm trying to run Mistral-Small-3.1-24B-Instruct-2503 in multimodal mode (with image_url) using vLLM, but hitting the tokenizer error: AttributeError: 'MistralTokenizer' object has no attribute '...
3
votes
2
answers
214
views
Multimodal embedding requires video first, then image - why?
I am working with OmniEmbed model (https://huggingface.co/Tevatron/OmniEmbed-v0.1), which is built on Qwen2.5 7B. My goal is to get a multimodal embedding for images and videos. I have the following ...
0
votes
1
answer
2k
views
llama.cpp server and curl requests for multimodal models
I have llama-server up and running on a VPS with Ubuntu 24.04. I can send curl requests from an external IP and get answers for text embedding for instance. Now I want to use multimodal models through ...
4
votes
1
answer
387
views
Cannot interence with images on llama-cpp-python
I am new to this. I have been trying but could not make the the model answer on images.
from llama_cpp import Llama
import torch
from PIL import Image
import base64
llm = Llama(
model_path='Holo1-...
0
votes
0
answers
492
views
Struggling in creating a multimodality chatbot using CopilotKit
While trying to build a chatbot leveraging the capabilities of CopilotKit and GPT-4o model. I am also using the frontend (React based UI) which is also supported by CopilotKit. What's happening is ...
0
votes
2
answers
202
views
How to include image as part of user prompt in haystack 2.X?
I have a great pipeline of chatbot using Haystack. I am referring to the haystack docs to create a pipeline, here is the example of the pipeline using prompt builder:
from haystack import Pipeline
...
0
votes
1
answer
1k
views
langchain_ollama attach image to prompt
I'm expirementing with llama 3.2 vision 11B and I'm having a bit of a rough time attaching an image, wether it's local or online, to the chat. Here's my Python code:
import io
import base64
import ...
2
votes
0
answers
193
views
MultiModal Cross attention
I am dealing with two embeddings, text and image both are last_hidden_state of transfomer models (bert and vit), so the shapes are (batch, seq, emd_dim). I want to feed text information to image using ...
1
vote
0
answers
120
views
Can't evaluate BLIP2 on a batch of images in parallel
I'm trying to speed up the generation of captions on a large set of images using BLIP-2. The below code for one image works fine:
prompt = "this is a picture of"
inputs = processor(trainData[...
1
vote
3
answers
3k
views
How to pass online images to Gemini model?
I try to use Gemini model to generate descriptions for online images, but failed at the converting Pillow iamge format to vertext ai image format. Running below code encounters this error: ...
2
votes
1
answer
649
views
How to extract image hidden states in LLaVa's transformers (Huggingface) implementation?
I am using the transformers library (Huggingface) to extract all hidden units of LLaVa 1.5. On the huggingface documentation, it shows that it is possible to extract image hidden states from the ...
1
vote
1
answer
2k
views
GCP Gemini API - Send multimodal prompt requests using local image
On this page Google shows a sample code on how to send multimodal prompt requests (image + text).
import vertexai
from vertexai.generative_models import GenerativeModel, Part
# ...
0
votes
1
answer
119
views
Transformers code works on its own, but breaks when using gradio (device mismatch
I am attempting to make a gradio demo for nanoLLaVA by @stablequan. I am porting over just the structure of Apache 2.0 licensed code in the Moondream repo.
The nanoLLaVA repo has example code in the ...
5
votes
0
answers
895
views
How to use LLaVa embedding function? Multi-Modal Rag
I'm currently implementing a multi-modal RAG sys leveraging, LLaVa, Chroma & Langchain.
However, I'm having a hard time finding the embeddings function llava uses. Can anybody help me with that? ...
1
vote
1
answer
318
views
Loading video-LLaVA with Huggingface transformers
On trying to load video-LLaVA with Huggingface on colab I get this error:
---------------------------------------------------------------------------
HTTPError ...