Skip to main content
0 votes
0 answers
178 views

I'm trying to run Mistral-Small-3.1-24B-Instruct-2503 in multimodal mode (with image_url) using vLLM, but hitting the tokenizer error: AttributeError: 'MistralTokenizer' object has no attribute '...
weiming's user avatar
  • 39
3 votes
2 answers
214 views

I am working with OmniEmbed model (https://huggingface.co/Tevatron/OmniEmbed-v0.1), which is built on Qwen2.5 7B. My goal is to get a multimodal embedding for images and videos. I have the following ...
n_arch's user avatar
  • 76
0 votes
1 answer
2k views

I have llama-server up and running on a VPS with Ubuntu 24.04. I can send curl requests from an external IP and get answers for text embedding for instance. Now I want to use multimodal models through ...
user3102556's user avatar
4 votes
1 answer
387 views

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...
Abhash Rai's user avatar
0 votes
0 answers
492 views

While trying to build a chatbot leveraging the capabilities of CopilotKit and GPT-4o model. I am also using the frontend (React based UI) which is also supported by CopilotKit. What's happening is ...
Gaurav Singh's user avatar
0 votes
2 answers
202 views

I have a great pipeline of chatbot using Haystack. I am referring to the haystack docs to create a pipeline, here is the example of the pipeline using prompt builder: from haystack import Pipeline ...
Abstract's user avatar
0 votes
1 answer
1k views

I'm expirementing with llama 3.2 vision 11B and I'm having a bit of a rough time attaching an image, wether it's local or online, to the chat. Here's my Python code: import io import base64 import ...
Za3tour420's user avatar
2 votes
0 answers
193 views

I am dealing with two embeddings, text and image both are last_hidden_state of transfomer models (bert and vit), so the shapes are (batch, seq, emd_dim). I want to feed text information to image using ...
m sh's user avatar
  • 21
1 vote
0 answers
120 views

I'm trying to speed up the generation of captions on a large set of images using BLIP-2. The below code for one image works fine: prompt = "this is a picture of" inputs = processor(trainData[...
Paul's user avatar
  • 1,216
1 vote
3 answers
3k views

I try to use Gemini model to generate descriptions for online images, but failed at the converting Pillow iamge format to vertext ai image format. Running below code encounters this error: ...
Koala S's user avatar
  • 11
2 votes
1 answer
649 views

I am using the transformers library (Huggingface) to extract all hidden units of LLaVa 1.5. On the huggingface documentation, it shows that it is possible to extract image hidden states from the ...
Mihir Mehta's user avatar
1 vote
1 answer
2k views

On this page Google shows a sample code on how to send multimodal prompt requests (image + text). import vertexai from vertexai.generative_models import GenerativeModel, Part # ...
Matheus Torquato's user avatar
0 votes
1 answer
119 views

I am attempting to make a gradio demo for nanoLLaVA by @stablequan. I am porting over just the structure of Apache 2.0 licensed code in the Moondream repo. The nanoLLaVA repo has example code in the ...
CoderCowMoo's user avatar
5 votes
0 answers
895 views

I'm currently implementing a multi-modal RAG sys leveraging, LLaVa, Chroma & Langchain. However, I'm having a hard time finding the embeddings function llava uses. Can anybody help me with that? ...
Danilo Dresen's user avatar
1 vote
1 answer
318 views

On trying to load video-LLaVA with Huggingface on colab I get this error: --------------------------------------------------------------------------- HTTPError ...
Kamakshi Ramamurthy's user avatar

15 30 50 per page