Questions tagged [similarity]
The similarity tag has no summary.
275 questions
3
votes
1
answer
45
views
Is there a fast method from sampling from document embeddings to *maximize* pairwise distances?
I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...
2
votes
0
answers
43
views
in RAG, for large dataset, which similarity works? Why? how to handle problem with size of matrix in cosine similarity?
If we want to implement RAG for large dataset, which similarity works? Why?
Also, how to handle problem with size of matrix in cosine similarity?
1
vote
0
answers
71
views
Content-Based Filtering for Internship Recommendations Without User Ratings—Is It Feasible?
I’m designing a recommendation feature for a student internship platform. Students will explicitly select their interests and skills during registration, and recruiters will post internship ...
4
votes
1
answer
209
views
Calculating weighted cosine similarity between vectors of words
I have two word lists, where each word is representative of each topic. A topic is created from a collection of documents (tweets in this case). Not all words would’ve appeared an equal number of ...
3
votes
1
answer
79
views
Best metric to assess similarity between flight trajectories features
Consider a flight as represented by a dataframe with spatial (latitude, longitude, altitude) ...
0
votes
1
answer
123
views
Synchrony vs Similarity in time series data
I would like to know what is the difference between synchrony and similarity w.r.t time series data. Upon research I get the below explanation.
"Synchrony and similarity are two different ...
0
votes
1
answer
309
views
How does RAG query affect the similarity search?
I have a RAG pipeline where I want to extract a piece of information called "X" In a regular RAG pipeline, there is a query entered by the user. Then, ...
1
vote
0
answers
80
views
Similarity search with text and tabular data
If I have two documents, D1 and D2 and a function f which computes the (normalized) document ...
1
vote
0
answers
72
views
Interpretation of Evaluation Values of Augmented SBERT Training with EmbeddingSimilarityEvaluator()
I train a BI-Encoder to get an Augmented SBERT and I get a final training result.
How can I interpret the following output of the final training result?
...
4
votes
1
answer
470
views
Higher level sentence similarity (meaning instead of 'just' embeddings)
I am looking for the correct model / approach for the task of checking if two sentences have the same meaning
I know I can use embeddings to check similarity, but that is not what I am after. I ...
0
votes
1
answer
76
views
Using text embeddings directly to compute similarity vs using them as features for a model that predicts similariy
Say you have a problem where you have a query and a set of result documents and you want to rank the result documents according to the query. Say also you have embeddings for the query and for the ...
0
votes
1
answer
158
views
What's Best way in selecting right model for document comparison
We have different pre-trained models like BERT, USE, ELMo, Word2Vec, FastText, etc..,
we have documents in different sizes (large, medium, small). now, we want to do document similarity. how can we ...
6
votes
2
answers
2k
views
How to handle similarity search on mixed data types vectors?
I think this question is one that many beginners run into and I could not find a decent generic guide for it.
My issue is the following. I want to evaluate similarity of vectors which have mixed data ...
0
votes
0
answers
132
views
Better results in Document similarity using Word2Vec
I try to cluster similar support-tickets in a technical domain. The support tickets are very domain-specific and are written in various styles, lengths, using abbreviation, etc.
I made a training-...
2
votes
1
answer
50
views
Comparing images in N channels
I have an "image" of NxN dimensions in m channels (for reference, m is less than 17) in my training set and validation set. I would like to compare images in the training set with those in ...