Newest 'similarity' Questions - Data Science Stack Exchange

3 votes

1 answer

45 views

Is there a fast method from sampling from document embeddings to maximize pairwise distances?

I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...

Layman

291

asked Nov 11, 2025 at 23:52

2 votes

0 answers

43 views

in RAG, for large dataset, which similarity works? Why? how to handle problem with size of matrix in cosine similarity?

If we want to implement RAG for large dataset, which similarity works? Why? Also, how to handle problem with size of matrix in cosine similarity?

user10296606

1,906

asked Feb 11, 2025 at 5:15

1 vote

0 answers

71 views

Content-Based Filtering for Internship Recommendations Without User Ratings—Is It Feasible?

I’m designing a recommendation feature for a student internship platform. Students will explicitly select their interests and skills during registration, and recruiters will post internship ...

Amira

11

asked Nov 29, 2024 at 16:37

4 votes

1 answer

209 views

Calculating weighted cosine similarity between vectors of words

I have two word lists, where each word is representative of each topic. A topic is created from a collection of documents (tweets in this case). Not all words would’ve appeared an equal number of ...

Adam_G

141

asked May 24, 2024 at 19:22

3 votes

1 answer

79 views

Best metric to assess similarity between flight trajectories features

Consider a flight as represented by a dataframe with spatial (latitude, longitude, altitude) ...

Droid

131

asked May 3, 2024 at 13:10

0 votes

1 answer

123 views

Synchrony vs Similarity in time series data

I would like to know what is the difference between synchrony and similarity w.r.t time series data. Upon research I get the below explanation. "Synchrony and similarity are two different ...

mohammed shoab

1

asked Jan 18, 2024 at 18:16

0 votes

1 answer

309 views

How does RAG query affect the similarity search?

I have a RAG pipeline where I want to extract a piece of information called "X" In a regular RAG pipeline, there is a query entered by the user. Then, ...

ahmedmoh123

3

asked Jan 12, 2024 at 20:26

1 vote

0 answers

80 views

Similarity search with text and tabular data

If I have two documents, D1 and D2 and a function f which computes the (normalized) document ...

CutePoison

520

asked Jan 4, 2024 at 15:35

1 vote

0 answers

72 views

Interpretation of Evaluation Values of Augmented SBERT Training with EmbeddingSimilarityEvaluator()

I train a BI-Encoder to get an Augmented SBERT and I get a final training result. How can I interpret the following output of the final training result? ...

Christian01

141

asked Dec 21, 2023 at 18:43

4 votes

1 answer

470 views

Higher level sentence similarity (meaning instead of 'just' embeddings)

I am looking for the correct model / approach for the task of checking if two sentences have the same meaning I know I can use embeddings to check similarity, but that is not what I am after. I ...

Rob Audenaerde

143

asked Dec 8, 2023 at 10:39

0 votes

1 answer

76 views

Using text embeddings directly to compute similarity vs using them as features for a model that predicts similariy

Say you have a problem where you have a query and a set of result documents and you want to rank the result documents according to the query. Say also you have embeddings for the query and for the ...

user1893354

183

asked Sep 24, 2023 at 20:45

0 votes

1 answer

158 views

What's Best way in selecting right model for document comparison

We have different pre-trained models like BERT, USE, ELMo, Word2Vec, FastText, etc.., we have documents in different sizes (large, medium, small). now, we want to do document similarity. how can we ...

tovijayak

77

asked Jun 16, 2023 at 12:51

6 votes

2 answers

2k views

How to handle similarity search on mixed data types vectors?

I think this question is one that many beginners run into and I could not find a decent generic guide for it. My issue is the following. I want to evaluate similarity of vectors which have mixed data ...

Chapo

63

asked Apr 14, 2023 at 9:57

0 votes

0 answers

132 views

Better results in Document similarity using Word2Vec

I try to cluster similar support-tickets in a technical domain. The support tickets are very domain-specific and are written in various styles, lengths, using abbreviation, etc. I made a training-...

Roland

1

asked Apr 5, 2023 at 14:36

2 votes

1 answer

50 views

Comparing images in N channels

I have an "image" of NxN dimensions in m channels (for reference, m is less than 17) in my training set and validation set. I would like to compare images in the training set with those in ...

Shaz

135

asked Mar 30, 2023 at 13:58

Stack Exchange Network

Questions tagged [similarity]

Is there a fast method from sampling from document embeddings to maximize pairwise distances?

in RAG, for large dataset, which similarity works? Why? how to handle problem with size of matrix in cosine similarity?

Content-Based Filtering for Internship Recommendations Without User Ratings—Is It Feasible?

Calculating weighted cosine similarity between vectors of words

Best metric to assess similarity between flight trajectories features

Synchrony vs Similarity in time series data

How does RAG query affect the similarity search?

Similarity search with text and tabular data

Interpretation of Evaluation Values of Augmented SBERT Training with EmbeddingSimilarityEvaluator()

Higher level sentence similarity (meaning instead of 'just' embeddings)

Using text embeddings directly to compute similarity vs using them as features for a model that predicts similariy

What's Best way in selecting right model for document comparison

How to handle similarity search on mixed data types vectors?

Better results in Document similarity using Word2Vec

Comparing images in N channels

Hot Network Questions