Questions tagged [sampling]
The sampling tag has no summary.
185 questions
3
votes
1
answer
45
views
Is there a fast method from sampling from document embeddings to *maximize* pairwise distances?
I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...
4
votes
1
answer
68
views
Bootstrap iterations in Orange data mining
In Orange data mining (GUI), what is the default number of iterations for the data sampler bootstrap? And is there a way to increase it?
1
vote
1
answer
325
views
best way to create Synthetic data generation
I want to here y'all opinions on synthetic data generations, which method and tools you use and why.
3
votes
1
answer
81
views
Should data be sent to Learner algorithm also in Orange?
I see that both of following arrangements work in Orange software to give score for a model:
and
Both above work but which of above two is the correct method?
Does the selection of model (Tree, ...
4
votes
1
answer
73
views
How do I downsample huge datasets with sparse asymptotes?
I'm rendering charts for timeseries data composed by millions of records. The charts need to be interactive and have lots of feature support so I need to downsample them.
The problem I've encountered ...
2
votes
0
answers
50
views
Need help with model architecture and sampling negative edges
I am currently training a graph transformer model in order to develop an AI who'd be able to generate edges on a unseen graph (link dependencies between text with historical data).
I divided my ...
2
votes
0
answers
86
views
How do I train a model on data where there should be a statistical difference but it can't find it?
I'm trying to create a predictive model for a dataset with continuous input variables and a binary/probability output. The input are sensors (up to 400 columns, but some very irrelevant) which are ...
1
vote
0
answers
42
views
Sampling multiple masked tokens through Metropolis–Hastings
I'm trying to replicate the finding of the the publication "Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis-Hastings" for obtaining the joint distribution ...
1
vote
0
answers
18
views
Optimizing Sampling Strategy to Enhance Uniformity Under Conditional Constraints
I am facing a challenge in a project that involves sampling from a design space defined by 10 variables. I use Latin Hypercube Sampling (LHS) and/or Sobol sequences, and initially, the samples are ...
4
votes
1
answer
88
views
Algorithm for picking N random uniformly distributed samples, in irregular polygon?
Say want to pick a fixed number of samples from a large 2D dataset, such that they relatively evenly distributed over the whole sample area. Imagine places in a country - so the border of the data is ...
1
vote
1
answer
305
views
Top_p parameter in langchain
I am trying to understand the top_p parameter in langchain (nucleus sampling) but I can't seem to grasp it.
Based on this we sort the probabilities and select a ...
1
vote
1
answer
292
views
Correct way to take a subset of a dataset?
I am attempting a binary classification problem (using Weka). My dataset has 100,000 rows, 14 attributes (1 output variable). It takes already too long just to open the dataset in excel so I just know ...
1
vote
1
answer
4k
views
Why is 0.7, in general, the default value of temperature for LLMs?
I have recently read through a lot of documentation and articles about Large Language Models (LLMs), and I have come to the conclusion that 0.7 is, most of the time, the default value for the ...
0
votes
1
answer
70
views
how to evaluate a model on our data when the model is imported from a library and thus not trained by us?
The company I work for has deployed a trained rule-based sentiment analyzer model vader to make predictions on customer's attitude. We import the model from nltk library directly, so we didn't train ...
1
vote
0
answers
43
views
Calculating an integral with as few grid points as possible
Suppose I have a function $f\colon [0,1] \to \mathbb{R}$ which is maybe continuous (it's at least in $L^1$).
I have a sample of $N$ points $\{x_i\}$ taken from the domain $[0,1]$ randomly from some ...