Skip to main content

Questions tagged [sampling]

3 votes
1 answer
45 views

I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...
Layman's user avatar
  • 291
4 votes
1 answer
68 views

In Orange data mining (GUI), what is the default number of iterations for the data sampler bootstrap? And is there a way to increase it?
lala's user avatar
  • 41
1 vote
1 answer
325 views

I want to here y'all opinions on synthetic data generations, which method and tools you use and why.
haneulkim's user avatar
  • 497
3 votes
1 answer
81 views

I see that both of following arrangements work in Orange software to give score for a model: and Both above work but which of above two is the correct method? Does the selection of model (Tree, ...
rnso's user avatar
  • 1,648
4 votes
1 answer
73 views

I'm rendering charts for timeseries data composed by millions of records. The charts need to be interactive and have lots of feature support so I need to downsample them. The problem I've encountered ...
nathan-w's user avatar
2 votes
0 answers
50 views

I am currently training a graph transformer model in order to develop an AI who'd be able to generate edges on a unseen graph (link dependencies between text with historical data). I divided my ...
lili's user avatar
  • 371
2 votes
0 answers
86 views

I'm trying to create a predictive model for a dataset with continuous input variables and a binary/probability output. The input are sensors (up to 400 columns, but some very irrelevant) which are ...
user46124's user avatar
1 vote
0 answers
42 views

I'm trying to replicate the finding of the the publication "Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis-Hastings" for obtaining the joint distribution ...
Chris's user avatar
  • 11
1 vote
0 answers
18 views

I am facing a challenge in a project that involves sampling from a design space defined by 10 variables. I use Latin Hypercube Sampling (LHS) and/or Sobol sequences, and initially, the samples are ...
Chris's user avatar
  • 11
4 votes
1 answer
88 views

Say want to pick a fixed number of samples from a large 2D dataset, such that they relatively evenly distributed over the whole sample area. Imagine places in a country - so the border of the data is ...
barryhunter's user avatar
1 vote
1 answer
305 views

I am trying to understand the top_p parameter in langchain (nucleus sampling) but I can't seem to grasp it. Based on this we sort the probabilities and select a ...
Labyrinthian's user avatar
1 vote
1 answer
292 views

I am attempting a binary classification problem (using Weka). My dataset has 100,000 rows, 14 attributes (1 output variable). It takes already too long just to open the dataset in excel so I just know ...
FlexMcMurphy's user avatar
1 vote
1 answer
4k views

I have recently read through a lot of documentation and articles about Large Language Models (LLMs), and I have come to the conclusion that 0.7 is, most of the time, the default value for the ...
jmpion's user avatar
  • 11
0 votes
1 answer
70 views

The company I work for has deployed a trained rule-based sentiment analyzer model vader to make predictions on customer's attitude. We import the model from nltk library directly, so we didn't train ...
Shelby's user avatar
  • 3
1 vote
0 answers
43 views

Suppose I have a function $f\colon [0,1] \to \mathbb{R}$ which is maybe continuous (it's at least in $L^1$). I have a sample of $N$ points $\{x_i\}$ taken from the domain $[0,1]$ randomly from some ...
math_guy's user avatar
  • 111

15 30 50 per page
1
2 3 4 5
13