Newest Questions
36,284 questions
3
votes
1
answer
287
views
Handling Profanity Censorship in BERTopic
I'm currently working in a dataset with censorship on profanity. Basically, fuck would be 4 heart emojis. Considering I'm trying to run a topic modelling w/ BERTopic, what kinda of preprocessing would ...
1
vote
0
answers
8
views
How do I change the default dashboard size in Logi Symphony?
When I create a dashboard in Logi Symphony, the canvas size defaults to 1024px by 768px. I want to change this to 1366px by 768px. Currently, I have to change the size for every new dashboard via ...
4
votes
1
answer
172
views
Different classifiers are yielding same metric results, is it normal?
I am trying to implement the strategy of hierarchical classification with chained classifiers and a Bayesian network, as in the paper of Serrano-Perrez & Sucar.
The data in my case are ontologies, ...
0
votes
0
answers
10
views
What causes a model to have such an output?
I'm training CSDI model and the output is very suspicious?
Low diffusion steps? too high learning rate? nothing seems to change this behavior? some normalization issue that I'm not accounting for, I'...
1
vote
0
answers
14
views
Image preprocessing for improving latest yolo version performance
I prepared dataset for testing yolo v11.
Now I am suffering with not achieving dreamed accuracy with existed algorithm.
I have searched so many books online but I didn't find any proper image dataset ...
4
votes
2
answers
359
views
feature generating strategy
There are several ways to generate new features , such as feature interactions(feature1*feature2) , some encoding types (target encoding , freq encoding) etc.
is it good idea to generate new features ,...
1
vote
0
answers
13
views
Kolmogorov-Arnold Network fitting software that allows exchangeability assumption?
I have a problem in which I'd like to try to approximate an unknown function using the a specific version of the Kolmogorov representation theorem. I will have somwhere between at minimum dozens of ...
3
votes
1
answer
215
views
First trained model, slight overfitting or still fine?
Hi everyone,
I’m new to machine learning and have just trained my first model. While analyzing the training process, I noticed something I’m unsure how to interpret.
In my plot, both the training loss ...
5
votes
1
answer
181
views
How to efficiently merge multiple CSV and JSON files into a single DataFrame using Pandas in Python
I am working with multiple data files in a folder where some files are in CSV format and others are in JSON format. I want to combine all of them into a single DataFrame for further analysis.
Here is ...
1
vote
0
answers
15
views
Need some guidance implmeneting a particular diffusion model?
I've been attempting to implement "Retrieval-Augmented Diffusion Model" found here https://github.com/stanliu96/RATD
. The issue for me was that the TCN (Temporal Convolutional Network) used ...
5
votes
1
answer
35
views
using emmeans for LMEM continious outcome
First time posting so I might forget important details.
I'm using a dataset in R, on which I performed a linear mixed model with random intercept and slope (lme from nlme package).
Now, this was all ...
2
votes
0
answers
20
views
Kaggle competition differentiation among competitors
Given there are so many kaggle competitions, how does the winner win technically? Does he/she invent a completely new algorithm to solve the problem? By now there are 8 million sincere students who ...
2
votes
1
answer
22
views
When does Master Data become critical in an ERP project?
I’m currently involved in an ERP project with a focus on Master Data, and I’m interested in your perspective.
In the early phases, a significant part of the work is dedicated to:
analyzing existing ...
6
votes
2
answers
303
views
How does Validation work for Time-Series Forecasting?
What is the standard method for splitting time series into train/validation/test in time-series forecasting?
Example 1000 time series and total time steps is 300. Forecast horizon is at time step 200 ...
5
votes
1
answer
30
views
How should I approach feature selection when working with a very large scraped dataset for regression?
I recently scraped a large dataset from several websites and ended up with around 25–30 potential features that might influence the target variable. The dataset is fairly large (hundreds of thousands ...