Questions tagged [data-mining]
An activity that seeks patterns in large, complex data sets. It usually emphasizes algorithmic techniques, but may also involve any set of related skills, applications, or methodologies with that goal.
1,177 questions
4
votes
1
answer
68
views
Bootstrap iterations in Orange data mining
In Orange data mining (GUI), what is the default number of iterations for the data sampler bootstrap? And is there a way to increase it?
3
votes
1
answer
85
views
Is it possible to make the python widget in Orange to give output and receive input (both in the same widget)
I'm working on a project which works on loop control, when I try to implement that in the orange platform, I'm unable to connect one widget (python script) to another in loop, as the connection is ...
2
votes
1
answer
50
views
Why do TSclust's diss.MINDIST.SAX() and jmotif's min_dist() give different results for the same SAX strings?
I am working with time series data in R and converting them to symbolic strings using the Symbolic Aggregate Approximation(SAX) algorithm.
I have tried two different R packages for SAX:
TSclust
...
3
votes
1
answer
81
views
Should data be sent to Learner algorithm also in Orange?
I see that both of following arrangements work in Orange software to give score for a model:
and
Both above work but which of above two is the correct method?
Does the selection of model (Tree, ...
0
votes
0
answers
35
views
How to properly set up your X matrix for time-series classification
I am making predictions at the entity level, and for simplicity's sake, suppose there is only one feature. My goal is to set up my X matrix such that I can capture changes to the entity over different ...
0
votes
0
answers
31
views
Where can I find dataset for my dissertation?
I am seeking high-quality datasets for my PhD dissertation on developing data mining models for diabetes prediction and treatment. Given the sensitivity of medical data, I am aware that accessing ...
0
votes
0
answers
22
views
the good approaches to automatically identify the change point
Given a sequence shown as follows, what are the normal approaches to automatically identify all the points that are suddenly have a big change.
5
votes
1
answer
60
views
Analyzing if my email notifications increase or decrease total subscriptions
I am hoping to reach someone who knows how to interpret data, if not, someone with better logic than me would still help :)
I had around 9000 users paying for monthly subscriptions for a service on ...
0
votes
1
answer
61
views
Best method to analyse user sequence
I have a sequence dataset as the following.
These sequences are statuses got approved by clients and they are ordered by date/time. A client can get multiple statuses and jump back to the same status ...
3
votes
1
answer
51
views
Name of algorithm that maps a string column to a float column, based on an aggregation with another float column , similar to TF-IDF
The Question
I'm not super familiar with the name's of common algorithms in Data Science, and I feel like this would be something that is commonly used, and so should have a name - want to refer to ...
2
votes
1
answer
59
views
Calculating LOF for big data
I have big dataset (hundreds of millions of records, counted in dozens of GBs) and I would like to perform LOF for the problem of anomaly detection (testing different methods for academic purposes) ...
3
votes
1
answer
117
views
Can a fact table have a 1:1 relationship with a dimension table?
I am trying to build a small healthcare fact table with the following information
[patientid], [organid], [value]
Each [patientid] is unique to that patient, but there are only 10 available [organid] ...
1
vote
1
answer
43
views
getting ideas to start my learning
i have plan to start my career on data analytics and i need a guildline how to start and where to start ,if you are ready to give some hints through that I'll get some clarity and i'll start my ...
0
votes
1
answer
88
views
How to create a conversation dataset from a website without API?
I am currently doing my thesis on Natural Language Processing and it involves studying how people text online in a community so that it can be used to simulate conversational agents that can mimic ...
5
votes
1
answer
307
views
What is appropriate Individual KPI for AI projects?
I work in the sales department of electronics component manufacturing company and we do data science projects using traditional algorithm like Random forests (success likelihood of design project), ...