Skip to main content
12 events
when toggle format what by license comment
Feb 10, 2020 at 17:25 vote accept giacomo1488
Feb 6, 2020 at 18:53 answer added Ben A timeline score: 2
Jan 2, 2020 at 17:46 comment added Zchpyvr @giacomo1488 Could you share some more context about the problem you're trying to solve? It sounds like you want to create a 1% sample based on one variable in each line? Does that mean you only care about 2 fields in every line-- one for ID and the other for the variable you measuring against? It really sounds like a Python script is not the best tool for this...
Dec 30, 2019 at 2:26 comment added AMC I somehow forgot about this question, but I will return to it...
Dec 26, 2019 at 14:50 comment added giacomo1488 Yes, a 1% sample of the 4 million IDs. So I'm extracting the claims for 40,000 people.
Dec 24, 2019 at 16:00 comment added AMC The 4 million individual IDs are used to determine which claims to extract?
Dec 24, 2019 at 15:20 comment added giacomo1488 It is insurance claims data, which is privacy protected so I don't know of any sample data that's out there. There are ~300 million lines in the file. Each line represents a claim line and has 171 variables that are delimited with *. I make the 1% sample at the person level, using a list of 4 million person ids represented by integers and contained in idunique_ids_final. Let me know if there's any other useful information I can share.
Dec 23, 2019 at 1:54 comment added AMC Can you share some information about the data itself? Ideally we would have enough to run the program, since matters of performance are so dependent on benchmarking and profiling.
Dec 18, 2019 at 15:00 history tweeted twitter.com/StackCodeReview/status/1207314750960472064
Dec 18, 2019 at 0:43 history edited greybeard CC BY-SA 4.0
decorate code snippets as such, include title from hyperlink
Dec 17, 2019 at 19:45 review First posts
Dec 18, 2019 at 0:43
Dec 17, 2019 at 19:41 history asked giacomo1488 CC BY-SA 4.0