Skip to main content

Questions tagged [data-formats]

40 votes
4 answers
30k views

What are the advantages of HDF compared to alternative formats? What are the main data science tasks where HDF is really suitable and useful?
IgorS's user avatar
  • 5,484
8 votes
2 answers
129k views

I have a (2M, 23) dimensional numpy array X. It has a dtype of <U26, i.e. unicode string ...
cappy0704's user avatar
  • 231
4 votes
2 answers
1k views

I am exporting data from an SQL database and importing it into R. This is a two step process since I first (automatically) download the data to a hard drive and then import the file with R. ...
Pieter's user avatar
  • 971
3 votes
1 answer
8k views

I used the below code for downloading stock data from yahoo finance:- ...
coding_ninza's user avatar
3 votes
3 answers
3k views

I am working on network analysis project at one of our Nation's Service Academies, and I need a little help. As a starting point, we are looking at a citation network that we build by using the ...
Joseph Bosse's user avatar
3 votes
2 answers
1k views

I am new to data engineering and wanted to know , what is the best way to store more than 3000 GB of data for further processing and analysis ? I am specifically looking for open source resources . I ...
user14519285's user avatar
2 votes
3 answers
5k views

For what it's worth my particular case is a symmetrical matrix, but this question should be answered more generally.
Austin Capobianco's user avatar
2 votes
2 answers
554 views

I am new to NLP, I found a format named ConLL which seems a tab-separated file, like ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL I found ...
Ahmad's user avatar
  • 447
2 votes
1 answer
4k views

To train a CNN, I have stacked arrays of images over observations [observations x width x length]. The dataset is very sparse ($95\%$). What would be an efficient ...
hH1sG0n3's user avatar
  • 2,156
2 votes
1 answer
110 views

Which treebanks are based on an XML format? What is the advantage of XML format for a treebank? I think it may have effects on annotation and querying the treebank. for example LASSY and Alpino or ...
Ahmad's user avatar
  • 447
2 votes
3 answers
2k views

I am attempting to work with a very large data-set (~1.5mil lines) for the first time in SAS and I am having some difficulty. The data-set I have is formatted as a "long" .txt file as follows: ...
roboallen's user avatar
1 vote
2 answers
876 views

I have data, which looks like this: These data are only for one subject. I will have a lot more. These data will be analyzed in R. Now I'm storing them like this: ...
AndriusZ's user avatar
  • 145
1 vote
2 answers
125 views

I have a list, which looks like this: params ['h\x00i\x00', '\x00t\x00h\x00e\x00r\x00e\x00'] Now, all I want is to merge these two elements into the string "hi ...
user96931's user avatar
  • 113
1 vote
2 answers
207 views

The ICD code scheme has gone through many revisions: https://en.wikipedia.org/wiki/International_Statistical_Classification_of_Diseases_and_Related_Health_Problems Is there any compatibility between ...
user48956's user avatar
  • 121
1 vote
1 answer
4k views

I am new to data science. I am attempting to write a program using regression techniques, and all of my values are numerical, except for the date and time (UTC), which are written in this format: HH:...
RocketBlaster05's user avatar

15 30 50 per page