Questions tagged [data-formats]
The data-formats tag has no summary.
37 questions
40
votes
4
answers
30k
views
What are the advantages of HDF compared to alternative formats?
What are the advantages of HDF compared to alternative formats? What are the main data science tasks where HDF is really suitable and useful?
8
votes
2
answers
129k
views
ValueError: could not convert string to float: '���'
I have a (2M, 23) dimensional numpy array X. It has a dtype of <U26, i.e. unicode string ...
4
votes
2
answers
1k
views
What is the most used format to save data with type information
I am exporting data from an SQL database and importing it into R. This is a two step process since I first (automatically) download the data to a hard drive and then import the file with R.
...
3
votes
1
answer
8k
views
Getting stock data in a discipline manner from Yahoo finance
I used the below code for downloading stock data from yahoo finance:-
...
3
votes
3
answers
3k
views
Building a Citation Network to Analyze in R
I am working on network analysis project at one of our Nation's Service Academies, and I need a little help.
As a starting point, we are looking at a citation network that we build by using the ...
3
votes
2
answers
1k
views
Storing Large dataset for processing and analysis of data
I am new to data engineering and wanted to know , what is the best way to store more than 3000 GB of data for further processing and analysis ? I am specifically looking for open source resources . I ...
2
votes
3
answers
5k
views
What is the best file format to store an uncompressed 2D matrix?
For what it's worth my particular case is a symmetrical matrix, but this question should be answered more generally.
2
votes
2
answers
554
views
Why ConLL is not in XML format
I am new to NLP, I found a format named ConLL which seems a tab-separated file,
like
ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL
I found ...
2
votes
1
answer
4k
views
How to store efficiently very large sparse 3D matrices
To train a CNN, I have stacked arrays of images over observations [observations x width x length]. The dataset is very sparse ($95\%$). What would be an efficient ...
2
votes
1
answer
110
views
Advantage of a treebank in XML format
Which treebanks are based on an XML format?
What is the advantage of XML format for a treebank? I think it may have effects on annotation and querying the treebank.
for example LASSY and Alpino or ...
2
votes
3
answers
2k
views
Transposing Every nth row to column in a large dataset
I am attempting to work with a very large data-set (~1.5mil lines) for the first time in SAS and I am having some difficulty. The data-set I have is formatted as a "long" .txt file as follows:
...
1
vote
2
answers
876
views
Appropriate way to store data in R
I have data, which looks like this:
These data are only for one subject. I will have a lot more.
These data will be analyzed in R.
Now I'm storing them like this:
...
1
vote
2
answers
125
views
Python list formatting
I have a list, which looks like this:
params
['h\x00i\x00', '\x00t\x00h\x00e\x00r\x00e\x00']
Now, all I want is to merge these two elements into the string "hi ...
1
vote
2
answers
207
views
Are ICD codes forward compatible?
The ICD code scheme has gone through many revisions: https://en.wikipedia.org/wiki/International_Statistical_Classification_of_Diseases_and_Related_Health_Problems
Is there any compatibility between ...
1
vote
1
answer
4k
views
Date time conversion in a CSV column [closed]
I am new to data science. I am attempting to write a program using regression techniques, and all of my values are numerical, except for the date and time (UTC), which are written in this format: HH:...