64
votes
Should UTF-8 CSV files contain a BOM (byte order mark)?
Not for UTF-8, but see the various caveats in the comments.
It's unnecessary (UTF-8 has no byte order) unlike UTF-16/32 and not recommended in the Unicode standard. It's also quite rare to see UTF-8 ...
16
votes
Accepted
In what configuration file format do regular expressions not need escaping?
CDATA sections in XML should do.
Here's a stackoverflow post about it:
https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean
I remember it took me a while to understand how to use ...
13
votes
In what configuration file format do regular expressions not need escaping?
This is indeed an interesting question, as commonly the requirements for config file formats are somewhat different, so it's understandable that available formats don't really support this requirement....
11
votes
Should UTF-8 CSV files contain a BOM (byte order mark)?
There still is no widespread convention AFAIK, though certainly UTF-8 is now generally accepted.
The BOM is an awful artifact:
It is invisible (zero-width space).
Some software might break on the ...
10
votes
In what configuration file format do regular expressions not need escaping?
Consider TOML
It handles two different forms of raw strings:
regex = '<\i\c*\s*>'
OR
regex2 = '''I [dw]on't need \d{2} apples'''
8
votes
Accepted
SQLite database as a data interchange format?
It happens to be possible to use SQLite databases as a data interchange format, but it's not a particularly good solution. Why?
You are looking for a compact data exchange format. In contrast, SQLite'...
7
votes
Accepted
How do different file types generally store data?
The reason there are many different file formats is that there are many different goals for the way data is formatted. Some of these are in opposition to each other and some are orthogonal to each ...
7
votes
Accepted
How a standard video file is structured under the hood
Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.
You always start with individual streams containing ...
6
votes
How a standard video file is structured under the hood
Video container formats are typically made up of a series of content blocks. A block typically consists of a few marker bytes (important for finding the next block if you get incomplete data while ...
6
votes
In what configuration file format do regular expressions not need escaping?
NestedText is a configuration file format that makes a point of not requiring any escaping or quoting, which makes it very good for applications like this:
# regex examples from:
# https://support....
5
votes
Using source code instead of XML/JSON or other custom serialization schemes and binary file formats
With the question clarified as to handling differences in the structure of configuration from one version to the next, you lose a lot of potential solutions by using a code as configuration process. ...
4
votes
How to handle multiple versions of binary file format
Honestly, you have to open the stream and read enough bytes to determine which file format you are actually dealing with first. This solution is not unlike what many graphics tools do to discover the ...
4
votes
Why GUID Partition Table data lack the sector size?
A software layer which can read and write the GPT has to use low-level ATA/ATAPI commands for it. Hence, it can simply ask the device for its logical sector size using the ATA command "IDENTIFY ...
4
votes
In what configuration file format do regular expressions not need escaping?
Roll your own
Seems a simple enough format; just write your own custom parser to deserialize from a plain text file (perhaps just like your first example) into your object model. This would require ...
3
votes
In what configuration file format do regular expressions not need escaping?
Tab-separated value format (.tsv) is a simple, easy to edit text format that allows any text within a field except TAB and newline characters.
There are no escaping rules in its IANA format spec.
TSV ...
3
votes
Accepted
Selecting the endianness of data in files generated by an embedded system
If your goal isn't interoperability with independently written software, endianness conversion is probably overkill. You're already doing it for network data, which would not strictly be necessary - ...
3
votes
Is it possible to extend gifs to support audio?
In theory, you can add any feature to any format.
In practice, GIF images are so widespread and the new feature, while vaguely related, is so far from the core functionality of the standard format ...
2
votes
Should UTF-8 CSV files contain a BOM (byte order mark)?
The only public definition of CSV format is RFC 4180 and such CSV file should use UTF-8 encoding and no byte-order-mark (BOM) because UTF-8 files do not depend on CPU byte order and therefore do not ...
2
votes
Accepted
Design Pattern to extract arbitrary field from arbitrary file format
Regardless of the pattern you use the fundamental thing here is deciding what you're going to model: whatever the file has in it or what your application needs.
There are use cases for either. A text ...
2
votes
Choosing a MagicNumber or Signature for a Binary File Format
Create a 16 byte guid, and use that. Obviously make very, very sure that you don't lose the guid. There is no serious chance that someone else creates files unintentionally starting with this guid.
...
2
votes
Using source code instead of XML/JSON or other custom serialization schemes and binary file formats
The other answer has already covered the main point why storing configuration in source is a bad idea. But I wanted to add a few points.
Just about every developer should know how json works, and ...
1
vote
Accepted
Why do HTML and JS feel slow on smartphones? What could be changed to make them faster?
It's not really to do with HTML itself, and especially not XHTML, but four things:
the document layout is dynamically computed at runtime based on the size of the device viewport.
the attributes (...
1
vote
Using source code instead of XML/JSON or other custom serialization schemes and binary file formats
I do disagree with some of the other answers that seem to generally dispose of the idea that configuring through code could be a thing - After all, that's the general idea behind Lua, which is quite ...
1
vote
Accepted
Getting file format by checking file header
Use memcmp to compare the PNG header against the first bytes of the file. Other than that you can't avoid opening and reading the file.
Maybe you should fseek back to the start, to reset the file ...
1
vote
What are formats to store geographic maps for a robot to travel point A to point B
For moving robots or calculate routes to move from point A to point B, you need to use a graph data structure. A graph is a set of nodes and a set of edges that relate nodes. A path is a sequence of ...
1
vote
Accepted
Best file formats for ML training
Since you don't have a clear criteria on efficiency, it's hard to have a best practice here. The data format you are using are all well supported by PyTorch data loader module, so it should be fine.
...
1
vote
How to handle multiple versions of binary file format
Universal solution is to always have the most recent version of the structure and use BinaryReader.
struct FileHeader
{
ushort version;
uint fieldX; // version 1
uint fieldXX; // added in ...
1
vote
Design Pattern to extract arbitrary field from arbitrary file format
There's no magical solution to this. If the file formats have some version information built into them, you can use that data, feed it into a Factory, and create the appropriate instance of a Strategy ...
1
vote
Fast storage format for huge point clouds (fast read/write)
I'm not an expert in the field, nor do I know your exact constraints/data, but I may have an idea. Or at least something that would be worth trying if the points are spread more or less uniformely.
...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
file-formats × 24data-structures × 4
file-handling × 3
file-structure × 3
design × 2
parsing × 2
xml × 2
standards × 2
video × 2
java × 1
c# × 1
design-patterns × 1
architecture × 1
object-oriented × 1
c++ × 1
c × 1
api-design × 1
refactoring × 1
html × 1
versioning × 1
linux × 1
operating-systems × 1
configuration × 1
embedded-systems × 1
machine-learning × 1