Skip to main content
64 votes

Should UTF-8 CSV files contain a BOM (byte order mark)?

Not for UTF-8, but see the various caveats in the comments. It's unnecessary (UTF-8 has no byte order) unlike UTF-16/32 and not recommended in the Unicode standard. It's also quite rare to see UTF-8 ...
Kayaman's user avatar
  • 1,980
16 votes
Accepted

In what configuration file format do regular expressions not need escaping?

CDATA sections in XML should do. Here's a stackoverflow post about it: https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean I remember it took me a while to understand how to use ...
Martin Maat's user avatar
  • 18.6k
13 votes

In what configuration file format do regular expressions not need escaping?

This is indeed an interesting question, as commonly the requirements for config file formats are somewhat different, so it's understandable that available formats don't really support this requirement....
Hans-Martin Mosner's user avatar
11 votes

Should UTF-8 CSV files contain a BOM (byte order mark)?

There still is no widespread convention AFAIK, though certainly UTF-8 is now generally accepted. The BOM is an awful artifact: It is invisible (zero-width space). Some software might break on the ...
Joop Eggen's user avatar
  • 2,639
10 votes

In what configuration file format do regular expressions not need escaping?

Consider TOML It handles two different forms of raw strings: regex = '<\i\c*\s*>' OR regex2 = '''I [dw]on't need \d{2} apples'''
JimmyJames's user avatar
  • 31.1k
8 votes
Accepted

SQLite database as a data interchange format?

It happens to be possible to use SQLite databases as a data interchange format, but it's not a particularly good solution. Why? You are looking for a compact data exchange format. In contrast, SQLite'...
amon's user avatar
  • 136k
7 votes
Accepted

How do different file types generally store data?

The reason there are many different file formats is that there are many different goals for the way data is formatted. Some of these are in opposition to each other and some are orthogonal to each ...
JimmyJames's user avatar
  • 31.1k
7 votes
Accepted

How a standard video file is structured under the hood

Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that. You always start with individual streams containing ...
besc's user avatar
  • 1,163
6 votes

How a standard video file is structured under the hood

Video container formats are typically made up of a series of content blocks. A block typically consists of a few marker bytes (important for finding the next block if you get incomplete data while ...
Sebastian Redl's user avatar
6 votes

In what configuration file format do regular expressions not need escaping?

NestedText is a configuration file format that makes a point of not requiring any escaping or quoting, which makes it very good for applications like this: # regex examples from: # https://support....
Kale Kundert's user avatar
5 votes

Using source code instead of XML/JSON or other custom serialization schemes and binary file formats

With the question clarified as to handling differences in the structure of configuration from one version to the next, you lose a lot of potential solutions by using a code as configuration process. ...
Berin Loritsch's user avatar
4 votes

How to handle multiple versions of binary file format

Honestly, you have to open the stream and read enough bytes to determine which file format you are actually dealing with first. This solution is not unlike what many graphics tools do to discover the ...
Berin Loritsch's user avatar
4 votes

Why GUID Partition Table data lack the sector size?

A software layer which can read and write the GPT has to use low-level ATA/ATAPI commands for it. Hence, it can simply ask the device for its logical sector size using the ATA command "IDENTIFY ...
Doc Brown's user avatar
  • 221k
4 votes

In what configuration file format do regular expressions not need escaping?

Roll your own Seems a simple enough format; just write your own custom parser to deserialize from a plain text file (perhaps just like your first example) into your object model. This would require ...
Tim Sparkles's user avatar
3 votes

In what configuration file format do regular expressions not need escaping?

Tab-separated value format (.tsv) is a simple, easy to edit text format that allows any text within a field except TAB and newline characters. There are no escaping rules in its IANA format spec. TSV ...
Jerry101's user avatar
  • 5,477
3 votes
Accepted

Selecting the endianness of data in files generated by an embedded system

If your goal isn't interoperability with independently written software, endianness conversion is probably overkill. You're already doing it for network data, which would not strictly be necessary - ...
Hans-Martin Mosner's user avatar
3 votes

Is it possible to extend gifs to support audio?

In theory, you can add any feature to any format. In practice, GIF images are so widespread and the new feature, while vaguely related, is so far from the core functionality of the standard format ...
Kilian Foth's user avatar
2 votes

Should UTF-8 CSV files contain a BOM (byte order mark)?

The only public definition of CSV format is RFC 4180 and such CSV file should use UTF-8 encoding and no byte-order-mark (BOM) because UTF-8 files do not depend on CPU byte order and therefore do not ...
Mikko Rantalainen's user avatar
2 votes
Accepted

Design Pattern to extract arbitrary field from arbitrary file format

Regardless of the pattern you use the fundamental thing here is deciding what you're going to model: whatever the file has in it or what your application needs. There are use cases for either. A text ...
candied_orange's user avatar
2 votes

Choosing a MagicNumber or Signature for a Binary File Format

Create a 16 byte guid, and use that. Obviously make very, very sure that you don't lose the guid. There is no serious chance that someone else creates files unintentionally starting with this guid. ...
gnasher729's user avatar
  • 49.4k
2 votes

Using source code instead of XML/JSON or other custom serialization schemes and binary file formats

The other answer has already covered the main point why storing configuration in source is a bad idea. But I wanted to add a few points. Just about every developer should know how json works, and ...
JonasH's user avatar
  • 6,397
1 vote
Accepted

Why do HTML and JS feel slow on smartphones? What could be changed to make them faster?

It's not really to do with HTML itself, and especially not XHTML, but four things: the document layout is dynamically computed at runtime based on the size of the device viewport. the attributes (...
pjc50's user avatar
  • 15.3k
1 vote

Using source code instead of XML/JSON or other custom serialization schemes and binary file formats

I do disagree with some of the other answers that seem to generally dispose of the idea that configuring through code could be a thing - After all, that's the general idea behind Lua, which is quite ...
tofro's user avatar
  • 958
1 vote
Accepted

Getting file format by checking file header

Use memcmp to compare the PNG header against the first bytes of the file. Other than that you can't avoid opening and reading the file. Maybe you should fseek back to the start, to reset the file ...
Spidey's user avatar
  • 116
1 vote

What are formats to store geographic maps for a robot to travel point A to point B

For moving robots or calculate routes to move from point A to point B, you need to use a graph data structure. A graph is a set of nodes and a set of edges that relate nodes. A path is a sequence of ...
Christophe's user avatar
  • 82.3k
1 vote
Accepted

Best file formats for ML training

Since you don't have a clear criteria on efficiency, it's hard to have a best practice here. The data format you are using are all well supported by PyTorch data loader module, so it should be fine. ...
lennon310's user avatar
  • 3,242
1 vote

How to handle multiple versions of binary file format

Universal solution is to always have the most recent version of the structure and use BinaryReader. struct FileHeader { ushort version; uint fieldX; // version 1 uint fieldXX; // added in ...
Konrad's user avatar
  • 1,569
1 vote

Design Pattern to extract arbitrary field from arbitrary file format

There's no magical solution to this. If the file formats have some version information built into them, you can use that data, feed it into a Factory, and create the appropriate instance of a Strategy ...
Filip Milovanović's user avatar
1 vote

Fast storage format for huge point clouds (fast read/write)

I'm not an expert in the field, nor do I know your exact constraints/data, but I may have an idea. Or at least something that would be worth trying if the points are spread more or less uniformely. ...
dagnelies's user avatar
  • 5,503

Only top scored, non community-wiki answers of a minimum length are eligible