Skip to main content
added 4 characters in body
Source Link
Doc Brown
  • 220.6k
  • 35
  • 410
  • 625

There is nothing wrong in letting a human define the column mapping for each new data provider if the number of different providers is the order ofsomething around "some hundreds". Give a human ~five minutes per provider to figure out the correct mapping, that will take approximately one day of work to define the mapping for 100 providers. It should be obvious that once a mapping was determined, it should be saved and reused for the specific provider.

This will usually be way more effective than investing some weeks of work into trying to develop some clever heuristic (see canonical XKCD).

Of course, the situation will change when data providers from time to time change their data format and don't actively inform you about, so a human need to check each and every CSV file again. Still, you can try to start imports with the mapping once defined and run some automated sanity checks whether the result looks like a valid address list. Such sanity check can be implemented mostly independent from the specific provider, so it is probably worth the time.

There is nothing wrong in letting a human define the column mapping for each new data provider if the number of different providers is the order of "some hundreds". Give a human ~five minutes per provider to figure out the correct mapping, that will take approximately one day of work to define the mapping for 100 providers. It should be obvious that once a mapping was determined, it should be saved and reused for the specific provider.

This will usually be way more effective than investing some weeks of work into trying to develop some clever heuristic (see canonical XKCD).

Of course, the situation will change when data providers from time to time change their data format and don't actively inform you about, so a human need to check each and every CSV file again. Still, you can try to start imports with the mapping once defined and run some automated sanity checks whether the result looks like a valid address list. Such sanity check can be implemented mostly independent from the specific provider, so it is probably worth the time.

There is nothing wrong in letting a human define the column mapping for each new data provider if the number of different providers is something around "some hundreds". Give a human ~five minutes per provider to figure out the correct mapping, that will take approximately one day of work to define the mapping for 100 providers. It should be obvious that once a mapping was determined, it should be saved and reused for the specific provider.

This will usually be way more effective than investing some weeks of work into trying to develop some clever heuristic (see canonical XKCD).

Of course, the situation will change when data providers from time to time change their data format and don't actively inform you about, so a human need to check each and every CSV file again. Still, you can try to start imports with the mapping once defined and run some automated sanity checks whether the result looks like a valid address list. Such sanity check can be implemented mostly independent from the specific provider, so it is probably worth the time.

Source Link
Doc Brown
  • 220.6k
  • 35
  • 410
  • 625

There is nothing wrong in letting a human define the column mapping for each new data provider if the number of different providers is the order of "some hundreds". Give a human ~five minutes per provider to figure out the correct mapping, that will take approximately one day of work to define the mapping for 100 providers. It should be obvious that once a mapping was determined, it should be saved and reused for the specific provider.

This will usually be way more effective than investing some weeks of work into trying to develop some clever heuristic (see canonical XKCD).

Of course, the situation will change when data providers from time to time change their data format and don't actively inform you about, so a human need to check each and every CSV file again. Still, you can try to start imports with the mapping once defined and run some automated sanity checks whether the result looks like a valid address list. Such sanity check can be implemented mostly independent from the specific provider, so it is probably worth the time.