Azure Storage Table design with multiple query points

Question

I have the following Azure Storage Table.

PositionData table:

PartitionKey: ClientID + VehicleID 
RowKey: GUID 
Properties:  ClientID, VehicleID, DriverID, Date, GPSPosition

Each vehicle will log up to 1,000,000 entities per year per client. Each client could have thousands of vehicles. So, I decided to partition by ClientID + VehicleID so to have small, manageable partitions. When querying by ClientID and VehicleID, the operation performs quickly because we are narrowing the search down to one partition.

PROBLEM:

The problem here is that sometimes I need to query on only ClientID and DriverID. Because it's not possible to perform partial PartitionKey comparisons, every single partition will need to be scanned. This will kill performance.

I can't have a PartitionKey with all ClientID, VehicleID and DriverID because queries will only ever query on VehicleID OR DriverID, never both.

SOLUTION 1:

I considered having a value stored elsewhere which represented a VehicleID and DriverID pair, and then having a ClientID + VehicleDriverPairID PartitionKey, but that would result in hundreds of thousands of partitions and there will be much unioning of data between partitions in my code.

SOLUTION 2:

Have a partition for Client + VehicleID and another partition for Client + DriverID. This means that updating the table is twice as much work (two updates) but both queries will be fast. Also there will be redundant data.

Do any of these solutions sound viable? Other solutions?

Community · Accepted Answer · 2017-05-23 12:40:13Z

I think you have several viable programming solutions that could work - your original solutions as well as those that were posted on SO. As you say, the problem becomes deciding which design is optimal for your case - and also which will be easily understood by other programmers modifying your code in the future.

You didn't really say if vehicles are owned and operated by the clients or by the drivers. I expect it would be rare for clients that own and operate their vehicles to trade vehicles over a year. If vehicles are owned and operated exclusively by drivers I expect they would drive that vehicle most of the time.

When I read your question I thought of partitioning the table by VehicleID + DriverID if the vehicles are owned and operated exclusively by clients. And if the vehicles are owned and operated exclusively by drivers, a partition by ClientID + DriverID would be more efficient.

This approach would be fast, but it would become less accurate or less efficient as more driver-owners or client-owners traded vehicles.

Stack Exchange Network

Azure Storage Table design with multiple query points

1 Answer 1

Hot Network Questions

Azure Storage Table design with multiple query points

1 Answer 1

Related

Hot Network Questions