Skip to main content
deleted 1 character in body
Source Link
RandomDude
  • 372
  • 2
  • 10
{
    'country-id': 1,
    'name': 'USA',
    'states': [
    {
        'coutry'country-id': 1,
        'name': 'Alaska',
        'state-id': 11,
        'cities': [
        {
            'state-id': 11,
            'name': 'Anchorage',
            'city-id': 111,
            'streets':
            [
                {
                    'city-id': 111,
                    'name': 'Ingra St.',
                    'street-id': 1111
                }
            ]
        },
        {
            'state-id': 11,
            'name': 'Fairbanks',
            'city-id': 112,
            'streets': 
            [
                {
                    'city-id': 112,
                    'name': 'Johansen Expy',
                    'street-id': 1121
                }
            ]
        }
        ]
    }, ...
    ]
}
{
    'country-id': 1,
    'name': 'USA'
    'states': [
    {
        'coutry-id': 1,
        'name': 'Alaska',
        'state-id': 11,
        'cities': [
        {
            'state-id': 11,
            'name': 'Anchorage'
            'city-id': 111,
            'streets':
            [
                {
                    'city-id': 111,
                    'name': 'Ingra St.',
                    'street-id': 1111
                }
            ]
        },
        {
            'state-id': 11,
            'name': 'Fairbanks'
            'city-id': 112,
            'streets': 
            [
                {
                    'city-id': 112,
                    'name': 'Johansen Expy',
                    'street-id': 1121
                }
            ]
        }
        ]
    }, ...
    ]
}
{
    'country-id': 1,
    'name': 'USA',
    'states': [
    {
        'country-id': 1,
        'name': 'Alaska',
        'state-id': 11,
        'cities': [
        {
            'state-id': 11,
            'name': 'Anchorage',
            'city-id': 111,
            'streets':
            [
                {
                    'city-id': 111,
                    'name': 'Ingra St.',
                    'street-id': 1111
                }
            ]
        },
        {
            'state-id': 11,
            'name': 'Fairbanks',
            'city-id': 112,
            'streets': 
            [
                {
                    'city-id': 112,
                    'name': 'Johansen Expy',
                    'street-id': 1121
                }
            ]
        }
        ]
    },
    ]
}
Source Link
RandomDude
  • 372
  • 2
  • 10

Store nested json repsonses in relational database

The goal is to collect market data from a free rest api. The response is in json and size per response is above 1MB. I want to get at least updated data once per minute which means about 24h * 60min * 1MB ~ 1.5GB per day of json data. Since i have to collect data over a few weeks in order to start analysing market patterns i have to store the json responses in a space efficient way --> SQLite is my idea so far. Now the ugly part is dealing with nested (4 levels) json and insert it into a relational database.

json example:

{
    'country-id': 1,
    'name': 'USA'
    'states': [
    {
        'coutry-id': 1,
        'name': 'Alaska',
        'state-id': 11,
        'cities': [
        {
            'state-id': 11,
            'name': 'Anchorage'
            'city-id': 111,
            'streets':
            [
                {
                    'city-id': 111,
                    'name': 'Ingra St.',
                    'street-id': 1111
                }
            ]
        },
        {
            'state-id': 11,
            'name': 'Fairbanks'
            'city-id': 112,
            'streets': 
            [
                {
                    'city-id': 112,
                    'name': 'Johansen Expy',
                    'street-id': 1121
                }
            ]
        }
        ]
    }, ...
    ]
}

My approach:

from dataclasses import dataclass
from typing import List


@dataclass
class Country:
    def __init__(self, country_json):
        self.country_id: int = country_json['country-id']
        self.name: str = country_json['name']
        self.states: List(State) = [State(state) for state in country_json['states']]


@dataclass
class State:
    def __init__(self, state_json):
        self.country_id: int = state_json['country_id']
        self.name: str = state_json['name']
        self.state_id: int = state_json['state_id']
        self.cities: List(City) = [City(city) for city in state_json['cities']]


@dataclass
class City:
    def __init__(self, city_json):
        self.state_id: int = city_json['state-id']
        self.name: str = city_json['name']
        self.city_id: int = city_json['city-id']
        self.streets: List(Street) = [Street(street) for street in city_json['street']]


@dataclass
class Street:
    def __init__(self, street_json):
        self.city_id: int = street_json['city-id']
        self.name: str = street_json['name']
        self.street_id: int = street_json['street-id']

Next steps would be to add functions to each dataclass that store the values in a SQlite DB. I am really not sure if my general approach is efficient or if there is a better solution for this kind of problem. My idea in general was that if i build it like this i can easily add other steps e.g. data validation.