Skip to main content
added 2 characters in body; edited title
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Python: Fill a pandas dataframe (of predetermined size) with results from a similar dataframe (of size less than or equivalent to the original)

I'd like some feedback/suggestions on how to improve the following.. Specifically I want to know if what I'm doing is reliable and fast, or if there is a better way to accomplish this.

I have some dataset containing counts of sales made at different times throughout the day, across different locations/shops. Let's say there are 4 different shops contained in this data (A, B, C, D), and there are 4 different time bins in the day [0,1,2,3]. I query this and return a query dataset, but the issue I have is that for this query there may be no transactions for a certain time bin. Or there may be no transactions even for a specific shop (maybe there was a rat infestation and it closed for the day). 

Nevertheless, the end result must have the same number of rows (4 locations x 4 time bins), and simply contain zeros if there were no transactions there. In other words, I want records for all possible occurrences, even if they were not returned by the query itself.

Python: Fill a pandas dataframe (of predetermined size) with results from a similar dataframe (of size less than or equivalent to the original)

I'd like some feedback/suggestions on how to improve the following.. Specifically I want to know if what I'm doing is reliable and fast, or if there is a better way to accomplish this.

I have some dataset containing counts of sales made at different times throughout the day, across different locations/shops. Let's say there are 4 different shops contained in this data (A, B, C, D), and there are 4 different time bins in the day [0,1,2,3]. I query this and return a query dataset, but the issue I have is that for this query there may be no transactions for a certain time bin. Or there may be no transactions even for a specific shop (maybe there was a rat infestation and it closed for the day). Nevertheless, the end result must have the same number of rows (4 locations x 4 time bins), and simply contain zeros if there were no transactions there. In other words, I want records for all possible occurrences, even if they were not returned by the query itself.

Fill a pandas dataframe (of predetermined size) with results from a similar dataframe (of size less than or equivalent to the original)

I'd like some feedback/suggestions on how to improve the following. Specifically I want to know if what I'm doing is reliable and fast, or if there is a better way to accomplish this.

I have some dataset containing counts of sales made at different times throughout the day, across different locations/shops. Let's say there are 4 different shops contained in this data (A, B, C, D), and there are 4 different time bins in the day [0,1,2,3]. I query this and return a query dataset, but the issue I have is that for this query there may be no transactions for a certain time bin. Or there may be no transactions even for a specific shop (maybe there was a rat infestation and it closed for the day). 

Nevertheless, the end result must have the same number of rows (4 locations x 4 time bins), and simply contain zeros if there were no transactions there. In other words, I want records for all possible occurrences, even if they were not returned by the query itself.

Source Link
fffrost
  • 139
  • 1
  • 4

Python: Fill a pandas dataframe (of predetermined size) with results from a similar dataframe (of size less than or equivalent to the original)

I'd like some feedback/suggestions on how to improve the following.. Specifically I want to know if what I'm doing is reliable and fast, or if there is a better way to accomplish this.

The problem:

I have some dataset containing counts of sales made at different times throughout the day, across different locations/shops. Let's say there are 4 different shops contained in this data (A, B, C, D), and there are 4 different time bins in the day [0,1,2,3]. I query this and return a query dataset, but the issue I have is that for this query there may be no transactions for a certain time bin. Or there may be no transactions even for a specific shop (maybe there was a rat infestation and it closed for the day). Nevertheless, the end result must have the same number of rows (4 locations x 4 time bins), and simply contain zeros if there were no transactions there. In other words, I want records for all possible occurrences, even if they were not returned by the query itself.

Example:

import pandas as pd

# Specify the complete list of possible time bins
max_timebins = 3
bin_nums = list(range(max_timebins + 1))

# Specify the complete list of shops
shop_ids = ['A', 'B','C','D']

# Make a dataframe for all possible results without the counts
# This is just a dataframe with an index and no columns... this feels a little strange to me but it worked...
dat = {'shop':[], 'timebin':[]}
for shop in shop_ids:
    dat['shop']+=[shop]*len(bin_nums)
    dat['timebin'] += bin_nums
df_all = pd.DataFrame(dat)
df_all = df_all.set_index(list(dat.keys()))

# Example of a result of a query
dfq = pd.DataFrame(
    {
        'shop':['A', 'A', 'A', 'A',
                'B', 'B',
                'C', 'C', 'C',
                'D'],
        'time_bins':[0,1,2,3,
                     0, 3,
                     0,2,3,
                     2],
        'counts':[100,220, 300, 440,
                  500, 660,
                  120, 340, 90,
                  400]}).set_index(['shop', 'time_bins'])


result_df = pd.concat([df_all, dfq], axis=1).fillna(0).astype(int)