Revisions to Finding specific promotions from two columns [closed]

deleted 7 characters in body

Source Link

edited Sep 16, 2024 at 23:40

tdy

2.3k
1
10
21

If promotions is large, then vectorize that as well using Series instead of dict(-zip(...)):

ranks = pd.Series(range(len(promotions)), index=promotions)  # ~4x~40x faster given 1,00010K jobcodes

If promotions is large, then vectorize that as well using Series instead of dict(zip(...)):

ranks = pd.Series(range(len(promotions)), index=promotions)  # ~4x faster given 1,000 jobcodes

If promotions is large, then vectorize that as well using Series instead of dict-zip:

ranks = pd.Series(range(len(promotions)), index=promotions)  # ~40x faster given 10K jobcodes

added 209 characters in body

Source Link

edited Sep 16, 2024 at 23:26

tdy

2.3k
1
10
21

If promotions is large, then vectorize that as well using Series instead of dict(zip(...)):

ranks = pd.Series(range(len(promotions)), index=promotions)  # ~4x faster given 1,000 jobcodes

Concrete example:

If promotions is large, then vectorize that as well using Series instead of dict(zip(...)):

ranks = pd.Series(range(len(promotions)), index=promotions)  # ~4x faster given 1,000 jobcodes

Concrete example:

deleted 29 characters in body

Source Link

edited Sep 16, 2024 at 21:31

tdy

2.3k
1
10
21

Here a simple way to actually vectorize would be to map the jobcodes to their numerical rankranks and just compare the ranks (assuming the promotions are ordered, which is indeed the case in your provided example):

import pandas as pd

promotions = ('AGM4', 'GM2', 'ADO3')
df = pd.DataFrame({"PayGroup_prev": ["---", "ADO3", "AGM4", "AGM4", "AGM4", "AGM4", "ADO3"], "PayGroup_cur": ["AGM4", "GM2", "ADO3", "???", "AGM4", "GM2", "ADO3"]})
#   PayGroup_prev  PayGroup_cur
# 0           ---          AGM4
# 1          ADO3           GM2
# 2          AGM4          ADO3
# 3          AGM4           ???
# 4          AGM4          AGM4
# 5          AGM4           GM2
# 6          ADO3          ADO3

promotions_rankranks = dict(zip(promotions, range(len(promotions))))
# {'AGM4': 0, 'GM2': 1, 'ADO3': 2}

df['PayGroup_prev_rank'] = df['PayGroup_prev'].map(promotions_rankranks)
df['PayGroup_cur_rank'] = df['PayGroup_cur'].map(promotions_rankranks)

df['Promoted'] = df['PayGroup_cur_rank'] > df['PayGroup_prev_rank']
#   PayGroup_prev  PayGroup_cur  PayGroup_prev_rank  PayGroup_cur_rank  Promoted
# 0           ---          AGM4                 NaN                0.0     False
# 1          ADO3           GM2                 2.0                1.0     False
# 2          AGM4          ADO3                 0.0                2.0      True
# 3          AGM4           ???                 0.0                NaN     False
# 4          AGM4          AGM4                 0.0                0.0     False
# 5          AGM4           GM2                 0.0                1.0      True
# 6          ADO3          ADO3                 2.0                2.0     False

Here a simple way to actually vectorize would be to map the jobcodes to their numerical rank and just compare the ranks (assuming the promotions are ordered, which is indeed the case in your provided example):

import pandas as pd

promotions = ('AGM4', 'GM2', 'ADO3')
df = pd.DataFrame({"PayGroup_prev": ["---", "ADO3", "AGM4", "AGM4", "AGM4", "AGM4", "ADO3"], "PayGroup_cur": ["AGM4", "GM2", "ADO3", "???", "AGM4", "GM2", "ADO3"]})
#   PayGroup_prev  PayGroup_cur
# 0           ---          AGM4
# 1          ADO3           GM2
# 2          AGM4          ADO3
# 3          AGM4           ???
# 4          AGM4          AGM4
# 5          AGM4           GM2
# 6          ADO3          ADO3

promotions_rank = dict(zip(promotions, range(len(promotions))))
# {'AGM4': 0, 'GM2': 1, 'ADO3': 2}

df['PayGroup_prev_rank'] = df['PayGroup_prev'].map(promotions_rank)
df['PayGroup_cur_rank'] = df['PayGroup_cur'].map(promotions_rank)

df['Promoted'] = df['PayGroup_cur_rank'] > df['PayGroup_prev_rank']
#   PayGroup_prev  PayGroup_cur  PayGroup_prev_rank  PayGroup_cur_rank  Promoted
# 0           ---          AGM4                 NaN                0.0     False
# 1          ADO3           GM2                 2.0                1.0     False
# 2          AGM4          ADO3                 0.0                2.0      True
# 3          AGM4           ???                 0.0                NaN     False
# 4          AGM4          AGM4                 0.0                0.0     False
# 5          AGM4           GM2                 0.0                1.0      True
# 6          ADO3          ADO3                 2.0                2.0     False

Here a simple way to actually vectorize would be to map the jobcodes to their numerical ranks and just compare the ranks (assuming the promotions are ordered, which is indeed the case in your provided example):

import pandas as pd

promotions = ('AGM4', 'GM2', 'ADO3')
df = pd.DataFrame({"PayGroup_prev": ["---", "ADO3", "AGM4", "AGM4", "AGM4", "AGM4", "ADO3"], "PayGroup_cur": ["AGM4", "GM2", "ADO3", "???", "AGM4", "GM2", "ADO3"]})
#   PayGroup_prev  PayGroup_cur
# 0           ---          AGM4
# 1          ADO3           GM2
# 2          AGM4          ADO3
# 3          AGM4           ???
# 4          AGM4          AGM4
# 5          AGM4           GM2
# 6          ADO3          ADO3

ranks = dict(zip(promotions, range(len(promotions))))
# {'AGM4': 0, 'GM2': 1, 'ADO3': 2}

df['PayGroup_prev_rank'] = df['PayGroup_prev'].map(ranks)
df['PayGroup_cur_rank'] = df['PayGroup_cur'].map(ranks)

df['Promoted'] = df['PayGroup_cur_rank'] > df['PayGroup_prev_rank']
#   PayGroup_prev  PayGroup_cur  PayGroup_prev_rank  PayGroup_cur_rank  Promoted
# 0           ---          AGM4                 NaN                0.0     False
# 1          ADO3           GM2                 2.0                1.0     False
# 2          AGM4          ADO3                 0.0                2.0      True
# 3          AGM4           ???                 0.0                NaN     False
# 4          AGM4          AGM4                 0.0                0.0     False
# 5          AGM4           GM2                 0.0                1.0      True
# 6          ADO3          ADO3                 2.0                2.0     False

Source Link

answered Sep 16, 2024 at 21:14

tdy

2.3k
1
10
21

Loading

Stack Exchange Network

Return to Answer