3
\$\begingroup\$

I have a dataframe df, which contains below data:

**customers**   **product**   **Val_id**
     1               A            1
     2               B            X
     3               C               
     4               D            Z

i have been provided 2 rules, which are as below:

**rule_id**   **rule_name**  **product value**  **priority**
   123              ABC             A,B               1
   456              DEF             A,B,D             2

Requirement is to apply these rules on dataframe df in priority order, customers who have passed rule 1, should not be considered for rule 2 and in final dataframe add two more columns rule_id and rule_name, i have written below code to achieve it:

val rule_name = when(col("product").isin("A","B"), "ABC").otherwise(when(col("product").isin("A","B","D"), "DEF").otherwise(""))
val rule_id = when(col("product").isin("A","B"), "123").otherwise(when(col("product").isin("A","B","D"), "456").otherwise(""))
val df1 = df_customers.withColumn("rule_name" , rule_name).withColumn("rule_id" , rule_id)
df1.show()

Final output looks like below:

**customers**   **product**   **Val_id**  **rule_name**  **rule_id**
     1               A            1           ABC            123
     2               B            X           ABC            123
     3               C               
     4               D            Z           DEF            456

Is there any better way to achieve it, adding both columns by just going though entire dataset once instead of going through entire dataset twice?

\$\endgroup\$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.