1

Using Spark's java_method (and getISOCountries from java.util.Locale), I try to access the list of all countries. I get no error, but the returned value looks like [Ljava.lang.String;@5a68a908. When I use a similar method to get currencies, I get it correctly.

from pyspark.sql import functions as F
df = spark.range(1).select(
    F.expr("java_method('java.util.Locale', 'getISOCountries')").alias('countries'),
    F.expr("java_method('java.util.Currency', 'getAvailableCurrencies')").alias('currencies'),
)
df.show(truncate=0)
# +----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# |countries                   |currencies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
# +----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# |[Ljava.lang.String;@5a68a908|[XPF, SDD, KZT, MXN, THB, XCD, KPW, BOB, DZD, PGK, CLF, BYN, LKR, BWP, IQD, UYU, HUF, PEN, SGD, HRK, XAU, MVR, SAR, UYI, ZWN, LVL, NLG, ZAR, ZWD, SLL, PLN, KGS, BIF, RUR, CHE, TRL, TMT, TZS, BTN, AFN, SRG, GIP, KMF, HTG, AYM, JMD, USS, RSD, SEK, XAF, XBD, UGX, YER, GRD, FKP, CAD, HKD, ILS, ADP, TJS, JPY, LUF, ZWL, DEM, ITL, BAM, BYR, ZMW, BRL, SRD, INR, STD, ETB, NPR, GHS, NZD, AWG, RON, RWF, XTS, NGN, COU, NIO, VEB, BEF, BYB, TRY, BGN, MUR, VUV, SYP, XPT, DOP, MRU, XBC, ISK, LTL, BMD, CSD, SZL, CRC, TWD, SDG, GHC, MZN, FIM, ESP, MGA, ALL, USN, XDR, ZMK, XBB, XPD, MTL, SOS, AZN, MMK, SLE, PHP, KWD, XOF, AED, MDL, CHF, CUP, BDT, JOD, BSD, CHW, HNL, XXX, STN, AZM, ANG, BND, VES, SSP, PYG, EEK, NOK, MWK, BZD, SBD, MAD, ROL, TOP, ZWR, VED, EUR, LAK, TPE, CUC, GBP, SHP, KHR, IEP, TTD, CZK, XBA, QAR, BBD, MXV, TND, VEF, OMR, KES, LSL, VND, PTE, XAG, KRW, LRD, GMD, UAH, IRR, SCR, UZS, XUA, FJD, BGL, CNY, DKK, TMM, CLP, ARS, PAB, RUB, GNF, SVC, GTQ, CYP, EGP, KYD, PKR, DJF, FRF, AFA, USD, CVE, MNT, LBP, XSU, SIT, NAD, BOV, GWP, MZM, ATS, MGF, IDR, MKD, WST, AOA, XFO, COP, MRO, XFU, YUM, GYD, MOP, GEL, SKK, AMD, LYD, MYR, ERN, AUD, BHD, CDF]|
# +----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

This is probably, because getISOCountries returns an array, while getAvailableCurrencies returns a set.

Is there a way to "parse" the returned getISOCountries array/list value so that I would get a proper country list?

4
  • i believe this can be posted as a new feature in spark. the java_method (and reflect) will always return a string. given getISOCountries returns a java array (which can't be directly represented as a string unless you Arrays.toString() it), it will always return that "error" when forced to be represented as a string. -- disclaimer - this is my understanding which is very little in Java.
    – samkart
    Commented Jun 14, 2023 at 4:36
  • So I assume you don't want to write a trivial wrapper for getISOCountries() in Java, right?
    – mazaneicha
    Commented Jun 14, 2023 at 13:23
  • @mazaneicha - If I understand correctly, such option would require me to do some part in Java and then connect the result with this Python/Spark code. Such option would not feel nice. But I'm starting to think I could write the whole Spark logic using only Java instead of Python...
    – ZygD
    Commented Jun 14, 2023 at 14:05
  • Yes, I mean in Java converting String[] to List<String> is trivial. But if you can write the whole thing in Java that'd be best.
    – mazaneicha
    Commented Jun 14, 2023 at 14:23

1 Answer 1

1

Without using java_method, one way to accomplish this would be something like this:

>>> x = sc._jvm.java.util.Locale.getISOCountries()
>>> xx = ', '.join(x)
>>> spark.range(1).withColumn('countries',lit(xx)).show()
+---+--------------------+
| id|           countries|
+---+--------------------+
|  0|AD, AE, AF, AG, A...|
+---+--------------------+

Worth noting that "countries" (as well as "currencies" in your original df) is a StringType column, not ArrayType.

3
  • Thanks! After some tests, I may prefer spark.range(1).withColumn('countries', F.lit(x)). This way an array of strings is created, which would probably be a Spark's equivalent to the Java array.
    – ZygD
    Commented Jun 15, 2023 at 6:05
  • You welcome and good luck! Would be great if you can post your final solution because I have a bit of a doubt F.lit(x) is going to work.
    – mazaneicha
    Commented Jun 15, 2023 at 15:39
  • 1
    My final result has 2 lines: x = spark.sparkContext._jvm.java.util.Locale.getISOCountries() and df = spark.range(1).withColumn('countries', F.lit(x)). It works probably because x is of type <class 'py4j.java_collections.JavaArray'>. The final schema (df.dtypes) is [('id', 'bigint'), ('countries', 'array<string>')].
    – ZygD
    Commented Jun 16, 2023 at 3:42

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.