ZhangZhihui's Blog  

Spark installed on WSL.

Java installed on WSL.

Jupyter ran on WSL:

jupyter-lab --no-browser

 

Code ran in Windows browser:

spark = (SparkSession.builder
                     .master('spark://ZZHPC.localdomain:7077')
                     .appName('Read Data from PostgreSQLL')
                     .getOrCreate())

 25/11/22 20:06:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable TypeError: 'JavaPackage' object is not callable

 

Root causes remaining:

1️⃣ Spark master unreachable (most common)

You're using:

.master("spark://ZZHPC.localdomain:7077")

WSL hostname resolution is often wrong.

Run in WSL:

ping ZZHPC.localdomain

If it resolves to 127.0.1.1broken.

Then try:

ping ZZHPC 
hostname -I

If Spark master is actually using 10.255.255.254,

you must connect with:

.master("spark://10.255.255.254:7077")

Hostname mismatch → Java gateway fails → JavaPackage error.


2️⃣ No worker registered

Open Spark master UI:

http://ZZHPC.localdomain:8080

(or use the IP version)

Check:

  • Workers = 0 → cluster can't run → JavaPackage errors inside SparkContext

Start worker:

$SPARK_HOME/sbin/start-worker.sh spark://ZZHPC.localdomain:7077

Or using IP:

$SPARK_HOME/sbin/start-worker.sh spark://10.255.255.254:7077

3️⃣ Spark version mismatch (common in WSL setups)

Run:

pyspark --version

and

$SPARK_HOME/bin/spark-shell --version

If versions differ → classloader breaks → JavaPackage errors.


4️⃣ Python kernel in Jupyter is NOT your WSL Python

This happens if you selected a Windows Python kernel accidentally.

Inside a Jupyter cell, run:

import platform, sys platform.system(), sys.executable

You should see:

('Linux', '/usr/bin/python3')

If you see:

('Windows', 'C:\\...python.exe')

→ you are using a Windows kernel (wrong!) even though Jupyter was started in WSL.

This will DEFINITELY produce the 'JavaPackage' error.

Fix:
Select the correct kernel:

  • In Jupyter: Kernel → Change Kernel → Python (WSL)

  • Or ensure you launched Jupyter from the virtual environment in WSL.


🎯 Let’s run 3 quick diagnostics to identify the real cause

Please run these inside your notebook:

Test 1: Check OS of the Python kernel

import platform, sys
platform.system(), sys.executable

Test 2: Check if PySpark can find the Java gateway

import pyspark
print(pyspark.__version__)

Test 3: Check Spark master connectivity

In WSL terminal:

curl http://ZZHPC.localdomain:8080

Then try:

curl http://10.255.255.254:8080

📌 Summary

Since Jupyter server is in WSL,
you are very close to a working setup.
The remaining likely causes are:

  1. Wrong Python kernel (Windows kernel selected)

  2. Wrong master URL (hostname resolves badly)

  3. Worker not running

  4. Spark version mismatch

Send me the results of the 3 tests,
and I’ll pinpoint the exact cause in one message.

 

'ping ZZHPC.localdomain' and 'ping ZZHPC' both succeeded, output of 'hostname -I' is 172.26.76.164 connect with '.master("spark://10.255.255.254:7077")' also failed.
 
ChatGPT said:

Thanks — your results tell me something very important:

  • Your WSL machine IP is 172.26.76.164

  • Spark WARN message earlier showed Spark was using 10.255.255.254

  • Your hostname ZZHPC.localdomain resolves correctly (so DNS works)

  • But the cluster IP (10.255.255.254) does not match your WSL IP (172.26.x.x)

This means:

🚨 Your Spark master is binding to the wrong network interface.

This is the real root cause.

And when PySpark tries to talk to that wrong address, SparkContext fails to initialize → leading to:

TypeError: 'JavaPackage' object is not callable

because the JVM gateway never finishes loading.


Why Spark is using the wrong IP (10.255.x.x)

Your earlier Spark logs said:

Your hostname resolves to 127.0.1.1 using 10.255.255.254 instead (on interface lo)

This is Spark guessing a fallback interface.

But on WSL2,
the correct interface is almost always eth0 with the IP:

172.26.76.164

So your Spark master should be:

spark://172.26.76.164:7077

But Spark is currently binding to:

10.255.255.254

which is an internal interface that PySpark cannot reach.


✅ Solution: Force Spark to bind to the correct IP

Edit (or create) this file:

$SPARK_HOME/conf/spark-env.sh

Add these lines:

SPARK_LOCAL_IP=172.26.76.164 
SPARK_MASTER_HOST=172.26.76.164

Save it.

Then restart Spark:

$SPARK_HOME/sbin/stop-master.sh
$SPARK_HOME/sbin/stop-worker.sh
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/start-worker.sh spark://172.26.76.164:7077

Verify:

curl http://172.26.76.164:8080

You should now see the master UI showing Workers = 1.


Then connect PySpark like this:

spark = ( SparkSession.builder .master("spark://172.26.76.164:7077") .appName("test") .getOrCreate() )

Now SparkSession should start normally.


🎯 Summary

Your Spark master was binding to a wrong/invalid interface (10.255.x.x).
Fix by adding:

SPARK_LOCAL_IP=172.26.76.164
SPARK_MASTER_HOST=172.26.76.164

Then restart Spark and workers.

After this, the Java gateway will start correctly and the 'JavaPackage' error disappears.


If you want, show me the output of:

ifconfig

and I can confirm the exact interface Spark should bind to.

 
You said:
I did all the steps, but still got the TypeError: 'JavaPackage' object is not callable 

ifconfig output is as below:

frank@ZZHPC:/opt/spark-3.5.7-bin-hadoop3/conf$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.26.76.164  netmask 255.255.240.0  broadcast 172.26.79.255
        inet6 fe80::215:5dff:fe90:982d  prefixlen 64  scopeid 0x20<link>
        ether 00:15:5d:90:98:2d  txqueuelen 1000  (Ethernet)
        RX packets 65969  bytes 58202948 (58.2 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 63637  bytes 4665856 (4.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 8876  bytes 7533071 (7.5 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8876  bytes 7533071 (7.5 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

ChatGPT said:

Thanks — the ifconfig output shows that your correct network interface is eth0 with IP 172.26.76.164, and lo is just the loopback.

Since you already set:

SPARK_LOCAL_IP=172.26.76.164
SPARK_MASTER_HOST=172.26.76.164

and restarted the master/worker, PySpark should now be able to connect.

The fact that you still get:

TypeError: 'JavaPackage' object is not callable

tells me that the problem is not network anymore, but something else in your setup is broken, most likely:


1️⃣ Windows vs WSL Python kernel

Even though Jupyter is started in WSL, you must confirm the kernel is WSL Python, not a Windows Python kernel.

In a notebook cell, run:

import platform, sys
platform.system(), sys.executable

Expected:

('Linux', '/usr/bin/python3') # or your WSL venv

If it says 'Windows', PySpark will fail with 'JavaPackage' object is not callable.

If it is Windows, change the kernel:

  • Kernel → Change Kernel → select WSL Python / your WSL virtual environment.


2️⃣ PySpark cannot find Java

Check Java inside WSL:

java -version
echo $JAVA_HOME
  • Should show Java 11/17 (OpenJDK)

  • JAVA_HOME must point to that JDK

If Java is missing or JAVA_HOME is wrong → JVM gateway fails → 'JavaPackage' errors.


3️⃣ PySpark version mismatch

Run in WSL Python:

python3 -c "import pyspark; print(pyspark.__version__)"

 

Compare with:

$SPARK_HOME/bin/spark-submit --version

They must match exactly. If they don’t, the JVM fails to load classes → same error.


4️⃣ Minimal test to isolate

Try a minimal PySpark session without specifying master (local mode):

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("test") \
    .master("local[*]") \
    .getOrCreate()

print(spark.version)
  • If this works → your PySpark installation and Java are fine

  • If it fails → issue is PySpark/Java installation


🔑 Most common cause at this stage

Based on everything:

  1. You are in WSL, network is fixed → master should be reachable

  2. Java exists in WSL → verify $JAVA_HOME

  3. Most likely culprit: PySpark in your notebook is using a Windows Python kernel, not WSL Python.

Fix that first — this alone almost always resolves 'JavaPackage' object is not callable.


If you want, I can give you exact steps to ensure Jupyter in WSL uses the correct Python kernel so PySpark connects to the WSL Spark master properly.

Do you want me to do that?

 
You said:
Check result of Step 1: 
('Linux', '/home/frank/zvenv/bin/python3') 

Check result of Step 2: 
openjdk version "17.0.16" 2025-07-15
OpenJDK Runtime Environment (build 17.0.16+8-Ubuntu-0ubuntu124.04.1)
OpenJDK 64-Bit Server VM (build 17.0.16+8-Ubuntu-0ubuntu124.04.1, mixed mode, sharing)
/usr/lib/jvm/java-17-openjdk-amd64

Check result of Step 3: 
(zvenv) frank@ZZHPC:~$ python3 -c "import pyspark; print(pyspark.__version__)"
4.0.1

So the cause is probably version mismatch. I'll install pyspark-3.5.7.

 

Exactly — that explains everything.

Your setup:

ComponentVersion
Spark 3.5.7
PySpark 4.0.1
Java 17.0.16

PySpark 4.0.1 is meant for Spark 4.x.
Using it with Spark 3.5.7 causes the JVM class mismatch → 'JavaPackage' object is not callable even before you connect to any data.


✅ Correct fix

Install the PySpark version that matches your Spark installation (3.5.7):

pip install --upgrade --force-reinstall pyspark==3.5.7

Then verify:

python3 -c "import pyspark; print(pyspark.__version__)"
# should output 3.5.7

After this, your SparkSession code should work:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .master("spark://172.26.76.164:7077") \
    .appName("Read Data") \
    .getOrCreate()

print(spark.version)

You should see 3.5.7 and no 'JavaPackage' error.


💡 Tip: Always make sure PySpark version matches the installed Spark version.
Even a minor mismatch can break the JVM initialization.

 

You said:
The error is gone. Thank you very much.
 
ChatGPT said:

Perfect! 🎉

Glad it’s finally working.

 

posted on 2025-11-22 20:54  ZhangZhihuiAAA  阅读(2)  评论(0)    收藏  举报