ZhangZhihui's Blog  

 

jdbc_url = "jdbc:postgresql://localhost:5432/postgres"
properties = {
    "user": "postgres",
    "password": "postgrespwd",
    "driver": "org.postgresql.Driver"
}

table_name = "player_seasons"

df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)

df.show()

When I ran above code, I got below error:

Py4JJavaError: An error occurred while calling o28.jdbc. : java.lang.ClassNotFoundException: org.postgresql.Driver

 

You need the PostgreSQL JDBC JAR.


Step 1: Download the JDBC driver

Go to the PostgreSQL site, or directly:

wget https://jdbc.postgresql.org/download/postgresql-42.7.2.jar

You can also use any recent 42.x.x version.


Step 2: Make the JAR available to Spark

There are three common ways:

Option 1: Copy to Spark jars/ directory

cp postgresql-42.7.2.jar $SPARK_HOME/jars/

Spark automatically loads all JARs in that folder.


Option 2: Pass the JAR when creating SparkSession

df = (SparkSession.builder
    .appName("PostgreSQL Test")
    .config("spark.jars", "/home/frank/postgresql-42.7.2.jar")
    .getOrCreate()
)

Now your spark.read.jdbc(...) will find the driver.


Option 3: Use Maven coordinates (auto-download)

spark = (SparkSession.builder
    .appName("PostgreSQL Test")
    .config("spark.jars.packages", "org.postgresql:postgresql:42.7.2")
    .getOrCreate()
)

Spark will automatically download the JAR from Maven.


Step 3: Retry reading the table

jdbc_url = "jdbc:postgresql://localhost:5432/postgres"
properties = {
    "user": "postgres",
    "password": "postgrespwd",
    "driver": "org.postgresql.Driver"
}

table_name = "player_seasons"

df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)

df.show()

Now it should work. ✅


⚡ Note

    • Make sure your PostgreSQL is accessible from WSL.
      If PostgreSQL is running on Windows, use the WSL IP of Windows host, e.g., jdbc:postgresql://<Windows-IP>:5432/postgres
      localhost inside WSL points to WSL itself, not Windows.

 

jdbc_url = "jdbc:postgresql://172.26.64.1:5432/postgres"
properties = {
    "user": "postgres",
    "password": "postgrespwd",
    "driver": "org.postgresql.Driver"
}

table_name = "player_seasons"

df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)

df.show()

 

I chose option 1, do I need to restart Spark after copying the jar file? I have changed 'localhost' to Windows-IP, the code still got the same error.
 
ChatGPT said:

Yes — you must restart Spark after copying the JAR into $SPARK_HOME/jars/.

Here’s why:

  • When you start Spark, it loads all JARs in $SPARK_HOME/jars/ into the JVM classpath.

  • If you copy a new JAR after Spark is already running, the running JVM cannot see it.

  • This is exactly why your code still shows:

java.lang.ClassNotFoundException: org.postgresql.Driver

✅ Correct steps after copying the JAR

  1. Stop any running Spark master or workers:

  1. Start Spark master & worker again:

  1. Restart your Jupyter kernel (or Python process):

  1. Then run your JDBC code again.

 
[Stage 0:>                                                          (0 + 1) / 1]
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+
|player_name|age|height|weight|college|country|draft_year|draft_round|draft_number| gp|pts|reb|ast|netrtg|oreb_pct|dreb_pct|usg_pct|ts_pct|ast_pct|season|
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+

 

posted on 2025-11-22 21:47  ZhangZhihuiAAA  阅读(5)  评论(0)    收藏  举报