alex_bn_lee

导航

[1114] Extract the Dataframe from a table on the webpage

Here is the screenshot of the webpage.

Here are the copied texts.

 Legislation and compliance
 News and media
NSW Environment Protection Authority (EPA)
Your environment
Reporting, incidents and recovery programs
Licensing and Regulation
Working together
About us
Public registers
POEO Public Register
Contaminated land record of notices
About the record of notices
List of notified sites
Tips for searching
Disclaimer
Dangerous goods licences
Pesticide licences
Radiation licences
HomePublic registersContaminated land record of notices
Search results
Your search for:	Notice Type: Declaration of Significantly Contaminated Land	  Matched 266 notices relating to 256 sites.
    
Suburb	Address	Site Name	Notices related to this site
ALBURY	616-624 Young STREET	Xpress Service Station	6 former
ALBURY	441 Kiewa STREET	Former Gasworks and surrounding commercial land	2 current and 5 former
ALBURY	161 Fallon STREET	Former Thales Australia site, Albury	2 former
ALEXANDRIA	Off Huntley STREET	Alexandra Canal Sediments	2 current
ALEXANDRIA	10-24 Ralph STREET	Australia Post	2 former
ALEXANDRIA	Sydney Park ROAD	Sydney Park	2 current and 9 former
ARMIDALE	132 Niagara STREET	Former Mobil Depot	4 former
ASHFIELD	132 Liverpool Road STREET	7-Eleven Ashfield	4 former
BANKSMEADOW	16-20 Beauchamp ROAD	Orica Botany Groundwater Project	6 current and 28 former
BATHURST	71 Russell STREET	Former Gasworks	4 former
BEACON HILL	176 Warringah ROAD	Caltex Service Station	3 current and 2 former
BELROSE	56-58 Glen STREET	Glenrose Shopping Centre	2 current and 7 former
BLAKEHURST	390 Princes HIGHWAY	Woolworths Service Station Blakehurst	2 current
BOMADERRY	320 Princes HIGHWAY	Commercial Land	1 current and 4 former
BOOLAROO	13 Main STREET	Incitec Pivot	8 former
BOOLAROO	Lake ROAD	Pasminco Cockle Creek Smelter	3 current and 34 former
BOOROWA	63-69 Marsden STREET	Mobil Service Station	5 former
BOTANY	49-61 Stephen ROAD	Allnex	4 current and 2 former
BOWRAL	Merrigang STREET	Former Gasworks	3 current
BRANXTON	Part of 70 Maitland STREET	Former Service Station Branxton	2 current and 7 former
1 2 3 4 5 6 7 8 9 10 ...
Page 1 of 13
25 February 2025
 


For business and industry
Public registersDuty to notify pollution incidentsRecycling and reuseWasteLegislation and complianceEnvironment protection licencesGuide to licensingDangerous goods
For local government
Information and resources for local government
Contact us
131 555Onlineinfo@epa.nsw.gov.auEPA Office Locations
Accessibility
 
Disclaimer
 
Privacy
 
Copyright
Find us on 

Here are the scripts to generate the dataframe.

import pyautogui, pyperclip, time, os
from datetime import datetime
from bs4 import BeautifulSoup
from io import StringIO
import pandas as pd 

pyautogui.click(50, 150)    # click on the blank area
time.sleep(0.5)

pyautogui.hotkey('ctrl', 'a')   # select all
time.sleep(0.5)

pyautogui.hotkey('ctrl', 'c')   # copy  
time.sleep(0.5)

text = pyperclip.paste()   # paste
time.sleep(0.5)

# locate the indices of the table
start_ind = text.find("Suburb	Address")
end_ind = text.find("Page")

# extract the text of the table
text = text[start_ind:end_ind]
text_table = "\n".join(text.split("\n")[:-2])

# Convert the CSV text to a pandas DataFrame
df = pd.read_csv(StringIO(text_table), sep='\t')

# Display the DataFrame
df

Here is the output.

posted on 2025-02-25 12:28  McDelfino  阅读(9)  评论(0)    收藏  举报