Tutorial: Installing and Running nationalparksdata
This guide explains how to install and use the nationalparksdata Python package from the GitHub repository.
What This Package Does
nationalparksdata gathers National Park Service park data and annual visitation data, then combines it into a clean dataset for analysis.
The package can:
- Scrape fresh National Park data from the NPS API
- Build a final combined dataset
- Save output files as
.csv - Return a pandas DataFrame for analysis
Prerequisites
Before installing, make sure you have:
- Python 3.10 or newer installed
- pip installed
- Internet connection
- A National Park Service API key
Step 1: Install the Package
From inside the folder you want to save the data, run:
pip install nationalparksdataThis installs the package locally.
Step 2: Get an API Key
This project uses the National Park Service API.
Get a free API key from:
https://www.nps.gov/subjects/developer/get-started.htm
Step 3: Create a .env File
Inside the project folder, create a file named:
.env
Add this line:
API_KEY=your_api_key_here
Example:
API_KEY=abc123xyz456
Step 4: Run the Package
Option A: Full Refresh (Recommended)
Runs scraper + builds final dataset.
Create a Python file:
from nationalparksdata import refresh_dataset
df = refresh_dataset()
print(df.head())Run:
python your_file.pyOption B: Build Dataset From Existing Files Only
If raw files already exist:
from nationalparksdata import build_dataset
df = build_dataset()
print(df.head())Option C: Run Scraper Only
from nationalparksdata import run_scraper
run_scraper()Option D: Find Parks With Certain Activities
By passing a comma separated list of activities, the function will find the parks with all listed activities.
from nationalparksdata import parks_with_activity
parks_with_activity('horseback riding', 'hiking')Output Files
The package saves data into the data/ folder.
Typical outputs:
data/base_data.csv
data/csv_dictionary.json
data/final.csv
Using in Jupyter Notebook
You may also use the package in Jupyter:
from nationalparksdata import refresh_dataset
df = refresh_dataset()
df.head()Troubleshooting
Error: 403 Forbidden
Cause: API key missing or invalid.
Fix:
- Check
.envfile exists - Verify
API_KEY=line is correct
Error: Module Not Found
Reinstall package:
pip install .Selenium Errors
If browser automation fails, update Chrome and ChromeDriver.
Uninstalling
pip uninstall nationalparksdataNotes
This package was created for academic and data analysis purposes.