README

An automated, Python-based data pipeline designed to ingest, clean, and analyze heavily formatted multi-table spreadsheets.

This project transforms raw, layout-shifted library administrative logs into organized, query-ready datasets and compiles them into a clean web-based dashboard.

The Challenge

Municipal and public services datasets often come in formatted Excel Workbooks rather than raw database dumps. These reports are designed for human eyes, not computers.

Key Obstacles Solved:

The "All-in-One" Sheet: Multiple distinct tables (Registration, Circulation, Event Attendance, and Computer Usage) were stacked vertically within a single spreadsheet tab. Layout Shifts: Shifting headers, empty rows, definitions, and floating "Totals" rows made simple programmatic loading impossible. Missing Identifiers (Implicit Hierarchy): County labels were only written once at the top of a group of towns, leaving the remaining rows blank (NaN).

The Goal:

Build a resilient script to slice these stacked tables, repair implicit labels, normalize numbers, and package them into an interactive dashboard.

Pipeline Design & Features

The pipeline is written in Python using a clean, readable sequence of extraction, transformation, visualization, and output compilation.

1. Extract & Align (Label Recovery)

Instead of hardcoding index ranges—which break as soon as next year's data adds a town—the script programmatically identifies boundary flags. It then uses forward-filling (ffill()) to cascade implicit county names down to each municipal row.

2. Segment & Normalize (Transform)

Using the dynamically assigned structural markers, the pipeline splits the master document into four clean, normalized tables:

Registration: Town-by-town service populations, library card numbers, and active users.

Circulation: A multi-year tracking index (2019–2023) separating physical vs. digital checkouts.

Event Attendance: Tracking program hosting counts and community attendance metrics.

Computer Use: Evaluating digital divide metrics like overall sessions and workstation minutes.

During this stage, non-numeric values are programmatically isolated, missing entries are filled with zeros, and column names are standardized.

4. Visual Discovery (Analytics & Dashboard)

Using Plotly Express, the engine automatically constructs four interactive visual models:

Ranked Bar Chart: Sorting town branches by total service population.
County Comparison: Grouping and summarizing regional populations.
Active Card Share: A distribution map of library user market share.
Population vs. Active Users: An interactive scatter plot comparing community size to engagement.

The pipeline wraps these figures in a standalone, responsive HTML grid (Library_Data_Dashboard.html), generating a polished executive portal without requiring a web server.

Analytics Highlights (What the Data Revealed)

The Pandemic Pivot

Physical checkouts dropped by -24.4% between 2019 and 2023, while Digital Circulation surged by +87.3% as patrons leaned into ebooks and remote access.

Event Recovery

Community events proved highly resilient, achieving a 114.9% recovery index score relative to pre-pandemic baselines. Workstation Gap: Workstation sessions remained at 51.2% of 2019 levels, suggesting a lingering shift in public computer usage patterns.

Repository Structure

├── data/

│ ├── Library-County-Details-2023.xlsx # Raw multi-table Excel source

│ └── Clean_Library_Registration_2023.csv # Pipeline registration export

├── outputs/

│ └── Library_Data_Dashboard.html # Generated standalone dashboard

├── library_registration_data_part5.ipynb # Main Jupyter Notebook pipeline

├── requirements.txt # Libraries (pandas, openpyxl, plotly)

└── README.md # Project documentation

Running the Project

1. Installation

Clone the repository and install the dependencies: git clone https://github.com/yourusername/library-etl-dashboard.git cd library-etl-dashboard pip install -r requirements.txt

2. Execution

Open the Jupyter Notebook in your preferred environment (VS Code, JupyterLab, or Google Colab) and run all cells.

The notebook will instantly: Load and process Library-County-Details-2023.xlsx. Generate the cleaned .csv database file. Build the interactive visualizations. Export the responsive web dashboard file directly to your workspace.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Clean_Library_Registration_2023.csv		Clean_Library_Registration_2023.csv
Library-County-Details-2023.xlsx		Library-County-Details-2023.xlsx
Library_Data_Dashboard.html		Library_Data_Dashboard.html
README.md		README.md
Regional_Library_Analytics_Report_2023.xlsx		Regional_Library_Analytics_Report_2023.xlsx
library_registration_data.ipynb		library_registration_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

An automated, Python-based data pipeline designed to ingest, clean, and analyze heavily formatted multi-table spreadsheets.

The Challenge

Key Obstacles Solved:

The Goal:

Pipeline Design & Features

1. Extract & Align (Label Recovery)

2. Segment & Normalize (Transform)

4. Visual Discovery (Analytics & Dashboard)

Analytics Highlights (What the Data Revealed)

The Pandemic Pivot

Event Recovery

Repository Structure

Running the Project

1. Installation

2. Execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

README

An automated, Python-based data pipeline designed to ingest, clean, and analyze heavily formatted multi-table spreadsheets.

The Challenge

Key Obstacles Solved:

The Goal:

Pipeline Design & Features

1. Extract & Align (Label Recovery)

2. Segment & Normalize (Transform)

4. Visual Discovery (Analytics & Dashboard)

Analytics Highlights (What the Data Revealed)

The Pandemic Pivot

Event Recovery

Repository Structure

Running the Project

1. Installation

2. Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages