Skip to content

add rss feed scraping#7

Open
Oliviayuuuu317 wants to merge 13 commits into
mainfrom
olivia/rss_feed
Open

add rss feed scraping#7
Oliviayuuuu317 wants to merge 13 commits into
mainfrom
olivia/rss_feed

Conversation

@Oliviayuuuu317

@Oliviayuuuu317 Oliviayuuuu317 commented Apr 22, 2026

Copy link
Copy Markdown

Overview

Added rss_feed.go to scrape recent concert information from Chimes' rss feed (https://apps.chimes.cornell.edu/music/rss.xml)

Changes Made

  • created models for concert days, time slots, and songs
  • added rss feed scraping and parsing into structured models

Test Coverage

  • ran the main method and printed the scraped results

Summary by CodeRabbit

  • New Features
    • Added an RSS scraper to import concert updates, parse titles/dates, and convert descriptions into Morning/Afternoon/Evening time slots with song/artist/source details.
    • Links imported songs to the correct session, avoiding duplicate associations.
    • Added session-based kudos counting to return per-type totals.
  • Chores
    • Improved database setup by loading environment variables from a .env file and expanding migrations for sessions, songs, session-song links, and kudos.

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@tnt07-t, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 51 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84a7a809-4c07-4d44-8dca-ce7a0fa832a6

📥 Commits

Reviewing files that changed from the base of the PR and between 808332f and 124c902.

📒 Files selected for processing (11)
  • Dockerfile
  • README.md
  • cmd/rss_scraper/main.go
  • controllers/notifications.go
  • docker-compose.yml
  • main.go
  • models/rss_feed.go
  • models/sessions.go
  • models/songs.go
  • models/users.go
  • services/notificationservice.go
📝 Walkthrough

Walkthrough

A new Go RSS scraper program fetches Cornell's music feed, parses XML items into time-slot grouped structures (Morning, Afternoon, Evening), and persists sessions, songs, and session-song relationships to the database. The system extracts song metadata (title, artist, source) from HTML-formatted descriptions and includes kudo/rating infrastructure.

Changes

RSS Feed Scraper with Sessions and Kudos

Layer / File(s) Summary
Data models and structs
models/rss_feed.go, models/sessions.go, models/songs.go, models/kudos.go
XML-mapped RSS structures (RSS, Channel, Item), domain models for parsed content (ParsedSong, TimeSlot), session and time-of-day enums, song and session-song persistence, and kudo/rating types (KudoType, Kudo, KudoCount).
Parsing utilities and functions
models/rss_feed.go
HTML tag stripping via regex, song-line parsing (extracts optional artist via " / " split and source via (from "…") regex), and description-to-slot parsing (splits on breaks/newlines, recognizes Morning/Afternoon/Evening headers, aggregates songs by slot).
Database operations and setup
models/sessions.go, models/songs.go, models/kudos.go, models/setup.go
Get-or-create patterns for sessions (by day and time-of-day) and songs (by name and artist), daily session creation helper, kudo-count aggregation query with join and grouping, and database setup updates (.env loading and auto-migration of new models).
Main scraper program
cmd/rss_scraper/main.go
Orchestrates RSS fetching via HTTP GET, XML unmarshaling, per-item date parsing, description normalization and parsing into time slots, session and song get-or-create logic, and SessionSong insertion with conflict-do-nothing semantics; includes per-record error logging and parseConcertDate helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A feast of songs from a distant feed,
Parsed slot by slot with careful speed,
Morning, afternoon, and evening's call,
Sessions and songs, persisted for all!
The rabbit hops through the scraper's delight! 🎵

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers Overview and Changes Made but lacks Test Coverage details and uses non-standard testing claims without clear repro steps or validation methods. Clarify test coverage by specifying repro steps, expected output, and how to verify the scraping works correctly with actual or sample RSS data.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add rss feed scraping' directly and clearly summarizes the main change—adding RSS feed scraping functionality to parse concert information from Chimes' RSS feed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch olivia/rss_feed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
models/rss_feed.go (2)

104-138: No test coverage.

The PR description notes testing was done by running main and eyeballing output. Since parseSong and parseDescription are pure functions operating on strings, they're easy to unit-test with a small fixture of the feed's description XML. This will lock down the (from "..."), " / ", and <br> handling against future RSS format drift.

Want me to draft rss_feed_test.go with table-driven tests for parseSong and parseDescription?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 104 - 138, The PR lacks unit tests for the
pure string-parsing functions; add a new table-driven test file
(rss_feed_test.go) that exercises parseSong and parseDescription with
representative description HTML snippets from the feed (covering "(from
\"...\")" source parsing, " / " artist separators, <br> line breaks, and edge
cases like missing fields), asserting expected Song and TimeSlot outputs; name
tests clearly, include fixtures for multiple item descriptions, and use subtests
to make failures actionable so future RSS format drift is caught by CI.

72-102: Shadowed current and fragile <br> splitting.

Two readability/robustness concerns:

  1. The outer var current *TimeSlot (line 74) is immediately shadowed by the loop variable current (line 77) used as a string. It happens to work because the pointer is reassigned on line 96 before any read, but reusing the name makes this code confusing and brittle to future edits. Rename one of them (e.g., chunk for the loop variable).
  2. strings.Split(desc, "<br>") won't match <br/>, <br />, or <BR>, which RSS descriptions frequently contain. A single regex split handles all variants.
🔧 Suggested fix
-var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var brPattern  = regexp.MustCompile(`(?i)<br\s*/?>`)
@@
-	var lines []string
-	for _, current := range strings.Split(desc, "<br>") {
-		for _, line := range strings.Split(current, "\n") {
-			lines = append(lines, line)
-		}
-	}
+	var lines []string
+	for _, chunk := range brPattern.Split(desc, -1) {
+		lines = append(lines, strings.Split(chunk, "\n")...)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 72 - 102, In parseDescription, avoid
shadowing the outer pointer variable current and make <br> splitting robust:
replace the loop "for _, current := range strings.Split(desc, \"<br>\")" with a
loop over chunks (e.g., "for _, chunk := range ...") and split desc using a
case-insensitive regex that matches "<br>", "<br/>", "<br />" variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@models/rss_feed.go`:
- Around line 52-69: parseSong currently only applies originPattern to the
title, so lines like `Title / Artist (from "X")` leave Song.Source empty and the
origin stuck in Song.Artist; update parseSong to check originPattern against
both title and artist: first attempt to extract source from title (using
originPattern.FindStringSubmatch and ReplaceAllString), and if not found, try
the same extraction on artist, removing the matched text from artist when found;
ensure you TrimSpace both title and artist before assigning Song.Title and
Song.Artist and prefer the source found in title over one found in artist.
- Around line 104-118: The current main function uses http.Get (which uses
http.DefaultClient with no timeout), panics on errors, and feeds non-2xx
responses straight to xml.Unmarshal for the RSS type; change this to use an
http.Client with a reasonable Timeout (e.g., 10s), replace http.Get with
client.Do(req), check resp.StatusCode and return a descriptive error for any
non-2xx status before reading the body, read and unmarshal into the RSS struct
but propagate errors (return error) instead of calling panic so callers can
handle retries/logging; update the function signature (e.g., fetchRSS or
NewRSSFromURL) to return (*RSS, error) and reference the main function, http.Get
usage, resp.StatusCode check, xml.Unmarshal, and RSS type when making these
changes.
- Around line 1-10: The file currently declares "package main" which conflicts
with other files in the models directory; change the package declaration to
"package models" and refactor by moving only library code (type definitions and
helper functions) into a new models file (e.g., rss.go) under package models,
and create a separate command entrypoint (cmd/rss_scraper/main.go) with "package
main" that imports models and implements main() to fetch the feed and call the
library functions; ensure exported types/functions keep capitalized names used
by the new main and update any imports accordingly (look for the package
declaration and functions that parse/fetch RSS to split between models and the
cmd main).

---

Nitpick comments:
In `@models/rss_feed.go`:
- Around line 104-138: The PR lacks unit tests for the pure string-parsing
functions; add a new table-driven test file (rss_feed_test.go) that exercises
parseSong and parseDescription with representative description HTML snippets
from the feed (covering "(from \"...\")" source parsing, " / " artist
separators, <br> line breaks, and edge cases like missing fields), asserting
expected Song and TimeSlot outputs; name tests clearly, include fixtures for
multiple item descriptions, and use subtests to make failures actionable so
future RSS format drift is caught by CI.
- Around line 72-102: In parseDescription, avoid shadowing the outer pointer
variable current and make <br> splitting robust: replace the loop "for _,
current := range strings.Split(desc, \"<br>\")" with a loop over chunks (e.g.,
"for _, chunk := range ...") and split desc using a case-insensitive regex that
matches "<br>", "<br/>", "<br />" variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb3482ee-4b12-43b5-abae-f65f4c39deb7

📥 Commits

Reviewing files that changed from the base of the PR and between e92d7a3 and fc7ef51.

📒 Files selected for processing (1)
  • models/rss_feed.go

Comment thread models/rss_feed.go Outdated
Comment thread models/rss_feed.go Outdated
Comment thread models/rss_feed.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
models/rss_feed.go (1)

52-102: ⚡ Quick win

Add table-driven tests for parser edge cases.

Given the format-sensitive parsing (parseSong/parseDescription), please add tests for delimiter variants (<br>, <br/>, <br />), source-in-artist cases, and blank/noise lines to prevent regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/rss_feed.go` around lines 52 - 102, Add table-driven tests to cover
edge cases for the parseSong and parseDescription functions. Create test cases
that verify different HTML line break variants (such as <br>, <br/>, and <br />
in the parseDescription function), scenarios where the source pattern appears in
the artist portion of the song string in parseSong, and handling of blank or
noise lines throughout the parsing flow. Use Go's testing patterns to define
test slices with input and expected output pairs, then iterate through them to
verify each parser function handles these edge cases correctly and prevent
future regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/rss_feed.go`:
- Around line 77-81: The description parsing loop in the RSS feed tokenization
only splits on the literal string "br" but RSS feeds commonly contain "br/" and
"br /" variants that should also be treated as line breaks. Before the nested
loop that splits on newlines, add additional handling to normalize all br tag
variants (including "br/", "br /", and "br") into a common delimiter so the
subsequent split operations properly tokenize the description text and prevent
headers and songs from incorrectly merging together.

---

Nitpick comments:
In `@models/rss_feed.go`:
- Around line 52-102: Add table-driven tests to cover edge cases for the
parseSong and parseDescription functions. Create test cases that verify
different HTML line break variants (such as <br>, <br/>, and <br /> in the
parseDescription function), scenarios where the source pattern appears in the
artist portion of the song string in parseSong, and handling of blank or noise
lines throughout the parsing flow. Use Go's testing patterns to define test
slices with input and expected output pairs, then iterate through them to verify
each parser function handles these edge cases correctly and prevent future
regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 992650c7-692f-457f-89fd-1fec47def741

📥 Commits

Reviewing files that changed from the base of the PR and between fc7ef51 and 90e650d.

📒 Files selected for processing (1)
  • models/rss_feed.go

Comment thread models/rss_feed.go Outdated
tnt07-t added 5 commits June 21, 2026 00:56
- Import models from tran/chimes_sessions branch
- Add Song, Session, SessionSong, Kudo, KudoType structs
- Implement GetOrCreateSong and GetOrCreateSession helpers
- Add RSS scraper to parse and store concert data + make db insertions idempotent
Dates from RSS were being interpreted as UTC instead of Eastern Time,
causing them to appear one day early in the database. Use ParseInLocation
to correctly parse as America/New_York timezone
…dOrCreateUser error handling

- Use middleware.UIDFrom + GetUserByFirebaseUID to correctly resolve the
  authenticated user in RegisterFCMToken and SendTestNotification instead
  of c.GetUint(userID) which always returned 0
- Distinguish gorm.ErrRecordNotFound from other DB errors in FindOrCreateUser
  so real DB failures aren't silently treated as missing users
Replace log.Printf with log.Fatalf for DB and Firebase init errors so the
server exits immediately instead of continuing with nil dependencies that
cause panics on the first request.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
models/rss_feed.go (2)

67-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle all <br> variants during description tokenization.

On Line 67-Line 71, splitting only on literal <br> misses <br/> and <br />, which can merge headers and songs into malformed slots.

Proposed fix
+var brPattern = regexp.MustCompile(`(?i)<br\s*/?>`)
+
 func ParseDescription(desc string) []TimeSlot {
 	var slots []TimeSlot
 	var current *TimeSlot
 
 	var lines []string
-	for _, current := range strings.Split(desc, "<br>") {
-		for _, line := range strings.Split(current, "\n") {
+	for _, segment := range brPattern.Split(desc, -1) {
+		for _, line := range strings.Split(segment, "\n") {
 			lines = append(lines, line)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/rss_feed.go` around lines 67 - 71, The nested loop structure in the
description tokenization code that splits on literal `<br>` only handles one
variant of HTML break tags and misses `<br/>` and `<br />`, causing headers and
songs to be improperly merged. Before splitting the description in the loop
starting with `for _, current := range strings.Split(desc, "<br>")`, normalize
all HTML break tag variants by using a regex replacement or similar method to
convert all occurrences of `<br>`, `<br/>`, and `<br />` (and their spacing
variations) to a consistent delimiter like newline characters, then perform the
split operations on the normalized string.

43-58: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Extract Source before splitting title/artist.

On Line 46-Line 55, (from "...") is only parsed from title, so feeds like Title / Artist (from "X") persist incorrect Artist and empty Source.

Proposed fix
 func parseSong(line string) ParsedSong {
 	song := ParsedSong{}
 
+	if match := originPattern.FindStringSubmatch(line); match != nil {
+		song.Source = match[1]
+		line = strings.TrimSpace(originPattern.ReplaceAllString(line, ""))
+	}
+
 	title, artist, found := strings.Cut(line, " / ")
 	if found {
 		song.Artist = strings.TrimSpace(artist)
 	}
 
-	match := originPattern.FindStringSubmatch(title)
-
-	if match != nil {
-		song.Source = match[1]
-		title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))
-	}
-	song.Title = title
+	song.Title = strings.TrimSpace(title)
 	return song
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/rss_feed.go` around lines 43 - 58, The parseSong function currently
performs the title and artist split using strings.Cut before extracting the
source pattern, which causes the source to be missed if it appears after the
artist portion (e.g., "Title / Artist (from \"X\")"). Move the source pattern
extraction logic using originPattern.FindStringSubmatch and
originPattern.ReplaceAllString to execute before the strings.Cut operation, so
that the source is extracted from the entire input line first, then the cleaned
line is split into title and artist components.
🧹 Nitpick comments (1)
cmd/rss_scraper/main.go (1)

43-43: ⚡ Quick win

Prefer full HTML entity decoding over manual replacement.

Manual replacement misses common entities (for example &quot;), which can degrade song/source parsing quality.

Suggested refactor
 import (
 	"encoding/xml"
+	"html"
 	"io"
@@
-	cleanHTML := strings.NewReplacer("&lt;", "<", "&gt;", ">", "&amp;", "&").Replace(item.Description)
+	cleanHTML := html.UnescapeString(item.Description)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/rss_scraper/main.go` at line 43, The manual HTML entity decoding using
strings.NewReplacer for the cleanHTML variable assignment only handles three
entities (lt, gt, amp) and misses common ones like quot, causing parsing issues.
Replace the strings.NewReplacer call with the html.UnescapeString function from
Go's standard html package, which properly handles all HTML entities. Make sure
to import the html package and update the cleanHTML assignment to use
html.UnescapeString(item.Description) instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rss_scraper/main.go`:
- Around line 20-29: The HTTP request using http.Get() lacks a deadline and the
response status code is never validated before parsing the body, which means
non-OK responses will be treated as valid RSS payload. Replace the http.Get()
call with http.NewRequestWithContext() using a context with a timeout, then
check that resp.StatusCode is http.StatusOK before proceeding to read and parse
the response body. Additionally, at line 81 where time.LoadLocation() is called,
capture and handle the returned error instead of silently ignoring it, as this
error will cause subsequent time.ParseInLocation() calls to fail without proper
diagnostics.
- Around line 81-82: In the parseConcertDate function, the error returned by
time.LoadLocation is currently being ignored (assigned to underscore), which
means if location loading fails, loc will be nil and time.ParseInLocation will
silently fall back to UTC instead of the intended America/New_York timezone.
Capture the error from time.LoadLocation by assigning it to a named variable
instead of the blank identifier, check if the error is not nil, and return the
error immediately to the caller rather than continuing with the parsing
operation. Since parseConcertDate already has an error return type, this error
propagation will be properly handled by callers.

In `@models/setup.go`:
- Line 42: The Session and Song structs lack unique constraints to prevent
duplicate rows from concurrent scraper runs using read-then-insert patterns. Add
GORM uniqueIndex struct tags to enforce database-level constraints: in the
Session struct, add uniqueIndex tags with a composite constraint name to the Day
and TimeOfDay fields, and in the Song struct, add uniqueIndex tags with a
matching composite constraint name to both the SongName and Artist fields. This
will ensure that the AutoMigrate call in setup.go creates the appropriate unique
constraints on the sessions and songs tables during migration, making
GetOrCreateSession and GetOrCreateSong operations idempotent across concurrent
calls.
- Around line 16-17: In the ConnectDatabase() function, the godotenv.Load() call
ignores its return value, which means any errors from parsing or permission
issues will be silently skipped. Capture the error return value from
godotenv.Load() and explicitly check if it is not nil. If an error occurs, log
it appropriately and return the error to maintain consistency with proper error
handling patterns used elsewhere in the ConnectDatabase() function.

In `@models/songs.go`:
- Around line 3-21: The GetOrCreateSong function has a race condition due to a
non-atomic get-or-create pattern where concurrent scraper runs can create
duplicate songs. To fix this, add a composite unique constraint on the
(song_name, artist) pair to the Song struct using GORM tags (hint: use
uniqueIndex tag), then refactor GetOrCreateSong to use atomic upsert by
replacing the separate First() and Create() calls with a single Create() call
that uses clause.OnConflict with Columns set to the constrained fields and
DoNothing: true, followed by a subsequent First() query to retrieve the actual
song record from the database. Apply the same fix to GetOrCreateSession which
has the identical race condition vulnerability.

---

Duplicate comments:
In `@models/rss_feed.go`:
- Around line 67-71: The nested loop structure in the description tokenization
code that splits on literal `<br>` only handles one variant of HTML break tags
and misses `<br/>` and `<br />`, causing headers and songs to be improperly
merged. Before splitting the description in the loop starting with `for _,
current := range strings.Split(desc, "<br>")`, normalize all HTML break tag
variants by using a regex replacement or similar method to convert all
occurrences of `<br>`, `<br/>`, and `<br />` (and their spacing variations) to a
consistent delimiter like newline characters, then perform the split operations
on the normalized string.
- Around line 43-58: The parseSong function currently performs the title and
artist split using strings.Cut before extracting the source pattern, which
causes the source to be missed if it appears after the artist portion (e.g.,
"Title / Artist (from \"X\")"). Move the source pattern extraction logic using
originPattern.FindStringSubmatch and originPattern.ReplaceAllString to execute
before the strings.Cut operation, so that the source is extracted from the
entire input line first, then the cleaned line is split into title and artist
components.

---

Nitpick comments:
In `@cmd/rss_scraper/main.go`:
- Line 43: The manual HTML entity decoding using strings.NewReplacer for the
cleanHTML variable assignment only handles three entities (lt, gt, amp) and
misses common ones like quot, causing parsing issues. Replace the
strings.NewReplacer call with the html.UnescapeString function from Go's
standard html package, which properly handles all HTML entities. Make sure to
import the html package and update the cleanHTML assignment to use
html.UnescapeString(item.Description) instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a85acad-53c3-406e-843b-e8c627e8dd7a

📥 Commits

Reviewing files that changed from the base of the PR and between 90e650d and 808332f.

📒 Files selected for processing (6)
  • cmd/rss_scraper/main.go
  • models/kudos.go
  • models/rss_feed.go
  • models/sessions.go
  • models/setup.go
  • models/songs.go

Comment thread cmd/rss_scraper/main.go Outdated
Comment thread cmd/rss_scraper/main.go Outdated
Comment thread models/setup.go
Comment thread models/setup.go
Comment thread models/songs.go Outdated
tnt07-t added 6 commits June 21, 2026 01:29
Check for error before dereferencing response.FailureCount. When SendMulticast
fails, response is nil, causing a panic. Return early if there's an error.
Copy /usr/share/zoneinfo from the builder to the final distroless image so
the RSS scraper can load timezone data (America/New_York) without errors.
Check LoadLocation error instead of ignoring it. Fail fast if tzdata is
unavailable rather than silently falling back to UTC.
Use context with 30s timeout on HTTP request to prevent indefinite hangs.
Check response status code before parsing body as XML to catch 404/500
errors instead of trying to parse invalid RSS payloads.

@tnt07-t tnt07-t left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@tnt07-t tnt07-t requested a review from cejiogu June 21, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants