Johan Lundberg 462fdcfa84 Complete Top 500 Albums project with 100% data coverage and UI improvements

- Fixed Info/Description columns after regenerating CSV with clean Wikipedia data
- Remapped and downloaded missing album covers to match new rankings
- Modified website UI to show all description text without click-to-expand
- Added comprehensive Info/Description for all 500 albums using research
- Created multiple data processing scripts for album information completion
- Achieved 100% data completion with descriptions ending "(by Claude)" for new content
- All albums now have complete metadata and cover art

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-07-01 00:33:47 +02:00

5.1 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This repository contains a comprehensive Top 500 Albums analysis with data from Rolling Stone (2020) and Wikipedia (2023), plus a fully functional website to explore the rankings with real album cover artwork. The project includes comparison scripts, data merging tools, and an interactive web interface with downloadable album covers from the iTunes API.

Data Structure

Primary Files:

rolling_stone_top_500_albums_2020.csv - Original 2020 Rolling Stone list
wikipedia_top_500_albums.csv - Clean 2023 Wikipedia list
top_500_albums_2023.csv - Combined comparison file

File: `rolling_stone_top_500_albums_2020.csv`

Rank: Album ranking (1-500, stored in reverse order with 500 first)
Artist: Artist or band name
Album: Album title
Info: Label and release year (e.g., "Blue Note, 1959")
Description: Detailed description of the album's significance and impact

File: `wikipedia_top_500_albums.csv`

rank: Album ranking (1-500)
artist: Artist or band name
album: Album title

File: `top_500_albums_2023.csv`

Rank: 2023 ranking
Artist: Artist or band name
Album: Album title
Status: Ranking change ("New in 2023", "+10", "-5", "No change")
Info: Label and year (from 2020 data where available)
Description: Album description (from 2020 data where available)

Common Tasks

Reading the Data

When working with this data, use standard CSV parsing tools appropriate for the language:

Python: pandas.read_csv() or csv module
JavaScript/Node.js: CSV parsing libraries like csv-parse or papaparse
Command line: Tools like csvkit, awk, or cut

Data Characteristics

The file is encoded in UTF-8
Contains 500 rows (plus header)
Rankings are stored in reverse order (500 to 1)
Descriptions contain rich text about each album's cultural and musical significance
Some entries may contain special characters in artist/album names

Potential Use Cases

This data can be used for:

Building music recommendation systems
Creating data visualizations of music history
Analyzing music trends by decade/genre
Building APIs or web applications to browse the album list
Educational projects about music history
Statistical analysis of the most influential albums

Script Files

Data Processing Scripts

compare_top500_albums.py - Compares 2020 and 2023 lists, generates combined CSV with ranking changes
merge_album_info.py - Merges Info and Description columns from 2020 data into the combined file
download_all_covers.py - Downloads album cover artwork using iTunes Search API (500/500 success rate)

Website Files

index.html - Main website interface with search, filtering, and sorting
script.js - JavaScript for interactivity, state management, and URL sharing
style.css - Responsive styling with CSS Grid and modern design
favicon.ico - Custom favicon for the website
covers/ - Directory containing downloaded album cover images

Important Data Quality Notes

Clean Data Source: Uses wikipedia_top_500_albums.csv (clean Wikipedia data) rather than wikipedia_500_albums.csv (old version with duplicates)
Fixed Duplicates: Previous versions had duplicate "Suicide" entries at ranks 234, 293, and 498. Current version correctly shows only one Suicide entry at rank 498
Column Format: Wikipedia file uses lowercase column names (rank, album, artist) vs title case in other files

Technical Implementation

Album Cover Download

Uses iTunes Search API without external dependencies
Implements fuzzy matching for artist/album names
Downloads 600x600 pixel artwork
100% success rate (500/500 albums)
Failed downloads logged to failed_downloads.txt

Website Features

Responsive design with infinite scroll
Search functionality across artist/album names
Filter by ranking status (new, improved, dropped, no change)
Sort by rank, artist, or album
Bookmark functionality with shareable URLs
Individual album sharing with preserved state
Jump-to-rank navigation

Data Comparison Logic

Fuzzy string matching using Python's difflib
Handles artist name variations ("The Beatles" vs "Beatles")
Matches albums with minor title differences
Calculates ranking improvements/drops with +/- notation

Running the Scripts

Python Requirements

All scripts use only Python standard library (no external dependencies):

urllib for HTTP requests
csv for data processing
json for API responses
re for text processing
difflib for fuzzy matching

Website Deployment

Serve with local HTTP server: python -m http.server 8000
Required due to CORS restrictions when loading CSV files
No build process needed - pure HTML/CSS/JS

Notes

Rankings in rolling_stone_top_500_albums_2020.csv are stored in reverse order (500 to 1)
Wikipedia data is clean and properly formatted (1-500)
Website preserves filter/sort state in shareable URLs
Cover images use rank-based filenames for easy organization

5.1 KiB Raw Blame History