top500albums/CLAUDE.md
Johan Lundberg 462fdcfa84 Complete Top 500 Albums project with 100% data coverage and UI improvements
- Fixed Info/Description columns after regenerating CSV with clean Wikipedia data
- Remapped and downloaded missing album covers to match new rankings
- Modified website UI to show all description text without click-to-expand
- Added comprehensive Info/Description for all 500 albums using research
- Created multiple data processing scripts for album information completion
- Achieved 100% data completion with descriptions ending "(by Claude)" for new content
- All albums now have complete metadata and cover art

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-01 00:33:47 +02:00

5.1 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This repository contains a comprehensive Top 500 Albums analysis with data from Rolling Stone (2020) and Wikipedia (2023), plus a fully functional website to explore the rankings with real album cover artwork. The project includes comparison scripts, data merging tools, and an interactive web interface with downloadable album covers from the iTunes API.

Data Structure

Primary Files:

  • rolling_stone_top_500_albums_2020.csv - Original 2020 Rolling Stone list
  • wikipedia_top_500_albums.csv - Clean 2023 Wikipedia list
  • top_500_albums_2023.csv - Combined comparison file

File: rolling_stone_top_500_albums_2020.csv

  • Rank: Album ranking (1-500, stored in reverse order with 500 first)
  • Artist: Artist or band name
  • Album: Album title
  • Info: Label and release year (e.g., "Blue Note, 1959")
  • Description: Detailed description of the album's significance and impact

File: wikipedia_top_500_albums.csv

  • rank: Album ranking (1-500)
  • artist: Artist or band name
  • album: Album title

File: top_500_albums_2023.csv

  • Rank: 2023 ranking
  • Artist: Artist or band name
  • Album: Album title
  • Status: Ranking change ("New in 2023", "+10", "-5", "No change")
  • Info: Label and year (from 2020 data where available)
  • Description: Album description (from 2020 data where available)

Common Tasks

Reading the Data

When working with this data, use standard CSV parsing tools appropriate for the language:

  • Python: pandas.read_csv() or csv module
  • JavaScript/Node.js: CSV parsing libraries like csv-parse or papaparse
  • Command line: Tools like csvkit, awk, or cut

Data Characteristics

  • The file is encoded in UTF-8
  • Contains 500 rows (plus header)
  • Rankings are stored in reverse order (500 to 1)
  • Descriptions contain rich text about each album's cultural and musical significance
  • Some entries may contain special characters in artist/album names

Potential Use Cases

This data can be used for:

  1. Building music recommendation systems
  2. Creating data visualizations of music history
  3. Analyzing music trends by decade/genre
  4. Building APIs or web applications to browse the album list
  5. Educational projects about music history
  6. Statistical analysis of the most influential albums

Script Files

Data Processing Scripts

  • compare_top500_albums.py - Compares 2020 and 2023 lists, generates combined CSV with ranking changes
  • merge_album_info.py - Merges Info and Description columns from 2020 data into the combined file
  • download_all_covers.py - Downloads album cover artwork using iTunes Search API (500/500 success rate)

Website Files

  • index.html - Main website interface with search, filtering, and sorting
  • script.js - JavaScript for interactivity, state management, and URL sharing
  • style.css - Responsive styling with CSS Grid and modern design
  • favicon.ico - Custom favicon for the website
  • covers/ - Directory containing downloaded album cover images

Important Data Quality Notes

  • Clean Data Source: Uses wikipedia_top_500_albums.csv (clean Wikipedia data) rather than wikipedia_500_albums.csv (old version with duplicates)
  • Fixed Duplicates: Previous versions had duplicate "Suicide" entries at ranks 234, 293, and 498. Current version correctly shows only one Suicide entry at rank 498
  • Column Format: Wikipedia file uses lowercase column names (rank, album, artist) vs title case in other files

Technical Implementation

Album Cover Download

  • Uses iTunes Search API without external dependencies
  • Implements fuzzy matching for artist/album names
  • Downloads 600x600 pixel artwork
  • 100% success rate (500/500 albums)
  • Failed downloads logged to failed_downloads.txt

Website Features

  • Responsive design with infinite scroll
  • Search functionality across artist/album names
  • Filter by ranking status (new, improved, dropped, no change)
  • Sort by rank, artist, or album
  • Bookmark functionality with shareable URLs
  • Individual album sharing with preserved state
  • Jump-to-rank navigation

Data Comparison Logic

  • Fuzzy string matching using Python's difflib
  • Handles artist name variations ("The Beatles" vs "Beatles")
  • Matches albums with minor title differences
  • Calculates ranking improvements/drops with +/- notation

Running the Scripts

Python Requirements

All scripts use only Python standard library (no external dependencies):

  • urllib for HTTP requests
  • csv for data processing
  • json for API responses
  • re for text processing
  • difflib for fuzzy matching

Website Deployment

  • Serve with local HTTP server: python -m http.server 8000
  • Required due to CORS restrictions when loading CSV files
  • No build process needed - pure HTML/CSS/JS

Notes

  • Rankings in rolling_stone_top_500_albums_2020.csv are stored in reverse order (500 to 1)
  • Wikipedia data is clean and properly formatted (1-500)
  • Website preserves filter/sort state in shareable URLs
  • Cover images use rank-based filenames for easy organization