# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This repository contains a comprehensive Top 500 Albums analysis with data from Rolling Stone (2020) and Wikipedia (2023), plus a fully functional website to explore the rankings with real album cover artwork. The project includes comparison scripts, data merging tools, and an interactive web interface with downloadable album covers from the iTunes API.

## Data Structure

### Primary Files:
- **`rolling_stone_top_500_albums_2020.csv`** - Original 2020 Rolling Stone list
- **`wikipedia_top_500_albums.csv`** - Clean 2023 Wikipedia list  
- **`top_500_albums_2023.csv`** - Combined comparison file

### File: `rolling_stone_top_500_albums_2020.csv`
- **Rank**: Album ranking (1-500, stored in reverse order with 500 first)
- **Artist**: Artist or band name
- **Album**: Album title
- **Info**: Label and release year (e.g., "Blue Note, 1959")
- **Description**: Detailed description of the album's significance and impact

### File: `wikipedia_top_500_albums.csv`
- **rank**: Album ranking (1-500)
- **artist**: Artist or band name
- **album**: Album title

### File: `top_500_albums_2023.csv`
- **Rank**: 2023 ranking
- **Artist**: Artist or band name
- **Album**: Album title
- **Status**: Ranking change ("New in 2023", "+10", "-5", "No change")
- **Info**: Label and year (from 2020 data where available)
- **Description**: Album description (from 2020 data where available)

## Common Tasks

### Reading the Data
When working with this data, use standard CSV parsing tools appropriate for the language:
- Python: `pandas.read_csv()` or `csv` module
- JavaScript/Node.js: CSV parsing libraries like `csv-parse` or `papaparse`
- Command line: Tools like `csvkit`, `awk`, or `cut`

### Data Characteristics
- The file is encoded in UTF-8
- Contains 500 rows (plus header)
- Rankings are stored in reverse order (500 to 1)
- Descriptions contain rich text about each album's cultural and musical significance
- Some entries may contain special characters in artist/album names

## Potential Use Cases

This data can be used for:
1. Building music recommendation systems
2. Creating data visualizations of music history
3. Analyzing music trends by decade/genre
4. Building APIs or web applications to browse the album list
5. Educational projects about music history
6. Statistical analysis of the most influential albums

## Script Files

### Data Processing Scripts
- **`compare_top500_albums.py`** - Compares 2020 and 2023 lists, generates combined CSV with ranking changes
- **`merge_album_info.py`** - Merges Info and Description columns from 2020 data into the combined file
- **`download_all_covers.py`** - Downloads album cover artwork using iTunes Search API (500/500 success rate)

### Website Files
- **`index.html`** - Main website interface with search, filtering, and sorting
- **`script.js`** - JavaScript for interactivity, state management, and URL sharing
- **`style.css`** - Responsive styling with CSS Grid and modern design
- **`favicon.ico`** - Custom favicon for the website
- **`covers/`** - Directory containing downloaded album cover images

## Important Data Quality Notes

- **Clean Data Source**: Uses `wikipedia_top_500_albums.csv` (clean Wikipedia data) rather than `wikipedia_500_albums.csv` (old version with duplicates)
- **Fixed Duplicates**: Previous versions had duplicate "Suicide" entries at ranks 234, 293, and 498. Current version correctly shows only one Suicide entry at rank 498
- **Column Format**: Wikipedia file uses lowercase column names (`rank`, `album`, `artist`) vs title case in other files

## Technical Implementation

### Album Cover Download
- Uses iTunes Search API without external dependencies
- Implements fuzzy matching for artist/album names
- Downloads 600x600 pixel artwork
- 100% success rate (500/500 albums)
- Failed downloads logged to `failed_downloads.txt`

### Website Features
- Responsive design with infinite scroll
- Search functionality across artist/album names
- Filter by ranking status (new, improved, dropped, no change)
- Sort by rank, artist, or album
- Bookmark functionality with shareable URLs
- Individual album sharing with preserved state
- Jump-to-rank navigation

### Data Comparison Logic
- Fuzzy string matching using Python's difflib
- Handles artist name variations ("The Beatles" vs "Beatles")
- Matches albums with minor title differences
- Calculates ranking improvements/drops with +/- notation

## Running the Scripts

### Python Requirements
All scripts use only Python standard library (no external dependencies):
- `urllib` for HTTP requests
- `csv` for data processing  
- `json` for API responses
- `re` for text processing
- `difflib` for fuzzy matching

### Website Deployment
- Serve with local HTTP server: `python -m http.server 8000`
- Required due to CORS restrictions when loading CSV files
- No build process needed - pure HTML/CSS/JS

## Notes

- Rankings in `rolling_stone_top_500_albums_2020.csv` are stored in reverse order (500 to 1)
- Wikipedia data is clean and properly formatted (1-500)
- Website preserves filter/sort state in shareable URLs
- Cover images use rank-based filenames for easy organization