if_invest/SP500_SECTOR_VERIFICATION_REPORT.md

8.1 KiB

S&P 500 SECTOR VERIFICATION REPORT

Database: if_invest @ 3.38.180.110:8088 Date: 2025-10-11 Total S&P 500 Stocks: 503


CRITICAL FINDING: SECTOR NAME MISMATCH

Current Database Values (WRONG - Yahoo Finance Categories)

The database currently contains YAHOO FINANCE SECTOR CATEGORIES, not official GICS sectors:

  1. BASIC MATERIALS (20 stocks) - Should be "Materials"
  2. COMMUNICATION SERVICES (25 stocks) - Correct GICS name
  3. CONSUMER CYCLICAL (55 stocks) - Should be "Consumer Discretionary"
  4. CONSUMER DEFENSIVE (37 stocks) - Should be "Consumer Staples"
  5. ENERGY (22 stocks) - Correct GICS name
  6. FINANCIAL (69 stocks) - Should be "Financials" (with 's')
  7. HEALTHCARE (60 stocks) - Should be "Health Care" (two words)
  8. INDUSTRIALS (71 stocks) - Correct GICS name
  9. REAL ESTATE (31 stocks) - Correct GICS name
  10. TECHNOLOGY (82 stocks) - Should be "Information Technology"
  11. UTILITIES (31 stocks) - Correct GICS name

Expected GICS Sector Names (11 Official Sectors)

According to CLAUDE.md and Wikipedia GICS standard:

  1. Information Technology (currently "TECHNOLOGY")
  2. Health Care (currently "HEALTHCARE" - one word)
  3. Financials (currently "FINANCIAL" - missing 's')
  4. Consumer Discretionary (currently "CONSUMER CYCLICAL")
  5. Industrials ✓ (correct)
  6. Consumer Staples (currently "CONSUMER DEFENSIVE")
  7. Communication Services ✓ (correct)
  8. Energy ✓ (correct)
  9. Utilities ✓ (correct)
  10. Real Estate ✓ (correct)
  11. Materials (currently "BASIC MATERIALS")

Stock Count Per Sector (Current Database Values)

Sector Name Stock Count
TECHNOLOGY 82
INDUSTRIALS 71
FINANCIAL 69
HEALTHCARE 60
CONSUMER CYCLICAL 55
CONSUMER DEFENSIVE 37
UTILITIES 31
REAL ESTATE 31
COMMUNICATION SERVICES 25
ENERGY 22
BASIC MATERIALS 20
TOTAL 503

Industry Breakdown (Current Database - Yahoo Finance Categories)

BASIC MATERIALS (20 stocks)

  • SPECIALTY CHEMICALS: 10
  • AGRICULTURAL INPUTS: 3
  • BUILDING MATERIALS: 2
  • STEEL: 2
  • COPPER: 1
  • GOLD: 1
  • CHEMICALS: 1

COMMUNICATION SERVICES (25 stocks)

  • ENTERTAINMENT: 10
  • TELECOM SERVICES: 5
  • ADVERTISING AGENCIES: 4
  • INTERNET CONTENT & INFORMATION: 4
  • ELECTRONIC GAMING & MULTIMEDIA: 2

CONSUMER CYCLICAL (55 stocks)

  • TRAVEL SERVICES: 6
  • PACKAGING & CONTAINERS: 6
  • RESTAURANTS: 6
  • AUTO PARTS: 5
  • SPECIALTY RETAIL: 4
  • RESIDENTIAL CONSTRUCTION: 4
  • INTERNET RETAIL: 3
  • AUTO MANUFACTURERS: 3
  • APPAREL RETAIL: 3
  • RESORTS & CASINOS: 3
  • HOME IMPROVEMENT RETAIL: 2
  • LODGING: 2
  • FOOTWEAR & ACCESSORIES: 2
  • (and 6 more sub-industries)

CONSUMER DEFENSIVE (37 stocks)

  • PACKAGED FOODS: 9
  • HOUSEHOLD & PERSONAL PRODUCTS: 7
  • DISCOUNT STORES: 5
  • BEVERAGES-NON-ALCOHOLIC: 4
  • FARM PRODUCTS: 3
  • CONFECTIONERS: 2
  • BEVERAGES-BREWERS: 2
  • TOBACCO: 2
  • (and 3 more sub-industries)

ENERGY (22 stocks)

  • OIL & GAS E&P: 10
  • OIL & GAS MIDSTREAM: 4
  • OIL & GAS EQUIPMENT & SERVICES: 3
  • OIL & GAS REFINING & MARKETING: 3
  • OIL & GAS INTEGRATED: 2

FINANCIAL (69 stocks)

  • ASSET MANAGEMENT: 12
  • FINANCIAL DATA & STOCK EXCHANGES: 9
  • BANKS-REGIONAL: 9
  • INSURANCE-PROPERTY & CASUALTY: 9
  • CREDIT SERVICES: 6
  • INSURANCE BROKERS: 6
  • CAPITAL MARKETS: 5
  • BANKS-DIVERSIFIED: 5
  • INSURANCE-LIFE: 4
  • INSURANCE-DIVERSIFIED: 3
  • INSURANCE-REINSURANCE: 1

HEALTHCARE (60 stocks)

  • DIAGNOSTICS & RESEARCH: 11
  • MEDICAL DEVICES: 10
  • MEDICAL INSTRUMENTS & SUPPLIES: 9
  • DRUG MANUFACTURERS-GENERAL: 9
  • HEALTHCARE PLANS: 7
  • BIOTECHNOLOGY: 5
  • MEDICAL DISTRIBUTION: 4
  • MEDICAL CARE FACILITIES: 3
  • DRUG MANUFACTURERS-SPECIALTY & GENERIC: 2

INDUSTRIALS (71 stocks)

  • SPECIALTY INDUSTRIAL MACHINERY: 17
  • AEROSPACE & DEFENSE: 12
  • BUILDING PRODUCTS & EQUIPMENT: 6
  • INTEGRATED FREIGHT & LOGISTICS: 5
  • RAILROADS: 4
  • INDUSTRIAL DISTRIBUTION: 3
  • AIRLINES: 3
  • ENGINEERING & CONSTRUCTION: 3
  • FARM & HEAVY CONSTRUCTION MACHINERY: 3
  • (and 10 more sub-industries)

REAL ESTATE (31 stocks)

  • REIT-RESIDENTIAL: 7
  • REIT-SPECIALTY: 7
  • REIT-RETAIL: 5
  • REIT-INDUSTRIAL: 3
  • REIT-HEALTHCARE FACILITIES: 3
  • REIT-OFFICE: 2
  • REAL ESTATE SERVICES: 2
  • REIT-HOTEL & MOTEL: 1
  • REIT-DIVERSIFIED: 1

TECHNOLOGY (82 stocks)

  • SOFTWARE-APPLICATION: 17
  • SOFTWARE-INFRASTRUCTURE: 16
  • SEMICONDUCTORS: 13
  • INFORMATION TECHNOLOGY SERVICES: 11
  • COMPUTER HARDWARE: 6
  • SCIENTIFIC & TECHNICAL INSTRUMENTS: 5
  • ELECTRONIC COMPONENTS: 4
  • SEMICONDUCTOR EQUIPMENT & MATERIALS: 4
  • COMMUNICATION EQUIPMENT: 4
  • SOLAR: 1
  • CONSUMER ELECTRONICS: 1

UTILITIES (31 stocks)

  • UTILITIES - REGULATED ELECTRIC: 23
  • UTILITIES - INDEPENDENT POWER PRODUCERS: 2
  • UTILITIES - REGULATED GAS: 2
  • UTILITIES - DIVERSIFIED: 2
  • UTILITIES - RENEWABLE: 1
  • UTILITIES - REGULATED WATER: 1

Data Quality Status

  • NULL/Empty Sectors: 0 stocks
  • NULL/Empty Industries: 0 stocks
  • Total Coverage: 100% (503/503 stocks have sector + industry data)

ROOT CAUSE ANALYSIS

Why Database Has Wrong Sector Names

The database contains Yahoo Finance sector categories instead of official GICS sectors because:

  1. Data Source: The collection script uses yfinance library which returns Yahoo Finance's proprietary sector classification

  2. Yahoo Finance Mapping: YF uses its own 11-sector system that differs from GICS:

    • YF "Technology" → GICS "Information Technology"
    • YF "Healthcare" → GICS "Health Care" (space difference)
    • YF "Financial" → GICS "Financials" (plural)
    • YF "Consumer Cyclical" → GICS "Consumer Discretionary"
    • YF "Consumer Defensive" → GICS "Consumer Staples"
    • YF "Basic Materials" → GICS "Materials"
  3. Previous Update Script: The update_gics_from_wikipedia.py mentioned in CLAUDE.md (2025-10-07) was supposed to fix this, but either:

    • Never ran successfully
    • Was overwritten by subsequent yfinance-based updates
    • Script exists but wasn't executed

REQUIRED ACTION

Immediate Fix Required

The frontend SP500Page.tsx is using correct GICS sector names (Information Technology, Health Care, Financials, etc.) but the database has Yahoo Finance names (TECHNOLOGY, HEALTHCARE, FINANCIAL, etc.).

This causes ZERO STOCKS TO DISPLAY because sector filtering fails to match.

Solution Options

Option 1: Update Database to GICS Names (Recommended)

  • Run SQL UPDATE to convert Yahoo Finance names to GICS names
  • Preserve industry classifications (can keep Yahoo Finance industry names)
  • Update sector column only for 6 mismatched sectors

Option 2: Update Frontend to Yahoo Finance Names

  • Modify SECTOR_BOXES array in SP500Page.tsx
  • Use "TECHNOLOGY" instead of "Information Technology"
  • Use "HEALTHCARE" instead of "Health Care"
  • Use "FINANCIAL" instead of "Financials"
  • Use "CONSUMER CYCLICAL" instead of "Consumer Discretionary"
  • Use "CONSUMER DEFENSIVE" instead of "Consumer Staples"
  • Use "BASIC MATERIALS" instead of "Materials"

Recommendation: Use Option 1 (update database) to match official GICS standard as documented in CLAUDE.md.


SQL Fix Script (Option 1)

-- Update Yahoo Finance sector names to official GICS names
UPDATE invest_product_code
SET sector = 'Information Technology'
WHERE sector = 'TECHNOLOGY' AND is_sp500 = 'Y';

UPDATE invest_product_code
SET sector = 'Health Care'
WHERE sector = 'HEALTHCARE' AND is_sp500 = 'Y';

UPDATE invest_product_code
SET sector = 'Financials'
WHERE sector = 'FINANCIAL' AND is_sp500 = 'Y';

UPDATE invest_product_code
SET sector = 'Consumer Discretionary'
WHERE sector = 'CONSUMER CYCLICAL' AND is_sp500 = 'Y';

UPDATE invest_product_code
SET sector = 'Consumer Staples'
WHERE sector = 'CONSUMER DEFENSIVE' AND is_sp500 = 'Y';

UPDATE invest_product_code
SET sector = 'Materials'
WHERE sector = 'BASIC MATERIALS' AND is_sp500 = 'Y';

-- Verify changes
SELECT sector, COUNT(*) as stock_count
FROM invest_product_code
WHERE is_sp500 = 'Y'
GROUP BY sector
ORDER BY stock_count DESC;

This will affect 377 stocks across 6 sectors while leaving 5 correct sectors unchanged (Communication Services, Energy, Industrials, Real Estate, Utilities).


End of Report