Individual voters were grouped by unique residential address to create a household-level view.
Metric: The "Likelihood Score" displayed on the map represents the average probability of all registered voters at that address voting in the next municipal election.
Rationale: Household-level aggregation serves two purposes:
It reduces the dataset size to fit within Google MyMaps 2,000-record-per-layer limit.
It provides actionable intelligence for canvassing operations—identifying high-engagement households is more efficient than tracking individual voters.
Example: A household with 3 voters scoring 95%, 88%, and 92% would display an average score of 91.7%.
Model Architecture
We utilized a Random Forest Classifier, an ensemble learning method that constructs multiple decision trees during training and outputs the mode of their predictions.
Technical Specs:
• Algorithm: Random Forest (scikit-learn 1.7.2)
• Estimators: 100 trees
• Max Depth: 5 levels
• Training/Test Split: 80/20
• Random State: 42 (for reproducibility)
Why Random Forest?
Non-linear Relationships: Voter behavior isn't linear. Someone who votes in every Presidential election but skips local elections has a different pattern than someone who votes in primaries. Random Forests excel at capturing these complex, non-linear patterns.
Feature Importance: The model can identify which elections are the strongest predictors. For example, voting in the 2023 local election is a much stronger signal than voting in the 2020 Presidential election.
Robustness: Random Forests are resistant to overfitting and handle missing data gracefully—critical when dealing with voters who have incomplete histories (e.g., recently registered).
Training Objective
The model was trained to identify the specific behavioral patterns of voters who participate in Local/Municipal Elections (odd-numbered years). These elections have fundamentally different turnout dynamics than Presidential or Midterm elections:
Presidential Elections (2020, 2024): High turnout (~70-80% in Piqua), driven by national media coverage and "casual" voters.
Midterm Elections (2022): Moderate turnout (~50-60%), driven by engaged partisans.
Municipal Elections (2021, 2023): Low turnout (~20-30%), driven by "Super Voters" with strong local civic engagement.
By training specifically on municipal election patterns, the model learns to distinguish between these voter archetypes.
Training Data & Target
Training Window: Historical voting records from 2015–2022.
Target Variable: Did the voter participate in the November 2023 General Election (the most recent municipal election)?
Eligibility Filter: Only voters who were registered before November 7, 2023, were included in the training set. This prevents the model from "learning" that new voters don't vote—they simply weren't eligible.
Feature Engineering
The model was trained on the following features:
Demographic Features:
Age: Calculated from date of birth. Older voters tend to have higher municipal turnout.
Years_Registered: How long the voter has been registered. Longer registration correlates with civic engagement.
Behavioral Features (Voting History):
Recent General Elections:
2022 General (Midterm): Strong signal for partisan engagement.
2021 General (Local): The single strongest predictor—if they voted in the last local election, they'll likely vote in the next one.
2020 General (Presidential): Weak signal—many "casual" voters participate only in Presidential years.
2019 General (Local): Secondary signal for consistent local engagement.
Primary Elections:
2022 Primary: Indicator of "Super Voter" status—only highly engaged voters participate in primaries.
2021 Primary: Same rationale as above.
Feature Representation:
Each election is encoded as a binary feature (1 = voted, 0 = did not vote).
Example: A voter who participated in 2022 General, 2021 General, and 2022 Primary would have:
• feat_GENERAL-11/08/2022 = 1
• feat_GENERAL-11/02/2021 = 1
• feat_PRIMARY-05/03/2022 = 1
• All other features = 0
Model Performance
Accuracy: 78.5% on the test set (20% of the training data held out for validation).
Precision: 75% (of voters predicted to vote, 75% actually did).
Recall: 80% (of voters who actually voted, the model correctly identified 80%).
F1-Score: 0.77 (harmonic mean of precision and recall).
This performance is strong for a behavioral prediction model, especially given the inherent randomness in voter behavior (illness, weather, last-minute schedule conflicts, etc.).
The trained model was applied to the current voter file to generate a probability score (0–100%) for the November 2025 General Election.
Temporal Shift
The model learned the relationship between "Past 3 years of behavior" → "Next local election." To predict 2025, we shift the window forward:
Training Pattern:
2022 General + 2021 Local + 2020 Presidential → Predict 2023 Local
Application Pattern:
2024 General + 2023 Local + 2022 Midterm → Predict 2025 Local
Handling New Voters
Voters who registered after 2023 have no "2023 Local" feature. The model handles this gracefully:
If they voted in the 2024 General, the model infers they are engaged and assigns a moderate-to-high probability.
If they have no voting history, the model assigns a low probability based on their demographics (age, years registered).