123
2026 MCM
Problem C: Data With The Stars
Dancing with the Stars (DWTS) is the American version of an international television franchise based on the British show “Strictly Come Dancing” (“Come Dancing” originally). Versions of the show have appeared in Albania, Argentina, Australia, China, France, India, and many other countries. The U.S. version, the focus of this problem, has completed 34 seasons.
Celebrities are partnered with professional dancers and then perform dances each week. A panel of expert judges scores each couple’s dance, and fans vote (by phone or online) for their favorite couple that week. Fans can vote once or multiple times up to a limit announced each week. Further, fans vote for the star they wish to keep, but cannot vote to eliminate a star. The judge and fan votes are combined in order to determine which couple to eliminate (the lowest combined score) that week. Three (in some seasons more) couples reach the finals and in the week of the finals the combined scores from fans and judges are used to rank them from 1st to 3rd (or 4th, 5th).
There are many possible methods of combining fan votes and judge scores. In the first two seasons of the U.S. show, the combination was based on ranks. Season 2 concerns (due to celebrity contestant Jerry Rice who was a finalist despite very low judge scores) led to a modification to use percentages instead of ranks. Examples of these two approaches are provided in the Appendix.
In season 27, another “controversy” occurred when celebrity contestant Bobby Bones won despite consistently low judges scores. In response, starting in season 28 a slight modification to the elimination process was made. The bottom two contestants were identified using the combined judge scores and fan votes, and then during the live show the judges voted to select which of these two to eliminate. Around this same season, the producers also returned to using the method of ranks to combine judges scores with fan votes as in seasons one and two. The exact season this change occurred is not known, but it is reasonable to assume it was season 28.
Judge scores are meant to reflect which dancers are technically better, although there is some subjectivity in what makes a dance better. Fan votes are likely much more subjective, influenced by the quality of the dance, but also the popularity and charisma of the celebrity. Show producers might actually prefer, to some extent, conflicts in opinions and votes as such occurrences boost fan interest and excitement.
Data with judges scores and contestant information is provided and described below. You may choose to include additional information or other data at your discretion, but you must completely document the sources. Use the data to:
-
Develop a mathematical model (or models) to produce estimated fan votes (which are unknown and a closely guarded secret) for each contestant for the weeks they competed.
- Does your model correctly estimate fan votes that lead to results consistent with who was eliminated each week? Provide measures of the consistency.
- How much certainty is there in the fan vote totals you produced, and is that certainty always the same for each contestant/week? Provide measures of your certainty for the estimates.
- Use your fan vote estimates with the rest of the data to:
- Compare and contrast the results produced by the two approaches used by the show to combine judge and fan votes (i.e. rank and percentage) across seasons (i.e. apply both approaches to each season). If differences in outcomes exist, does one method seem to favor fan votes more than the other?
- Examine the two voting methods applied to specific celebrities where there was “controversy”, meaning differences between judges and fans. Would the choice of method to combine judge scores and fan votes have led to the same result for each of these contestants? How would including the additional approach of having judges choose which of the bottom two couples to eliminate each week impact the results? Some examples you might consider (there may also be others you identified):
- Season 2 – Jerry Rice, runner up despite the lowest judges scores in 5 weeks.
- Season 4 – Billy Ray Cyrus was 5th despite last place judge scores in 6 weeks.
- Season 11 – Bristol Palin was 3rd with the lowest judge scores 12 times.
- Season 27 – Bobby Bones won despite consistently low judges scores.
- Based on your analysis, which of the two methods would you recommend using for future seasons and why? Would you suggest including the additional approach of judges choosing from the bottom two couples?
- Use the data including your fan vote estimates to develop a model that analyzes the impact of various pro dancers as well as characteristics for the celebrities available in the data (age, industry, etc). How much do such things impact how well a celebrity will do in the competition? Do they impact judges scores and fan votes in the same way?
- Propose another system using fan votes and judge scores each week that you believe is more “fair” (or “better” in some other way such as making the show more exciting for the fans).Provide support for why your approach should be adopted by the show producers.
- Produce a report of no more than 25 pages with your findings and include a one- to two-page memo summarizing your results with advice for producers of DWTS on the impact of how judge and fan votes are combined with recommendations for how to do so in future seasons.
Data File: 2026_MCM_Problem_C_Data.csv – contestant information, results, and judges scores by week for seasons 1 – 34. The data description is provided in Table 1.
Table 1: Data Description for 2026_MCM_Problem_C_Data.csv
| Variables | Explanation | Example |
|---|---|---|
| celebrity_name | Name of celebrity contestant (Star) | Jerry Rice, Mark Cuban, … |
| ballroom_partner | Name of professional dancer partner | Cheryl Burke, Derek Hough, … |
| celebrity_industry | Star profession category | Athlete, Model, … |
| celebrity_homestate | Star home state (if from U.S.) | Ohio, Maine, … |
| celebrity_homecountry/region | Star home country/region | United States, England, … |
| celebrity_age_during_season | Age of the star in the season | 32, 29, … |
| season | Season of the show | 1, 2, 3, …, 32 |
| results | Season results for the star | 1st Place, Eliminated Week 2, … |
| placement | Final place for the season (1 best) | 1, 2, 3, … |
| weekX_judgeY_score | Score from judge Y in week X | 1, 2, 3, … |
Notes on the data:
-
Judges scores for each dance are from 1 (low) to 10 (high).
a. In some weeks the score reported includes a decimal (e.g. 8.5) because each celebrity performed more than one dance and the scores from each are averaged.
b. In some weeks, bonus points were awarded (dance offs etc); they are spread evenly
across judge/dance scores.
c. Team dance scores were averaged with scores for each individual team member. -
Judges are listed in the order they scored dances; thus “Judge Y” may not be the same judge from week to week, or season to season.
-
The number of celebrities is not the same across the seasons, nor is the number of weeks the show ran.
-
Season 15 was the only season to feature an all-star cast of returning celebrities.
-
There are occasionally weeks when no celebrity was eliminated, and others where more than one was eliminated.
-
N/A values occur in the data set for
a. the 4th judge score if there is not 4th judge for that week (usually there are 3) and
b. in weeks that the show did not run in a season (for example, season 1 lasted 6 weeks
so N/A values are recorded for weeks 7 thru 11). -
A 0 score is recorded for celebrities who are eliminated. For example, in Season 1 the first celebrity eliminated was Trista Sutter at the end of the Week 2 show. She thus has scores of 0 for the rest of the season (week 3 through week 6).
Appendix: Examples of Voting Schemes
1. COMBINED BY RANK (used in seasons 1, 2, and 28a - 34)
In seasons 1 and 2 judges and fan votes were combined by rank. For example, in season 1, week 4 there were four remaining contestants. Rachel Hunter was eliminated meaning she received the lowest combined rank. In Table 2 the judges scores and ranks are shown, and we created one possible set of fan votes that would produce the correct result. There are many possible values for fan votes that would also give the same results. You should not use these as actual values as this is just one example. Since Rachel was ranked 2nd by judges, in order to finish with the lowest combined score, she has the lowest fan vote (4th place) for a total rank of 6.
Table 2: Example of Combining Judge and Fan Votes by Rank (Season 1, Week 4)
| Contestant | Total Judges Score | Judges Score Rank | Fan Vote* | Fan Rank* | Sum of ranks |
|---|---|---|---|---|---|
| Rachel Hunter | 25 | 2 | 1.1 million | 4 | 6 |
| Joey McIntyre | 20 | 4 | 3.7 million | 1 | 5 |
| John O’Hurley | 21 | 3 | 3.2 million | 2 | 5 |
| Kelly Monaco | 26 | 1 | 2 million | 3 | 4 |
- Fan vote/rank are unknown, hypothetical values chosen to produce the correct final ranks
2. COMBINED BY PERCENT (used for season 3 through 27a)
Starting in season 3 scores were combined using percents instead of ranks. An example is shown using week 9 of season 5. In that week, Jennie Garth was eliminated. Again, we artificially created fan votes that produce total percents to correctly lead to that result. The judges’ percent is computed by dividing the total judge score for the contestant by the sum of total judge scores for all 4 contestants. Based on the judges’ percent, Jennie was 3rd. However, adding the percent of the 10 million artificially created fan votes we assigned to the judges’ percent she was 4th.
Table 3: Example of Combining Judge and Fan Votes by Percent (Season 5, Week 9)
| Contestant | Total Judges Score | Judges Score Percent | Fan Vote* | Fan Percent* | Sum of Percents |
|---|---|---|---|---|---|
| Jennie Garth | 29 | 29/117 = 24.8% | 1.1 million | 1.1/10 = 11% | 35.8 |
| Marie Osmond | 28 | 28/117 = 23.9% | 3.7 million | 3.7/10 = 37% | 60.9 |
| Mel B | 30 | 30/117 = 25.6% | 3.2 million | 3.2/10 = 32% | 57.8 |
| Helio Castroneves | 30 | 30/117 = 25.6% | 2 million | 2/10 = 20% | 45.6 |
| Total | 117 | 10 million |
- Fan vote is unknown, values hypothetical to produce the correct final standings a The year of the return to the rank based method is not known for certain; season 28 is a reasonable assumption.
请用 Python 生成可运行代码(建议 PyTorch 自动求导)来实现 2026 MCM Problem C 第1问:从裁判分与每周淘汰事实反推每位选手每周的粉丝票“占比” p_{i,t},并给出一致性与不确定性度量。必须按以下数学模型实现。
【输入数据】
- 读取 2026_MCM_Problem_C_Data.csv
- 关键列:season, celebrity_name, placement, results, weekX_judgeY_score (X=1..maxWeek, Y=1..4)
- 评分规则:每周裁判分 1-10,NA 表示无该裁判;淘汰后该选手后续周分数记 0(题面 Notes #7)。
【预处理:构造每季每周的裁判总分与淘汰集合】
对每个 season s:
- 计算总分 J_{i,t} = sum_y weekt_judgey_score (忽略 NaN)。
- active 指示 m_{i,t} = 1[J_{i,t} > 0],当周 active 集合 A_t。
- 每个选手 last_nonzero_week L_i = max{t: J_{i,t}>0}。
- 该季 final week F = max_i L_i。
- 淘汰周定义:对每周 t < F,淘汰集合 E_t = {i: L_i = t}。若 E_t 为空则该周“无淘汰”。
- 决赛周 t=F:用 placement 给最终名次约束(1更好)。
【赛制判断(按题面 Appendix)】
- percent 合并:season 3..27
- rank 合并:season 1,2 以及假设 season 28..34 为 rank(题面说合理假设 28 起回归 rank)。
(可留参数 allow_bottom2=True 用于 season>=28 处理 bottom-two judges save 的松弛约束:淘汰者只需在 bottom2。)
【变量与变换】
- 对每季:令该季共有 N 名选手,周数 F。
- 未知:每周每位 active 选手的 logit y_{i,t}∈R;每位选手人气底盘 a_i;参数 β(粉丝对裁判分敏感度)。
- 粉丝票占比:对每周 t,p_{i,t} = exp(y_{i,t}) / sum_{k in A_t} exp(y_{k,t}),inactive 的 p=0(用 mask 实现)。
- 裁判分特征:对每周 t,z_{i,t} = (J_{i,t}-mean(J_{A_t,t}))/std(J_{A_t,t})(若 std=0 则置 0)。
【先验/误差模型(能量项)】
- y_{i,t} = a_i + β z_{i,t} + ε_{i,t}, ε~N(0,σ^2)
- a_i ~ N(0, σ_a^2)
- β ~ N(β0, σ_β^2)
对应能量:
E_prior = sum_{t,i active} (y_{i,t}-a_i-β z_{i,t})2/(2σ2) + sum_i a_i2/(2σ_a2) + (β-β0)2/(2σ_β2)
【节目规则约束:Penalty 能量】
超参:margin δ>0,强度 λ、λF。
-
percent 赛制(season 3..27):
- q_{i,t} = J_{i,t} / sum_{k in A_t} J_
- c_{i,t} = q_{i,t} + p_
- 淘汰周 t<F 且 E_t 非空:要求 E_t 是 |E_t| 个最小 c
Φ_t_pct = ReLU( max_{e in E_t} c_{e,t} - min_{j in A_t\E_t} c_{j,t} + δ )^2 - 决赛周 t=F:按 placement(1最好) 约束 c 排序(高者更好)
Φ_F_pct_final = sum_{i,j in A_F, place(i)<place(j)} ReLU( c_{j,F} - c_{i,F} + δ )^2
-
rank 赛制(season 1,2,28..34):
- 裁判名次 r^J_{i,t} = rank_desc(J_{i,t}),1最好(ties 用平均名次或dense均可,但要一致)
- 粉丝名次用“平滑名次”:
r^F~{i,t}(τ) = 1 + sum sigmoid((y_{j,t}-y_{i,t})/τ)
其中 τ 是平滑温度。 - 合并名次和:s_{i,t} = r^J_{i,t} + r^F~_{i,t}(τ)
- 淘汰周 t<F 且 E_t 非空:
标准(非 bottom2)约束:E_t 为 |E_t| 个最大 s
Φ_t_rank = ReLU( max_{j in A_t\E_t} s_{j,t} - min_{e in E_t} s_{e,t} + δ )^2
若 allow_bottom2=True 且 season>=28 且 |E_t|=1:
只要求淘汰者在 bottom2:令 s^(2) 为第二大 s,淘汰者 e 满足 s_e >= s^(2)
Φ_t_rank_bottom2 = ReLU( s^(2) - s_{e,t} + δ )^2 - 决赛周 t=F:placement(1最好) 对应更小的 s 更好:
Φ_F_rank_final = sum_{i,j in A_F, place(i)<place(j)} ReLU( s_{i,F} - s_{j,F} + δ )^2
总能量:
E = E_prior + λ * sum_{t<F} Φ_t + λF * Φ_F_final
【采样:Overdamped Langevin / ULA】
- 目标分布 π ∝ exp(-E)。
- 迭代:θ_{k+1} = θ_k - eps * grad(E) + sqrt(2eps)N(0,I)
θ 包含 y,a,β(β 可用 raw 参数经 softplus 保证正;或者直接实数也可)。 - 需要:burn-in, thinning, 多链(至少 2 条)可选。
- 强烈建议 PyTorch:把 y,a,β 定义为 torch tensors requires_grad=True,用 autograd 求梯度。
【建议超参(写成代码可调)】
- σ=1.0, σ_a=1.0, β0=0.5, σ_β=1.0
- δ=1e-3
- τ=0.1(rank 平滑)
- λ=50~200(不一致时增大),λF=同量级
- eps=1e-3(可衰减),steps=5000,burnin=2000,thinning=10
- 随机种子固定
【输出:投票估计 + 不确定性 + 一致性】
对每个 season s,每周 t,每位 active 选手 i 输出:
- p_mean = posterior mean of p_
- p_ci_low, p_ci_high (2.5%,97.5%)
- p_sd
- rel_unc = (ci_high-ci_low)/(p_mean+1e-12)
可选:vote_mean = p_mean*1e6(固定总票数缩放)
一致性指标:
- 对每个淘汰周 t:
- deterministic replay:用 p_mean 计算节目规则淘汰集合 E_hat_t
- percent:取最小 c 的 |E_t| 个
- rank:取最大 s 的 |E_t| 个(或 season>=28 bottom2:检查真实淘汰者是否在 bottom2)
- 记录 match=1/0
- deterministic replay:用 p_mean 计算节目规则淘汰集合 E_hat_t
- overall_acc = mean(match over elimination weeks)
- margin_t:
percent: min_keep(c)-max_elim(c)
rank: min_elim(s)-max_keep(s) - posterior_consistency_prob PC_t = fraction of samples where penalty Φ_t==0(或 margin>0)
保存:
- 一个 long-format CSV:season, week, celebrity_name, p_mean, p_ci_low, p_ci_high, rel_unc, ...
- 一个 season-level summary CSV:acc, mean_PC, mean_margin, 等
- 可选画图:每季/全局 rel_unc 分布、margin 分布、acc 条形图。
【工程要求】
- 代码结构清晰:preprocess -> build tensors -> energy -> langevin sampler -> summarize -> export
- 必须处理:NA 裁判分、无淘汰周、双淘汰周、不同赛季周数不同、淘汰后 0 分的 mask
- 你应该使用Notebook来写代码。在你写代码的时候,每写完一个单元格之后,应该先将单元格跑通,然后再根据单元格的结果选择是修改已有单元格还是继续生成新的单元格
你需要用 Python 实现一个贝叶斯模型,基于 DWTS 数据(文件名:2026_MCM_Problem_C_Data.csv,字段含 season, celebrity_name, results, placement, 以及多列 weekX_judgeY_score)来估计每位选手每周的粉丝投票(至少投票份额,最好再给一个固定总票数标尺下的票数),并输出每周淘汰一致性与不确定性。
数据与题意要点(来自题面):
-
评委分数每个 dance 1-10,可能小数;某些周有 bonus 已经分摊到分数里。
-
每周评委人数可能 3 或 4,
N/A表示没有该评委。 -
选手被淘汰后,后续周的分数记为 0(重要:用于推断淘汰周)。
-
合成规则:
- Season 1-2:按 rank 合成(judge rank + fan rank,淘汰名次和最大者)
- Season 3-27:按 percent 合成(judge percent + fan percent,淘汰合成最小者)
- Season 28-34:假设按 rank 合成,且引入 bottom-two + judges save(先找 bottom2,再由评委在 bottom2 中淘汰一人)
(如果不做 judges save,也要至少实现 rank 合成。)
A. 预处理
- 读入 CSV 为 pandas DataFrame。对所有
weekX_judgeY_score列:
- 将
N/A/空 转为 NaN - 转 float
- 对每个 season s:
- 找出该季所有选手(每行是一位选手)
- 确定本季最大周数
T_s:从列名里解析week1..weekK的最大 week index
- 计算每位选手每周的评委总分:
[
J_{i,t}=\sum_k score_{i,t,k} \ \text{(忽略 NaN)}
]
得到矩阵J[i,t]。 - 定义当周 active 选手集合:
A_t = {i | J[i,t] > 0} - 推断每位选手最后一次
J>0的周:
t_last[i] = max {t | J[i,t] > 0}
若t_last[i] < T_s,则该选手在周t_last[i]结束时淘汰。 - 构造每周淘汰集合:
E_t = {i | t_last[i] == t and t < T_s}
可能 size=0(无淘汰)、=1(单淘汰)、>=2(多淘汰)。
仅对|E_t|>=1的周进入似然;无淘汰周跳过(但仍要输出投票估计)。
B. 模型(逐季拟合)
对每季 s 单独拟合,参数:
- 每位选手基础人气
theta[i](实数) - 表现影响
alpha(实数,建议先验偏正但可正态) - 淘汰软度
lambda(正数) - 若 season>=28 且启用 judges save:评委救人强度
kappa(正数)
B1. 票份额生成(logit-softmax)
对每周 t,先标准化评委总分:
[
x_{i,t}=\frac{J_{i,t}-mean(J_{\cdot,t})}{sd(J_{\cdot,t})+\epsilon}
]
(只在 active 集合上算 mean/sd,epsilon=1e-6 防除零)
效用:
[
u_{i,t}=\theta_i+\alpha x_{i,t}
]
票份额:
[
p_{i,t}=\frac{exp(u_{i,t})}{\sum_{j\in A_t}exp(u_{j,t})}
]
(注意:softmax 要用 log-sum-exp 做数值稳定。)
输出票数时,可固定总票数 M_t = 10_000_000:
V[i,t] = M_t * p[i,t]。
B2. 似然:Season 3-27(percent 合成)
评委百分比:
[
q_{i,t}=J_{i,t}/\sum_{j\in A_t}J_{j,t}
]
合成:
[
C_{i,t}=q_{i,t}+p_{i,t}
]
单淘汰 soft-likelihood:
[
P(E=i)=\frac{exp(-\lambda C_{i,t})}{\sum_{j\in A_t}exp(-\lambda C_{j,t})}
]
loglik += log P(E=e)
多淘汰(如 E={e1,e2})用无放回对称化:
P({e1,e2}) = P(e1)P(e2|not e1) + P(e2)P(e1|not e2)
loglik += log P({e1,e2})
B3. 似然:Season 1-2 & 28-34(rank 合成)
评委 rank:rJ[i] = rank(-J[i,t]),1=最高分;并列用 average rank 或 dense rank。
粉丝 rank:rF[i] = rank(-p[i,t])(或 rank(-u) 等价)
合成:
[
S_{i,t}=rJ_{i,t}+rF_{i,t}
]
单淘汰 soft-likelihood(淘汰 S 最大者):
[
P(E=i)=\frac{exp(\lambda S_{i,t})}{\sum_{j\in A_t}exp(\lambda S_{j,t})}
]
多淘汰同上用无放回对称化。
B4. Season>=28:可选 judges save 扩展(建议实现)
步骤:
- 先用 rank 合成得到
S[i,t],取 bottom2:S 最大的两人(若并列可用 jitter)。 - 设 bottom2 = {a,b}。
- 评委在 bottom2 中淘汰更差评委分者:用 logistic
[
P(elim=a|bottom2)=sigma(kappa*(J[b,t]-J[a,t]))
]
若观察到淘汰 e:
- 如果 e==a:loglik += log P(elim=a)
- 如果 e==b:loglik += log(1-P(elim=a))
- 如果 e 不在 bottom2:给一个极小概率 epsilon(例如 1e-9)避免 -inf,并记录为“该周模型解释不了”。
C. 先验与后验采样(不用梯度,适配 rank)
先验建议:
- theta[i] ~ Normal(0, sigma_theta),sigma_theta ~ HalfNormal(1)
- alpha ~ Normal(0, 1)(或截断 alpha>0)
- lambda ~ HalfNormal(5) (正)
- kappa ~ HalfNormal(5) (正,仅 judges save 时)
采样:
- 用自写 Metropolis-Hastings(随机游走)或 PyMC 的 Metropolis/SMC。
- 每季独立跑,例如:5000-20000 iter,burn-in 30%,thin 可选。
- 提供 proposal_sd 并做简单自适应(让接受率在 0.2~0.4)。
每次迭代:
- 给定参数 -> 计算所有淘汰周 loglik -> 加先验 logprior -> 得到 logposterior
- MH 接受/拒绝
- 保存参数样本
D. 输出(估计、置信度、一致性)
对每季:
- 对每个后验样本 m,计算所有周的 p^{(m)}[i,t](仅 active)
- 输出每个 i,t:
- posterior mean:mean(p)
- 90% CI:quantile 5% 和 95%
- 若给票数:V=1e7*p 同样算均值与 CI
- 一致性指标:
-
每个淘汰周 t,计算 posterior predictive probability:
pi_t = mean_m P^{(m)}(E=observed) -
报告:
- 平均 pi_t、最小 pi_t、分位数
- 使用 posterior-mean p_hat 进行确定性淘汰预测的 accuracy
- 可选:平均 log score = mean(log(pi_t))
- 将结果保存为:
fan_vote_estimates_season_{s}.csv:包含 season, week, celebrity_name, p_mean, p_ci_low, p_ci_high, V_mean, V_ci_low, V_ci_highseason_{s}_fit_summary.json:包含 accuracy, mean_pi, logscore 等
注意事项:
- 周内标准化 x 时仅用 active 选手
- softmax 用 log-sum-exp
- rank 并列处理要固定策略(average rank 或 jitter)
- 多淘汰周用无放回对称化概率
- 无淘汰周不进似然,但仍输出 p 的后验分布(由参数决定)
请使用Notebook来完成代码,包含函数化结构:
- load_and_preprocess()
- build_season_data(season)
- log_posterior(params, season_data)
- metropolis_sampler(...)
- posterior_to_estimates(...)
- evaluate_consistency(...)
在你写代码的时候,每写完一个单元格之后,应该先将单元格跑通,然后再根据单元格的结果选择是修改已有单元格还是继续生成新的单元格
请用 Python(pandas + numpy + scipy)实现一个“最大熵 + 时间平滑”模型,用 2026_MCM_Problem_C_Data.csv 反推出 DWTS 每赛季每周每位选手的观众投票份额 p_{i,t}(以及用固定总票数换算的 fan votes)。
【目标】
对每个赛季 s:
- 输出每周在场选手的投票份额 p_{i,t}(sum=1),以及 votes=TOT*p;
- 校验用节目规则(percent/rank/bottom2)得到的淘汰/名次是否与真实一致;
- 给出不确定性:每周熵、以及bootstrap/多初值重复求解后的置信区间。
【数据读取与预处理】
- 读入 csv。
- 找出所有评分列:正则 r"week(\d+)_judge(\d+)_score"。
- 对每行(celebrity-season)计算每周评委总分:
J_{i,t} = sum(该周所有 judgeY_score,忽略 NA)。 - 对每个赛季 s:
- contestants = 该季所有选手 i
- T_s = 最大的 week index,使得存在至少一个选手 J_{i,t} 非NA且 >0
- 对每周 t=1..T_s:
C_t = {i: J_{i,t} > 0} # 在场 - 对每周 t=1..T_s-1:
E_t = C_t \ C_{t+1} # 该周结束淘汰(可能为空/多人)
R_t = C_t \ E_t # 幸存 - 决赛周 t=T_s:从 placement 列得到最终名次 place(i)(1最好)。对 C_T 中任意 place(i)<place(j) 生成排序约束。
【赛季投票规则(method by season)】
- seasons 3..27: percent 规则
- seasons 1..2: rank 规则(直接淘汰最差)
- seasons 28..34: rank + bottom two (judges save):
第一问只要求“实际淘汰者必须在 bottom two”,不要求一定是最差。
【未知量与参数化】
为了自动满足 p_t 在 simplex 上:
- 对每周 t 的在场选手 i∈C_t 定义自由变量 z_{i,t} ∈ R
- 令 p_{i,t} = softmax(z_t)i = exp(z) / sum_{j∈C_t} exp(z_{j,t})
这样 p_{i,t}>0 且 sum=1。
【目标函数:最大熵 + 时间平滑 + 约束罚项】
对赛季 s 的所有周一起优化,最小化 loss =
Σ_t [ -H(p_t) ] + λ Σ_{t>=2} KL(p_t || q_{t-1→t}) + β Σ_t penalty_t + β_final * penalty_final
其中:
- 熵 H(p_t) = - Σ_i p_{i,t} log(p_{i,t})
- 上一周分布映射到本周:q_{t-1→t,i} = p_{i,t-1} / Σ_{j∈C_t} p_{j,t-1} (i∈C_t)
- KL(p||q)= Σ_i p_i log(p_i/q_i)
实现时用 eps=1e-12 防 log(0)。
【约束如何变成 penalty(hinge-square)】
设 margin δ=1e-6。
penalty_t 用 hinge^2,保证接近“硬一致”。
(1) percent 赛季(3..27)
- 评委百分比 a_{i,t} = J_{i,t}/Σ_{j∈C_t}J_
- 合成得分 S_{i,t}=a_{i,t}+p_{i,t} (越大越安全)
- 若该周淘汰集合 E_t 非空:
违背量 viol_t = max_{e∈E_t, r∈R_t} max(0, S_{e,t} - S_{r,t} + δ)
penalty_t = viol_t^2 - 若 E_t 为空:penalty_t=0
(2) rank 赛季(1..2 以及 28..34)
- 评委名次 rJ:对 J_{i,t} 做降序排名,1最好;并列用平均名次。
- 观众软排名(温度 tau,建议 tau=0.02 可网格搜索):
sigmoid(x)=1/(1+exp(-x))
rF_{i,t} = 1 + Σ_{k≠i} sigmoid((p_{k,t}-p_{i,t})/tau) - 合成坏度 B_{i,t} = rJ_{i,t}+rF_{i,t} (越大越差)
A) seasons 1..2:直接淘汰最差
若 E_t 非空:
viol_t = max_{e∈E_t, r∈R_t} max(0, B_{r,t} - B_{e,t} + δ)
penalty_t = viol_t^2
B) seasons 28..34:bottom two + judges choose
第一问只要求:实际淘汰者 e ∈ bottom2(B)
用软排名再对 B 排序得到 rB(1最好,n最差):
rB_{i,t} = 1 + Σ_{k≠i} sigmoid((B_{i,t}-B_{k,t})/tauB)
n=|C_t|
若单淘汰:要求 rB_{e,t} >= n-1
viol_t = max(0, (n-1) - rB_{e,t})
penalty_t = viol_t^2
若多人淘汰 m=|E_t|:要求每个 e∈E_t 的 rB_{e,t} >= n-m
viol_t = max_{e∈E_t} max(0, (n-m) - rB_{e,t})
penalty_t = viol_t^2
(tauB 可取 0.5 或 1.0,亦可网格搜索)
(3) 决赛名次 penalty_final
决赛周 T 的在场选手 C_T 有 placement:
- percent:S_i,T = a_i,T + p_i,T
对任意 place(i)<place(j),违背 = max(0, S_{j,T}-S_{i,T}+δ) - rank:用 B_i,T(越小越好)
对任意 place(i)<place(j),违背 = max(0, B_{i,T}-B_{j,T}+δ)
把所有 pair 的违背取 max 或求和平方,做 penalty_final。
【优化器】
- 用 scipy.optimize.minimize,推荐 L-BFGS-B(无约束)或 Powell。
- 变量是所有周的 z 拼成一维向量;你需要维护 index mapping:
对每周 t:active_ids = list(C_t)
z_segment 长度 = len(active_ids) - 初值:
week1:z=0(uniform)
后续周:可用上一周的 z(裁剪后)或 0 - 超参建议网格:
lambda ∈ {0,0.1,0.3,1,3}
beta ∈ {1e2,1e3,1e4}
tau ∈ {0.01,0.02,0.05}
选取使得约束违背最小、且熵较大/平滑合理的解。
【输出】
- 一个长表 votes_estimates.csv,列:
season, week, celebrity_name, judge_total, method, p_share, votes (TOT=1e7*p),
entropy_week (H(p_t)/log(n)),
(可选) combined_score_or_B - 一张一致性表 consistency.csv:
season, week, n, eliminated_true(list), eliminated_pred(list),
metric (percent: bottom-m by S; rank: top-m by B; season28+: bottom2 hit),
violation_value, margin - 不确定性(bootstrap / multi-start):
- 重复 R=30 次:随机初值或对 J 加小噪声(例如 N(0,0.05))再求解(用 warm start)
- 对每个 (i,t) 计算 p 的均值、std、95%区间
- 合并到 votes_estimates.csv:p_mean, p_std, p_ci_low, p_ci_high
【一致性判定】
- percent:预测淘汰 = S 最小的 m=|E_t| 人;与真实 E_t 计算 Jaccard / hit rate;计算 margin = min_survivor(S) - max_elim(S)
- rank (1..2):预测淘汰 = B 最大的 m 人;同上
- rank (28..34):预测 bottom2 = B 最大的2人;检查真实淘汰者是否在其中(命中率)
你应该使用Notebook来完成代码,在你写代码的时候,每写完一个单元格之后,应该先将单元格跑通,然后再根据单元格的结果选择是修改已有单元格还是继续生成新的单元格
请把代码组织成可复用模块:parse_data(), build_season_struct(), loss_function(), solve_season(), bootstrap_uncertainty(), evaluate_consistency()。
并确保可直接运行:读取 csv -> 求解 -> 写出两个csv(+可选图)。

浙公网安备 33010602011771号