Things I’ve Worked On: Survivor by the Numbers

I wanted a reason to mess around with the Python visualization libraries matplotlib and bokeh, so I decided to look into some information on the TV show Survivor. The results are largely unremarkable, but I’ve created pretty nice-looking graphs (at least compared to what the libraries make when left to their own devices) so I thought I’d show them. Note that I chose day 20 as a common cut-off because merges usually happen around that time.

I may add additional graphs intermittently once I get some more time to play around. I’ll also return to add usable Python2 functions for each graph once I clean them up and make them adaptable to different data. Data are from here (through Survivor: Cambodia).

Age Distribution of Contestants

Blue = All Contestants, Green = Day 20 Contestants, Red = Finalists

See [1] below for relevant code.

Age Breakdown by Days Lasted

See [2] below for relevant code.

Regional Distribution of Contestants

Competitions Won by Finish Position

Finish Position (20=Last, 1=Winner)

Blue = Average, Light Blue = Maximum, Orange = Minimum

Performance vs. Average Votes Against

The radius of the bubbles is proportional to the number of contestants who fit that bin of days.

Appendix

Functions of variations of the above charts are provided below. Note that some elements were sacrificed to make them intelligible, adaptable and relatively simple.

[2]


import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file

def circle_graph(df, index, list_, unit):
    """Takes a pandas df with an index column and a list of columns (3 to 5 for 
    best results) with float values representing the unit and creates a circle graph"""
    
    """e.g. circle_graph(survivor_df, 'age group', ['15 to 24 y.o.', '25 to 34 y.o.', '35+ y.o.'], '%')"""
    
        #Set the colors of the bars in the bar graph based on "Tableau 20" colors.    
    tableau20 = [ '#1f77b4',  '#2ca02c','#7f7f7f','#8c564b','#d62728', '#bcbd22', '#9edae5','#17becf',
                 '#98df8a','#9467bd','#aec7e8','#c5b0d5', '#c7c7c7']
    width = 800
    height = 800
    inner_radius = 90
    outer_radius = 290
    big_angle = 2.0 * np.pi / (len(df) + 1)
    small_angle = big_angle / (len(list_) *2 + 1)
    p = figure(plot_width=width, plot_height=height, title="", x_axis_type=None, y_axis_type=None,
    x_range=(-420, 420), y_range=(-420, 420), min_border=0, outline_line_color="white")
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None

    #draw the large wedges
    angles = np.pi/2 - big_angle/2 - df.index.to_series()*big_angle
    p.annular_wedge(0, 0, inner_radius, outer_radius, -big_angle + angles, angles, color='#cfcfcf',)
    
    #find the maximum value for the concentric rings of values
    counter = 0
    column_max = list()
    for column in list_:
        column_max.append(max(df[column]))
        counter += 1
    label_max = int(max(column_max) + 10 - max(column_max)%10)
    
    #draw the small wedges and labels
    bar_color = {}
    counter = 0
    for column in list_:
        bar_color[column] = tableau20[counter]
        p.annular_wedge(0, 0, inner_radius, 90 + df[column]*(200/float(label_max)),
                -big_angle+angles+(2*counter+1)*small_angle, -big_angle+angles+(2*counter+2)*small_angle,
                color=bar_color[column])
        p.rect([-40, -40, -40], [37-counter*18], width=30, height=13, color=bar_color[column])
        p.text([-15, -15, -15], [37-counter*18], text=[column], text_font_size="9pt", text_align="left", text_baseline="middle")
        counter += 1
    
    #draw the rings and corresponding labels
    labels = np.array(range((label_max+10)/10))*10
    radii = 90 + labels* (200/float(label_max))
    p.circle(0, 0, radius=radii[:-1], fill_color=None, line_color="#E6E6E6")
    p.text(0, radii, [str(z)+str(unit) for z in labels[:-1]], text_font_size="8pt", text_align="center", text_baseline="middle")
    
    #draw the spokes separating 
    p.annular_wedge(0, 0, inner_radius-10, outer_radius+10, -big_angle+angles, -big_angle+angles, color="black")
    
    xr = radii[-1]*np.cos(np.array(-big_angle/2 + angles))
    yr = radii[-1]*np.sin(np.array(-big_angle/2 + angles))   
    label_angle=np.array(-big_angle/2+angles)
    label_angle[label_angle < -np.pi/2] += np.pi   
    p.text(xr, yr, df[index], angle=label_angle, text_font_size="9pt", text_align="center", text_baseline="middle")

    output_file("example.html", title="example.py")
    show(p)

test_df = pd.read_csv(r"C:\Users\Zachery McKinnon\Documents\survivor_demographics_agexdayslasted1.csv")

One thought on “Things I’ve Worked On: Survivor by the Numbers”

Marco says:

April 15, 2020 at 1:28 pm

Hi Zachery,
I was looking at the article “things I’ve worked on: survivo by the number” (https://zacherymckinnon.com/2016/09/26/things-ive-worked-on-survivor-by-the-numbers/).
I was no at able to find the dataset so I was no able to try the Pyhton Code. I looked also at https://survivor.fandom.com/wiki/List_of_Survivor_contestants but
I was no able to find the dataset. Could you please send me or tell me how to get it.

I tried also different type of dataset but I was not able to make one working.
What is record format I have to prepare?
I get also an error with: [’15 to 24 y.o.’, ’25 to 34 y.o.’, ’35+ y.o.’]

KeyError: ’15 to 24 y.o.’

Any ideas?
Thanks,
Marco

LikeLike

z.mckinnon

musings

Things I’ve Worked On: Survivor by the Numbers

Age Distribution of Contestants

Age Breakdown by Days Lasted

Regional Distribution of Contestants

Competitions Won by Finish Position

Performance vs. Average Votes Against

Appendix

One thought on “Things I’ve Worked On: Survivor by the Numbers”

Leave a comment Cancel reply

Age Distribution of Contestants

Age Breakdown by Days Lasted

Regional Distribution of Contestants

Competitions Won by Finish Position

Performance vs. Average Votes Against

Appendix

Share this:

One thought on “Things I’ve Worked On: Survivor by the Numbers”

Leave a comment Cancel reply