Things I’ve Worked On: Survivor by the Numbers

I wanted a reason to mess around with the Python visualization libraries matplotlib and bokeh, so I decided to look into some information on the TV show Survivor. The results are largely unremarkable, but I’ve created pretty nice-looking graphs (at least compared to what the libraries make when left to their own devices) so I thought I’d show them. Note that I chose day 20 as a common cut-off because merges usually happen around that time.

I may add additional graphs intermittently once I get some more time to play around. I’ll also return to add usable Python2 functions for each graph once I clean them up and make them adaptable to different data. Data are from here (through Survivor: Cambodia).

Age Distribution of Contestants

suvivor_age

Blue = All Contestants, Green = Day 20 Contestants, Red = Finalists

See [1] below for relevant code.

Age Breakdown by Days Lasted

graphfinalx-2

See [2] below for relevant code.

Regional Distribution of Contestants 

Competitions Won by Finish Position

competitions_won

Finish Position (20=Last, 1=Winner)

Blue = Average, Light Blue = Maximum, Orange = Minimum

Performance vs. Average Votes Against

votes_against

The radius of the bubbles is proportional to the number of contestants who fit that bin of days.

Appendix

Functions of variations of the above charts are provided below. Note that some elements were sacrificed to  make them intelligible, adaptable and relatively simple.

[2]


import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file

def circle_graph(df, index, list_, unit):
    """Takes a pandas df with an index column and a list of columns (3 to 5 for 
    best results) with float values representing the unit and creates a circle graph"""
    
    """e.g. circle_graph(survivor_df, 'age group', ['15 to 24 y.o.', '25 to 34 y.o.', '35+ y.o.'], '%')"""
    
        #Set the colors of the bars in the bar graph based on "Tableau 20" colors.    
    tableau20 = [ '#1f77b4',  '#2ca02c','#7f7f7f','#8c564b','#d62728', '#bcbd22', '#9edae5','#17becf',
                 '#98df8a','#9467bd','#aec7e8','#c5b0d5', '#c7c7c7']
    width = 800
    height = 800
    inner_radius = 90
    outer_radius = 290
    big_angle = 2.0 * np.pi / (len(df) + 1)
    small_angle = big_angle / (len(list_) *2 + 1)
    p = figure(plot_width=width, plot_height=height, title="", x_axis_type=None, y_axis_type=None,
    x_range=(-420, 420), y_range=(-420, 420), min_border=0, outline_line_color="white")
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None

    #draw the large wedges
    angles = np.pi/2 - big_angle/2 - df.index.to_series()*big_angle
    p.annular_wedge(0, 0, inner_radius, outer_radius, -big_angle + angles, angles, color='#cfcfcf',)
    
    #find the maximum value for the concentric rings of values
    counter = 0
    column_max = list()
    for column in list_:
        column_max.append(max(df[column]))
        counter += 1
    label_max = int(max(column_max) + 10 - max(column_max)%10)
    
    #draw the small wedges and labels
    bar_color = {}
    counter = 0
    for column in list_:
        bar_color[column] = tableau20[counter]
        p.annular_wedge(0, 0, inner_radius, 90 + df[column]*(200/float(label_max)),
                -big_angle+angles+(2*counter+1)*small_angle, -big_angle+angles+(2*counter+2)*small_angle,
                color=bar_color[column])
        p.rect([-40, -40, -40], [37-counter*18], width=30, height=13, color=bar_color[column])
        p.text([-15, -15, -15], [37-counter*18], text=[column], text_font_size="9pt", text_align="left", text_baseline="middle")
        counter += 1
    
    #draw the rings and corresponding labels
    labels = np.array(range((label_max+10)/10))*10
    radii = 90 + labels* (200/float(label_max))
    p.circle(0, 0, radius=radii[:-1], fill_color=None, line_color="#E6E6E6")
    p.text(0, radii, [str(z)+str(unit) for z in labels[:-1]], text_font_size="8pt", text_align="center", text_baseline="middle")
    
    #draw the spokes separating 
    p.annular_wedge(0, 0, inner_radius-10, outer_radius+10, -big_angle+angles, -big_angle+angles, color="black")
    
    xr = radii[-1]*np.cos(np.array(-big_angle/2 + angles))
    yr = radii[-1]*np.sin(np.array(-big_angle/2 + angles))   
    label_angle=np.array(-big_angle/2+angles)
    label_angle[label_angle < -np.pi/2] += np.pi   
    p.text(xr, yr, df[index], angle=label_angle, text_font_size="9pt", text_align="center", text_baseline="middle")

    output_file("example.html", title="example.py")
    show(p)

test_df = pd.read_csv(r"C:\Users\Zachery McKinnon\Documents\survivor_demographics_agexdayslasted1.csv")

One thought on “Things I’ve Worked On: Survivor by the Numbers

  1. Hi Zachery,
    I was looking at the article “things I’ve worked on: survivo by the number” (https://zacherymckinnon.com/2016/09/26/things-ive-worked-on-survivor-by-the-numbers/).
    I was no at able to find the dataset so I was no able to try the Pyhton Code. I looked also at https://survivor.fandom.com/wiki/List_of_Survivor_contestants but
    I was no able to find the dataset. Could you please send me or tell me how to get it.

    I tried also different type of dataset but I was not able to make one working.
    What is record format I have to prepare?
    I get also an error with: [’15 to 24 y.o.’, ’25 to 34 y.o.’, ’35+ y.o.’]

    KeyError: ’15 to 24 y.o.’

    Any ideas?
    Thanks,
    Marco

    Like

Leave a Reply to Marco Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s