Here’s something I wrote in response to many, many questions about running a LOPSA chapter.
While I was doing the number crunching for my previous post, I collected a few other numbers and decided to visualize them.
Here’s ballots cast, grouped by election year.
2005, the first year of LOPSA, was the best. 2009 was unusually low. The average participation is 135 votes, though the two min and max values skew the mean a bit.
Candidates and seats available on one graph:
LOPSA consistently has only one more candidate than there are seats. What this means is that if you run for Board of Directors, you have a non-insignificant chance of winning a seat. This is bad.
Number of votes needed to win:
As you can see, this is actually quite a low number.
My project for today was to do a bit of time series analysis on LOPSA election data. I was only able to obtain the timestamped data for 2011, 2012, and 2013, so it’s not quite as expansive as I would like. However, some interesting results were still to be had.
I had two questions of the data:
1) What impact did an announcement/reminder have on voter turnout?
2) Are there particular times of day that people are more likely to vote?
To start off, the raw data looked like this:
I decided that Excel was the easiest and most effective tool for the job. First, I needed to separate the dates from the times in order to answer my two questions. Some Excel magic solved this nicely:
To get just the date:
This gave me ten characters, starting from the left, which is just the date portion.
To get the time:
Same as above, but coming from the right, and only eight characters. I set the format to 24 hour, for ease of use.
I now had two columns with one piece of data in each, allowing for easier manipulation.
The date was still in MySQL format, but I needed it in a native Excel format:
Now my date was nicely cleaned up. Let’s get back to the time entries.
The time entries are down to the second, which I don’t need. In fact, for my purposes, I only need the time in hourly increments, so I rounded the time to the nearest hour:
=TIME(HOUR(<TIME ENTRY>),MROUND(MINUTE(<TIME ENTRY>),”60″),0)
Awesome. Again, I set this to 24 hour format.
Next, I needed to create bins for my two data sets. Bins are just “buckets”, or intervals.
For date, since I know that the elections only went from 6-1 to 6-15, I created a list of those dates.
For times, I creatde a list of times in 24 hour format, from 00:00 to 23:00.
Now to count the frequencies.
I have the list of dates in column D and my list of times in column F.
Date bins are in column G and time bins in column H.
In column I, I selected one more cell than I had bins (eg, I have 15 voting days, so I selected 16 cells). I then used this formula:
Hit CMD+Enter (on a Mac) to create an array. Result is a list of hits for each bin, with a zero in the last one. Repeat for time bins.
Then it was a simple matter to create the graphs (I used a clustered column graph).
The results were pretty interesting.
Here’s 2011 through 2013, votes grouped by day. (click images to enlarge)
In order to get something useful from this, I went through my email and found when announcements were made about elections and then added a note (I only have data for 2012 and 2013 for this, so 2011 is not pictured):
Now that’s much more interesting. On 2013, it’s pretty clear that making an announcement is closely correlated with higher voter turnout. 2011’s isn’t as clear. Not sure why.
How about times? Note that times are in UTC (subtract 4 hours for EST and 7 hours for PST).
These results are pointing out that most ballots were cast in the afternoon and evening. Correlating these with announcement times would require some more work that I haven’t done.
Just out of curiosity, I ran 2011, 2012, and 2013’s ballot times through together, and came up with this:
This graph seems to agree with the above assessment.
Based on this data, I would conclude a few things:
1) Time voting announcements carefully, knowing that people tend to vote in the afternoon and evening, and not so much in the mornings.
2) Turnout tends to be higher immediately following announcements, with weekends showing low turn-out.
3) The best times for a reminder are for the email to hit the inbox after lunch on a weekday or immediately after work on a weekday. Avoid sending reminders on a weekend or late at night on a weekday.