Starting, Building, and Running a LOPSA Chapter

Here’s something I wrote in response to many, many questions about running a LOPSA chapter.

 

http://mikejulian.com/startinglopsachapter.pdf

Radeon HD 4650 (R730), CentOS 6.4, and Xorg

I’ve got a desktop with a Radeon HD 4650 PCIe (R730 chipset) video card in it, capable of dual DVI outputs. In studying for my RHCE, I’ve got it running CentOS 6, but was having some trouble getting a good resolution out of it (multiple VM console windows on 1024×768 is rough, man).

I spent several hours digging around and finally ran cross this link which had a solution: add radeon.modeset=1 to the kernel parameters. I did that and Xorg immediately picked up the Radeon driver and the new resolutions.

A More General Look at LOPSA Elections

While I was doing the number crunching for my previous post, I collected a few other numbers and decided to visualize them.

Here’s ballots cast, grouped by election year.

Ballots Cast

 

2005, the first year of LOPSA, was the best. 2009 was unusually low. The average participation is 135 votes, though the two min and max values skew the mean a bit.

 

 

Candidates and seats available on one graph:

Candidates & Seats

 

LOPSA consistently has only one more candidate than there are seats. What this means is that if you run for Board of Directors, you have a non-insignificant chance of winning a seat. This is bad.

 

Number of votes needed to win:

Votes Needed to Win

 

As you can see, this is actually quite a low number.

Ballot analysis on the past three LOPSA elections

My project for today was to do a bit of time series analysis on LOPSA election data. I was only able to obtain the timestamped data for 2011, 2012, and 2013, so it’s not quite as expansive as I would like. However, some interesting results were still to be had.

I had two questions of the data:

1) What impact did an announcement/reminder have on voter turnout?

2) Are there particular times of day that people are more likely to vote?

 

Digging In

To start off, the raw data looked like this:

2013-06-01 16:04:56
2013-06-01 18:09:21
2013-06-02 00:59:58
2013-06-02 04:03:41
2013-06-02 06:06:08
2013-06-02 09:10:34
2013-06-02 13:57:09

I decided that Excel was the easiest and most effective tool for the job. First, I needed to separate the dates from the times in order to answer my two questions. Some Excel magic solved this nicely:

To get just the date:

=LEFT(<TIMESTAMP ENTRY>,10)

This gave me ten characters, starting from the left, which is just the date portion.

To get the time:

=RIGHT(<TIMESTAMP ENTRY>,8)

Same as above, but coming from the right, and only eight characters. I set the format to 24 hour, for ease of use.

I now had two columns with one piece of data in each, allowing for easier manipulation.

The date was still in MySQL format, but I needed it in a native Excel format:

=TEXT(<DATE ENTRY>,”m/d/yy”)

Now my date was nicely cleaned up. Let’s get back to the time entries.

The time entries are down to the second, which I don’t need. In fact, for my purposes, I only need the time in hourly increments, so I rounded the time to the nearest hour:

=TIME(HOUR(<TIME ENTRY>),MROUND(MINUTE(<TIME ENTRY>),”60″),0)

Awesome. Again, I set this to 24 hour format.

Next, I needed to create bins for my two data sets. Bins are just “buckets”, or intervals.

For date, since I know that the elections only went from 6-1 to 6-15, I created a list of those dates.

For times, I creatde a list of times in 24 hour format, from 00:00 to 23:00.

Now to count the frequencies.

I have the list of dates in column D and my list of times in column F.

Date bins are in column G and time bins in column H.

In column I, I selected one more cell than I had bins (eg, I have 15 voting days, so I selected 16 cells). I then used this formula:

=FREQUENCY(D2:D146,G2:G16)

Hit CMD+Enter (on a Mac) to create an array. Result is a list of hits for each bin, with a zero in the last one. Repeat for time bins.

Then it was a simple matter to create the graphs (I used a clustered column graph).

The Results

The results were pretty interesting.

Here’s 2011 through 2013, votes grouped by day. (click images to enlarge)

LOPSA 2011 Election - Votes by day

 

LOPSA 2012 Election - Votes by day

 

LOPSA 2013 Election - Votes by day

 

In order to get something useful from this, I went through my email and found when announcements were made about elections and then added a note (I only have data for 2012 and 2013 for this, so 2011 is not pictured):

LOPSA 2012 Election - Votes by day (with announcement lines)

LOPSA 2013 Election - Votes by day (with announcement lines)

 

Now that’s much more interesting. On 2013, it’s pretty clear that making an announcement is closely correlated with higher voter turnout. 2011′s isn’t as clear. Not sure why.

How about times? Note that times are in UTC (subtract 4 hours for EST and 7 hours for PST).

LOPSA 2012 Election - Votes by hour

LOPSA 2013 Election - Votes by hour

 

These results are pointing out that most ballots were cast in the afternoon and evening. Correlating these with announcement times would require some more work that I haven’t done.

Just out of curiosity, I ran 2011, 2012, and 2013′s ballot times through together, and came up with this:

Voting Times - 2011-2013

 

This graph seems to agree with the above assessment.

 

Conclusion

Based on this data, I would conclude a few things:

1) Time voting announcements carefully, knowing that people tend to vote in the afternoon and evening, and not so much in the mornings.

2) Turnout tends to be higher immediately following announcements, with weekends showing low turn-out.

3) The best times for a reminder are for the email to hit the inbox after lunch on a weekday or immediately after work on a weekday. Avoid sending reminders on a weekend or late at night on a weekday.

Pad or strip an IP with zeros

To remove padded zeros from an IP:

In sed:

sed -r ‘s/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)(.*)$/\1.\2.\3.\4/’

In Python:

import re
reSearch = r”^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)(.*)$”
reReplace = r”\1.\2.\3.\4″
re.sub(reSearch,reReplace,”IP HERE”)

To pad an IP with zeros:

In sed:

sed -r ‘s/^|\./&00/g; s/0*([0-9]{3})/\1/g’

Getting Started With Monitoring – Part 3: Components of a Monitoring System

I believe it helps to understand the individual components of monitoring in order to effectively build monitoring systems.

 

Measuring

This is the collection of data, whether through polling a device (remote check scripts) or the device pushing the data (SNMP traps).

 

Recording

Once we have the data, we have to do something with it. The obvious solution is to store it in a time series database or a logging system, depending on what the data is.

 

Logic

Now that we have the data, we need to use it, such as through scripts that compare the data entries to thresholds to determine the health of the service.

 

Alerting

We’ve run rules against the data, now we have to tell someone (assuming someone needs to be told). Alerting is about getting the right notifications to the right people in the right ways at the right times. This is actually harder than it sounds. (think: on-call rotation schedules, large teams with specialists, specific admins for specific services, etc)

 

Visualization

Someone is going to want to see a dashboard or a graph of what’s going on. The goal here is to display data in a useful way. State systems are excellent here, as are graphing front-ends for systems like Cacti and Graphite.

Getting Started With Monitoring – Part 2: Types of Monitoring

When people think of monitoring, the first thing that usually comes to mind is Nagios or a similar system, but this is a narrow view of all that is available. Nagios does not do well with logs, for example, and graphing metric data is an add-on. That brings me to what I want to talk about today:

 

State Monitoring
State monitoring is what everyone is familiar with. The old-hand of monitoring systems, Nagios, really cemented things in our minds. A host or service can be one of three states: OK, Warn, and Crit. Applying thresholds to checks allows for easy alerting. State monitoring is excellent at informing you that something has happened, and perfect for dashboards.

Examples of state monitoring systems: Nagios (and derivatives: check_mk, Opsview, Shinken).

 

Event Monitoring
Event monitoring is for logs and the like. It’s very difficult to use logs in a state system, which necessitates another system.

Examples of event monitoring systems: Splunk, graylog2, logstash

 

Metric Monitoring
A metric, at its core, is just a number. Octets on an interface (counters) is a metric, as is CPU utilization, memory utilization, etc. Metric data is more accurately called time series data.

State monitoring systems are actually metric-based, turning the metric into a state. The Nagios method: a check script runs, hitting a host or service. The value it gets back is a metric, which it compares to the thresholds you have set. The check script informs Nagios which state the value met (OK/warn/crit).

Storing time series data into a database is best done using a database built for the purpose, called a time series database. SQL-based databases are not well-suited for storage of time series data.

Examples of metric monitoring systems and time series databases: MRTG, Cacti (both using rrdtool), Graphite.

 

I’ll discuss the different components of a monitoring system in Part 3.