Graph ALL THE THINGS with matplotlib

Matplotlib is one of those libraries that everybody loves. And I mean everybody. Google loves it. #python loves it. And #matplotlib definitely loves it.

I found the library as I was looking for something to graph employee performance data for a client. I’ve already posted about that adventure, so I’ll keep this blog just to Matplotlib, which was by far the most popular recommendation I saw. On my ubuntu system, all it took was a quick “sudo apt-get install python-matplotlib”, and it installed all the required dependencies (of which there are many). Be warned though, it takes a while. After that, all you need is a couple of imports and you’re set. I love when things are simple.

Once I got into using the library, though, I found it to be a bit more complicated. A few times I found that the documentation didn’t help much, and had to visit #matplotlib. That said, once I started fiddling around with it and looked at a few examples on the webpage I started to get the hang of it.

For our client, we just needed a few simple bar graphs of things like revenues per employee for the month and for the year. For the most part, I copy/pasted a basic bar graph example from the matplotlib samples and edited it to my needs, then wrapped it in a graph() function to make it callable. Here’s the whole bit of code handling the graphing of data.
[sourcecode language=”python”]
def graph(x_keys, bar_values, number, y_label, graph_title, save_name, has_dollars, old_records_save_name):

ind = numpy.arange(number) # the x locations for the groups
width = 0.35 # the width of the bars

fig = plot.figure()
ax = fig.add_subplot(111)
new_bars=[]

rects1 = ax.bar(ind, bar_values, width, color=’#6699CC’)

ax.set_ylabel(y_label)
ax.set_title(graph_title)
ax.set_xticks(ind+(width/2.0))
ax.set_xticklabels( x_keys )
ax.margins(.05)
ax.set_xlim(-.5, number)
if has_dollars=="yes":
for item in ax.get_yticks():
new_bars.append("$"+str(item))
ax.set_yticklabels(new_bars)

def autolabel(rects):
for rect in rects:
height = rect.get_height()
ax.text(rect.get_x()+(rect.get_width()/2.0), .8*height, ‘%d’%int(height), ha=’center’, va=’bottom’)

autolabel(rects1)
t=ax.title
t.set_y(1.07)
fig=plot.gcf()
fig.set_size_inches(4,4)
fig.tight_layout()
plot.subplots_adjust(wspace = 0.05)
plot.savefig(save_name, format="png")
plot.savefig(old_records_save_name, format="png")[/sourcecode]

Parameters that I gave it were x_keys, bar_values, number, y_label, graph_title, save_name, has_dollars and old_records_save_name. Each time you call the function, it saves a month- and year-to-date graph (old_records_save_name is the variable the program uses to save the files as past records. It was supposed to imply it’s an old graph record. In actuality what it does is save files according to month, so that when the month changes over, the old files are left and a new one is created, preserving the old files.). Most of the parameters are self-explanatory: x_keys is used to place the values on the x-axis, bar_Values is the value to be graphed, number is used to help set the x-axis values, y_label is the label for the y-axis, graph_title is the graph’s title, save_name is the file’s save name and has_dollars is a boolean check to see if the graph needs to display a “$” sign for the values it’s graphing.

The new_bars parameter is a list I had to create to allow $ signs to be used for y-tick labels, which you see go into effect at the line that starts “if has_dollars==’yes'”. There, I call the get_yticks function so that I can get all the values that are being used for labels on the y-ticks. Then I cycle through them and prepend a $ sign to the value, then save them all in the list. Finally i call set_yticklabels with the new values as the parameter. It’s not too complicated, but required a little snooping around to see how the y_tick labels were set.

I also had to add in a margin to get the graphs off of the axes (because it looked ugly), and setting the xlim also helped in that. In the autolabel function, you’ll notice the rect.get_width()/2.0) portion of code. This just sets the x-labels to be in the middle of the tick, otherwise they’re offset a bit. Finally, I added the fig.tight_layout() call and fig.set_size_inches(4,4) call. These just helped to size the image and keep it clean looking. Tight_layout() is a built-in function that comes with the newer versions of matplotlib, and hot damn I like it. It really made cleaning up the graphs easy.

All in all, I’d have to recommend matplotlib to anybody looking for a way to graph data. It has some pretty fancy features. The only downside is that the documentation was a little tricky at times, though I think somebody in #matplotlib mentioned that they’re working on that.