Learning my way around gspread, a library for Google Spreadsheets in Python

My Python work has been on hold for a while now as I’ve mostly been focused on Ruby on Rails stuff at work. But I had the opportunity recently to write a script in Python and remembered how much I love that language. The project was simply to create a script that could call out to Google Docs Spreadsheets to get employee and invoice data, and then be able to graph the information to track employee performance. I used two libraries to do that: gspread and matplotlib, both of which I’d wholeheartedly recommend. As the name suggests, gspread was used to get data from Google Spreadsheets. Matplotlib was what I used for graphing. I wound up with a script of around 200 lines (you can find here or here.), and had a lot of fun doing it. This post focuses mainly on gspread and some of the logic in the code I created.

Google docs has its own API (documentation for which can be found here or here. Frankly, it was way more difficult to find than it *should* have been). I used gspread because the documentation for it was so much clearer than google docs’ was. I didn’t mess around with anything in the library more than logging in, opening the spreadsheet and getting all the data from it. The library also provides clean methods to get data.

I got started by following the install directions here. To install, you can paste these three commands in your command line (it worked for me on ubuntu, and i’d assume your own OS of choice will work with the equivalent command to change directory).

[sourcecode language=”bash”]
git clone https://github.com/burnash/gspread.git
cd gspread
python setup.py install
[/sourcecode]

You can also install from Pypi or through easy_install.

THE CODE

I was able to get all the data from google I needed with six lines of code:

[sourcecode language=”python”]
gc = gspread.login(sys.argv[1], sys.argv[2])
sheet=gc.open("Invoices")
timesheet=sheet.worksheet("time")
invoice_sheet=sheet.worksheet("Invoice")
employee_time = timesheet.get_all_values()
invoice_row_data = invoice_sheet.get_all_values()
[/sourcecode]

You start by creating an object that you can then call other methods on, in this case “gc.”  As you can guess, gc.open will open the spreadsheet titled “Invoices”. Gspread also makes it so you can open spreadsheets by url or by key. Within a spreadsheet are one or more worksheets, which we open through “sheet.worksheet(‘time’). This finds the worksheets within the “Invoices” spreadsheet (think of them as different pages of the same spreadsheet). Then I call get_all_values to pull the entire worksheet into memory, which keeps from having to make time-intensive calls out to Google for more information.

You can also pull individual cell data by selecting it by row and column (i.e., row 3, column 1) or by cell name (i.e., cell B1). You do that by calling mycell=timesheet.cell(3, 1) or mycell=timesheet.acell(‘B1’), respectively. I didn’t use these methods in my script, but to view the value of that cell, I believe you’d just call mycell.value

To update cells, you’d do something like mycell.update_acell(‘C4’, ‘Updated Value’) or mycell.update_cell(3, 4, ‘Updated Value’)

Or you can update cells in batches:

myworksheet.update_cells(cells), where cells is a list of cells to update.

I think I liked the library so much because it was so intuitive, and produced such easily readable code. Definitely check it out if you get the time.

THE HARD STUFF

Now, gspread was easy enough to use; the hardest part of the whole script was in parsing the data. And that came in the form of hashes. Inside hashes. Inside lists. Here’s the relevant code:

[sourcecode language=”python”]
employee_invoices={}
total_revenue={}
callbacks={}
service_agreements={}
average_rev={}
# Finds all unique employee names and adds them to the appropriate hash, which were created above
for employee in employee_time_cards:
employee_invoices[employee.employee_name]=[]
total_revenue[employee.employee_name]=0
callbacks[employee.employee_name]=0
service_agreements[employee.employee_name]=0
average_rev[employee.employee_name]=0

# Appends the invoice numbers to the right employee who worked on the invoice, making a list of invoices per employee as a hash
for employee_name in employee_invoices:
for worker in employee_time_cards:
if employee_name==worker.employee_name:
employee_invoices[employee_name].append(worker.invoice_number)

# Searches for invoice numbers and adds the right revenue, callback and other information to a few hashes of employees
# in order to graph the data.
for invoice in invoice_array:
for employee in employee_time_cards:
if employee.invoice_number==invoice.invoice_number:
total_revenue[employee.employee_name]+=invoice.amount
service_agreements[employee.employee_name]+=invoice.agreements_sold
if invoice.callback=="yes":
callbacks[employee.employee_name]+=1
average_rev[employee.employee_name]=(total_revenue[employee.employee_name])/len(employee_invoices[employee.employee_name])
[/sourcecode]

The first five lines of that code sets hashes that will be used to store the data. The five lines after “for employee in employee_time_cards:” creates a key for each employee in the appropriate hash, which were just instantiated. The next for loop is where the difficulties started being had. Employee_invoices is a hash of objects, and employee_time_cards is actually a list. So what I have the script do is go through each of the keys in the employee_invoices hash (which are employee names). For each of those, I have it check the employee_name of each object in employee_time_cards. If the two match, I add that invoice to the employee_invoices hash for the appropriate employee. This gives me a list of invoices each employee has worked on, in the form of a hash.

The next for loop is where things got tricky for me. invoice_array is a list of invoice objects, while employee_time_cards is a hash of employee time card objects. If the invoice number in the list of invoice objects matches an invoice number for an employee, then the program adds that invoice object to one of several *other* hashes. My first implementation of the script didn’t use classes/objects, so the code was extremely more difficult to read. It was full of things like if employee[4]==invoice[2]. Creating classes/objects (at my boss’s suggestion) improved the readability about tenfold. Looking at it now with a little explanation it makes a bit of sense. But it sure didn’t before I implemented objects. Lesson? OOP sure does align programming concepts with human concepts pretty good.

Hopefully that makes some sense. One of my coworkers said he remembered when he learned how to use hashes in hashes in arrays. Seems you can do some pretty cool stuff with it. Data parsing: it’s where the money’s at!

I spent the rest of the script figuring out matplotlib, a highly recommended and popular library for graphing data. But I’ll save that for another post. Till then, check out gspread for some awesome Google interaction.