Friday, May 24, 2013

Decorator for Serializing MongoDB Dictionaries to JSON in Python

Before you read this I'm going on the record stating that this problem has likely been solved hundreds of times before. So here is yet another take on how I am managing to serialize Python objects that are returned as a result of a query to a MongoDB database. The biggest issue I was running into is when you try to serialize something like a dictionary that has a MongoDB ID in it, the data type of that ID is bson.objectid.ObjectId, and apparently that data type does not serialize to JSON well without running something like str(_id).

To address this I first started poking around the inter-tubes and found a really nice answer to a post written by Shabbyrobe that will recursively grab the keys and values of a class object so you can convert it to a dictionary (http://stackoverflow.com/questions/1036409/recursively-convert-python-object-graph-to-dictionary. This was a good start and I only needed a couple of small modifications to ensure that I check for any value that is of type ObjectId got converted correctly to a string.

The next thing I wanted was to make sure that I didn't have to do anything silly like this.

return [myService.serialize(item) for item in myService.getMongoRecords()]

To address this I put this nifty recursive object walker method into a decorator. With a decorator I could decorate any route method in my Bottlypy application with @ajax and it would run this recursive object walk and make sure any response I return from a method decorated would have MongoDB IDs properly converted to string. Here's what that decorator code looks like.

from bson.objectid import ObjectId

def ajax(fn):
   def wrapper(*args, **kwargs):
      def _toDict(obj):
         if isinstance(obj, dict):
            for key in obj.keys():
               obj[key] = _toDict(obj[key])

            return obj

         elif hasattr(obj, "__iter__"):
            return [_toDict(item) for item in obj]

         elif hasattr(obj, "__dict__"):
            return dict([(key, _toDict(value)) for key, value in obj.__dict__.iteritems() if not callable(value) and not key.startswith("_")])

         else:
            if isinstance(obj, ObjectId):
               return str(obj)
            else:
               return obj

      result = fn(*args, **kwargs)
      return _toDict(result)

   return wrapper

To use this in my Bottlepy application I would have a method for an AJAX response route that looks something like this.

@route("/ajax/contact/:id", method="GET")
@ajax
def ajax_getContact(id):
   contactService = request.factory.getContactService()
   return contactService.read(id=id)

@route("/ajax/contacts", method="GET")
@ajax
def ajax_getContacts():
   contactService = request.factory.getContactService()
   return contactService.list()

Notice how each of these methods are decorated with our cool decorator? That ensures that the objects and lists I return are ready for proper JSON serialization that is automatically done by the Bottlepy framework. I hope this helps somebody. Not 100% sure this is a great way to do it, but it is solving my immediate problem. Cheers, and happy coding!

Wednesday, May 22, 2013

Comparing Classes in Python

Most object oriented languages have a means to compare two objects or classes. Java has the Comparator<T> interface. In C# you can override the Equals() method. Python naturally has a way to make your class comparable by using a magic method named __eq__. It takes an argument of the object you are comparing against, as well as the standard reference to self. In it you return True or False to indicate if the object passed in is equal to itself. Here's a sample.

class Sample():
   def __init__(self):
      self.name = ""
      self.age = 0

   def __eq__(self, other):
      return (self.name == other.name and 
       self.age == other.age)

The above class has a constructor method as defined by __init__ that sets up two properties named name and age. It then defines the __eq__ magic method. This is called automatically when you attempt to compare the equality of two objects of type Sample like so.

a = Sample()
a.name = "Bob"
a.age = 11

b = Sample()
b.name = "Julia"
b.age = 12

result = (a == b)
print result

The result of this comparison will be False as the two objects do not have the same name and age properties. And that is how you compare class objects in Python. Cheers, and happy coding!

Tuesday, May 21, 2013

Quick Blurb on Python Beaker and Memcached

Real quickly I wanted to post to show how one can configure Beaker for session management using Memcached as the session storage. It took me a minute and a few searches to put it all together, so hopefully this will help someone searching the web.

SESSION_OPTS = {
   "session.type": "ext:memcached",
   "session.cookie_expires": 14400,
   "session.auto": True,
   "session.url": "127.0.0.1:11211",
   "session.lock_dir": os.path.join(SESSION_PATH, "lock"),
   "session.data_dir": os.path.join(SESSION_PATH, "data")
}

I'm still not sure if the lock and data directories are needed, and this example only shows connecting to a single, localhost Memcached server. Just the same it will help me remember how to do it in the future. Cheers, and happy coding!

Saturday, April 20, 2013

How I Deployed a PHP App via FTP Using Fabric and Git

A project I've been working with for a bit now uses a popular hosting provider to host a PHP application I'm writing. My normal workflow with this project is to make changes, commit my code to my Git repository, push it to Github (a private account), then FTP the files manually to the hosting provider. This works fine most of the time though I'll admit that there are moments when I forget a file or two. Of course the application blows up and I have to view my Git log to see what files I may have missed when uploading the code.

Today I decided to do something about this little problem. Having recently discovered Fabric, the deployment framework and library for Python, I started crafting a script to automate the tasks of deployment. There are a number of ways to approach this, but for now I decided that I have only a couple of simple requirements.

  • Need to upload files from my latest commit (HEAD)
  • A single command to do the upload
  • Upload is done via FTP

Based on these requirements I needed to learn two additional things: how to use FTP in Python, and how to get a log from Git in Python. I won't go into great deal on how all this works but I will cover the basics. First I needed to be able to connect to an FTP server. Python has a library called ftplib which offers basic, fairly low-level FTP methods. Out of the gate the first method I write looks like this.

def _connectToFTP():
   print green("** Connecting to the server **")

   ftp = FTP(host=FTP_ADDRESS, user=FTP_USER, passwd=FTP_PASSWORD)
   return ftp

As you can see the FTP class is simple enough to use. Provide an address, user name, and password and you are connected to an FTP server and are given an object to perform further tasks with. Naturally the next task would be to upload files, but before we can do that we have to know what we are uploading. This would be defined as one of the requirements listed above where I wanted to upload the files from my last commit.

To satisfy the requirement for uploading files from my last commit I installed the library gitpython. GitPython, simply put, is a library used to allow your Python code to interact with Git repositories. Here I must admit that I had some struggles as some of the abstractions I failed to comprehend. As a result I had a hard time getting the results I wanted just using the pure library, so I ended up using their utility class that allows you to execute arbitrary Git commands on the command line and get the results back. This allowed me to get a list of files for a given commit hash.

def _gitLatestFiles():
   print green("** Connecting to Git **")

   g = Git(REPO_ROOT)
   repo = Repo(REPO_ROOT)
   headCommit = repo.head.commit

   print "Head commit revision: %s" % headCommit
   print "Message: %s" % headCommit.message

   result = g.execute(["git", "diff-tree", "--no-commit-id", "--name-only", "-r", str(headCommit)])
   files = result.split("\n")

   return _filterForValidFiles(fileList=files)

def _filterForValidFiles(fileList):
   return [f for f in fileList if f.startswith(("components/", "www/"))]

There are two methods here to get the results I want. The first is _gitLatestFiles(). This method talks to my local Git repository and gets the HEAD commit entry. With it I have a commit hash that I can use to pull logs for. I then use the library's execute() method that allows me to run command line Git and store the results. In my case I wanted to run a diff-tree and get only the names of the files modified for the specified commit hash. The result of the diff-tree is a single string of the modified files, so I needed to split them into a list on a newline character.

The next method you'll see is _filterForValidFiles(). It is called by the _gitLatestFiles() method. This piece of code ensures that the files modified are children of a specific set of directories. If they reside outside that directory I don't want to deploy them.

The final step in this process is to put it all together. We need to connect to the FTP server, get a list of changed files from the HEAD revision in Git, then push those files to the server. Let's see what that code looks like.

from __future__ import with_statement

import fabric
import os
from fabric.api import *
from fabric.colors import green, yellow, red
from ftplib import FTP
from git import *

###############################################################################
# SECTION: Constants
###############################################################################
DATABASE = {
   "local": {
      "userName": "user",
      "password": "password"
   }
}

FTP_ADDRESS = "ftp.something.com"
FTP_USER = "user"
FTP_PASSWORD = "password"
FTP_ROOT_DIR = "/appDirectory"

REPO_ROOT = "../"


###############################################################################
# SECTION: Private methods
###############################################################################
def _connectToFTP():
   print green("** Connecting to the server **")

   ftp = FTP(host=FTP_ADDRESS, user=FTP_USER, passwd=FTP_PASSWORD)
   return ftp

def _gitLatestFiles():
   print green("** Connecting to Git **")

   g = Git(REPO_ROOT)
   repo = Repo(REPO_ROOT)
   headCommit = repo.head.commit

   print "Head commit revision: %s" % headCommit
   print "Message: %s" % headCommit.message

   result = g.execute(["git", "diff-tree", "--no-commit-id", "--name-only", "-r", str(headCommit)])
   files = result.split("\n")

   return _filterForValidFiles(fileList=files)

def _filterForValidFiles(fileList):
   return [f for f in fileList if f.startswith(("components/", "www/"))]


###############################################################################
# SECTION: Actions
###############################################################################

def uploadLatest():
   print ""
   print green("** Upload latest changes **")

   ftp = _connectToFTP()
   files = _gitLatestFiles()

   for f in files:
      print yellow("Uploading file %s" % f)
      split = os.path.split(f)

      ftp.cwd(os.path.join(FTP_ROOT_DIR, split[0]))
      ftp.storlines("STOR %s" % split[1], open(os.path.join("../", f), "r"))

   ftp.quit()

The last method in the code, uploadLatest() is the one that glues it all together. As mentioned before it first connects to the FTP server. Then it gets a list of files to upload from the latest Git commit. After this we iterate over the file names and split each of them into directory and file name. We do this so we can issues a CWD (current working directory) command to the FTP server, and have a file name to pass to the STOR FTP command.

The next steps actually change the directory on the FTP server to match the path of the changed file. From here we send the STOR command by calling the storlines method on the FTP library. This method first takes the command to use, which is STOR followed by the name of the file to upload. The second argument is an open file handle to the actual file to upload. In this case we are just using the built-in Python open() method to open up the file in read mode. Finally when all is done we issue a QUIT command by calling the quit() method.

All of this code is saved in my /bin folder of the project in a file named fabfile.py. To run this I open up my terminal and execute fab uploadLatest. If you aren't familiar with Fabric take a look at the Fabric site and also my previous blog entry on the subject.

Happy coding!

Thursday, February 14, 2013

Python itertools: Display repeating permutations of a number set

Back in July of 2010 I blogged about displaying a set of repeating permutations of a number set. In this blog post I explained how my daughter had a kid's "spy safe" that had four buttons on it that could be used to enter a passcode. At the time I crafted a bit of Java code to display all possible combinations that could be entered to unlock and open the safe.

Now in my quest to explore the Python itertools module I wanted to try this exercise again. Turns out that Python's itertools module makes this easy with a method called product(). A product is defined as "...a mathematical operation which returns a set (or product set) from multiple sets". To do this the product() method takes an argument that is an iterable and a range. Range in this case indicates how many time to repeat the product set. So in our example, since you have to enter in 4 numbers for a passcode we would repeat the product four times.

from itertools import combinations_with_replacement, product

#
# Available numbers are 1 - 4
#
availableNumbers = range(1, 5)

#
# Enumerate over a product of the available number range.
#
for i, s in enumerate(product(availableNumbers, repeat=4)):
   print "Combination #%s: %s" % (i, s)

This gives us an output that looks like this:

Combination #0: (1, 1, 1, 1)
Combination #1: (1, 1, 1, 2)
Combination #2: (1, 1, 1, 3)
Combination #3: (1, 1, 1, 4)
Combination #4: (1, 1, 2, 1)
Combination #5: (1, 1, 2, 2)
Combination #6: (1, 1, 2, 3)
Combination #7: (1, 1, 2, 4)
Combination #8: (1, 1, 3, 1)
Combination #9: (1, 1, 3, 2)
Combination #10: (1, 1, 3, 3)
Combination #11: (1, 1, 3, 4)
Combination #12: (1, 1, 4, 1)
Combination #13: (1, 1, 4, 2)
Combination #14: (1, 1, 4, 3)
Combination #15: (1, 1, 4, 4)
Combination #16: (1, 2, 1, 1)
Combination #17: (1, 2, 1, 2)
Combination #18: (1, 2, 1, 3)
Combination #19: (1, 2, 1, 4)
Combination #20: (1, 2, 2, 1)
Combination #21: (1, 2, 2, 2)
Combination #22: (1, 2, 2, 3)
Combination #23: (1, 2, 2, 4)
Combination #24: (1, 2, 3, 1)
Combination #25: (1, 2, 3, 2)
Combination #26: (1, 2, 3, 3)
Combination #27: (1, 2, 3, 4)
Combination #28: (1, 2, 4, 1)
Combination #29: (1, 2, 4, 2)
Combination #30: (1, 2, 4, 3)
Combination #31: (1, 2, 4, 4)
Combination #32: (1, 3, 1, 1)
Combination #33: (1, 3, 1, 2)
Combination #34: (1, 3, 1, 3)
Combination #35: (1, 3, 1, 4)
Combination #36: (1, 3, 2, 1)
Combination #37: (1, 3, 2, 2)
Combination #38: (1, 3, 2, 3)
Combination #39: (1, 3, 2, 4)
Combination #40: (1, 3, 3, 1)
Combination #41: (1, 3, 3, 2)
Combination #42: (1, 3, 3, 3)
Combination #43: (1, 3, 3, 4)
Combination #44: (1, 3, 4, 1)
Combination #45: (1, 3, 4, 2)
Combination #46: (1, 3, 4, 3)
Combination #47: (1, 3, 4, 4)
Combination #48: (1, 4, 1, 1)
Combination #49: (1, 4, 1, 2)
Combination #50: (1, 4, 1, 3)
Combination #51: (1, 4, 1, 4)
Combination #52: (1, 4, 2, 1)
Combination #53: (1, 4, 2, 2)
Combination #54: (1, 4, 2, 3)
Combination #55: (1, 4, 2, 4)
Combination #56: (1, 4, 3, 1)
Combination #57: (1, 4, 3, 2)
Combination #58: (1, 4, 3, 3)
Combination #59: (1, 4, 3, 4)
Combination #60: (1, 4, 4, 1)
Combination #61: (1, 4, 4, 2)
Combination #62: (1, 4, 4, 3)
Combination #63: (1, 4, 4, 4)
Combination #64: (2, 1, 1, 1)
Combination #65: (2, 1, 1, 2)
Combination #66: (2, 1, 1, 3)
Combination #67: (2, 1, 1, 4)
Combination #68: (2, 1, 2, 1)
Combination #69: (2, 1, 2, 2)
Combination #70: (2, 1, 2, 3)
Combination #71: (2, 1, 2, 4)
Combination #72: (2, 1, 3, 1)
Combination #73: (2, 1, 3, 2)
Combination #74: (2, 1, 3, 3)
Combination #75: (2, 1, 3, 4)
Combination #76: (2, 1, 4, 1)
Combination #77: (2, 1, 4, 2)
Combination #78: (2, 1, 4, 3)
Combination #79: (2, 1, 4, 4)
Combination #80: (2, 2, 1, 1)
Combination #81: (2, 2, 1, 2)
Combination #82: (2, 2, 1, 3)
Combination #83: (2, 2, 1, 4)
Combination #84: (2, 2, 2, 1)
Combination #85: (2, 2, 2, 2)
Combination #86: (2, 2, 2, 3)
Combination #87: (2, 2, 2, 4)
Combination #88: (2, 2, 3, 1)
Combination #89: (2, 2, 3, 2)
Combination #90: (2, 2, 3, 3)
Combination #91: (2, 2, 3, 4)
Combination #92: (2, 2, 4, 1)
Combination #93: (2, 2, 4, 2)
Combination #94: (2, 2, 4, 3)
Combination #95: (2, 2, 4, 4)
Combination #96: (2, 3, 1, 1)
Combination #97: (2, 3, 1, 2)
Combination #98: (2, 3, 1, 3)
Combination #99: (2, 3, 1, 4)
Combination #100: (2, 3, 2, 1)
Combination #101: (2, 3, 2, 2)
Combination #102: (2, 3, 2, 3)
Combination #103: (2, 3, 2, 4)
Combination #104: (2, 3, 3, 1)
Combination #105: (2, 3, 3, 2)
Combination #106: (2, 3, 3, 3)
Combination #107: (2, 3, 3, 4)
Combination #108: (2, 3, 4, 1)
Combination #109: (2, 3, 4, 2)
Combination #110: (2, 3, 4, 3)
Combination #111: (2, 3, 4, 4)
Combination #112: (2, 4, 1, 1)
Combination #113: (2, 4, 1, 2)
Combination #114: (2, 4, 1, 3)
Combination #115: (2, 4, 1, 4)
Combination #116: (2, 4, 2, 1)
Combination #117: (2, 4, 2, 2)
Combination #118: (2, 4, 2, 3)
Combination #119: (2, 4, 2, 4)
Combination #120: (2, 4, 3, 1)
Combination #121: (2, 4, 3, 2)
Combination #122: (2, 4, 3, 3)
Combination #123: (2, 4, 3, 4)
Combination #124: (2, 4, 4, 1)
Combination #125: (2, 4, 4, 2)
Combination #126: (2, 4, 4, 3)
Combination #127: (2, 4, 4, 4)
Combination #128: (3, 1, 1, 1)
Combination #129: (3, 1, 1, 2)
Combination #130: (3, 1, 1, 3)
Combination #131: (3, 1, 1, 4)
Combination #132: (3, 1, 2, 1)
Combination #133: (3, 1, 2, 2)
Combination #134: (3, 1, 2, 3)
Combination #135: (3, 1, 2, 4)
Combination #136: (3, 1, 3, 1)
Combination #137: (3, 1, 3, 2)
Combination #138: (3, 1, 3, 3)
Combination #139: (3, 1, 3, 4)
Combination #140: (3, 1, 4, 1)
Combination #141: (3, 1, 4, 2)
Combination #142: (3, 1, 4, 3)
Combination #143: (3, 1, 4, 4)
Combination #144: (3, 2, 1, 1)
Combination #145: (3, 2, 1, 2)
Combination #146: (3, 2, 1, 3)
Combination #147: (3, 2, 1, 4)
Combination #148: (3, 2, 2, 1)
Combination #149: (3, 2, 2, 2)
Combination #150: (3, 2, 2, 3)
Combination #151: (3, 2, 2, 4)
Combination #152: (3, 2, 3, 1)
Combination #153: (3, 2, 3, 2)
Combination #154: (3, 2, 3, 3)
Combination #155: (3, 2, 3, 4)
Combination #156: (3, 2, 4, 1)
Combination #157: (3, 2, 4, 2)
Combination #158: (3, 2, 4, 3)
Combination #159: (3, 2, 4, 4)
Combination #160: (3, 3, 1, 1)
Combination #161: (3, 3, 1, 2)
Combination #162: (3, 3, 1, 3)
Combination #163: (3, 3, 1, 4)
Combination #164: (3, 3, 2, 1)
Combination #165: (3, 3, 2, 2)
Combination #166: (3, 3, 2, 3)
Combination #167: (3, 3, 2, 4)
Combination #168: (3, 3, 3, 1)
Combination #169: (3, 3, 3, 2)
Combination #170: (3, 3, 3, 3)
Combination #171: (3, 3, 3, 4)
Combination #172: (3, 3, 4, 1)
Combination #173: (3, 3, 4, 2)
Combination #174: (3, 3, 4, 3)
Combination #175: (3, 3, 4, 4)
Combination #176: (3, 4, 1, 1)
Combination #177: (3, 4, 1, 2)
Combination #178: (3, 4, 1, 3)
Combination #179: (3, 4, 1, 4)
Combination #180: (3, 4, 2, 1)
Combination #181: (3, 4, 2, 2)
Combination #182: (3, 4, 2, 3)
Combination #183: (3, 4, 2, 4)
Combination #184: (3, 4, 3, 1)
Combination #185: (3, 4, 3, 2)
Combination #186: (3, 4, 3, 3)
Combination #187: (3, 4, 3, 4)
Combination #188: (3, 4, 4, 1)
Combination #189: (3, 4, 4, 2)
Combination #190: (3, 4, 4, 3)
Combination #191: (3, 4, 4, 4)
Combination #192: (4, 1, 1, 1)
Combination #193: (4, 1, 1, 2)
Combination #194: (4, 1, 1, 3)
Combination #195: (4, 1, 1, 4)
Combination #196: (4, 1, 2, 1)
Combination #197: (4, 1, 2, 2)
Combination #198: (4, 1, 2, 3)
Combination #199: (4, 1, 2, 4)
Combination #200: (4, 1, 3, 1)
Combination #201: (4, 1, 3, 2)
Combination #202: (4, 1, 3, 3)
Combination #203: (4, 1, 3, 4)
Combination #204: (4, 1, 4, 1)
Combination #205: (4, 1, 4, 2)
Combination #206: (4, 1, 4, 3)
Combination #207: (4, 1, 4, 4)
Combination #208: (4, 2, 1, 1)
Combination #209: (4, 2, 1, 2)
Combination #210: (4, 2, 1, 3)
Combination #211: (4, 2, 1, 4)
Combination #212: (4, 2, 2, 1)
Combination #213: (4, 2, 2, 2)
Combination #214: (4, 2, 2, 3)
Combination #215: (4, 2, 2, 4)
Combination #216: (4, 2, 3, 1)
Combination #217: (4, 2, 3, 2)
Combination #218: (4, 2, 3, 3)
Combination #219: (4, 2, 3, 4)
Combination #220: (4, 2, 4, 1)
Combination #221: (4, 2, 4, 2)
Combination #222: (4, 2, 4, 3)
Combination #223: (4, 2, 4, 4)
Combination #224: (4, 3, 1, 1)
Combination #225: (4, 3, 1, 2)
Combination #226: (4, 3, 1, 3)
Combination #227: (4, 3, 1, 4)
Combination #228: (4, 3, 2, 1)
Combination #229: (4, 3, 2, 2)
Combination #230: (4, 3, 2, 3)
Combination #231: (4, 3, 2, 4)
Combination #232: (4, 3, 3, 1)
Combination #233: (4, 3, 3, 2)
Combination #234: (4, 3, 3, 3)
Combination #235: (4, 3, 3, 4)
Combination #236: (4, 3, 4, 1)
Combination #237: (4, 3, 4, 2)
Combination #238: (4, 3, 4, 3)
Combination #239: (4, 3, 4, 4)
Combination #240: (4, 4, 1, 1)
Combination #241: (4, 4, 1, 2)
Combination #242: (4, 4, 1, 3)
Combination #243: (4, 4, 1, 4)
Combination #244: (4, 4, 2, 1)
Combination #245: (4, 4, 2, 2)
Combination #246: (4, 4, 2, 3)
Combination #247: (4, 4, 2, 4)
Combination #248: (4, 4, 3, 1)
Combination #249: (4, 4, 3, 2)
Combination #250: (4, 4, 3, 3)
Combination #251: (4, 4, 3, 4)
Combination #252: (4, 4, 4, 1)
Combination #253: (4, 4, 4, 2)
Combination #254: (4, 4, 4, 3)
Combination #255: (4, 4, 4, 4)

Python made this super easy, and I almost feel guilty even taking the time to blog about this when I didn't have to do any real work to make it happen! Just the same, I'll still sign off with Happy coding!

Tuesday, February 12, 2013

Python itertools: Filtering items

Continuing my exploration of the itertools module in Python I wanted to look at the ifilter() method. ifilter() is a method that returns an iterator of items that return true from a test function (or lambda expression). In this sample I have an array of products. I want to get a list of products that have inventory in stock and are allowed to be drop-shipped.

import json
from itertools import ifilter

products = [
   {"id": "SHIRT-BLU15", "category": "T-Shirt", "color": "Blue", "size": "large", "instock": 125, "backorder": 50, "shipTypes": ["tostore", "dropship"], "price": 25.99},
   {"id": "SHIRT-BLU13", "category": "T-Shirt", "color": "Blue", "size": "small", "instock": 105, "backorder": 20, "shipTypes": ["tostore", "dropship"], "price": 23.99},
   {"id": "SHIRT-RED15", "category": "T-Shirt", "color": "Red", "size": "large", "instock": 145, "backorder": 20, "shipTypes": ["tostore"], "price": 26.99},
   {"id": "SHIRT-RED13", "category": "T-Shirt", "color": "Red", "size": "small", "instock": 25, "backorder": 75, "shipTypes": ["tostore"], "price": 20.99},
   {"id": "SHIRT-GRN15", "category": "T-Shirt", "color": "Green", "size": "large", "instock": 102, "backorder": 0, "shipTypes": ["tostore", "dropship"], "price": 21.99},
   {"id": "SHIRT-GRN13", "category": "T-Shirt", "color": "Green", "size": "small", "instock": 0, "backorder": 100, "shipTypes": ["tostore", "dropship"], "price": 21.99},
   {"id": "SHIRT-PUR15", "category": "T-Shirt", "color": "Purple", "size": "large", "instock": 25, "backorder": 60, "shipTypes": ["tostore"], "price": 27.99},
   {"id": "SHIRT-PUR13", "category": "T-Shirt", "color": "Purple", "size": "small", "instock": 0, "backorder": 100, "shipTypes": ["tostore"], "price": 26.99}
]

#
# Give me all shirts that are in stock and allow drop-ship
#
matches = [item for item in ifilter(lambda k: k["instock"] > 0 and "dropship" in k["shipTypes"], products)]

for i in matches:
   print "Item ID %s is in stock and available for drop-ship" % i["id"]

Much like most Python list and array manipulation methods I'm using a list comprehension here. The cool part is the call to ifilter(). The first argument is a lambda expression that checks the current item to see if we have a value greater than zero for instock, and if dropship is an item in the shipTypes array. The end result is an array of items that match, and I loop over that to show what's available.

The more I use Python's list comprehensions and the itertools module the more I find I like it! Cheers, and happy coding!

Monday, February 11, 2013

Python itertools: Grouping an array by key

Have you ever had a flat dataset and you needed a portion of the data grouped into a sub-array? I came across such a need this weekend and thought I'd share my experience. In this code sample we'll see how to take an array of dictionaries (similar to what might come out of a database), sort it, then perform a grouping. This data is a list of people and their phone numbers. The trick here is that each person can have one or more phone numbers of different types. For example Jessica Alba has a home, work and cell phone number. What I want to end up with a a single array entry for each person, with a sub-array of all their phone numbers. First let's start with the code.

from itertools import groupby

people = [
   {"firstName": "Adam", "lastName": "Presley", "phoneType": "Home Phone", "phoneNumber": "555-7844"},
   {"firstName": "Jessica", "lastName": "Alba", "phoneType": "Home Phone", "phoneNumber": "555-7833"},
   {"firstName": "Adam", "lastName": "Presley", "phoneType": "Work Phone", "phoneNumber": "555-1122"},
   {"firstName": "Bob", "lastName": "Hope", "phoneType": "Home Phone", "phoneNumber": "555-9987"},
   {"firstName": "Jessica", "lastName": "Alba", "phoneType": "Cell Phone", "phoneNumber": "555-0915"},
   {"firstName": "Jessica", "lastName": "Alba", "phoneType": "Work Phone", "phoneNumber": "555-4821"}
]

people = sorted(people, key=lambda k: k["firstName"])
groupedResult = []

for key, person in groupby(people, lambda k: k["firstName"]):
   row = next(person)

   newRow = dict([(k, v) for k, v in row.items() if not k in ("phoneType", "phoneNumber")])
   newRow["phoneNumbers"] = [{"phoneType": row["phoneType"], "phoneNumber": row["phoneNumber"]}] + [{"phoneType": ph["phoneType"], "phoneNumber": ph["phoneNumber"]} for ph in person]

   groupedResult.append(newRow)



for person in groupedResult:
   print "%s %s:" % (person["firstName"], person["lastName"])

   for phoneNumber in person["phoneNumbers"]:
      print "\t%s: %s" % (phoneNumber["phoneType"], phoneNumber["phoneNumber"])

From the beginning you will see the array of dictionaries as I described above. There are multiple records for each person who has more than one phone number. The first thing I want to do with this array is to sort it, since our group method will need them sorted. This is accomplished using the sorted() method. The first argument is the array to sort, and the key argument is a function, or a lambda expression in this case that specifies what key to use in the sorting. In our case we are going to sort by first name.

Now it is time to loop over our collection. Notice the use of groupby(). This is a nifty method from the itertools module that returns a key and a grouper object. The grouper object is iterable and basically contains the current group of items as grouped by the current key. The next line retrieves our first item from the current group. This I use to seed the new row we are creating.

To create the row I am doing a bit of list comprehension. Using the dict() method to create a dictionary from an array of lists I can create the resulting row, which should result in a dictionary (very similar to the rows in the people array). In this list comprehension though I am getting all the items from our current row except for the phoneType and phoneNumber keys. I don't want them until the next line.

From here I then create a new key called phoneNumbers (notice the plural) that will house a sub-array of all phone numbers for the current person. We can break this line up into two parts, or list comprehension actually. The first creates an array of a single dictionary consisting of the first row we've already retrieved in the variable named row.

[{"phoneType": row["phoneType"], "phoneNumber": row["phoneNumber"]}]

The next list comprehension assembles an array of the remaining rows from the person grouper iterable object.

[{"phoneType": ph["phoneType"], "phoneNumber": ph["phoneNumber"]} for ph in person]

Then you'll notice that we are using an addition operation to combine the two arrays into one. Finally the new row is added to the groupedResult array and we do a quick loop to show off our results.

I found the itertools to be not only useful, but very powerful. I have a lot more to learn about what they can do. Cheers, and happy coding!