Get your JUnit XML reports (e.g. from Jasmine) in readable HTML

Whether you do Test Driven Development or just write your tests last, hopefully you have a good unit testing suite covering your code. It is very likely that you end up with unit test results in the JUnit XML format. Here is a short snippet on how to convert your XML reports into readble HTML.

In my current project we have Gradle as a build tool, and since it is easy to use ant from there, we will use the nice JUnitReport. The main issue was getting the classpath right, and the solution to that was to redefine the ant task, so to pass the right path along.

In addition, if you are using Jasmine (e.g. under PhantomJS) which is currently still waiting for HTML reporting and you are using the JUnitXmlReporter, you end up with consolidated testsuites where several testsuite entries will be combined. Here the solution is to explicitly tell the reporter to omit that behaviour. 

Without further ado here is the Gist:

Continuous Integration for your jQuery plugins

TL;DR If you have tests for Javascript code written in QUnit & Jasmine that depend on the Document Object Model (DOM), here is a way to set up Travis CI using PhantomJS.

My colleagues recently made me aware of a relatively new continuous integration software called Travis CI which, originally built to serve the Ruby community, is a distributed build service able to run code in various languages, including Python & Javascript. As far as I know, it currently only works together with Github, so your code needs to be hosted there.

As Travis' workers (the ones running the actual build) come with node.js included, I played around a bit getting my QUnit tests to run with jsdom and the QUnit node adaptation. While there are some guides out there on how to test your Javascript with node.js, it gets complicated when depending on the DOM, which most likely is the case when you are developing a plugin for jQuery. However, after reading criticism on testing against something that the original system didn't target (or are you running jQuery on the server side?) I gave up on that pretty quickly.

Now, in a different context I came upon PhantomJS, a headless browser stack based on Qt's Webkit. It provides an executable that can work without a graphical system (hence the name headless) and is perfectly suited for testing environments. Ariya, the guy behind PhantomJS, is clearly aware of that and already provides the proper integration for running tests based on QUnit and Jasmine. The test runner is a neat piece of code, that just scrapes the QUnit output from the generate HTML. Installing that locally was easy and running the test suite provides a short output on how many tests were run and how many failed, if any.

The problem was getting PhantomJS running on Travis CI. Travis CI comes with a good set of software (and already includes some of PhantomJS' dependencies); so far no one has written a cookbook for PhantomJS though. However, this guy came up with an easy solution, after all the worker is just a (virtual) Ubuntu machine and you can install anything on it.

So here is the quick run through: In the .travis.yml which describes the build, we

  • run a small install script setting up the remaining dependency of PhantomJS and PhantomJS itself,
  • start up a virtual framebuffer (xvfb, "headless" is not completely true when on Linux) running on port 99
  • and finally run PhantomJS with the QUnit (alternatively Jasmine) test runner on our test suite.

Here is the full .travis.yml file:



rvm:
  - 1.9.3
before_script:
  - "sudo bash install_phantomjs > /dev/null"
  - sh -e /etc/init.d/xvfb start
script:
  - DISPLAY=:99.0 phantomjs run-qunit.js test/index.html


The first line indicates that we are wanting Ruby version 1.9.3, even though we don't need it. I believe we have to chose some target system, so there it goes.

Here is the install_phantomjs script:



#!/bin/bash
apt-get install libqtwebkit-dev -y
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
qmake-qt4
make
cp bin/phantomjs /usr/local/bin/


We are ready to test this on Travis. If you haven't registered there yet, get an account, set up the hook by visiting your profile page, and commit your own .travis.yml together with the PhantomJS install script and the relevant test runner described above. You should pretty quickly find your project in the build queue on travis-ci.org.

Happy testing!

A simple and light-weight wiki solution for Django

I have been running a wiki with structured data for some months now. It's called CharacterDB and runs on MediaWiki with the SemanticMediaWiki framework. While I was happy to employ one of the best wikis out there (after all Wikipedia runs on MediaWiki) and I have good contacts to some of the guys developing SemanticMediaWiki, I do see some limitations to the task at hand. It makes me want to get rid of the current stack. The major two issues with the solution so far are scalability (> 60.000 entries, the actual count of database entries a far higher due to the RDF triple approach) and the difficulties I have with adjusting the configuration the way I want the input forms to work.

The natural solution for me was to look into a solution with Django. And more importantly using existing components. Looking into wiki apps for Django I found quite a few candidates - there is a fine comparison available under http://djangopackages.com/grids/g/wikis/. However I didn't like what I saw. What I need is a simple component that would make my models editable by anybody, providing full wiki features. Some of the existing apps implement a full standalone wiki, including authentication, own markup parsers, ...

So the idea of django-wiki was born. I already stated coding an initial version, that provides simple wiki features. Have a look if you want under https://github.com/cburgmer/django-wikify .

The feature set includes:

  • Edit view for your model
  • Versioned view of instance
  • Paginated instance versions

There are some other goals I want to reach:

  • lightweight code
  • simple integration
  • easy to adapt (esp. templates)
  • get straight to the point, if it's not a wiki feature, it shouldn't be in

django-wikify wouldn't exist without django-reversion, a neat app that adds versioning to your django models. It has seen many improvements lately and is just the right component to build a wiki on. All I needed to build on top was basically view logic.

Wiki markup can easily be integrated using django's native markdown integration. No need to develop any additional code. What is missing on my list is support for subscriptions. That, however, I consider an orthogonal feature.

The current code already provides the minimal working set. To setup a wiki yourself you first need to define a model (after all django-wikify is about giving you the flexibility of your own page model). Here is a simple page model, with a title and content:

from django.db import models
import reversion
class Page(models.Model):
    """Simple Wiki page model"""
    title = models.CharField(max_length=255, primary_key=True)
    content = models.TextField(blank=True)
    def __unicode__(self):
        return self.title
reversion.register(Page)

The example view is very simple. We just want to show the instance and in case it doesn't exist yet, we provide a page that allows the user to create it.

from django.shortcuts import render_to_response
from django.template import RequestContext
from wikify import wikify
from mywiki.models import Page

@wikify(Page)
def page(request, object_id):
    try:
        page = Page.objects.get(pk=object_id)
    except Page.DoesNotExist:
        page = None
    return render_to_response('page.html',
                              {'object_id': object_id,
                               'page': page},
                              context_instance=RequestContext(request))

What you see here is a simple way to use wikify. Just decorate the view and pass the model with it. The only thing you need to take care of is to pass the object's primary key as 'object_id', similar to Django's default views. The project code includes an example django project as a short demo using the code shown here.

The way the @wikify decorator works, you do not need to change your urls.py definition. The action triggered by the user is passed through a GET variable called 'action'. In case you want to provide your own template, just link to the url '?action=edit' and you are done.

Next features on my list are support for a diff view based on my side-by-side diff implementation, and then improving performance through ETag, cache, ... After that, better late then never, unit tests.

Making Deniz a single-file-app

Following up to the previous post, Deniz, the RDF browser written in HTML, Javascript & CSS, can now be distributed as one single file.

This is possible due do


The last step missing was the image embedding part which is nicely solved through https://github.com/nzakas/cssembed. In addition Deniz will now go through the Google Closure Javascript compiler and Yahoo's YUI Compressor for CSS to save bandwidth.

Thanks to the Makefile by Benjamin Lupton (https://github.com/balupton/jquery-sparkle/blob/master/Makefile) it was easy to set the process up for Deniz.

Two steps will build the file:

 
$ make build-update 

to download JAR dependencies, and

 
$ make 

to finally minimize and integrate all contents.

That's it.

Embedding external CSS & Javascript into the base HTML document

So I'm stuck on the train for some hours, why not solve a problem that is far from pressing?

I am developing a web application based only on HTML, CSS & Javascript, called Deniz (http://cburgmer.github.com/deniz/). It's a browser for RDF data and only needs a browser to run in, as it will connect to public data endpoints. So while it is build up from many different sources it would be nice if the whole application could be delivered in a single file. While this could speed up loading, the main idea here is to distribute just one HTML file.

Looking around there are many services and libraries for compressing and aggregating CSS & JS files, but so far I haven't found a solution specifically for what I try to achieve.

I've now come up with an implementation which parses the DOM tree and looks for elements with references to stylesheets and <script> tags
referecing external Javascript code. The program will read in the contents of the referenced files and paste it into the document. This is harder than it initially seems: XHTML which I assume here, needs to have data wrapped in a CDATA directive. I had to fight with the Python lxml library for some time to get this straight:

  1. The parser needs to be passed "strip_cdata" so that read CDATA blocks are preserved.
  2. Code needs to be wrapped in an instance of the CDATA class
  3. A dirty hack to quote the encapsulated CDATA blocks in multi-line comments to accommodate older browsers:

        html.replace('<![CDATA[', '/*<![CDATA[*/').replace(']]>', '/*]]>*/')
    

  4. While a proper solution would need to parse CSS & Javascript code to quote invalid HTML entities, another dirty hack makes sure that the text '</script>'
    in Javascript strings gets quoted:
            content = (content.replace('</script>"', '</scr" + "ipt>"')
                              .replace("</script>'", "</scr' + 'ipt>'"))
    

Warning: This script is not suited to parse any JS & CSS. It does though work for my task.

The source can be found here: http://github.com/cburgmer/deniz/blob/master/embed_media.py

The next step will be to include images as base64 urls.

A side-by-side diff view.

Just a short post about a side-by-side diff algorithm I implemented on top of Google's diff-match-patch library. It is modelled after the Wikipedia one

The snippet is posted under http://code.activestate.com/recipes/577784-line-based-side-by-side-diff/

A screenshot that was generated by this algorithm can be seen on http://jsfiddle.net/hRS9N/1/

Python has a side-by-side diff view implementation in difflib, but it sure isn't up to current web standards ( everywhere ...) and not adaptable at all. After all why would you limit the whole implementation to only return a set of html if the user knows best how to render the diff's outcome anyway. Also, diff-match-patch is probably way better performance wise. As described under http://code.google.com/p/google-diff-match-patch/wiki/LineOrWordDiffs there is a quicker solution to solve this problem for a similar outcome. However changes are calculated and shown in a pretty coarse way. I'll post about an application of this diff later on.

Using proper timezone information with SuRF

A short note on how best to deal with date, time and datetime objects in SuRF.

Chances are that you are living in a timezone other than UTC (an imaginary timezone) and want to properly handle time. Coming from SQL systems people might not be used to having a databases store additional timezone information (at least MySQL doesn't). RDF stores like AllegroGraph and Virtuoso however follow the XML and more precisely the ISO 8601 standard when storing date & time objects and make your life easier.

At least if you follow this short suggestion here.

As I tried to document in http://code.google.com/p/surfrdf/wiki/BackendPeculiarities, Virtuoso and AllegroGraph handle datetime objects differently. When presented with a timezone-less date, Virtuoso assumes the server's timezone, while AllegroGraph uses UTC ("Z"). You are probably using Python and so you have to deal with this, as Python doesn't use timezones out of the box. RDFLIB also ignores timezones for now which will hopefully change once http://code.google.com/p/rdflib/issues/detail?id=169 is implemented.

If you want to make sure the correct timezone is stored, look at the example below. This code uses pytz to get the UTC timezone and stores datetime.now() as UTC. This will make both Virtuoso & AGraph store the same date. If you don't intend to store values as UTC, look into pytz which has brilliant support that Python is missing.

A small fact on the side: AllegroGraph normalizes all timestamps to UTC - so the offset gets lost. It did give me some headaches, it might so give you.

Python private attribute annoyance

Having some Java history I do like the concept of protected and private attributes for hiding the implementation details. I also like the forgiving way of Python when accessing those attributes as it doesn't do any access checking and does allow access to private attributes e.g. for debugging purposes:

This attribute here

 
class A(object): 
    def __init__(self): 
        self.__a = 1 

can be accessed like this

 
a = A() 
print a._A__a 

The concept of prepending the classes name to the attribute's name is called "name mangling". This is an easy solution for hiding the private value from the interface.

However I just tripped over a small issue with name mangling here. Consider the following example which is a common pattern when calculating resource-
hungry values:

 
class A(object): 
    def a(self): 
        if not hasattr(self, '__a'): 
            print "generating a" 
            self.__a = 1 
        return self.__a 

Now let's run the method:

 
>>> a = A() 
>>> a.a() 
generating a 
1 
>>> a.a() 
generating a 
1 

Obviously hasattr doesn't check for private attributes as expected and the value gets recalulated over and over again. What I should actually do is check for '_A__a' which is kind of counterintuitive here. See also http://bugs.python.org/issue8264.

Now that was annoying.

Comparing Django ORM, SQLAlchemy & SuRF

The query interface is the part you see most of your favourite ORM, I believe. So here's an overview on how three ORMs for Python offer querying: Django ORM, SQLAlchemy and SuRF. The former two are well known for SQL, the latter is a relatively new interface to RDF data (queried foremost by SPARQL).

My goal is to see what methods are offered to query data and to compare those to each other. Here I'll be coming from the Django side, comparing equivalent methods to SQLAlchemy and also to SuRF. Don't get me wrong, SQLAlchemy users probably come from a different angle but then I don't use it often enough to know the full ORM details. For my usecase this suffices so far.

The real goal here is to see what needs to and can yet be done for SuRF. I already started developing Django style complex Q queries and slicing and want to see how SQLAlchemy does it. And also what else I need to take care of.

This list is far from complete, nor, as said, does it present an unbiased view. I might extend this list in the future. Feel free to note errors and other important differences.

Sources

On SQLAlchemy

I personally don't have much experience with the SQLAlchemy ORM, so take the examples for SQLAlchemy with a grain of salt. I'll use the query_property here (see http://www.sqlalchemy.org/docs/05/reference/orm/sessions.html) so that it compares more easily to Django. However I don't know if this way is generally accepted in the SQLAlchemy world.

Here is the way the tutorial of SQLAlchemy puts it:

session.query(User).all()

Using the query_property it boils down to:

class MyClass(object):
    query = db_session.query_property()

MyClass.query.all()

On SuRF

SuRF being an Object Relational Mapper for RDF data does many things differently. Most importantly a property has always a list of values - it might be empty, have one value or several. Also some functionality like aggregates was only specified in SPARQL 1.1 and has yet to be implemented by many backend stores. SuRF knows namespaces and thus a property is referenced by its namespace, here "myns" for property "prop".

The comparison

Django ORM SQLAlchemy SuRF Description
MyClass.objects.all() MyClass.query.all() MyClass.all() All elements
MyClass.objects.filter(prop=10) MyClass.query.filter(MyClass.prop==10)
MyClass.query.filter_by(prop==10)
MyClass.get_by(myns_prop=10) Query by parameter
MyClass.objects.get(pk=10) MyClass.query.get(10) MyClass.get_by(myns_pk=10).one() Unique key
MyClass.objects.get(prop=10) MyClass.query.filter(MyClass.prop==10).one() MyClass.get_by(myns_prop=10).one() One exact result
? MyClass.query.filter(MyClass.prop==10).first() MyClass.get_by(myns_prop=10).first() First result
MyClass.objects.filter(prop__gt=10) MyClass.query.filter(MyClass.prop > 10).all() MyClass.all().filter(myns_prop="(%s > 10)") Greater than filtering
MyClass.objects.filter(prop__in=[1, 2]) MyClass.query.filter(MyClass.prop.in_([1, 2])) MyClass.get_by(myns_prop=[1, 2]) In list
MyClass.objects.filter(
prop__startswith='somethin')

MyClass.query.filter(
MyClass.prop.like='somethin%')

MyClass.all().filter(
myns_prop="regex(%s,"^somethin","i")")
Substring
MyClass.objects.exclude(prop=10) MyClass.objects.filter(~MyClass.prop == 10) - Negative search
MyClass.objects.all().count() MyClass.query.all().count() len(MyClass.all()) Result count
MyClass.objects.all().delete() MyClass.query.all().delete() - Batch removal
MyClass.objects.all().exist() ? - Boolean exist
MyClass.objects.all().order_by("-prop") MyClass.query.all().order_by(
desc(MyClass.prop))
MyClass.all().order(ns.MYNS.prop).desc() Descending ordering
default default MyClass.all().full() Preload properties
MyClass.objects.all().select_related(prop) session.query(MyClass)
.options(joinedload('prop')).all()
- Eagerly load relations
MyClass.objects.aggregate(Avg('prop')) session.query(func.avg(MyClass.prop)).all() - Aggregates
MyClass.objects.all()[1:10] MyClass.query.all()[1:10] MyClass.all().offset(1).limit(10) Slicing
MyClass.objects.all().only("prop") session.query(MyClass.prop).all() - Performance
MyClass.prop.remove(i) MyClass.prop.remove(i) MyClass.myns_prop.remove(i) Remove from one-to-many relationship
MyClass.objects.filter(
Q(prop='x') | Q(prop='y'))
MyClass.query.filter(or_(MyClass.prop=='x', MyClass.prop=='y')) - Complex expression