I've been researching content management systems, and in the process was reminded of some of the headaches that I've seen in the past when dealing with content on a site. Having just created a test to ensure that my javascript code was well formed, I started to wonder if it might be possible to validate all of my content -- javascript, css and html. The answer turns out to be "yes," and due to some good work done by others, the process turned out to be a lot less painful than I expected.
My goals were that as part of my automated tests, I would be able to ensure that:
- The site javascript contained no obvious errors
- The site css was correctly formed, and valid
- Pages served from the site had valid XHTML
In addition, to avoid some misery I had encountered in earlier projects, I wanted to check a few other things:
- All link (a) tags had valid href attributes
- All img tags had valid src attributes
- All url() attributes in my CSS were valid
I do use the Firefox extensions Firebug and HTMLTidy, but those only operate on the pages loaded in the browser, so they aren't really a substitute for automated validation.
Let's take these in order ...
Javascript
For how javascript checking is done, see my other posting (http://johnwedgwood.blogspot.com/2007/08/javascript-lint-and-assetpackager.html) about using a javascript lint program to automatically check javascript code for the most obvious errors.
CSS
I really wanted to use a command line tool for this, but could not find one that validated the CSS (there are a set of interesting tools for compacting css though). The solution everyone refers to is to use the w3c online validation tool (http://jigsaw.w3.org/css-validator/). It turns out that there is already a terrific plugin for doing exactly this job -- assert_valid_asset (http://www.realityforge.org/articles/2006/03/15/rails-plugin-to-validate-x-html-and-css). It even has a method assert_valid_css_files which is almost perfect. It creates test methods for each CSS file, and these test methods submit the file contents to the w3c site to be checked.
Unfortunately for me I have CSS files located in sub-directories, and this method uses the file name as part of the method name for the test, which doesn't work so well when the file name has a "/" in it. So I took the idea, and rolled my own method that handles this case, producing a file test/unit/css_validation_test.rb which processes each CSS file to create a test method for that CSS file, but which munges the name so that it can handle CSS files located in sub-folders.
And of course it didn't work -- failing to catch the errors that I'd created to test it. After hunting a bit, I noticed that the check that is made for errors from the w3c in assert_valid_asset doesn't appear to work (for me at least). The line #128 in assert_valid_asset.rb was referring to a set of nested div elements that contained the errors. In my testing it appeared as though errors were actually coming back in a table row marked with the error class, so I changed that line to reflect the return value that I was seeing:
1 | REXML::XPath.each( REXML::Document.new(response.body).root, "//tr[@class='error']") do |element| |
With that done, the code did work for me. All of the errors that I tossed at it were caught, and my CSS was being validated automatically. Here's the contents of test/unit/css_validation_test.rb:
1 | # |
Note that there is a set of exceptions listed (files we don't want to check). I moved all my invalid CSS into the file invalid.css and mark this file as one not to be validated. The CSS in there is all related to setting opacity filters for dialog boxes, which is apparently supported, but not valid. If you have a set of CSS files from another source and you can't really change them, the exception list might be the right place to put them if they are generating errors.
There is also a call to assert_url_valid in there, which I'll cover separately under link checking.
HTML
I wanted to be somewhat aggressive -- validating every single response from my code during testing. There is a great slideshow and a small body of test helper code located here -- http://blog.spotstory.com/category/plugin/ -- The test helpers expect that you will have the assert_valid_markup plugin installed, but it turns out to work just fine with the assert_valid_asset plugin as well. What this code does is override the normal "get" and "post" methods used during testing, and performs validation on the results of those calls, checking HTML and RSS responses. It uses the assert_valid_markup method defined in the assert_valid_asset plugin to validate the HTML.
Unfortunately the code in assert_valid_asset package to check HTML sends it off to the w3c, just like it does for CSS, But for some reason I'm less comfortable about sending my HTML off to the w3c. What I wanted instead was to be able to validate my HTML locally, and I wanted to be able to use this tool on every response during testing. To do this I hunted down rails-tidy (a plugin), ruby-tidy (a gem), and the tidy library (a native library).
rails-tidy: http://blog.cosinux.org/pages/rails-tidy
ruby-tidy: http://rubyforge.org/projects/tidy
tidy library: http://tidy.sourceforge.net/
I'm running on Windows, so I downloaded the tidy DLL and put it somewhere that I could find it, and then configured rails-tidy with the path to the tidy library (see the readme file). With these in place, you can run tidy on HTML files if you wish (from the command line). You can also run a rake command to validate each of your views. In addition you get the nice assert_tidy call which you can use to validate your html in your tests. I used that assert_tidy call to override the behavior that was sending my HTML off to the w3c by redefining the assert_html_valid method that was being used by the test helper code:
1 | class Test::Unit::TestCase |
At this point, I was feeling pretty great. I actually found a bunch of errors (including a few in my html-generating helpers) which reminded me that there was actually some value in this. When I was done, all the pages served as part of my functional tests were valid, my CSS was validating correctly, and the javascript was lint free. There was only one more thing I wanted to do.
Links and Images
I wanted to make sure that my links were valid, and that my images were valid. Normally these are a pain to check, since they require checking by hand, or at least loading the page in the browser and looking for errors. On a content rich site, this might be troublesome.
There was one thing working in my favor for links -- I won't have images linking to external sites (so no fully qualified URLs for images, ever). This means that for images, an external link is an error. This accomplishes two things: (1) it simplifies my checking and (2) I don't have to worry about fully qualified URLs producing a "mixed content" warning in IE when the image URL includes "http" but the page is being accessed via https.
There are existing (built-in) rails extensions to Test::Unit::TestCase for finding specific elements in an HTML tree. My code to check links was based on these, and was all placed into a test_asset_validation_helper.rb file. Note that I moved assert_html_valid from test_helper.rb and extended it to provide link checking. The assert_url_valid method is the same one that is called by the CSS checking code.
Summarizing the tests for valid links. A link is valid if any one of these is true:
- There is a file in public which matches the link
- It is recognized by the routing engine, and the route refers to a view
- It is recognized by the routing engine, and the action is a method that the controller responds to
From the file test_asset_validation_helper.rb:
1 | class Test::Unit::TestCase |