Archive for 'Tools'

How to include MathML in a Wordpress blog

MathML is a W3C recommendation for putting math directly on the web. That is, your MathML compliant browser should render the math correctly, and you should not have to generate small gifs from your LaTeX equations any longer. Here I show you how to configure your Wordpress blog to correctly serve pages with embedded MathML.

Let’s look at the “Hello World!” example from W3C spec:

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <msup>
      <mfenced>
        <mrow>
          <mi>a</mi>
          <mo>+</mo>
          <mi>b</mi>
        </mrow>
      </mfenced>
      <mn>2</mn>
    </msup>
  </mrow>
</math>

Including this code verbatim in your page won’t work, at least not with Firefox. Why? Right-click, if you will, in any page in your browser (NOT this one), click “View Page Info” and see what it says under “Type:”. Chances are it says “text/html”. Firefox will simply not render any MathML in pages that he thinks are pure HTML.

But, you will say, Wordpress pages are strict XHTML, aren’t they? I mean, if I look at the source of my pages it says right there, at the top:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...

Well yes, it says so. But chances are your Apache server serves this page with a Content-Type header that says ‘text/html’. Try it now:

$ wget --save-headers http://url.to.your.blog
$ cat url.to.your.blog
HTTP/1.0 200 OK
...
Content-Type: text/html; charset=utf-8

So the trick to make it work is to insert some additional PHP in the document’s template that will modify the declared Content-Type. See this blog post for the full, gory details. Here is what my template looks like:

<?= '<?xml version="1.0"?>' ?>
<?= '<?xml-stylesheet type="text/xsl" href="/mathml/mathml.xsl"?>' ?>
<?php header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml")) {
	header("Content-Type: application/xhtml+xml; charset=utf-8"); ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
           "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head>
<?php } else {
	header("Content-Type: text/html; charset=utf-8"); ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
	<head>
<?php } ?>

Notice also that you need to change the DOCTYPE declaration, or your browser won’t find the special math entities declared by MathML, such as &Sum;. Notice also there’s a xml-stylesheet declaration; I’ll return to that in a moment.

With this in place, we can now insert, say, the definition of the first system matrix for an equivalent nodal network for building simulation (view the page’s source for the MathML markup):




  A
  =
  (
  
    
      
        -
        
          
            1
          
          
            
              C
              1
            
          
        
        
          
          
            i
            =
            1
          
            n
        
        
          h
          i,1
        
      
      
        
          
            
              h
              2,1
            
          
          
            
              C
              1
            
          
        
      
      
        
      
      
        
          
            
              h
              m,1
            
          
          
            
              C
              1
            
          
        
      
    
    
      
        
          
            
              h
              1,2
            
          
          
            
              C
              2
            
          
        
      
      
        -
        
          
            1
          
          
            
              C
              2
            
          
        
        
          
          
            i
            =
            1
          
            n
        
        
          h
          i,2
        
      
      
        
      
      
        
          
            
              h
              m,2
            
          
          
            
              C
              2
            
          
        
      
    
    
      
        
      
      
        
      
      
        
      
      
        
      
    
    
      
        
          
            
              h
              1,m
            
          
          
            
              C
              m
            
          
        
      
      
        
          
            
              h
              2,m
            
          
          
            
              C
              m
            
          
        
      
      
        
      
      
        -
        
          
            1
          
          
            
              C
              m
            
          
        
        
          
          
            i
            =
            1
          
            n
        
        
          h
          i,m
        
      
    
  
  )


The final thing I need to explain is the xml-stylesheet declaration in my template’s preamble. The mathml.xsl stylesheet is provided by the W3C and is, I believe, supposed to provide some protection and/or warning against browsers that do not support MathML. You should download all the requires XSL files from this W3C page and put them somewhere on the same server as your blog (some browsers will not execute XSLT that does not come from the same server as the document).

That’s, in a nutshell, the modifications I had to do to my Wordpress installation to enable MathML in my blog posts. I’d love to hear your comments on this.

Wordpress shortcode for syntax highlighting

There’s a nice feature in Wordpress for including source code in your blog posts, but the Codex is not crystal-clear on how to activate it.

According to this article, for example, all you have to do is to insert a [sourcecode] shortcode tag and anything that goes inside that tag will be automatically formatted.

But when I tried that on some Java code that I recently posted, it did not work. Only after some long work did I understand that in order to enable this nice feature you must install either the SyntaxHighlighter or the SyntaxHighlighter Plus plugin. They both provide this shortcode, but SyntaxHighlighter Plus seems more advanced. That’s the one I installed, and now it works perfectly:

<some>
  <xml>
    that is now nicely formatted
    and highlighted!
  </xml>
</some>
Reblog this post [with Zemanta]

Canonical data formats, middleware and GCC

These days I’m working on a middleware application that bridges a company’s ERP and its warehouses. The ERP posts messages in a given XML schema, our application reads these messages, transforms them into the schema understood by the warehouse management system, and uploads onthem on the warehouse’s FTP server.

We use XSLT to transform messages in one schema to messages in the other. In the example above, one XSL file can handle the whole transformation.

But what happens when you deal with more than one schema on either end? Suppose you have on the ERP side one schema for orders, one schema for defining the product catalogue, and so on. And on the warehouse side you might have more than one schema for different kinds of messages.

Say you end up with N schemata on the input and M on the output side, and suppose (for the sake of argument) that your application must handle every possible combination. If you use one XSL file per transformation, that’s NxM files. If the customer changes one schema on the input side, or adds one (and we have no control over that) then we must revise M files.

The classical solution to this combinatorial explosion is the Canonical Data Model messaging pattern. We have defined a common data format for our middleware application, and we transform all incoming messages to this common format before transforming them into the proper outgoing format.

With this solution, whenever a schema changes or is added we only need revise ONE XSL file. Pretty neat and innovative solution, right? I thought so too. Until I listened to this interview about the GCC internals.

The GCC can compile C, C++, Fortran, Ada, Java (and probably lots more languages) to an amazing number of platforms. How can it do this and avoid the combinatorial explosion when a language changes, or the definition of one platform changes?

Simple. It uses a canonical data format. More specifically, GCC’s frontend compiles the source code into an intermediate language-neutral and platform-neutral representation called GIMPLE. This representation is then translated by GCC’s backend into platform-specific code. If a language is modified, only the frontend must be revised. If a platform changes, only the backend must be revised.

The GCC folks (and probably many others) had been doing Canonical Data Format for decades before this pattern became recognized as such. And I thought we were being so clever…

Reblog this post [with Zemanta]

Remotely editing files as root with Emacs

I often need to edit files on remote machines or on embedded devices, that is, machines without a monitor and on which a proper editor might not necessarily be installed.

In the past that has always left me with the rather painful choice between vi and nano. Now I have never invested enough time in learning vi beyond the most basic editing commands. And nano is okayish for small edits but hopeless for larger ones.

So I was delighted to learn that you can edit files remotely through ssh with Emacs. If you want to remotely edit aFile on host aHost, open the following file:

/aHost:/path/to/aFile

The built-in Tramp package will take care of the rest. You can even use Dired remotely with this mechanism, an extremely powerful feature.

But what was missing for me was a painless way of editing remote files as root. The Tramp version that’s included in Emacs 22.2.1 was 2.0.57, with which I was unable to remotely edit files as root. The latest version of Tramp, 2.1.14, is in my humble opinion far easier to work with.

To install it, just follow the instructions. I created a directory ~/emacs into which I unzipped the Tramp distribution. I compiled it in place and did not bother installing it system-wide, being the only user of my system.

Then I added the following to my .emacs file:

;; Load most recent version of Tramp for proxy support
(add-to-list 'load-path "~/emacs/tramp/lisp/")
(require 'tramp)
(add-to-list 'Info-default-directory-list "~/emacs/tramp/info/")

With this in place, suppose you want to edit as root the files on aHost. The best is to add the following to your .emacs:

;; Setting for working with remote host
(add-to-list 'tramp-default-proxies-alist
'("aHost.*" "root" "/ssh:yourusername@%h:"))

Now editing remote files on aHost is easy, just open the following:

/sudo:aHost:/path/to/aFile

And that’s about it.

Schema validation with LXML on Ubuntu Hardy

LXML is an amazing Python module that picks up where the standard xml.dom(.minidom) left off.

It’s basically a set of wrapper code around the libxml2 and libxslt libraries, and provides functionality missing in Python’s standard library, including XML validation and XPaths.

On a project I’m currently working on I needed a good XML library for Python and ended up trying out lxml. But I simply could not get the schema validation to work, and after several wasted hours I understood that the default lxml that ships with Ubuntu Hardy (the distro I’m using) used the relatively old 1.3.6 python-lxml package.

I’m usually very reluctant to install anything as root that does not come from the “official” repository, but for lxml I made an exception and installed the python-lxml package from the upcoming Intrepid distribution.

Add the following line to your /etc/apt/sources.list file:

deb http://ch.archive.ubuntu.com/ubuntu intrepid main

Then run Synaptic as usual and install python-lxml version 2.1.1. To verify that it works fine, you can test schema validation thus:

>>>> from lxml import etree
>>>> schema_tree = etree.parse('path_to_schema.xsd')
>>>> schema = etree.XMLSchema(schema_tree)
>>>> doc = etree.parse('path_to_some_document')
>>>> schema.validate(doc)

That last command returns as a boolean the result of the validation.

Low energy, low cost linux box

There’s a great discussion on StackOverflow going on, when someone asked for suggestions for a low power, low cost and high availability linux box. Exactly the kind of hardware we need for home automation.

Java GNU Scientific Library 0.2 released

I have released version 0.2 of the Java GNU Scientific Library (JGSL) project, its second public alpha release. Please visit the JGSL project website for more information.

This second release provides additional wrapper classes for the GSL stats module (mean, variance, standard deviation, etc.). Feel free to try it out and get back to me for questions/comments.

Monitoring a home automation PC

Lesson learned today: always monitor a machine you intend to let run without interruptions for a long time. And that includes home automation hardware.

I have described elsewhere the steps to install Debian on an embedded PC. I’m still working on this project and intend to soon install the open-source Misterhouse software on it. But first I wanted to get a feel for how the machine’s resources (mainly disk and memory) evolve over time.

So I scouted for open-source monitoring software. There’s a great comparison on Wikipedia of different monitoring software (some proprietary), but my feeling was that it essentially boiled down to Cacti and Zabbix, both of which are variations on the PHP+MySQL+Agent theme. I knew Cacti from a previous project so I installed Zabbix on my Soekris box.

Good thing that I did. As you can see on the graph below, over a period of just 10 days the available disk space had shrunk by almost 2 Gb. Now this sort of thing happens almost always somewhere under /var, and indeed, it was caused by MySQL’s habit of logging every single data-altering statement in so-called bin files under /var/log/mysql.

Disk space evolution after two weeks

After commenting out the relevant lines in /etc/mysql/my.cnf the problem went away, but I had to restart the Zabbix server (without loss of data of course). And I’m sure the reader will notice the irony of MySQL being the cause of this decrease of disk space, when MySQL was installed together with Zabbix in order to monitor the system for such problems. Oh well.

Debian installation on a Soekris embedded PC

Ubiquitous home automation will never become a reality unless cheap embedded PCs are available to be the “brains” of the home. Some time ago I came across a company called Soekris Engineering who make relatively cheap embedded PCs, like the one shown below.

net4801

This little guy packs a 20Gb CompactFlash harddisk, 128 Mb RAM, and a 266 MHz Intel processor. Of course I managed to hose mine’s operating system and had to reinstall it from scratch. Here are the steps I followed to install Debian from my laptop (running Ubuntu 7.10), connected to the Soekris with a null-modem cable.

Minicom setup

Install the minicom package on the host system, you’re going to need it to communicate with the Soekris box during the installation. Here is what my minicom configuration looks like:

Minicom default configuration

I also suggest you run the following before starting minicom:

export LANG=C

If you don’t you might run into strange error reports from minicom.

The Debian installer will by default talk to the serial line in 9600 bauds, so I suggest you make this the default in the Soekris comBIOS’s monitor program. After entering it, enter

set ConSpeed=9600

and reboot. Change the setting in minicom too.

DHCP setup

We’re going to start a Debian installer through PXE, so first we need a DHCP server configured to provide the right file. Install the dhcp3-server package on the host system. Configure it by appending something similar to /etc/dhcp3/dhcpd.conf:

subnet 192.168.0.0 netmask 255.255.255.0 {
option domain-name "visnet.ch";
option domain-name-servers 192.168.0.1;
option routers 192.168.0.1;
range 192.168.0.4 192.168.0.4;
option host-name "misterhouse";
next-server 192.168.0.69;
option root-path "/var/lib/tftpboot";
filename "/pxelinux.0";
}

Start the DHCP server.

TFTP setup

Install the tftpd-hpa package on the host system. Start the server with:

in.tftpd -l -s /var/lib/tftpboot/

Download Debian installer

Download the most recent Debian installer from the Debian download site. Untar it to /var/lib/tftpboot. Change the pxelinux.cfg symlink to point to the serial-9600 configuration:

drwxr-xr-x 3 root root 4096 2008-02-28 01:28 debian-installer
lrwxrwxrwx 1 root root 32 2008-05-09 21:45 pxelinux.0 -> debian-installer/i386/pxelinux.0
lrwxrwxrwx 1 root root 46 2008-05-09 22:00 pxelinux.cfg -> debian-installer/i386/pxelinux.cfg.serial-9600

Boot the Soekris

You’re now ready to boot the Soekris box. Enter the comBIOS’ monitor program and enter:
boot f0

And go through the installation process. Since the installer will download packages from mirror locations you should turn off the DHCP server on the host system, and let the Debian installer configure itself with your LAN’s DHCP server.

I called my system misterhouse@visnet.ch, since I intend to use it to run the open-source MisterHouse home automation program.

You will at some point be asked how you want to partition the 20 Gb disk. If you leave the default settings, you run the risk of not being able to boot because of an “Error 18″ from GRUB. See here for an explanation. That’s why you should make the first partition a small (about 100 Mb) bootable partition mounted at /boot. The rest of the disk can be split between a 400 Mb swap space and all the rest.

Your setup should look something like this:

Debian installer partitions setup

When prompted, say yes to install GRUB on the master boot record.

You will also be asked what types of software to install. For a standalone, headless mini-server like this, I configured it like this:

Debian software types

Conclusion

That’s pretty much it. When you’re all done the installer will prompt you to reboot the system, which will then boot into Debian. Login with the username you provided during setup. I strongly recommend you install a SSH server or you won’t be able to login other than by the serial console:

aptitude install openssh-server

That should be it. Enjoy your new home automation central system.

Java GNU Scientific Library 0.1.0 released

I have released version 0.1.0 of the Java GNU Scientific Library (JGSL) project, its first public release. You can download it from here.

This first release provides Java wrapper classes for the Special Functions module of the GNU Scientific Library (log, exp, airy, bessel, etc). I’ve run some preliminary tests on the log function, that suggest the JGSL version are about 10% faster than Java’s built-in function.