The other day we had a meeting at work with a former colleague (now at QMUL) to discuss general project progress. The topics covered included the somewhat complicated workflow that we're using for doing optical music recognition (OMR) on early printed music sources. It includes mensural notation specific OMR software called Aruspix. Aruspix itself is fairly accurate in its output, but the reason why our workflow is non-trivial is that the sources we're working with are partbooks; that is, each part (or voice) of a multi-part texture is written on its own part of the page, or even on a different page. This is very different to modern score notation in which each part is written in vertical alignment. In these sources, we don't even know where separate pieces begin and end, and they can actually begin in the middle of a line. The aim is to go from the double page scans ("openings") to distinct pieces with their complete and correctly aligned parts.

Anyway, our colleague from QMUL was very interested in this little part of the project and suggested that we spend the afternoon, after the style of good software engineering, formalising the workflow. So that's what we did. During the course of the conversation diagrams were drawn on the whiteboard. However (and this was really the point of this post) I made notes in Haskell. It occurred to me a few minutes into the conversation that laying out some types and the operations over those types that comprise our workflow is pretty much exactly the kind of formal specification we needed.

Here's what I typed:

module MusicalDocuments where

import Data.Maybe

-- A document comprises some number of openings (double page spreads)
data Document = Document [Opening]

-- An opening comprises one or two pages (usually two)
data Opening = Opening (Page, Maybe Page)

-- A page comprises multiple systems
data Page = Page [System]

-- Each part is the line for a particular voice
data Voice = Superius | Discantus | Tenor | Contratenor | Bassus

-- A part comprises a list of musical sybmols, but it may span mutliple systems
--(including partial systems)
data Part = Part [MusicalSymbol]

-- A piece comprises some number of sections
data Piece = Piece [Section]

-- A system is a collection of staves
data System = System [Staff]

-- A staff is a list of atomic graphical symbols
data Staff = Staff [Glyph]

-- A section is a collection of parts
data Section = Section [Part]

-- These are the atomic components, MusicalSymbols are semantic and Glyphs are
--syntactic (i.e. just image elements)
data MusicalSymbol = MusicalSymbol
data Glyph = Glyph

-- If this were real, Image would abstract over some kind of binary format
data Image = Image

-- One of the important properties we need in order to be able to construct pieces
-- from the scanned components is to be able to say when objects of the some of the
-- types are strictly contiguous, i.e. this staff immediately follows that staff
class Contiguous a where
  immediatelyFollows :: a -> a -> Bool
  immediatelyPrecedes :: a -> a -> Bool
  immediatelyPrecedes a b = b `immediatelyFollows` a

instance Contiguous Staff where
  immediatelyFollows :: Staff -> Staff -> Bool
  immediatelyFollows = undefined

-- Another interesting property of this data set is that there are a number of
-- duplicate scans of openings, but nothing in the metadata that indicates this,
-- so our workflow needs to recognise duplicates
instance Eq Opening where
  (==) :: Opening -> Opening -> Bool
  (==) a b = undefined

-- Maybe it would also be useful to have equality for staves too?
instance Eq Staff where
  (==) :: Staff -> Staff -> Bool
  (==) a b = undefined

-- The following functions actually represent the workflow

collate :: [Document]
collate = undefined

scan :: Document -> [Image]
scan = undefined

split :: Image -> Opening
split = undefined

paginate :: Opening -> [Page]
paginate = undefined

omr :: Page -> [System]
omr = undefined

segment :: System -> [Staff]
segment = undefined

tokenize :: Staff -> [Glyph]
tokenize = undefined

recogniseMusicalSymbol :: Glyph -> Maybe MusicalSymbol
recogniseMusicalSymbol = undefined

part :: [Glyph] -> Maybe Part
part gs =
  if null symbols then Nothing else Just $ Part symbols
  where symbols = mapMaybe recogniseMusicalSymbol gs

alignable :: Part -> Part -> Bool
alignable = undefined

piece :: [Part] -> Maybe Piece
piece = undefined

I then added the comments and implemented the part function later on. Looking at it now, I keep wondering whether the types of the functions really make sense; especially where a return type is a type that's just a label for a list or pair.

I haven't written much Haskell code before, and given that I've only implemented one function here, I still haven't written much Haskell code. But it seemed to be a nice way to formalise this procedure. Any criticisms (or function implementations!) welcome.