I wrote this in my notebook last night:

Computational linguistics really has an unfair advantage over algorithmic music analysis (apart from having a more succinct name): computers have an inherent, or at least, much more fundamental understanding of text than they do of music. Its easy to perform lexigraphic operations on text because the semantics of the symbols which it comprises (i.e. letters) is built in to the most fundamental programming libraries.

Music, on the other hand, has no such near-binary digital representation; its encoding must be carried out on a much higher level and its semantics must be meticulously explained to the computer. So what would be a good method for this? Could a binary representation for musical symbols be used where the lexigraphic information (e.g. sort order) is practically built in? Unfortunately the pitch 'A' has more potential attributes than the letter 'A'. For example, in any meaningful context a pitch has a duration, a timbre and may well have an order of importance within its tonal context. A better method would be encoding entities which could have multiple properties (and resigning to the fact that a high level, resource intensive data representation method is inevitable) and writing specialised lexigraphic functions to go with them.

Linguists are also given semantic structure in their manuscripts: words, sentences, paragraphs, etc. But musicians don't have quite so much help here either. Should we only encode the information given in the MS and spend time writing algorithms which try to identify phrases, antecedent-consequent phrases, harmonic progressions, modulations and significant structural boundaries? Should we encode what we believe to be the semantic structure? The result, in this case, would very clearly be a reading or interpretation of the score - not an attempt to reproduce the original. But it may allow other (more interesting?) work to be done if this semantic information is already given.