In the light of my supposition that elements of musical information should be represented as tree elements - i.e. they can't be represented as characters - would there be any sense in a content-based tagging style? Or would an attribute-based tagging style make more sense?

Of course, an attribute-based tagging style has advantages for processing applications (particularly SAX-based tools) but it may also be syntactically more appropriate here as the markup is the data - its not, as in conventional text markup, a semantic addition to pre-existing encoded information. Consider that a <note> element may be the finest level of granularity; it doesn't have any inherent content - only attributes. To use a content-based scheme implies that there is (or was) digital content which could exist (or did exist) independently of the markup. This isn't the case.

<!-- content based: -->

<note>A</note>

<!-- attribute based: -->

<note pitch="A" />