Music Notation Use Cases

THIS DOCUMENT IS OBSOLETE.

This document is now maintained within the MNX GitHub repository.

You can view the most recent version directly at [1].

For archival purposes, the old contents of this page are shown below.

User Roles

These are musical roles assumed by someone using music notation.

The following roles are not clearly defined yet, but can help us begin collecting use cases based on an intuitive understanding of the terms. They are roles, not distinct people: a single person can occupy multiple roles in the same use case.

Composer (C)
Arranger/Orchestrator (A)
Performer (PF)
Publisher (PU)
Editor (ED)
Engraver (EN)
Musicologist (M)
Student (S)
Instructor or Teacher (T)
Developer (D)
Sound Engineer (SE)

In the use cases below, users playing roles are typically represented by short capitalized abbreviations as shown above, sometimes followed by a distinguishing number.

A lesson from other media types is, as the media moves from paper publishing to electronic publishing, people can starting taking on roles they would before have left to a specialist, people can take on more and more roles, and publishing can be done on a smaller scale and with smaller, more frequent cycles than before. For example, in a printed magazine it's likely that there would be a clear distinction between the roles of Writer (think composer), Editor, Engraver, and Publisher. The Writer is not concerned with appearance and font choice. The Editor and Engraver are specialists, and different people than the writer. The Publisher has a large role, because the cost of publishing is high. In contrast, with a blog, the same person will choose the words (Writer), review the words (Editor), choose the appearance and fonts (Engraver), and decide to publish (a trivial vestige of the Publisher role). We should expect the same transformation of workflow, and merging of roles, in music notation, as it becomes digital and on-line.

User Stories

These are narratives of musical activities performed by one or more parties assuming the User Roles described above. Each one is identified by a category prefix and a unique number.

Please add new stories with the next unused number in the category. Do not delete obsolete stories; instead mark them as such.

Music Creation

MC0: Composer wants to notate a composition and capture it as a digital encoding

C is composing a work and wishes to produce a digital document representing the contents of that work. The semantic content of the document is paramount as it represents C's musical creation. Depending on C's orientation, the nature of the work, the intentions regards publishing and performance, and the nature of the tool being used to do the encoding, visual and performance facets of the document may also be significant.

MC1: Composer wants to share work with a collaborator using another editing application

C1 is co-composing a work with C2, with the locus of editing switching back and forth between them. C1 and C2 are not using the same notation application. Preservation of semantic, visual and performance data are all important in this case, and the less faithful the transfer, the more work must be done by hand by each party to recover or adjust information that is lost or corrupted.

C1 is working on a compositional sketch that will subsequently be orchestrated by C2. C2 is not using the same notation application as C1, but C2 would like to begin C2's work by modifying C1's score as a starting point to ensure that there are no copying errors. In this case, the preservation of semantic and performance data is primary, while visual details may not matter as much.

MC2: Composer wants to migrate work to another editing application, or archive music as a protection against the future need to migrate.

C may be using a notation application that has become obsolete and wants to begin using a new application. Alternatively, C is concerned that C's notation application may cease to be supported on modern computer hardware and operating systems, preventing C from revising C's work. C would then either need to re-enter the music in another program, or keep an old hardware / software setup around to run the old program. C wants to archive C's music in a format that maintains as much of the semantic, visual, and performance data that was captured in the original application as possible. Then C always has the option to transfer the music to a new program that can read the archival format. C wants to minimize the amount of rework needed, but an absolutely perfect transfer is not necessary. C doesn't want to lose the visual information for the portions of the music that won't change, and doesn't want to lose the playback information that will help C when making revisions.

To be useful, the transfers will need a high level of accuracy and completeness. Furthermore, C needs assurance that the archival format has enough stability and longevity to be readable in the future.

MC3: Arranger/performer wants to convert existing printed sheet music into digital sheet music

PF has custody of a printed edition of some music, and no access to any digital encoding of the work. PF would like to work with the music in a notation application to adapt it to their purposes, perhaps transposing to a more playable key. OMR software is usually developed independently of notation software, so PF's OMR tools produce an intermediate digital format that can be imported into multiple notation applications. This format should preserve as much as possible of what the OMR software could discover, including both semantic and visual data from the original document.

[NOTA: Use-cases of the new "Music Transcription" category replace and extend this MC3]

MC4: Editor wants to compare two versions of digital sheet music to confirm they are the same, or see differences

ED receives two different digital sheet music documents which purport to be the same score, maybe from two different proofreaders. ED runs comparison software on the two documents. Comparison can include or ignore differences in layout (focussing only on unrolled note sequence), in key signature (focussing only on pitches no matter how notated), in per-staff and score-wide annotations, in performance annotations, etc. In any case, implementation details like differences in insignificant whitespace in the music notation, are ignored. Comparison software returns a concise statement that the documents are identical, or a concise statement of what are the differences (show the content of 2 measures added to all parts after a certain measure, or the contents of a new staff throughout the piece in one document but not the other).

MC5: Arranger/Orchestrator wants to annotate Composer’s score with comments and suggestions

C has created a composition and has engaged A to arrange and orchestrate this piece. C provides A with a digital encoding of the composition. A wishes to annotate C's composition with comments and suggestions. Some of A's suggestions include visual and audio media of performance techniques that may be appropriate for the music. Others include hyperlinks to relevant online resources. A returns these annotations to C.

In general A's suggestions could incorporate any content or media or pointers that might be found in a web page, and should not be limited to plain text or even rich formatted text.

To avoid issues with converting the entire score back and forth, A wishes to separate these annotations from C's original score, capturing them a distinct document that refers to elements of C's original score using pointers. Consider related use cases for Web Annotation.

MC6: Composer wants to specify positioning of notational elements for visual clarity

C has composed a piece and has taken pains to arrange notes and other elements for visual clarity by manually adjusting their positions. These layout decisions are desirable to reproduce in some contexts, such as printed output that replicates the page size and orientation in which the C was originally working. In other contexts such as display on a mobile device, these decisions should be disregarded in favor of re-flowing and otherwise adjusting the layout to fit the context.

MC7: Composer wants to specify playback of notational elements for aural clarity

C has notated a piece in an application that allows non-notated performance details of the piece to be captured for more expressive and accurate playback, such as tempo variations and specific note durations. These performance details are desirable to reproduce in some contexts, for instance presentation to a performer learning the piece. They are irrelevant to others, such as printed output.

Revisions to the piece by another composer or editor C2 may result in these performance details becoming irrelevant or requiring deletion.

MC8: Composer wants to interleave textual and musical content.

C has composed a strophic song in which one verse of lyrics are fully notated within the music, and all the others are supplied in a sequence of pure-text verses. C wishes to create a digital encoding of this song that incorporates the other verses in a way that is aware of their status as numbered song verses, rather than arbitrary text.

MC9: Composer wants to distinguish text associated with a whole ensemble (e.g. tempo indications) from text associated with a single part (performance instructions for a single instrument).

C is writing a piece for large ensemble. Some textual directions are associated with the entire score and are intended to be reproduced in every part. Other directions are part-specific. Digital encodings may be used downstream by C's publisher PU to create both individual parts and the full score, so this distinction is essential to show the correct text in the correct context.

MC10: Composer wants to embed multiple temporal renderings of a new piece in the score.

C wants to reduce rehearsal time by demonstrating its performance practice to a performer PF.

MC11: Sound Engineer wants to use an interactive score to access low level information in a DAW.

Clicking on a symbol opens a dialog giving the SE access to (and control over) much lower level information. The use of arbitrary symbols in the DAW's GUI allows the D of the DAW to create more expressive/useful scenarios than can be achieved using the currently ubiquitous space-time icons.

MC12: Composer wants to use a custom symbol to describe some special event.

C wants the score to be as visually expressive as possible, and supplies a custom graphic object.

MC13: Composer of a multimedia work wants to write a score that synchronises sound and vision.

Digital encoding (see MC0) can refer to both audio and visual data. Note also, that the C of the sounds need not be the same as the C of the lighting. (see MC1)

MC14: Composer wants to use software that allows the use of both Common Western Music Notation and the Web MIDI API

CWMN uses a time model based on tempi and fractional durations. The Web MIDI API has no concept of tempo, and uses indivisible durations (milliseconds). The software handles the conversion from fractions to integers.

Music Transcription

MT0: Editor wants to manually transcribe sheet music to digital encoding format

E has access to some sheet music, either a physical sheet of paper or a scanned image of it. The sheet can contain hand-written music or printed music.

E reads the source music and manually enters a corresponding transcription. The digital output should preserve semantic data of course and perhaps performance data as well. Visual data is not essential, especially if the source was hand-written music.

Note: in a variant of this use-case, multiple users playing an E role on Internet are provided with the image of say just one simple measure and prompted for its manual transcription (this is rather similar to Google "text reCAPTCHA").

MT1: Editor wants to transcribe printed sheet music to digital encoding format with help of OMR software

E does not enter the whole transcription from scratch but works in two steps. In step 1, an OMR software reads the source image and provides an annotated transcription. In step 2, E reviews OMR output and manually corrects the wrong or missing items in the final digital encoding. This approach is interesting only when the efforts spent in step 2 are significantly lower than in MT0, and this depends on many factors, notably initial image quality, music complexity, OMR recognition rate, OMR interaction abilities.

OMR outputs need to provide hints for user review: confidence in each symbol, abnormal voice durations, detected incompatibilities, missing barlines, significant objects with no interpretation, etc, in order to call user attention on these points. Visual data is key to easily relate any given music symbol to the precise location in the initial image.

E should use an "OMR UI" specifically meant for OMR validation / correction. As opposed to standard music edition UI, such OMR UI should focus on fidelity with initial image, avoid any over-interpretation of user actions, even switch off validation while a series of user actions is not explicitly completed.

MT2: Editor wants to transcribe sheet music to digital encoding format without manual intervention

This can be seen as a variant of MT1 without step 2 (review). However, it must be considered as a use-case on its own because, for large libraries with millions of pages, having human beings spend several minutes on each page review is out of reach. See SIMSSA project regarding the use of OMR (not perfect) data as a hidden navigation layer on top of source image display.

A side advantage in by-passing human review, is that is allows to re-launch at minor cost a campaign of transcription if significant progress is made in OMR technology. Such progress is helped by the openness and modular architecture of the OMR pipeline software...

MT3: Many editors help improve OMR service via collaborative OMR over the web.

This use-case extends on MT1 when used over the web on a shareable OMR service: In this approach, each user reviewing action, whether it's a validation or a correction, is linked back whenever possible to the underlying OMR item:

If we do have an identified OMR item, then it can be recorded as a representative sample (physical appearance and assigned shape). Samples are accumulated and later used to asynchronously improve the training of shape classifiers. A value commonly accepted in today's deep learning projects is to have sets of at least 5000 samples per shape. Such numbers would be easily reached with this collaborative approach.
If we don't have a clear OMR item identified, then the user could be prompted to select a case in a list of typical errors. We could that way increment a tally of typical errors with their concrete context. Later, an OMR developer could select one among the most common errors and have immediate access to all the related concrete examples.

Music Publishing

MP1: Composer wants to submit his music for publication

C is working on a complete composition that will subsequently be edited by ED to prepare for print publication. ED is not using the same notation application as C, but ED needs to begin with exactly what C produced in terms of semantic and visual details. Performance data in this case matters less.

This case benefits from standardizing the way a format is employed by composers and engravers, allowing publishers to accept machine-readable submissions and check them for conformance to some set of publication guidelines. In contrast, formats that permit alternative ways to express the same musical concept are harder to check.

MP2: Editor wants to annotate Composer’s work as part of publication workflow

C composes a piece and notates it by hand, submitting the manuscript to an editor ED. ED prepares a digital edition of the music and sends to reviewer R for proofreading as a digitally encoded file, along with a scanned copy of the manuscript as a reference. R edits the digital sheet music document to correct errors, using R's own notation software (which is different from ED's). If the corrections include additions or removals, document-level features like page numbers, headers, and footers adjust accordingly. The document is returned to ED for publishing, with no loss of semantic, visual or playback information.

MP3: Publisher wants to keep machine-readable representations of music in a central content management system

PU manages a large repository of works intended to be published in a variety of editions over time. PU must be able to rely on the durability of the digital encoding format used for the works and its independence from present-day proprietary technology. The format must be widely available as an export format from notation applications and also as an import format into other applications used by music consumers.

MP4: Publisher wants to prepare digital editions that can be viewed on any device and also printed

PU manages a large repository of works intended to be published in a wide variety of formats, ranging from print/PDF to interactive digital presentation on multiple devices, some of which may be unknown at present.

Different devices and presentation channels may want to format the music quite differently from the way the original composer or arranger viewed it. PU therefore wants to ensure that semantic information is reliably available for dynamic rendering of notation appropriate to the user's device and presentation. Visual information for scores is available where relevant (e.g. print edition in same format as engraved). Playback information is also needed where relevant, for example to drive playback or assessment in music learning applications.

MP5: Engraver wants detailed control over non-semantic formatting for printed output, while allowing for more flexible rendering on arbitrary devices e.g. mobile screens

An engraver EN is preparing a score for publication by PU. EN's job is to make skilled human formatting decisions that apply to some set of valid contexts, while recognizing that these same decisions may be set aside in other contexts where dynamic rendering of the music may take a different course. For example, EN wants to preserve page and system break decisions for print editions, while recognizing that reflowing for mobile devices may result in different decisions.

Furthermore, EN sees that a single notational element (such as a page break) can sometimes be semantic. EN may want one system break to be imposed by the start of a coda, and another to be driven by readability of the score in a known, specific print format.

EN may also want detailed control of over some non-semantic formatting elements for for digital output as well as printed (e.g. positions of accidentals and dots in a chord, beam angles, etc). Furthermore, EN also wants some control over elements that can break due to reflow, and some control over where systems can break even when it's non-semantic, e.g. breaking at a particular point is highly discouraged, but possible if that's the only option.

MP6: Publisher wants to decouple semantic formatting of notes and text from physical formatting

PU wishes to maintain a repository of scores that can be published in multiple styles. For instance, a show tune might be published in a Jazz font in a collection of lead sheets, but in a more standard music font as part of a song folio. At a more detailed level, PU may wish to use a particular set of text styles to distinguish tempo indications, expression and performance text in one edition, but a different set of styles in a different edition. PU might even wish to completely suppress certain elements in some editions.

PU would like to be able to create these renderings by combining any given score with a separate specification of how that score is to be styled. This would allow the semantic information in scores to be uncoupled from stylistic decisions that will vary from place to place, and also from time to time. As a long-time publisher PU is aware of the substantial aesthetic changes that have taken place in music notation over many years.

MP7: Publisher wants to create multiple foreign-language editions of a work

PU wishes to simplify the creation and maintenance of a work in multiple languages, minimizing the amount of work necessary to independently revise either the original notated music or the various translations of directions, lyrics, etc.

MP8: Publisher wants to minimize difficulty of managing related material for a work in a unitary fashion

PU maintains multiple assets relating to the same musical work. For instance, one composition is associated with a digital music encoding, a PDF, an audio backing track, a sequence in a MIDI file, and more. PU would like to be able to cross-reference these items, potentially linking to related assets in some way from metadata in the music encoding file. Some of this metadata is unique to PU's business and does not fit into any standardized metadata schema.

MP9: Publisher wants machine-readable identifiers for Composers, Works, in an asset

The publisher PU has traditionally identified the composer of one of their published works by human-readable text, which can differ by language (e.g. "Tchaikovsky", "Tschaikowski", "Чайко́вский"). The title of a work has also been identified by a string of human-readable text, which can differ by convention as well as language (e.g. "Sonata for Piano no. 6 in F major, op. 10 no. 2", "Opus 10 No. 2 Piano Sonata #6"). PU wants to disambiguate this, by tagging the digital score with identifiers to reliable authorities. This helps especially for automated workflows, distribution of electronic goods, and search engine visibility. Reliable, widely-used authorities include MusicBrainz, Wikidata, and other Authority Control. Other projects like IMSLP and Wikipedia also link to these authorities, so it is not necessary to put identifiers from every project in the world into every score.

MP10: Publisher wants machine-readable representation of intellectual property rights in an asset

A publisher PU1 wishes to avoid the cost and complexity of developing PU1's own private IP rights infrastructure and would prefer to incorporate references to an IP authority of some kind. Nascent standards seek to capture IP rights; one example is ODRL.

Another publisher PU2 maintains their own private and confidential rights information such as royalty splits. PU2 would like to incorporate this information directly into the encoding, or refer to it via a pointer referencing PU2's private database.

MP11: Publisher wants to be able to present user-visible notice of credits, copyrights in a way that may vary depending on device form factor

A publisher PU is legally required to include credits and copyrights within renderings of works in their catalog. PU must be able to reliably identify such information as semantically distinct from other arbitrary text in the document that has legal and business significance. The presentation of this information to end users can vary greatly. For example, PU has a mobile score-reading app that shows credits and copyrights in an overlay panel that is only displayed at the user's request.

MP12: Publisher wants to protect their intellectual property.

PU wishes to encrypt musical scores in their catalog in such a way that readable copies are only distributed to users who have permission from the publisher or publisher's representative, e.g. a distributor.

MP13: Publisher wants to automatically create a musical simplification of a full score

Publisher PU creates scores that are consumed by a mixture of music readers and non-readers. PU wishes to be able to create simplified versions of fully notated scores. One such example is the "chord sheet" which contains only the lyrics and chords associated with a song. Often the lyrics and chords are sufficient for non-readers to perform the work if they know the melody already but only need help with the words, or if they are accompanying on a chordal instrument.

MP14: Publisher wants to identify an excerpt or incipit of a work

PU publishes scores in collections whose tables of contents include short renderings of a readily identifiable fragment of each score, for easy reference by performers PF. The parts and range of material included in these excerpts need to be identifiable as part of the digital publication process.

MP15: Publisher wants to encode a organized collection of works in a single document

PU publishes a collection of related works by the same composer; these works further are broken into movements. PU wishes to treat all the elements of this collection within a single document under unitary management, and to apply house styles across the collection in a consistent way.

MP16: Publisher wants to convert existing sheet music into a digital edition

PU has custody of a printed edition of some music, and no access to any digital encoding of the work. PU would like to work with the music in a notation application to create a digital document as the basis for ongoing editorial work and future editions. See notes on MC3 above regarding OMR technology.

Reading, Learning and Performance

RLP1: Performer wants to reformat sheet music to her mobile device's display

Performer PF is reading a trumpet part on a mobile phone in a practice room, and wants to view the part in a way that is optimal given the small size of the display. The part is isolated from a full score and must be reformatted with system breaks completely different from the original and corresponding changes in notational spacing. These changes render any visual non-semantic formatting in the original document irrelevant (or even destructive) to PF's experience.

RLP2: Performer wants to reformat sheet music as per personal preference (e.g. size, page turns, font choice)

PF uses a tablet computer to view a piano composition for practice and performance. PF is visually impaired and wishes to view the score at twice the normal size. PF wishes the score to be reflowed accordingly, and further wants to choose the optimal location of page turns in the piece to suit their personal musical needs.

RLP3: Performer wants to find a particular piece of music to play

PF wishes to search a corpus of digitally encoded music. The publisher of a searchable index PU wishes to easily index large numbers of such digital encodings and reliably extract the material to be indexed.

RLP4: Performer wants to view only his own musical part from a larger ensemble score

PF is performing a flute part that is dynamically rendered from a digital encoding of an orchestral score. PF's part is not presented identically to the Flute staff of the full score: multirests are included to span silent passages, some key signatures and accidentals are spelled in a specific way, and repeat endings exist in the flute part where the full score does not itself repeat. System and page breaks are specific to the part.

RLP5: Performer wants to transpose music to suit a specific performance situation

PF is performing a piece for Bb clarinet but only has PF's A clarinet on hand. PF wants to be able to transpose the rendering of PF's score by a semitone up, preserving notational decisions that can survive this transposition (such as whether a note carries an explicit accidental or not).

RLP6: Performer wants to view her own musical part formatted more prominently as part a larger ensemble score

PF is an alto singer in a chorus. PF receives a digital piano-vocal score, with 4 vocal parts (SATB). PF is accustomed to reading choral works with all parts visible. PF would like to view the score with the alto part rendered in full-size notes with a yellow highlight behind, with the other vocal parts rendered in smaller notes on the same system.

RLP7: Performer wants to hear a synthesized audio rendition of the music being displayed, synchronized with the notation

PF is practicing a bassoon part in an orchestral work. At times PF would like to hear a synthesized audio rendition of PF's own part, for reference. At other times PF would like to play along with an audio rendition of all the other parts in the score. In both cases, PF expects the currently audible place in the score to be visually identified in some way. PF doesn't have an expectation that these audio renditions will approach the richness and fidelity of a recording of an actual orchestra.

RLP8: Performer wants to hear a recorded audio track of the music being displayed, synchronized with the notational display

PF is practicing a bassoon part in an orchestral work. PF would like to hear a recorded audio track of a high quality performance of PF's part and/or other parts in the work for reference. In both cases, PF expects the currently audible place in the score to be visually identified in some way.

The publisher PU for this piece has previously prepared an encoding of the piece along with an audio or video recording, which does not follow a uniform tempo. PU was able to create (automatically or manually) an encoding of the mapping between the musical form of the piece and regions of the audio recording, allowing the two to be synchronized. PF's application is likewise able to employ this same mapping to maintain the correspondence.

RLP9: Performer wants to hear a synthesized audio rendition of the music being displayed, where the musical form is not present in a standard representation

PF is playing a piece in which the order of playback of measures in the score is not completely determined by symbolic notation or common textual directions. The piece includes unusual repeat instructions (as text in the native language of the composer). Yet, the form was actually known to the composer and is accurately captured in the digital encoding of the piece alongside the textual, so that PF hears the playback order as the composer intended.

RLP10: Performer wants to listen at slower tempo than written to focus on a difficult portion of music

PF wishes to practice a piece at half tempo in some portions, to aid in learning a number of fast passages.

RLP11: Performer wants to have their performance of some music assessed with respect to the score as a reference

PF would like to practice a part of a musical score while listening to a metronome or backing track, using an application that not only displays the score but assesses the correctness and completeness of PF's performance and provides feedback to PF. This application uses a digital encoding of the score as the basis for assessment, relying on encoded information in the score to determine a rubric for assessment.

RLP12: Performer wants to view a score with all form repeats and jumps “unrolled”

PF would like to practice a musical score on a digital display and avoid the inconvenience of sudden page turns due to repeats, codas, and so on. PF would like the score display to be modified to show a purely linear, monotonically left-to-right representation of music by repeating measures of music in the original encoding of the score.

RLP13: Performer wants to edit and annotate score during rehearsal for self, or for other performers

PF1 is the first French Horn player in an orchestra section. PF1 wants to edit and annotate PF's part as well as those of the other players PF2, PF3 in the section to add specific breath marks, phrasing, dynamics and articulation. Over the course of several rehearsals, these annotations are developed, and persist from session to a session even as PF and PF's colleagues delete or change them. It is desirable to PF2 and PF3 that PF1's additions be visually distinguishable from those in the original score.

RLP14: Performer wants to apply cuts, then share the modifications with fellow performers

PF1 as a conductor adapts a score for a particular performance by a particular set of musicians. It's common in choral and opera performances to cut entire sections. It's common to discover incorrect notes (missing accidental or wrong duration) and fix them. It's common to ask one voice section to supplement another (e.g. Alto 2's or Bass 1's to sing parts of the Tenor part). The Performer making these adaptations wants to share them with the other Performers in this group for this performance, so they don't have to dictate the changes verbally and have each other Performer make the changes individually (and perhaps wrongly). The changes can be distributed either as a modified complete score or as a set of modifications to an original score.

RLP15: Performer wants to take advantage of annotations made by other musicians or conductor/section leader

PF1 as a conductor wants to collect individual annotations made by PF1 and also colleagues PF2, PF3 etc. on their parts in the course of rehearsing an ensemble score. PF1 wants to merge all of these annotations back into a single score that represents all of this individual work so it can be made available to others in the future.

RLP16: Performer wants to take advantage of adaptations and annotations by other Performers at other times

PF1 adapts a score. For example they may correct errors in a particular edition. They may apply typical cuts taken in performing an opera. PF1 posts those adaptations on a web site for others to use freely. PF2 downloads those changes an applies it to their own digital score, which may be based on a different edition with different page breaks. PF3 appreciates one of these sets of changes, but wants to publish a different, derived set of changes. They "fork" PF1's score and start an different set of further changes, also posted on a web site for others to use freely. (Software developers, think "GitHub for musicians".)

RLP17: Performer wants to select from a set of alternative readings of a work where available to display a single score, suppressing display of the other readings

PF has purchased a work published by PU, where PU has incorporated alternative readings as ossias in the digital encoding of the score. PF would like to choose the reading(s) to be used and be able to view a score that incorporates the chosen readings while suppressing the others.

A variant use case might have the alternative readings supplied by other performers or parties besides PU.

RLP18: Performer wants to insert, change or delete a cue in their part

PF plays percussion in an orchestra, and is playing a complex piece with too few cues in the percussion part. PF wants to be able to pull musical excerpts information from other parts at a given point in the score and to include this information as a cue notation. PF does not want the cue to always be a verbatim reproduction of the original part from which it is derived.

RLP19: Performer wants to compare two different score editions of the same work, to confirm they are the same, or see differences

PF1 receives a score from PU and an adapted score from a fellow performer PF2. PF1 runs comparison software on the two documents. Comparison can include or ignore differences in layout (focussing only on unrolled note sequence), in key signature (focussing only on pitches no matter how notated), in per-staff and score-wide annotations, in performance annotations, etc. In any case, implementation details like differences in insignificant whitespace in the music notation, are ignored. Comparison software returns a concise statement that the documents are identical, or a concise statement of what are the differences (show the content of 2 measures added to all parts after a certain measure, or the contents of a new staff throughout the piece in one document but not the other).

RLP20: Publisher wants to permit some interactivity in a performance app, but maintain completely faithful visual appearance from paper

PU is a publisher whose brand equity is associated with engraving quality. PU does not want any performance app to alter this formatting aside from scaling pages to the current device size.

RLP21: Publisher wants to permit some adaptations and derived works, but make it evident when changes alter what the Publisher considers significant

PU is another publisher whose brand equity is associated with editorial accuracy and engraving quality, but does not want to prevent derived works. They want to supply a signature of some kind, which will make it evident if any of the layers of content which Publisher values are modified. For instance, a Publisher may consider engraving changes related to screen size and orientation acceptable, but want to make changes to note values and pitches evident.

RLP22: Performer wants to try out various versions of a mobile score so as to decide what to play.

C has written a score that is graphically mobile. The P has to configure the score in order to decide what to play. For example: Stockhausen's Refrain.

Academic Research / Libraries

ARL1: Musicologist wants to produce a digital edition of a historical work, tracing the evolution of the work and including alternate readings.

M is working on an edition of an early musical work with multiple sources that conflict. M wants to produce an encoding that includes these alternate readings along with textual commentary. Sometimes M favors a particular reading as primary and others as unlikely; in other cases, M feels that readings are of equal relevance to understanding the work and doesn't want to favor one or the other.

ARL2: Musicologist wants to search for a particular theme or motive in a corpus of musical works

As part of M's research, M is interested in tracing the usage of a secular medieval folk tune motif across various early choral works. M wishes to search an online corpus of digital encodings of these works to discover instances of this motif, and pull up search results that render relevant fragments from these works that display the motif with a small amount of surrounding context.

ARL3: Musicologist wants to search a corpus of machine readable works for occurrences of words, ornaments or other notational elements.

As part of M's research, M is interested in tracing the usage of a secular medieval folk tune lyric across various early choral works. M wishes to search an online corpus of digital encodings of these works to discover instances of these words, and pull up search results that render relevant fragments from these works that display the motif with a small amount of surrounding context.

ARL4: Musicologist wants to manage related works or fragments in a single document.

M is encoding an incomplete and fragmentary work from which many portions are missing. Nonetheless the work has demonstrable unity and M wants to encode these in a single digital document, not in a set of separate disconnected documents. M would like each fragment to be identified and separate within the overall document.

ARL5: Musicologist, within one document, wants to reference a specific element, range or position of a work in the same or a different document using some kind of anchor construct.

M is working on a critical digital edition of the Cantigas de Santa Maria (13th c.) drawing on four primary manuscript sources each of which may incorporate hundreds of individual cantigas or songs. M desires to create an encoding of all four manuscripts, each of which is itself a collection of these songs. The same song can be present in multiple manuscripts within the overall collection. Where this occurs, M desires to encode links or pointers between different instances of the same song, and sometimes between parallel passages that differ in various details.

ARL6: Musicologist wants to distinguish editorial inferences from assured, explicit information in a manuscript.

M is preparing an encoding of a damaged manuscript. M can readily infer most of the missing or damaged notation from context, but as a matter of sound scholarship M wishes to identify in the encoding which elements M has supplied, and which are actually an encoding of the physical contents of the manuscript.

ARL7: Musicologist wants to represent original notation in a manuscript along with its conventional understanding in the musical period in which it was notated (e.g. musica ficta).

M is encoding an Elizabethan keyboard work that incorporates ornamental symbols, and which omits "understood" accidentals that would be explicitly notated in the modern notational idiom. M wants to supply additional information for modern readers. In the case of ornaments, M would like to include fully notated rhythms and pitches representing how players of the time would have performed them, and distinguish these as editorial interpretation. Multiple such interpretations may be possible and worth including as alternates. In the case of missing accidentals, M wants to add accidentals as musica ficta (which are by definition editorial in nature).

ARL8: Musicologist wants to attach the notion of uncertainty to readings.

M is transcribing some medieval songs into conventional Western notation. The mensuration of these songs is obscure and open to a great deal of interpretation. M desires to indicate which portions of the transcription are assured, and which are inferred. In some cases the pitch may be known and the rhythm uncertain; in other cases, the reverse.

ARL9: Musicologist wants to cross-reference readings or interpretations with their source material, in this document or elsewhere.

M is encoding a work from a manuscript in mensural notation. M is producing two encoded documents. Document A represents the original source material as it appeared in the manuscript as mensural notation elements, including many errors, alternative duplicated passages, and visual formatting details. Document B represents M's cleaned-up semantic reduction of Document A into modern-day common notation. M wants to carefully cross-reference each part of document B to the corresponding material in document A.

As a similar use case, consider the case where A and B are two different sections of a single master document.

ARL10: Musicologist wants to import a document into an application that is not able to work with alternate readings, uncertainty, etc.

M wants to take a document that encodes much of the above material regarding readings and uncertainty and make use of it within an notation application that only deals with specific notation that lacks these elements.

Education

ED1: Student wants to complete a music theory assignment

Student S is harmonizing a 4-part chorale given a figured bass as a starting point. S will submit a digital encoding of the completed assignment to teacher T.

ED2: Instructor wants to assess completed music theory assignment and share markings/annotations with student

T is examining a submitted 4-part chorale harmonization by S. T makes use of some initial automated tools that identify and annotate typical voice-leading mistakes in S's work. T also looks at a set of answer-key parts that are referenced by the assignment, but which were not viewable by the student, and makes further annotations of problems that were not caught by the automated tools. All of these annotations become part of the digital document that is then returned to the student for further work on the assignment.

ED3: Student wants to learn a song by playing and listening and selecting voices and instruments from the score, in order to hear his/her own voice.

S wants to change tempo and tune (transpose) and to play a difficult part in a loop. S at times wants the piece to be played with a specific subset of parts, emphasizing some with increased volume. At other times S wants to hear the other instruments and deselect his/her own voice.

ED4: Student wants to learn an early Asian music notation (that reads top to bottom on the page)

S has a score that contains at least one temporal rendering. While the score is performing, a cursor moves synchronously over the page to show which symbol is being performed.

ED5: Instructor wants to use an interactive Shenker analysis diagram.

Clicking on buttons in the diagram could either perform the current level or move to a different one.

Accessibility

Note: we should consult current users of accessible music notation before continuing to the next step.

AC1: Users with disabilities wish to interact with a score via accessible input/output modalities.

AC2: Low-vision users wish to view an arbitrary score in Braille notation.

Development using Web, Epub and App Technologies

DEV1: Publishers wish to embed digital renderings of music within online hypertext documents

P is creating an online music theory curriculum incorporating numerous examples, to be consumed by students S studying traditional Western harmony. This curriculum is available via a web site. Pages on the site incorporate a mix of text, images, audio, video. Crucially, the curriculum pages also include examples by embedding short interactive musical scores that are viewable, printable and playable, in whole or in part. P must make use of standard Web technology to incorporate the musical materials, so that students do not have to install or launch special-purpose notation software.

DEV2: Publishers wish to embed digital renderings of music in offline electronically published works

P is creating an e-book of a music theory curriculum (see prior use case). Although the e-book can be viewed offline, it is constructed using standard Web technologies: HTML, CSS, JS and so on. It also incorporates text, images, audio, video and of course interactive musical scores. P doesn't want to create completely separate versions of online and offline content, so it makes sense for them to create the content once using Web technology, and deliver it either in an online browser or in an offline reader.

DEV3: Developers want to render entire scores dynamically in applications for specific purposes

D is developing an application that generates randomized sight-reading examples at specific skill levels by creating music markup programmatically. A performer PF consumes these examples in a practice session. D would prefer to generate these examples within their application as an in-memory notational document that employs a standard digital encoding, and render them based on this data, without ever surfacing the data as a physical file.

DEV4: Developers want to render specific portions of scores dynamically in response to user actions

D is building an application that will generate human-readable accompaniments to a melody given a harmonic structure, to be used to assist a composer C or arranger A in filling in musical textures quickly. The melody and harmony symbols can be imported from an external score. The user can add new, empty parts and select specific portions of these parts to be filled in with an algorithmically generated accompaniment. The resulting score is a hybrid of imported music and internally generated music, and can be viewed within the application or ultimately exported in a standard encoding. D wishes to generate these portions of the score using a standard digital encoding based on in-memory objects, and render them based on this data.

DEV5: Developers want to display music and respond to gestures that indicate some element of that music

D is building a music theory quiz in which questions are displayed along with an interactive musical excerpt. A student S must indicate one of several elements in a score as a response to the question. The music can be displayed and reflowed in a device-dependent fashion, so it will not suffice to look for gestures that occur in some specific region of the screen or page.

DEV6: User needs to see some element, range or position in notated music highlighted with an app-dependent meaning

D is building an music appreciation application for students S that includes various audio and notation excerpts. As the music plays, various relevant portions in the notation are highlighted. Links within accompanying text also serve to highlight the musical passages or concepts within the score to which they refer.

Cases Requiring Clarification

Editor wants to discard performer annotations from digital sheet music

ED receives a digital music document which contains performance annotation.

Composer wants to share sequencer project with orchestrator to ultimately produce notated music for human performers

C creates a sequencer project that captures C's music in purely performative/gestural terms. For live performance, C desires to communicate some or all of this music to performers that will produce the corresponding parts on physical instruments by reading conventional music notation. It may be desirable to preserve this information wholly or in part across a conversion process that yields a notational model, but we might also consider approaches that link or correlate gestural documents with visual/semantic ones instead of trying to blend this information. (How is digital encoding of notation involved in this story? At what point does the orchestrator ever need to work with a digital encoding of the notation, as opposed to an encoding of the sequencer data?)

Composer wants to generate sheet music in real-time for live performance

Musicologist wants to prepare a modern-notation edition of a work using early notation, starting from machine-readable documents produced for academic research purposes.

Document Content

Separately from the User Stories, there are a set of orthogonal issues concerning the nature of document content. For the most part, user stories can be thought of in relation to almost any notational idiom (and vice versa). Given this many-to-many relationship between stories and content, user stories do not necessarily constrain the scope of the notational content that future versions of MusicXML will cover. Neither does a decision on notational content constrain the space of possible user stories.

For the time being, this section attempts a rough "mission statement" proposing the type of notational content that must be covered. There are many subtleties remaining to be explored and defined, but it's useful to try to state some broad issues, and attempt some rough boundaries.

Proposal: Core Types, Other Types And Extension Points

The proposal presented here is that MusicXML covers the Core Content Types noted below, plus some subset of other content types to be determined. Beyond these, MusicXML covers other cases by employing a set of extensibility points. Such extensibility mechanisms will allow documents to define custom elements whose semantics, rendering and playback are analogous to existing elements in already-supported notation types. Potential examples of such extensions could include:

Notes, articulations, ligatures or neumes employing custom glyphs
Notational prefixes or suffixes to note-like elements (analogous to accidentals or wind performance techniques)
Liaisons between notes using custom graphical connections
Ornaments defining custom performance interpretations
Custom graphical/textual inclusions alongside or interspersed with standard notation

It is proposed that such extensions not attempt to anticipate future directions for music notation, other than permitting the kinds of generalization-by-analogy discussed above, in order to avoid creating an excessive burden on creators, consumers and developers. Nor will the extension mechanism attempt to attach musical semantics to textual or graphical formats already well-described by existing standards (for example HTML or SVG).

The goal behind these constraints is to limit the specification and development effort required to sustain the standard, yet allow authors to add elements within well-understood boundaries. If the standard were to literally be able represent any kind of music going forward, then literally any amount of effort could be required to implement it.

Notation Types

Common (Western) Music Notation (CMN or CWMN)
Mensural Notation
All guitar tablature types
All types of neume notation
non-western music notations
Braille Music notation
Post-CMN 20th century and later: Crumb, Penderecki, Berio, Stockhausen to name a few.
Chord/Lyric Sheets: musical reductions of songs showing only lyrics and aligned chord symbols
Notation for electronic sound synthesis (including additive and subtractive synthesis, modulation and treatment, etc.): The resurgence of interest in recent years in analogue synthesis – and together with it the issues of interfacing between analogue and digital devices – has highlighted the absence of any common notation for this idiom.

Notational Outliers

There are many outliers and examples that test the limits of "standard" notational content types (for example, Don Byrd's well-known examples including Chopin's "Raindrop"). The details of these will be left to a more detailed analysis in the future.

We should note, however, that the literature contains many ad hoc uses (often of ordinary musical symbols) that, while intelligible to most readers, are difficult to represent or reliably interpret in a machine-readable encoding. Some can be dealt with through the extensibility mechanisms suggested above. Others, perhaps not. We propose that it is not the goal of MusicXML to represent the semantics of all music that has been composed to date. Even the set of standard CMN glyphs can be deployed in infinitely many non-standard ways. For some of these works, pure graphical representation (without semantics) will remain the most appropriate vehicle.

See the section on Document Profiles for more discussion of this problem.

Meta-Notation: Classes of Possible Notation Types

One useful question is whether there are constraints on the nature of notation that would suffice to make encodings of such scores useful across most use cases, while still broad enough to include most notation types. We might call these constraints a kind of "meta-notation".

One proposed such definition from James Ingram is Performable Notation which he defines as an instant of performed time maps to one or more symbols in the graphic.

Document Profiles

Completeness, Consistency and Ambiguity

Music encoding standards encounter at least one large conflict that emerges from a wide range of use cases: there is a fundamental difference in interests between those who need to faithfully represent the content of musical documents, and those who need to consume these same representations using a wide variety of real-world software tools. These two sets of interests differ enormously in their attitude towards incompleteness, inconsistency and ambiguity.

On the one hand, encoders wish to capture as much information as possible about the document at hand. They also wish to avoid encoding information that is not present. This allows encoded documents to fall anywhere on a spectrum in the dimensions of completeness, consistency and ambiguity. Document A may encode a late-20th century piano arrangement of a popular song in orthodox CMN created using notation software, containing complete information on pitch, rhythm, articulation and tempo. Document B may encode a 13th century manuscript with a sequence of known early-music symbols, but whose pitches and rhythms are open to multiple interpretations by different authorities. Document C might capture a 16th century printed work, faithfully including and annotating its typographical errors and suggesting alternative corrections. All of these are reflections of some of the user stories above, and all are legitimate uses of a music encoding.

On the other hand, consumers of the encoded information do not always share these interests. It is true that publishers or musicologists often want to know exactly what the encoder knew (and no more). But performers, educators and students often need to work with music that is reasonably complete, consistent and unambiguous. The meaning "reasonably" here has a fairly practical definition: the software available to consumers of these documents must be able to tolerate their contents. This further implies that the builders of this software must be able to implement that tolerance, to the extent that it is needed.

Representation, Rendering and Editing

The above section describes a conflict between the need for encoders to represent documents, and for consumers to enjoy the renderings of these same documents through the lens of real-world software. A standard that permits semantic gaps and conflicts imposes an enormous multiplication of effort on software developers attempting to deliver meaningful musical experiences based on that standard. In the worst possible outcome from this situation, developers have to anticipate and make sense of a huge range of invalid inputs for their purposes.

This problem is shared by the most widely used notation standards today, including both MusicXML and MEI among others. These standards allow an enormous range of expression, given the conflicting needs of representation and rendering. However, most software applications today that decode these documents can handle only a limited range of the possible constructs, and only when they fall within various guidelines (which are rarely documented).

Many musical applications need to render a music encoding visually and/or aurally. Short of displaying facsimile images or playing back audio recordings (which are addressed by existing non-musical standards), the semantic musical data in a document is paramount to this goal. In the CMN domain, in particular, an application that provides visual and aural rendering with any degree of flexibility (e.g. reflowing, part separation, tempo control) must rely on the semantic information within a document. If this information is incomplete or contains conflicts, the application needs to make a decision on how to handle the case. While a specification could attempt to prescribe all of these decisions, it may be better to let both providers and consumers declare that documents conform to one set of needs or another.

Perhaps the most constrained category of applications are notation editors. These applications not only must render notation, but must allow an end user to modify it at will. Most of these applications confine their users to some subset of possible information models for notated music, with the result that documents having some types of semantic issues may not even be editable in a given environment.

Proposal: Encoding Profiles

One solution lies in carefully defining a set of profiles to which encoded documents conform. Such profiles already exist in a de facto sense for the current generation of standards, in that a set of popular applications are only able to accept a subset of what current standards allow.

Equipped with profiles, software creators will finally be able to reference specific profiles of MusicXML, test their software against these profiles, and require content providers to supply documents that conform to these profiles. Multiple profiles may exist along different dimensions. For example, one set of profiles may address completeness and consistency. Another set of profiles may address the degree of visual positioning included.

Examples of possible profiles for CMN documents in the semantic completeness/consistency dimension might include the following at a minimum (with substantial work needed to define what some of these terms mean):

Transcription: an accurate representation of the exact contents of some original document including any number of semantic gaps, conflicts and errors. For example, measures may be incomplete or overflowing, accidentals may be absent, entire ranges of the work may even be missing.
Urtext: an attempt to produce a canonical and consistent encoding of a work that is believed by some source to reflect the composer's original intentions.
Standard: a document that contains a complete and consistent semantic encoding of a work using a defined set of notational constructs (possibly diversified by notational content type).

Profiles in the visual completeness dimension might include:

Semantic: no visual styling or positional information is included
Layout: every notational element has complete positional information relating to some specific visual layout and set of styles

Profiles in the performance completeness dimension might include:

Semantic: no performance interpretation is included
Performable: notational elements include timing, duration or other elements affecting their playback