Documentation

Live Practice

The score listens to you play. The playhead follows. When you stop, the system audits your run and writes you a coaching note in plain language.

What it is

Practice (Follow) is a score-following surface. You play; the platform listens through your microphone, advances the score in real time so you don't have to turn pages, and at the end produces a report:

  • Per-note assessment โ€” clean, rushed, dragged, wrong, missed.
  • Per-bar accuracy heatmap.
  • Tempo curve โ€” your tempo over time vs the target.
  • Pitch-class coverage โ€” for chord-heavy passages, which pitches you actually played.
  • A coaching narrative in a teacher voice with concrete suggestions.

Recordings are kept with each session so you can play them back next to the score and hear what you did.

How to start

From a sight-reading passage

  1. Go to /library/sight-reading.
  2. Pick a passage. Each card has two CTAs: UNEB cadence (the 30-second-preview-then-perform flow) and Follow mode โ†’.
  3. Tap Follow mode. The piece loads into Practice; grant mic permission when prompted.
  4. Tap Start. You get a 4-beat count-in, then play.

From your own piece

  1. Open the piece in FORTE Notation.
  2. Tap Practice in the TopRail (desktop) or in the More menu (mobile).
  3. Same flow โ€” grant mic, tap Start.

Practice settings

Before you tap Start, you can adjust three knobs (locked once the run begins):

Tempo
Practice tempo, in BPM. Defaults to the score's tempo. Tap the โˆ’ / + buttons in steps of 5. Use a slower tempo when learning a piece, then ramp up.
Polyphonic
Off (default): the system tracks one melody line at a time (best for voice, flute, recorder, melody-line piano). On: the system runs a multi-pitch detector in the background to handle chords. Use for piano two-handed, guitar with chords, organ.
Any octave
When on, the aligner accepts pitches in any octave as a match (only the pitch class needs to be right). Useful for vocalists practising a piece written outside their range.

How it actually works

We don't hide the architecture, because it's relevant to interpreting the results.

Layer 1 โ€” instant cursor (monophonic)

Your microphone stream is windowed every ~23 ms (44 frames per second). An on-device pitch detector finds the dominant pitch in each window. When a stable pitch locks for ~70 ms, an onset event fires. The aligner consumes onsets, matches them against the next expected note, advances the playhead โ€” and that's your cursor: the highlighted note on the staff.

Latency is ~100 ms end-to-end. Good enough for the cursor to feel responsive. The aligner has a 3-note lookahead to handle skipped notes and a forgiveness budget (1.5 ร— beat duration) before it gives up on a note and marks it missed.

Layer 2 โ€” polyphonic follower (sliding window)

In parallel with the instant cursor, the system runs an on-device multi-pitch detector on a 2-second sliding window every 1 second. That's a polyphonic-aware view: it finds multiple simultaneous pitches, not just one. It runs entirely in your browser (~5 MB binary, cached after first use).

The polyphonic pass doesn't override the instant cursor unless the cursor has stalled (no advance for > 1.5 ร— beat) and the polyphonic detector has high confidence about where you are. That keeps the cursor responsive while still rescuing the playhead when the monophonic path gets confused on a chord.

Layer 3 โ€” recording + post-session audit

While you play, your audio is captured locally (Opus in WebM, ~64 kbps). On completion the recording uploads to your account โ€” typically ~250 KB per 30-second run. The session metadata + per-note assessment log is saved to the database.

Then a second audit pass runs offline on the same polyphonic detector โ€” no real-time pressure, so it can analyse the whole clip. The findings (per-bar accuracy, tempo curve, pitch-class coverage) get persisted to the session row so the detail page can show them without re-running.

Layer 4 โ€” AI coaching narrative

Once the audit is in, the platform sends a structured summary โ€” outcome aggregates, top trouble bars, specific wrong-note moments, timing slips, tempo curve, pitch-class gaps โ€” to a language model. It returns a 2-sentence summary, a 1โ€“2 paragraph narrative, and 1โ€“4 concrete suggestions.

We auto-fetch on the stats screen and on the detail page. There's a regenerate button if the first take doesn't land.

The audio recording itself is never sent to the AI model. Only the structured findings (numbers and labels). Your audio leaves your device only to land in your account's private storage. The full list of services that handle data on our behalf is at /subprocessors.

Reading the metrics

Clean
You played the right pitch within ยฑ15% of a beat of where it should land. The bar of the night.
Rushed
Right pitch, but you played it earlier than the target tempo expected. The aheadMs in the assessment is how much.
Dragged
Right pitch, late. Symmetrical to rushed.
Wrong
You played a different pitch from what the score has at this position. The aligner stays put โ€” you can correct yourself by playing the right note (lookahead of 3 notes catches you back up if you skipped instead).
Missed
No pitch arrived where one was expected, and the forgiveness timeout fired (1.5 ร— beat duration). The playhead advances anyway so you don't get stuck.

The per-bar heatmap

Each bar shows clean / total notes for that bar, coloured by ratio:

  • Green โ€” โ‰ฅ 90% clean. Solid.
  • Amber โ€” 60โ€“89% clean. Workable, with rough spots.
  • Red โ€” < 60% clean. Slow it down and revisit.

Hover (or long-press on touch) for the exact count.

The tempo curve

The curve is your local tempo at each matched onset, sampled across the run, smoothed with a 3-point moving average. The dashed line is your target. Shapes to watch for:

  • Slowing into cadences โ€” natural for expressive playing, but consistent across phrases may mean you're not solid on the cadential figuration.
  • Rushing through dense bars โ€” usually anxiety; slow the metronome 8% and try again.
  • A step change in the middle โ€” typically a section break (B-section in faster tempo). The score may not actually have a tempo change marked; you may be rushing it.

Pitch-class coverage (polyphonic mode)

For polyphonic instruments, we compare the pitch classes the polyphonic detector heard during your run against the pitch classes the score expected. A high coverage score for Dโ™ญ means you covered the pitch class regularly when the score asked for it; a low score means you missed it (or played the wrong chord). Useful for piano students working on left-hand accuracy.

Pitch-class coverage is not a per-note accuracy score. Pianissimo or short notes may register as low coverage even when played correctly. Treat it as a coarse signal that complements the per-bar heatmap.

Coaching narrative

The AI writes a teacher-voice note based on the structured findings. Format:

  • Summary โ€” 2 short sentences.
  • Narrative โ€” 1โ€“2 paragraphs that reference specific bars, tempos, and moments.
  • Suggestions โ€” 1โ€“4 imperative-voice items. Each one tied to a concrete observation, not a platitude.

The model is told explicitly: warm but honest, not a cheerleader, name problems specifically. It will sometimes praise (when warranted), sometimes recommend slowing down, sometimes call out a specific wrong note. Use it as a starting point, not a verdict.

Practice history

Every saved session lives at /practice:

  • Recent โ€” last 7 days, chronological.
  • By piece โ€” runs grouped by piece with up/down/flat trend arrows. "You've practised this 6 times โ€” clean rate is up from 64% to 88%."

Tap any row for the detail page: full audit, recording playback, audio download, delete.

Data and storage

  • Recording size โ€” ~240 KB per 30 s of practice at 64 kbps. Capped at 25 MB per session.
  • Where it lives โ€” your account's private storage, accessible only via signed requests from your authenticated session.
  • What gets sent to the AI model โ€” only the structured findings JSON. Never the audio. Never your credentials. Never your name unless you put it in your memory facts.
  • Deletion โ€” delete a session from its detail page; the row and the audio are removed. Idempotent.

Honest limits

  • Phone microphones are noisier than tablet or desktop mics. Practice tone tracking is good on tablets, OK on phones, best on a laptop with a USB mic.
  • Polyphonic accuracy is decent for piano right hand + comping left hand, less reliable for dense organ textures or two-stringed-instrument arrangements.
  • Latency floor is the browser's; we can't go below ~80 ms in any browser today.
  • Doesn't track timbre. We follow pitches, not playing technique. A coaching note may say "you stumbled at bar 5" without knowing if the stumble was a fingering error or a bow change.
We're continually evaluating what to add. A future version may include: server-side polyphonic re-analysis with a heavier model, per-piece progress charts on the detail page, and an optional video-camera feed for posture / fingering observations.