Skip to main content

Transcription Quality

Harmony transcribes recordings with AI speech recognition. This page explains what affects accuracy, what you can do to improve results, and how to handle the most common transcription issues.

What affects accuracy

Several factors affect how accurately a recording is transcribed:

The quality and accuracy of transcriptions in Harmony are influenced by a variety of factors. Among the most significant is the clarity of the audio itself. When recordings are made using high-quality microphones or in environments free from excessive background noise, the AI speech recognition system is much more likely to capture words and phrases correctly. In contrast, poor audio quality—such as muffled sound, distant microphones, or distorted inputs—can cause the system to miss or misinterpret parts of speech.

Background noise, such as people talking nearby, typing sounds, or environmental sounds like fans and traffic, also has a strong impact on how well the transcription system can distinguish what is being said. The clearer and quieter the environment, the better the results will be.

How individuals speak is another major factor. Speaking clearly and at a moderate pace allows the system to catch more words accurately. If speakers talk too fast, mumble, or trail off at the ends of sentences, the likelihood of transcription errors increases.

Conversations with multiple participants can introduce additional challenges. The more people speaking in a meeting, the harder it becomes to accurately differentiate and transcribe each speaker, especially if several individuals talk at once (known as crosstalk or speaker overlap). When speakers interrupt or speak over one another, it can confuse the AI and reduce accuracy for those segments of the recording.

The use of technical or specialist vocabulary can also affect transcription quality. Industry jargon, product names, or less common terminology may not be well recognized by generic AI models, leading to misinterpretations or incorrect spelling in the transcripts.

Finally, accents and dialects play a subtler role. While modern AI transcription systems are designed to understand a wide range of accents, strong regional accents or unique dialects can still lead to occasional errors, particularly if combined with fast speech or background noise. Overall, carefully controlling your recording environment and encouraging good meeting etiquette will help achieve the best possible transcription accuracy.

If a word or phrase comes through incorrectly, that does not significantly affect insight quality — the AI models used for insights are designed to handle minor transcription errors.

Improve audio quality

Before the meeting

Participant setup:

  • Use a headset or external microphone instead of a laptop microphone.
  • Find a quiet environment and reduce background noise.
  • Test audio before the meeting starts.
  • Close unused applications that produce sound.

Meeting platform settings:

  • Enable HD or high-quality audio in your meeting platform when available.
  • Use a wired internet connection where possible.

During the meeting

Best practices:

  • Speak at a moderate pace.
  • Avoid talking over one another — let each speaker finish.
  • Mute when not speaking.
  • Stay close to the microphone.

Avoid:

  • Background music or other audio playback.
  • Loud typing during the meeting.
  • Phone speakers in noisy environments.
  • Multiple people sharing one microphone.

Speaker identification

Harmony assigns each detected speaker a code automatically (Speaker A, Speaker B, …). When a speaker can be matched to a known Harmony user or a linked contact, the participant's real name is shown instead. Unmatched speakers fall back to the label User.

To improve matching:

  • Make sure participants are added to the meeting as contacts where possible.
  • Have each speaker say their name once at the start of the call.
  • Reduce speaker overlap so the system can attribute turns to the right person.

If a speaker is mislabelled, you can match them from the conversation detail view; that updates which participant the affected turns are attributed to.

Multi-language meetings

Transcription accuracy varies by language and depends on the AI provider Harmony uses for that language. The most accurate results are typically in English; other languages may have lower accuracy depending on the provider and the audio quality.

Translation of an existing transcript into another language is currently in development — when you click Translate in the transcript toolbar, Harmony shows a "Translation Coming Soon" notice.

If a meeting mixes languages, transcription quality on the secondary language will usually be lower than the primary one. For high-stakes multi-language sessions, prefer separate single-language sessions where practical.

Common errors

Homophones

Words that sound alike are easy to confuse — for example "their" / "there" / "they're", or product names that sound like common words. There is no shipped custom-vocabulary feature today; the most reliable workaround is to use clear, deliberate pronunciation and to correct the meaning at read-time rather than expecting the transcript to disambiguate.

Proper nouns

Names, companies, and locations are frequently misrecognised, especially when they are uncommon. Pronouncing them slowly or spelling them out improves recognition.

Technical terms

Industry jargon and acronyms can be transcribed phonetically. Where the term still appears recognisably, downstream insights and Companion answers usually still work.

Reprocess a conversation

If a transcript has clearly failed or is so far off that it is unusable, the Reprocess flow lets you re-run transcription against the original recording. Open the conversation, use the action menu (⋯) next to the title, and choose the option that restarts from transcription. See Reprocessing insights for the full reprocess flow.

Reporting quality issues

If you see consistent quality problems — for example, the same word always misrecognised, an entire section dropped, or a meeting that processes but produces an unreadable transcript — contact Harmony support with:

  • The conversation reference (workspace and conversation ID, or the link to the meeting).
  • The meeting date and approximate timestamps where the errors occur.
  • A short description of what you expected vs. what the transcript says.
  • Notes on audio quality (headsets, background noise, speaker count, language(s) spoken).

This is the same information Harmony support needs to escalate provider-side issues, so giving it up front shortens triage.