Provides visual equivalent of speech and non-speech audio metadata