# Yahweasel's Audio Primer

OK, so you want to record something, but you know nothing about audio, and Yahweasel and Craig are abusive jerks. Where do you begin? Right here, that's where.

Firstly, audio from Craig is always delivered as multiple files, with each file representing a single speaker. This is called multi-track recording (a single stream of audio is called a “track”), and is Craig's primary feature. This is extremely useful, as you can edit or cut parts of each track independently. If one speaker is too quiet, you can increase their volume without affecting anyone else; if another speaker keeps coughing, you can remove that without removing anyone's speech. Ultimately, you will probably need to use an audio editor to mix these tracks into a single audio stream. There are many audio editors in the world, and Audacity (opens new window) is a popular, free one.

The multiple files delivered by Craig are packaged in a ZIP file. ZIP is a common archive format; that is, it's used to package several files as one. All modern desktop operating systems have support for ZIP out-of-the-box; mobile users will likely need to search for a ZIP file extractor for their system.

Digital audio data may be encoded in a variety of ways, with a variety of benefits and disadvantages. The classic and simplest technology is a WAV file, or a “raw PCM waveform”. The key word there is “raw”: WAV files are totally raw data, and are huge. So huge, in fact, that the idea of sending them over the Internet to anyone who asks is just infeasible. For this reason, Craig only directly offers WAV files of reduced quality, which should only be used if you have no other option. If you're on a supported operating system, Craig will offer a WAV file “extractor”, which is full-quality compressed files plus a program to convert them to WAV. If you don't trust me to run random software on your computer (and you have no reason to), or if you're not using a supported operating system, use a different format and read on.

Typically, audio data transmitted over the Internet is transmitted in a compressed format. With audio compression, there are two options: The audio may be losslessly compressed, or lossily compressed. Lossless compression takes more space than lossy compression, much less space than raw data (WAV), and loses absolutely no information relative to a raw WAV file. Lossless compression usually has no disadvantages except its size. The most popular lossless audio compression format is FLAC (Free Lossless Audio Codec), which is Craig's primary format. If your software supports FLAC and you have a fast enough Internet connection, FLAC is what you want. Apple has their own (slightly inferior) lossless audio format, called ALAC. There are a few other lossless formats, but none are as well supported as FLAC and ALAC.

If lossless compression isn't your thing, you want lossy compression. Lossy compression makes audio data smaller by intentionally losing some information: Lossily compressing audio data invariably loses some quality relative to the original. In the world of lossy compression, there are a huge variety of options available. The Moving Picture Experts Group (MPEG) defines an array of formats which are widely supported and considered the international standard. Amongst them is AAC (Advanced Audio Coding), otherwise known simply as MPEG-4 audio, which is the most widely supported modern audio format. AAC has excellent quality at reasonable size. If you can't use FLAC, AAC is what you want.

Craig also supports a collection of other lossy formats, namely MPEG-4 HE-AAC, Opus and Ogg Vorbis. HE-AAC is an improved version of AAC, but is slightly less widely supported; the quality of an HE-AAC file will be roughly the same as the AAC file, but it will usually be smaller. Opus is the current darling of lossy compression, is the format used by Discord itself, and is simply the best lossy format available today, but currently support in audio editing software is mostly lacking. Opus is also unemcumbered by patents, if that's important to you. Ogg Vorbis isn't as widely supported or good as AAC, but is more widely supported than FLAC or Opus, and is also unencumbered by patents.

The elephant in the room is MP3. To many people, “MP3” is synonymous with “audio file”. First, let's avoid a confusing issue: MP3 is not MPEG-3 audio, and thus is not the immediate predecessor to MPEG-4 audio. MP3 is the third audio standard in the first MPEG standard; i.e., it's MPEG-1. I will call it MPEG-1 from here on so as not to give it more prominance than it deserves. I don't support MPEG-1 for a variety of reasons:

  • One of the details of lossy encoding I haven't gone into is bitrate. Bitrate is simply how many bits (the smallest unit of digital information) are taken by one second of audio data. Bitrate is frequently used as a measure of quality, but that's entirely wrong: No modern audio compression uses a constant number of bits per second¹ (a constant bitrate); they vary, based, sensibly, on how much space is required to encode the information at a constant quality. Some information, in particular silence, simply requires less space because it's less information. MPEG-1, being a quite ancient standard, does use a constant bitrate, which means that, for instance, sudden changes will always sound muddier than sustained notes. Importantly for a voice recorder, this further means that complete silence will take as much space as speaking, which is absurd. Fairly common extensions to MPEG-1 allow variable bitrate—indeed, when Craig was new, I supported MP3 in variable bitrate modes—but I found that support for variable bitrate MPEG-1 in audio editing software is dodgy.

  • MPEG-1 is formally obsolete. Even MPEG doesn't want you using MPEG-1 audio; they obsoleted it. Then obsoleted the thing that obsoleted it. (As it happens, there wasn't an MPEG-3 standard, for exactly the reason that people were calling MPEG-1 “MP3”)

  • Being obsolete, it is hopefully unsurprising that MPEG-1 cannot deliver the same quality as AAC.

  • MPEG-1 is not the file format your software “wants”. All Apple music, for instance, has been delivered in AAC since 2003; your “MP3 player” is an AAC player. Same goes for Android. MP4 files are MPEG-4 files, and so use MPEG-4 audio (AAC). YouTube videos have AAC or Opus audio. Discord and Skype use Opus. All browsers support either AAC or Opus (often both). The only people who use MPEG-1 are people who don't know all the things I've just told you. Or, people who are still rocking their MP3 player from 1999.

  • “MP3” is a symptom of a larger theme of unwillingness to adapt or understand. Did you know that your “gifs” aren't gifs? Your “gifs” are WebP or WebM, as gif is a woefully obsolete and inefficient format. Because only browsers usually care about the format of a “gif file”, the web ecosystem as a whole has been willing to pull that rug out from under you and continue to call them “gifs” even though they're not gifs at all. Unfortunately, because audio software is far more diverse, the same rug cannot be pulled out from under MP3; it is up to users to be sufficiently educated not to be duped by obsolete technology.

This concludes my ranting. If you have read this and still have questions, you are free to ask them on Craig's discord server.

Last Updated: 3/15/2024, 11:29:16 PM