It’s the time of year for saving money!
The AIFF (Audio Interchange File Format) music file format was first specified by Apple in 1988 based on the Electronic Arts Interchange File Format (generally just called IFF). Its original design only accounts for PCM encoding of music and offers no compression, lossy or lossless.
By definition, an AIFF file gets divided into “chunks”, which are organized somewhat hierarchically. At the highest level, a whole AIFF file is one chunk, and each chunk begins with a header, a sort of block of metadata, to identify what kind of chunk it is with variables that tell you how to decode the music file properly. Generally, an AIFF files begins with a FORM Chunk that has some identifying information and contains two subordinate chunks, the Common Chunk and the Sound Data Chunk.
The Common Chunk has a unique identifier and its own total size. It also contains information about the “sound data”, or music, that remains common throughout the AIFF file, specifically, the number of channels of music contained in the file (2 for stereo, etc.), the number of sample “frames” (the channels are interleaved sequentially and each set of interleaved channels is called a frame), the size of each sample per channel (meaning how many 8-bit values, called bytes, combine into an integer per channel (such as 2, which would mean a 16-bit sample size, 3 meaning a 24-bit sample size, 4 meaning a 32-bit sample size, etc.), and the sample rate, a large number such as 44100, 48000, 88200, etc., corresponding to 44.1kHz, 48kHz, and 88kHz, etc., respectively). If you have 2 channels with a sample size of 2 bytes (16 bits of a 4-byte or 32-bit frame given 2 channels at 2 bytes or 16 bits per each channel) and a sample rate of 44100, each second of music has 2 * 2 * 44100 bytes of data (that’s 176,400 8-bit bytes for every second of music). You can see why uncompressed music files tend to get very large.
If you happen to have an unusual sample size, like 18 bits, the AIFF file considers that a 24-bit sample but “pads” bits 19-24 with 0’s, which neither subtracts nor adds any information to the 18-bit music. Please note that, because of the way that numerical values reside in an AIFF file, by bits 19-24, I really mean bits 5-0. [???] It has to do with the Motorola CPU’s used by Apple in 1988. In effect, “padding” this way creates a value more like “001” than “100” and has to do binary math. I won’t dive into the details, at least not right now.
The Sound Data Chunk also starts with a header with a unique identifier and the size of the header itself. Following that is the offset, which means where the first frame begins in the actual musical data and is almost always 0 (meaning immediately), the block size, or how many full bytes of musical data are contained in the given chunk, then the “wave form data”, which is simply the actual music stream itself with a length, in number of bytes, equal to the block size.
There are many other types of chunks used for all sorts of different things by a music program or application, but those chunks are not necessary to store multichannel PCM-encoded music in an AIFF file.
So where, you might ask, is information like the performer, the album title, the cover art, etc., stored? That’s generally in a separate file, a kind of index, that “points”, speaking metaphorically, to all of the individual AIFF files, each one representing a track, for all of the albums in your collection of music (although, speaking hypothetically, some or all of that information could be stored in other, miscellaneous types of chunks in the AIFF file). Even the image files containing the cover art may be in a separate location, “pointed” to by the index.
Of course, AIFF is just one music file format. There is a variation of AIFF, called AIFF-C, that compresses the information, plus FLAC, ALAC, WAV, DFF, DSF, DXD, etc. not to mention the much maligned formats using lossy compression such as MP3 and M4A. There’s also Roon’s own RAAT, really more of a protocol than a file format, and MQA (which can be stored in, say, a FLAC file because of built-in backwards compatibility).
Most of these formats can be converted into one another (with a loss of information in some cases) because they’re all just numbers, the language of computers. It’s not (exactly) like trying to insert a VHS videocassette into a Betamax player (see https://en.m.wikipedia.org/wiki/Videotape_format_war ). What I call “special” DSP occurs, for example, when you upsample or downsample music in real time or convert DSD files to PCM files on the fly (as the music plays) among other things.
More to follow …
“Please note that, because of the way that numerical values reside in an AIFF file, by bits 19-24, I really mean bits 5-0.”
Absolutely awful thing to write as the padding has nothing to do with the endianness. From your explanation it’s really hard to tell if the whole value is shifted left by a few bits or if the 18-bit values remain the same but stored as 24. Endianness isn’t relevant to how individual bytes are treated which is why what you wrote was awful.