Sound Theory Digital Audio WAV MIDI DSMI

At it's core, in order for there to be sound you must have something that vibrates.  In order to transmit sound you need  a medium to carry the vibration, and an instrument to detect the vibration.  But it all boils down to the vibrations, and these vibrations are characterized common measurements:

Amplitude: This is the height of the vibration.

Frequency: This is the number of waves per second.

Run  this oscilliscope program.  Say the letter   "e" and hold it for as long as you can.  How smooth is the wave? Thiis is about the closest a human can come to getting a pure wave form.  Typically our voices are creating many simultaneous vibrations, which when combined make the sounds we know and recognize - which are really bumping looking waves on the oscilliscope.   Hold the "e" sound and try raising and lowering your volume.  As you increase or decrease the  volume the amplitude of vibration increases or decreases.  Now try changing the pitch.  You see the wavs become longer or shorter as you adjust the frequency.

Digital Audio
In the real world, sould vibrations are continuous and smooth, this is known as analog.  Computers however are digital devices.  For example, when we digitize sound what we are really doing is to measure the volume so many times per second, also known as the sampling rate.  If you increase the sampling rate to infinity, then you have the equivalent of analog. 

The human ear starts having problems distinguishing the difference between analog and digital at sampling rates low as 11,000 KHz for voices and CD quality is said to be at 44,100 KHz.  This is very similar to effects of animation, the average eye can barely detect a flicker at 15 frames per second, and is nearly impossible to detect at 30 frames per second.

The other decision to make in digital recording is how many bits to use?   Typically we use 8 or 16 bits to measure the volume.  8 bits means the volume is between -128 and 127.  16 bits means we set the range between -32768 and 32768.    What's important here is the delta between dead silence at zero and the upper range, the greater the number of bits the greated the nuances are in the playback.   This is very similar to the "jaggies" you get with screen resolutions.   Large blocky pixels are similar to the 8 bit world, and smaller refined pixels to the 16 bit world.  How good the picture looks depends on how good your ears are.  

A WAV file is a style of recording digitized voice that offers a variety of internal compression techniques.  This format is typically used for very short sound playback as the uncompressed formats can be loaded into memory and transferred with very little CPU consumption.

However, WAV files aren't typically used for music because to get good quality you need higher sampling rates and musical scores are two to three minutes.  Let's say you're recording in stero at 44KHz in 16 bits for two minutes.  Disregrading the file overhead, the data length would be 2 channels times 16 bits times 44,100 samples per second times 120 seconds = ~162 Mb of data.  So, for musical reproductions you needs lots of compression or a different way to store the data.

The Musical Instrument Digital Interface (MIDI)  is a set of instructions (similar to a muscial score) that controls a set of 'instruments' on the sound driver.   The MIDI protocol is an entire music description language in binary form. Each word describing an action of musical performance is assigned a specific binary code. MIDI was designed for keyboards, so many of the actions are percussion oriented. To sound a note in MIDI language you send a "Note On" message, and then assign that note a "velocity", which determines how loud it plays. Other MIDI messages include selecting which instrument to play, mixing and panning sounds, and controlling various aspects of electronic musical instruments.  The MIDI file format is slightly different than the protocol and includes time-stamping to have the playback in the proper sequence. 

MIDI is the primary source of music in many popular PC games and CD-ROM entertainment titles, and thousands of MIDI files are available on the Internet for recreational use. Just about every personal computer is now equipped to play Standard MIDI files.

One reason for the popularity of MIDI files is that, unlike digital audio files (.wav, .aiff, etc.) or even compact discs or cassettes, a MIDI file does not need to capture and store actual sounds. Instead, the MIDI file can be just a list of events which describe the specific steps that a soundcard or other playback device must take to generate ceratin sounds. This way, MIDI files are very much smaller than digital audio files, and the events are also editable, allowing the music to be rearranged, edited, even composed interactively, if desired.

What MIDI lacks however is control over the instrumentation in the playback.  It's like writing music rather than record it, you don't know how the band will play it back.   Recently, the MIDI standard has introduce the Downloadable Sounds format.  This allows MIDI files to contain standardized samples of musical instruments, sound effects, or even dialogue, which are used to recreate an exact copy of the sound intended by the composer. MIDI files with DLS are the ideal solution for composers of all kinds who want the predictable playback of digital audio, but also need the compactness and/or interactivity of Standard MIDI Files for delivering their music.

The Digital Sound and Music Interface (DSMI) standard was never particularly popular because when it was introduced it required a lot of CPU horsepower to recreate the sound.  Even though today's PC can easily handle the requirements, it is still considered a "fringe" effort.  This technique digitally records the instruments as with the Downloadable Sounds format.  Then, the sounds are sliced, diced, and sequenced, to produce music.