AI Generated Music from Audio Wave Data

Introduction

In this article we explore how AI can be used to analyze audio waves and automatically generate music.

Computer Audio
Audio Transformations
My Solution for Music Generation
Existing Solutions
Final Remarks
Credits

My Question

I love music. While growing up, I took drum lessons throughout high school and college. I also make music on my computer using FL Studio and Virtual DJ. Over the years, I have tried to learn a few songs on violin, but I have never had much success which has led me to the following question:

Can a computer play an instrument like a skilled musician?

In order to answer this question, we first need to explore the topic of computer audio.

Computer Audio

I am new to audio engineering and computer generated music, but from a few weeks of studying, I have learned a lot. Here are some important concepts that I learned about.

A wave file (.wav) stores byte data that can be used to represent audio.
In the PCM format, the byte data represents a sequence of signed numeric amplitude values.
The positions in the sequence are associated with points in time and changes in the amplitude values lead to vibrations in your speakers.
With a two byte per frame format (or 16-bit format), each numeric amplitude value is represented as a pair of bytes. This allows for amplitude values between -32768 and 32767.
Using the little-endian order, the second byte contains the most significant bits and the first byte contains the least significant bits.
Stereo has two audio channels (Ex: left & right ear) and mono has one audio channel.

I was able to apply this knowledge to programmatically read, write, play, copy, and blend audio data from multiple wave files. I was also able to programmatically perform a few simple audio transformations.

Audio Transformations

Audio can be transformed by manipulating the associated sequence of amplitude values.

For example:

To adjust volume, we multiply the amplitude values by a constant factor.
To combine two sound clips, we sum together the amplitude values from their respective sequences.

My Solution for Music Generation

For my side-project, I made a computer AI that reads in a song and tries to recreate it using a given instrument.

If the instrument is a violin, then the program first reads in violin samples. Next, the program copies, transforms, and blends together these samples to recreate the original song.

You can watch an example of the recreation process using violin here.

Source: YouTube

Note: The generated violin plays towards the end of the video.

You can also listen to the generated audio here.

Source: SoundCloud

The recreation process is based on an intuitive idea. The violin samples are repeatedly played at differing intervals in many ways to create sounds that match segments of the original song. The AI determines how to effectively carry out this process over a series of rounds. As the number of rounds increases, the generated audio sounds more and more like music.

Existing Solutions

It turns out that there are existing software solutions that convert directly from Audio to MIDI. This allows you to extract the melody from an audio file and transcribe it into a composition. For future work, I hope to investigate various approaches for Audio to MIDI conversion.

Here are some helpful links that I found.

http://akoff.com/audio-to-midi.html

http://www.justinsalamon.com/melody-extraction.html

Final Remarks

Thank you very much for reading! I gladly welcome any thoughts, feedback, or comments.

This was my first serious attempt at AI generated music. In the future, I hope to learn more about existing technologies for music production and audio engineering to refine my approach for music generation.