For this project, we wish to create a program that takes a song chosen by the user and constructs a Minecraft note block circuit that reproduces the song. The input to the system is an audio file which is less than five minutes. Our output is a Minecraft circuit complete with redstone, repeaters, and note blocks that when activated, produces the input audio in Minecraft. To make our output sound more like the input, we will also be using different Minecraft blocks underneath the note blocks, which changes the type of sound the note block produces by changing the instrument that is played. A possible application of our project is a song recognition app for a smartphone.
In this project, we first began by procesing our input so that we
wouild have the data available that we required for training. Our
original input was an audio file that we generated using a Digital Audio Workstation
called Ableton Live Lite 10. We were then able to convert that audio file via AnthemScore
into a readable csv file that could be used as input for our program. Each line of the
csv file corrresponds to the frequency of the audo recorded every one
hundreth of a second. From the csv file, we know which frequencies are
being heard at each specific moment in time and the amount of time
that the specific frequency is being played for. For example, in our
model, if there is a frequency that is being played for 50 lines in
our csv file, we know that this corresponds to a half second of time
in the input file. From this file, we then can determine how long each
note of the song is being played, and using each frequency value we
can map each note to a specific pitch that can be represented in
Minecraft.
The machine learning model that we are using to approach our scenario
is a classifier specifically we are using a random forest algorithm in
order to classify our data. Once we have determined the basic notes
that are played, we use our main algorithm in to assign an instrument
to each note that best reproduces the sounds that we have gathered
from our input file. In Minecraft, the instrument that a note block
produces depends on the material that is below the noteblock. For
example, placing a noteblock on top of grass produces a piano sound.
Each note can then be represented by one of the five possible
instruments that can be heard in the game, (kick drum, snare drum,
high-hat, harp, or bass) chosen by our Random forest model. The module
that we are using is sklearn.ensemble.RandomForestClassifier.
Evaluating our project is not exactly simple, as we rely on our ears to make sure that the notes are being played correctly. One downside of this project is that when we convert a song to a minecraft noteblock circuit, the BPM (beats per minute) of the song will most likely change. This is because there is no way for the Malmo XML file to modify the number of ticks on a redstone repeater, and noteblock circuits with unticked repeaters only allow a BPM value of 150. Thus, rather than to record the audio file and compare that with the original file, the best that we could do to evaluate our code was to make sure the noteblock circuit plays the correct note at the correct time.\
The note block circuit that we generate has each note block represent a sixteenth note, and our scale input only consisted of quarter notes. Since a quarter note has the same length of four sixteenth notes, we made sure that each note (note block) was followed by 3 silent notes (stone block). In our implementation, our code successfully translated the scale to a note block circuit, as all 64 notes were played the correctly at the correct time. We did not test out different rhythms, but in theory it should work if the fastest note in the audio file is at most a sixteenth note.
Our implementation serves as a proof of concept that we could indeed convert information gained by a musical audio clip to a noteblock circuit. The main challenge of our project lies in the problem of determining which instruments are being played and when. Up to this point we have only tried to do this with one instrument type. Thus, our main focus now is to ensure that our Random Forest Classifier would correctly determine which instruments are being played.
Some problems we would probably encounter along the way is to make sure that the instruments are being played at the correct time. Thus, our most robust solution is to modify the current circuits to be compatible with multiple instruments. The limitation is that the agent must be within a 40 block radius of a note block for it to be able to hear it, but with the structure we have in mind this shouldn’t be a problem. We believe this should not be too daunting of a task, as we already have the backbone of the implementation. However, another problem we have to account for is making sure that the agent moves along with the note blocks so the entirety of the song could be heard. However, this problem is not too difficult to overcome as well, as we just need the agent to move along with the speed of the circuit, which isn’t fast at all.
For this project, we are using AnthemScore’s Neural Network system which takes an audio file and produces a csv file that contains the amplitudes of different frequencies at different time intervals. To parse through the csv file, we are using the python library’s csv module. We are also using the note_block_test file provided in Malmo’s Python_Examples as a guide to generate the note blocks in different pitches. We used Ableton to generate the initial sample songs. Finally, we are using the mltools library in order to train and utilize our random forest classifier.