The End of The Universe

The universe is a broad subject that raises a lot of questions; How the universe came to be?, was there anything before the big bang? However the question that never came to mind was is if how and…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Building an Audio Visualizer for Razer Chroma Keyboards

The sound is a really fascinating phenomenon. As a human, without even knowing it, you are able to decode and interpret thousands of different sounds from thousands different sources. The sound is a never-ending source of inspiration for us : from symphonies to binaural beats also known as ‘digital drugs’ for the brain, playing with sound is definitely fun and entertaining.

Before starting, I think that it is important that we are on the same page when it comes to the physical basics behind this application (if you have some physics background, feel free to skip to the next section).

a — Wave nature of sound

The sound can be described as a vibration that travels through a medium (air most of time) and reaches a destination where it can be interpreted (your ear for example). The vibration of the molecules has unique properties such as the intensity and the frequency and will in the end have an effect on the sound you hear.

The best way to capture it is to think about water and a tiny rock falling perpendicularly in it. You have no trouble imagining the water waves produced by it : the same phenomenon happens with sound. The sound will propagate like a water wave but in the air.

b — Frequency of sound

So far, you know that the sound is a wave that travels through a medium but most importantly the sound has one or multiple frequencies. When we record sound wave oscillations, with an oscilloscope for example, we get a signal and more precisely a sinusoidal signal. A sinusoidal signal looks like this :

You can already notice that a sinusoidal signal repeats over time. The point at the top of the signal can be seen two times. Now if the difference between those two points is one second, we say that the signal has a 1Hz frequency or that this signal repeats one time over a second.

Okay, now that you have those physical concepts under your belt, we can begin to think on how we are going to build our cool project!

a — Defining The Architecture

The goal of our project is quite simple : from Google Chrome, we need to be able to capture the sound (whether it is coming from Youtube, Twitch or any source), analyze the signal, build an audio equalizer effect out of it, and send it to the keyboard. The project needs to be completely web-based and OS independent.

As always, here is the final architecture used in the project :

We made great progress so far : from a simple idea of converting audio signals to lightning effects, we have built an entire architecture that will serve our needs.

b — Defining The Process

As stated before, for technical reasons, our application will be embedded in a Google Chrome Extension, available directly in the browser. The extension lifecycle of our application is defined below :

The process is quite simple : when a Google Chrome tab is producing audio, the extension can be activated by selecting in the toolbar and clicking on the Play button. From there, a visual audio equalizer is shown in realtime. Different themes can be selected and will impact the extension UI as well as the keyboard itself. Clicking the stop button stops the tab recording and the keyboard effect.

Now as the different parts of our process are fairly decoupled, the next sections will follow the steps described above. In the end, they will be reassembled to build our final product.

a — Capturing Sound From Google Chrome

The first part of our project relies upon being able to capture the sound coming from Google Chrome. As stated before, this part will be built using the Web Audio API. This API is embedded in more and more browsers and provides functions that developers can use to manipulate audio signals. This is exactly what we are going to do here.

The Web Audio API relies on ‘nodes’ which are blocks that have different usages. An AnalyserNode will be used for example to extract Fourier transformations from the signal. A GainNode can be used to amplify the sound, while a BiquadFilterNode can be used to filter out high frequencies.

The Web Audio API is instantiated via an Audio Context object that can either be media stream source (such as a continuous Youtube video or Twitch stream) or a buffer (a predownloaded MP3 for example).

In our case, as described in our process, when clicking on the ‘Play’ button, we should fire a capture method from the Chrome API and begin capturing the audio. A first sample of code looks like this :

Capturing the audio from a Google tab (encapsulated in a promise)

In this example, the stream is directly connected to the output (being your speakers or your headsets). As stated in our architecture, we need to bind the analyser in between in order to retrieve relevant signal information.

b — Analyzing The Signal

Before going any further, we need to know a bit more about what are time-based and frequency-based representation of signals. When you take a look at an oscilloscope, you are looking at a time based representation of a signal. It means that the oscilloscope is plotting the amplitude of a signal over time.

Let’s take our previous schema, annotated this time :

In this schema, a time representation highlights the fact that the amplitude of the signal is zero at t = 0. Over the course of a second, it reaches a maximum amplitude before reaching zero again. Now if you remember correctly, we want to build an equalizer effect, which has the following appearance :

Now, remember when I said that sound can be described as a sinusoidal wave form? It is true.. but incomplete.

c — Understanding Frequency Based Representation

Before understanding the utility of the Analyser Node, we need to dig a bit deeper into signal processing and focus a bit more on a concept called Fourier Transforms. In reality, audio signals look like this :

As you can see, it is far from the sinusoïd that we described in the first section. But is it that far away from it?

Visually yes, but not mathematically.

In fact, what if I told you that the signal observed above is just the superposition of many sinusoidal signals? When you’re listening to your favourite music, you are physically listening to the superposition of pure sinusoïds. This process repeats over time and builds the melody of your song.

With this schema, we have the building blocks of how frequency based representation. To go from the chaotic signal to the pure sinusoïds that represent our signal, we use a mathematical tool call Fourier Transforms.

Fourier Transforms allow us to find the original pure sinusoïds that made our signals and display their distribution over time. A schema speaks a million words.

Does this Frequency Based Distribution sounds familiar to you?

Exactly! This is the equalizer effect. What the signal varies over time, the distribution will also vary and provides a very cool equalizer effect.

d — Building The Almighty Equalizer Effect

Back to our architecture, the AnalyserNode comes very handy for our project. This node provides native FFT (Fast Fourier Transforms, a faster way to perform Fourier Transforms) on the stream provided as well as the frequency distribution of the signal. First, let’s inject the node in our code.

From there, the AnalyserNode is binded to a memory object that is updated everytime the audio context receives new data from the audio stream.

As a side note, how often does the audio context receive new data from the stream?

From my experiments, it seems that the Web Audio API refreshes its context at a 144Hz rate, meaning that we get 144 new arrays every single second.

In order to have a nice equalizer effect on our keyboard, there are some parameters that we need to adjust : the Fourier Transform size (also called FFT size) and the frequency resolution.

Back to the science.

As I explained earlier, sound is the superposition of multiple signals, each one having its own frequency. As humans, we are not able to hear all the frequencies available, in fact you are able to hear frequencies between 20Hz and 20 KHz.

Given the range of the frequencies you are able to hear, you can intuitively understand that your fourier transform, i.e your equalizer, will be distributed on this spectrum giving an approximate 20Khz frequency range to work with.

If you remember the explanation from the previous section, the Fourier Transform can be visualized as a series of ‘bins’ representing the frequency distribution of your signal. This is where the Fourier Transform size comes into play.

The formula we are interested in is the following :

In our case, we stated that the frequency range is 20KHz.

If you were to take a very low FFT Size, 2 for example, you would have a frequency resolution of 10KHz, or two bars on your equalizer. With this resolution you would not be able to grasp the audio fluctuations in an accurate way. On the contrary, if you were to take a big FFT Size, 32 768 for example, you would have many signal fluctuations but a bigger computation time, giving a poor and ‘laggy’ equalizer effect.

As you can see in the sample code above, given many experiments run on this particular project, I chose a 8192 FFT Size, giving an approximate 2.6 Hz signal resolution.

Now that we have all the input parameters for our analyser node, it is time to send the frequency distribution to our keyboard.

e — Let There Be Light

Having the frequency distribution, how can we build an equalizer effect on our keyboard?

First, let’s reduce our physical keyboard to a 22x6 matrix, as most Razer keyboards seem to fit those dimensions.

To build the final equalizer effect, we need to :

Why do we need to resample the signal coming from the analyser node? Because we need to reduce the dimensions from the memory object that is filled by the analyser node to the keyboard matrix. In short, we need to go from an array made by 4096 elements (your FFT size divided by two) to an array made by 22 elements.

To get our final columns, we are going to perform local means every 186 elements (4096 / 22 ) elements of the initial array (also called the frequency step here).

Resampling the audio coming from our Fourier transform

Now that you have a preset resampled array of frequencies that match our keyboard dimensions, let’s build the final JSON that will be sent to the Razer Chroma API to light our keyboard.

The final step is to get the maximum amplitudes of our resampled frequencies in order for them to take all the height of our keyboard. All the other frequencies, fractions of the maximum amplitude, will get assigned a fraction of the height. Again, a picture speaks a million explanations.

This maximum amplitude detection was implemented as described below :

Maximum amplitude and final keyboard array.

That’s it!

From this point, we simply called the Razer Chroma REST API (described here), provided with every Razer Chroma keyboard and shipped with Razer Synapse.

Enhanced version of the keyboard visualizer

This project is by far the funniest and most interesting side-project I have ever built. Going back to some physics concepts learned during my education, deepening some aspects of them and finally building a product out of it was a ton of fun.

Science is fun. Physics is fun and so is programming. I really hope that this article conveyed the message that science isn’t only about writing equations on a blackboard : it is also about experimenting, discovering, imagining new applications of it and most of all having fun.

On a personal note, you are more and more everyday to read, share and like my articles. A warm thank you for that. I have many other projects and articles coming up, follow me and the blog to be sure not miss them in the future.

Until next time,

Antoine.

Add a comment

Related posts:

What is the problem?

Booking a study room in your university might be your routine. You go to the reservation page, choose the time period, fill the reservation form and submit your request. Sounds easy-peasy, right? But…