The training relies on isolated keys
-
To run our audio dataset constructor, have a look at
data_recording.pywhich provides the necessary methods to record both audio channel microphone input and keyboard press/releases simultaneously. -
Then, a short time fourrier transform (STFT) is performed on the data chunked into small buffers. The STFT is applied to a short time frame between a key's press to its release, and a correction that ensures each window fits in a consistent time interval of 0.3s.
-
Data is preprocessed in the following manner
- Overlapping keys are removed with a pair comparison using a stack that iterates over the entire set of keys pressed
- Keys that were pressed and never released (and vice versa) are omitted
*The spectrograms are generated in following manner :
- The STFT is applied on a time "frame", then all the frames are stacked to form a 2D matrix.
-
The spectrograms are exported to numpy arrays of shape (129,300,
channel) withchannelthe number of channels (mono = 1, stereo = 2, etc). -
The model
Example spectrogram for key "r"
Note here that the buffering size is 0.5s, but in the case of a fast typer we might reduce it so that our spectrogram does not contain parasite keys
- Bets result so far from training a Dell mechanical keyboard
Account for data imbalance : when there is an imbalance in the keys of the training dataset (one key is disproportionately present) apply some weighting to correct imbalance



