The Convolutional Recurrent Neural Networks is the combination of two of the most prominent neural networks. The CRNN (convolutional recurrent neural network) involves CNN(convolutional neural network) followed by the RNN(Recurrent neural networks). The proposed network is similar to the CRNN but generates better or optimal results especially towards audio signal processing.

## Composition of the network

The network starts with the traditional 2D convolutional neural network followed by batch normalization, ELU activation, max-pooling and dropout with a dropout rate of 50%. Three such convolution layers are placed in a sequential manner with their corresponding activations. The convolutional layers are followed by the permute and the reshape layer which is very necessary for CRNN as the shape of the feature vector differs from CNN to RNN. The convolutional layers are developed on 3-dimensional feature vectors, whereas the recurrent neural networks are developed on 2-dimensional feature vectors.

The permute layers change the direction of the axes of the feature vectors, which is followed by the reshape layers, which convert the feature vector to a 2-dimensional feature vector. The RNN is compatible with the 2-dimensional feature vectors. The proposed network consists of two bidirectional GRU layers with ānā no of GRU cells in each layer where ānā depends on the no of classes of the classification performed using the corresponding network. The bidirectional GRU (Gated recurrent unit) is used instead of the unidirectional RNN layers because the bidirectional layers take into account not only the future timestamps but also the future timestamp representations as well. Incorporating two-dimensional representations from both the timestamps allows incorporating the time dimensional features in a very optimal manner.

Finally, the output of the bidirectional layers is fed to the time distributed dense layers followed by the Fully connected layer.