Recently I was looking at Stochastic Neighbor Embedding (SNE) and its t-distributed version (t-SNE), but I could not find the exact steps to derive the gradient of the loss function (there are small errors in the t-SNE article and no info in the SNE one), so I decided to carry on the derivation and share it.