Derivations of SNE and t-SNE

Image taken from original paper

Recently I was looking at Stochastic Neighbor Embedding (SNE) and its t-distributed version (t-SNE), but I could not find the exact steps to derive the gradient of the loss function (there are small errors in the t-SNE article and no info in the SNE one), so I decided to carry on the derivation and share it. I hope this can be of any help to those who are studying the same topic. Here you find the pdf: please let me know if you spot an error! (Picture taken from t-SNE paper)

Federico Errica
Federico Errica
Research Scientist

My research interests include distributed robotics, mobile computing and programmable matter.