Derivations of SNE and t-SNE
Recently I was looking at Stochastic Neighbor Embedding (SNE) and its t-distributed version (t-SNE), but I could not find the exact steps to derive the gradient of the loss function (there are small errors in the t-SNE article and no info in the SNE one), so I decided to carry on the derivation and share it. I hope this can be of any help to those who are studying the same topic. Here you find the pdf: please let me know if you spot an error! (Picture taken from t-SNE paper)