yakov_a_jerkov | (Reply)

From:

angerona.livejournal.com

ok, я почитала. Я правда засыпаю, и даже в несонном состоянии не очень много во всем этом разбираюсь, но тут как раз, кажется, все не так запутанно. Ну или это я во сне так думаю :).

Here's how I see it: for each image you have a list of values (let's call them "weights" because that's how I think of them) in the penultimate layer of the network (so not the layer that says which class the image falls into, but the one before that).

For each class j, you take all the images that were correctly characterized to be in that class, and you take all of their activation vectors and you call that collection of vectors "S".

Then you compute a mean for each "weight" in S. So for the 1st weight, you take all of the values this weight takes on for each of the correctly classified image and you compute a mean, and then so on and so on. That's your mean "Activation Vector."

And now you compute p_j, which is essentially a value -- a probability based on a Weibull distribution. A quick googling tells me that this Weibull distribution is based on 3 parameters: scale parameter, shape parameter and location parameter. I'm having a hard time here, because I don't understand the weibull distribution (but I'm sure you will), but basically tau_i is a list of "largest distances" for all weights. So for each weight it's an absolute value of a difference between the "furthest outlying" value and the mean value. That's what S_hat means in this case, I think: the most "extreme" value of all available for that weight, and we are finding the difference between that and the mean.

Does this help any?

Again, I have no idea whether I'm even in the right ballpark :).