Metrics¶

Created on Wed Jul 13 14:21:16 2016

@author: Jonas Eschle “Mayou36”

raredecay.tools.metrics.precision_measure(n_signal, n_background)[source]¶

Return the precision measure = \(\frac {n_{signal}} {\sqrt{n_{signal} + n_{background}}}\).

Parameters:	n_signal (int or numpy.array) – Number of signals observed (= tpr; true positiv rate) n_background (int or numpy.array) – Number of background observed as signal (= fpr; false positiv rate) n_sigma (int or float) – The number of sigmas

raredecay.tools.metrics.punzi_fom(n_signal, n_background, n_sigma=5)[source]¶

Return the Punzi Figure of Merit = \(\frac{S}{\sqrt(B) + n_{\sigma}/2}\).

The Punzi FoM is mostly used for the detection of rare decays to prevent the metric of cutting off all the background and leaving us with only a very few signals.

Parameters:	n_signal (int or numpy.array) – Number of signals observed (= tpr; true positiv rate) n_background (int or numpy.array) – Number of background observed as signal (= fpr; false positiv rate) n_sigma (int or float) – The number of sigmas

raredecay.tools.metrics.train_similar(mc_data, real_data, features=None, n_checks=10, n_folds=10, clf='xgb', test_max=True, test_shuffle=True, test_mc=False, old_mc_weights=1, test_predictions=False, clf_pred='rdf')[source]¶

Score for reweighting. Train clf on mc reweighted/real, test on real; minimize score.

Enter two datasets and evaluate the score described below. Return a dictionary containing the different scores. The test_predictions is another scoring, which is built upon the train_similar method.

Scoring method description

Idea: A clf is trained on the reweighted mc as well as on the real data of a certain decay. Therefore, the classifier learns to distinguish between Monte-Carlo data and real data. Then we let the classifier predict some real data (an unbiased test set) and see, how many he is able to classify as real events. The lower the score, the less differences he was able to learn from the train data therefore the more similar the train data therefore the better the reweighting.

Advandages: It is quite difficult to cheat on this method. Most of all it is robust to single high-weight events (which mcreweighted_as_real is not) and, in general, seems to be the best scoring so far.

Disadvantages: If you insert a gaussian shaped 1.0 as mc and a gaussian shaped 1.1 as real, the score will be badly (around 0.33). So far, this was only observed for “artificial” distributions (even dough, of course, we do not know if it affects real distributions aswell partly)

Output explanation

The return is a dictionary containing several values. Of course, only the values, which are set to be evaluated, are contained. The keys are:

‘score‘ : The average of all train_similar scores (as we use KFolding, there will be n_folds scores). The score.
‘score_std‘ : The std of a single score, just for curiosity
‘score_max‘ : The (average of all) “maximum” score. Actually the train_similar score but with mc instead of reweighted mc. Should be higher then the reweighted score.
‘score_max_std‘ : The std of a single score, just for curiosity
‘score_pred‘ : The score of the test_predictions method.
‘score_mc_pred‘ : The score of the test_predictions method but on the predictions of the mc instead of the reweighted mc.

Parameters:	mc_data (HEPDataStorage) – The reweighted Monte-Carlo data, assuming the new weights are applied already. real_data (HEPDataStorage) – The real data n_checks (int >= 1) – Number of checks to perform. Has to be <= n_folds n_folds (int > 1) – Number of folds the data will be split into clf (str) – The name of a classifier to be used in `classify()`. test_max (boolean) – If true, test for the “maximum value” by training also on mc/real (instead of reweighted mc/real) and test on real. The score for only mc should be higher than for reweighted mc/real. It should most probably but does not have to be! old_mc_weights (array-like or 1) – If test_max is True, the weights for mc before reweighting will be taken to be old_mc_weights, the weights the mc distribution had before the reweighting. The default is 1. test_predictions (boolean) – If true, try to distinguish the predictions. Advanced feature and not yet really discoverd how to interpret. Gives very high ROC somehow. clf_pred (str) – The classifier to be used to distinguish the predictions. Required for the test_predictions.
Returns:	out – A dictionary conaining the different scores. Description see above.
Return type:	dict

raredecay.tools.metrics.similar_dist(predictions, weights=None, true_y=1, threshold=0.5)[source]¶

Metric to evaluate the predictions on one label only for similarity test.

This metric is used inside the mayou_score

Parameters:	predictions (`array`) – The predicitons weights (array-like) – The weights for the predictions true_y ({0 , 1}) – The “true” label of the data threshold (float) – The threshold for the predictions to decide whether a point belongs to 0 or 1.

raredecay.tools.metrics.mayou_score(mc_data, real_data, features=None, old_mc_weights=1, clf='xgb', splits=2, n_folds=10)[source]¶: An experimental score using a “loss” function for data-similarity