Analysis Guide
This comprehensive guide walks you through creating a complete analysis with RDFAnalyzerCore using the config-based approach.
Table of Contents
- Overview
- Analysis Structure
- Step-by-Step Tutorial
- Config-Based Analysis Pattern
- Working with Data
- Event Selection
- Machine Learning Integration
- Corrections and Scale Factors
- Histogramming
- Systematics
- Managing Event Weights
- Analysis Regions
- Cutflow and Event Selection Monitoring
- Complete Example
- Advanced Topics
- Friend Trees
Overview
RDFAnalyzerCore uses a config-driven architecture where:
- Core behavior is controlled by text configuration files
- Plugin managers handle common tasks (BDTs, corrections, triggers)
- Analysis-specific logic is written in C++
- The framework orchestrates everything through the Analyzer class
This approach separates configuration from code, making analyses more maintainable and reproducible.
Analysis Structure
A typical analysis repository has this structure:
MyAnalysis/
├── CMakeLists.txt # Build configuration
├── analysis.cc # Main analysis code
├── cfg/ # Configuration directory
│ ├── main_config.txt # Main configuration
│ ├── output.txt # Branches to save
│ ├── bdts.txt # BDT models (optional)
│ ├── onnx_models.txt # Neural networks (optional)
│ ├── corrections.txt # Scale factors (optional)
│ └── triggers.txt # Trigger configuration (optional)
├── aux/ # Auxiliary files
│ ├── models/ # ML models
│ └── corrections/ # Correction files
└── scripts/ # Helper scripts
└── submit_jobs.py
Step-by-Step Tutorial
Step 1: Create Analysis Directory
cd analyses
mkdir MyAnalysis
cd MyAnalysis
Step 2: Create CMakeLists.txt
# Tell CMake this is an analysis
add_executable(myanalysis analysis.cc)
# Link against the core framework
target_link_libraries(myanalysis
PRIVATE
RDFCore
ROOT::ROOTDataFrame
ROOT::ROOTVecOps
)
# Set output directory
set_target_properties(myanalysis PROPERTIES
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/analyses/MyAnalysis"
)
Step 3: Create Main Configuration File
Create cfg/main_config.txt:
# ============== Input/Output ==============
fileList=/path/to/data.root
saveFile=output/myanalysis.root
metaFile=output/histograms.root
saveTree=Events
# ============== Performance ==============
threads=-1
batch=false
# ============== Output Branches ==============
saveConfig=cfg/output.txt
# ============== Plugins ==============
# Add as needed for your analysis
# bdtConfig=cfg/bdts.txt
# onnxConfig=cfg/onnx_models.txt
# correctionConfig=cfg/corrections.txt
# triggerConfig=cfg/triggers.txt
Step 4: Create Output Branch Configuration
Create cfg/output.txt listing branches to keep:
selected_muon_pt
selected_muon_eta
selected_jet_pt
selected_jet_mass
event_weight
Step 5: Write Analysis Code
Create analysis.cc:
#include <analyzer.h>
#include <ManagerFactory.h>
#include <ConfigurationManager.h>
#include <DataManager.h>
#include <iostream>
int main(int argc, char **argv) {
// Validate arguments
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <config_file>" << std::endl;
return 1;
}
// Create analyzer with configuration file
Analyzer analyzer(argv[1]);
// Define your analysis variables and cuts here
// (see next sections for examples)
// Save results
analyzer.save();
return 0;
}
Step 6: Build and Run
cd /path/to/RDFAnalyzerCore
source build.sh
cd build/analyses/MyAnalysis
./myanalysis cfg/main_config.txt
Config-Based Analysis Pattern
The config-based approach follows this pattern:
1. Configuration Setup
All behavioral configuration is in text files:
// Configuration is loaded by the Analyzer constructor
Analyzer analyzer("cfg/main_config.txt");
The Analyzer automatically:
- Loads the main configuration
- Initializes plugin managers based on config
- Sets up the data pipeline
- Configures output sinks
2. Analysis Logic in C++
Your C++ code defines the analysis-specific logic:
// Define variables
analyzer.Define("my_variable", myFunction, {"input1", "input2"});
// Apply cuts
analyzer.Filter("my_cut", cutFunction, {"variable"});
// Book histograms (if using NDHistogramManager)
// Apply ML models (if using OnnxManager/BDTManager)
3. Plugin Integration
Plugins are configured via config files but accessed in C++:
// Get plugin from analyzer
auto* onnxManager = analyzer.getPlugin<IOnnxManager>("onnx");
// Use plugin
onnxManager->applyAllModels();
4. Execution
// Trigger execution and save outputs
analyzer.save();
This separation means:
- Config changes don’t require recompilation
- Code changes don’t affect configuration
- Reproducibility is ensured by versioning configs
- Batch jobs can use different configs with same binary
Working with Data
Defining Variables
The Analyzer provides methods to define new DataFrame columns:
Simple Scalar Variable
analyzer.Define("weight", []() {
return 1.0;
}, {});
Derived Variable
analyzer.Define("jet_pt_gev",
[](float pt_mev) { return pt_mev / 1000.0; },
{"jet_pt"}
);
Variable with Multiple Inputs
analyzer.Define("delta_phi",
[](float phi1, float phi2) {
float dphi = phi1 - phi2;
while (dphi > M_PI) dphi -= 2*M_PI;
while (dphi < -M_PI) dphi += 2*M_PI;
return dphi;
},
{"lepton_phi", "jet_phi"}
);
Vector Variables
analyzer.Define("good_jets",
[](const ROOT::VecOps::RVec<float>& pt,
const ROOT::VecOps::RVec<float>& eta) {
ROOT::VecOps::RVec<int> indices;
for (size_t i = 0; i < pt.size(); i++) {
if (pt[i] > 25.0 && std::abs(eta[i]) < 2.5) {
indices.push_back(i);
}
}
return indices;
},
{"jet_pt", "jet_eta"}
);
Extracting Vector Elements
// Get leading jet pT
analyzer.Define("leading_jet_pt",
[](const ROOT::VecOps::RVec<float>& jet_pt) {
return jet_pt.size() > 0 ? jet_pt[0] : -1.0f;
},
{"jet_pt"}
);
Per-Sample Variables
Define variables that depend on the input sample:
analyzer.DefinePerSample("sample_weight",
[](unsigned int slot, const ROOT::RDF::RSampleInfo& info) {
std::string sampleName = info.AsString();
if (sampleName.find("ttbar") != std::string::npos) {
return 1.2f; // Scale factor for ttbar
}
return 1.0f;
}
);
Event Selection
Applying Filters
Filters reduce the dataset to events passing selection criteria:
// Single cut
analyzer.Filter("pt_cut",
[](float pt) { return pt > 25.0; },
{"jet_pt"}
);
// Multiple conditions
analyzer.Filter("lepton_selection",
[](int n_leptons, float leading_pt) {
return n_leptons >= 2 && leading_pt > 30.0;
},
{"n_leptons", "leading_lepton_pt"}
);
Filter with Named Cut
Named filters appear in cut flow reports:
analyzer.Filter("trigger_passed",
[](bool trigger) { return trigger; },
{"HLT_Mu24"}
);
analyzer.Filter("jet_selection",
[](int n_jets) { return n_jets >= 4; },
{"n_jets"}
);
Vector-Based Selection
// Select events with at least 2 good muons
analyzer.Define("good_muons",
[](const ROOT::VecOps::RVec<float>& pt,
const ROOT::VecOps::RVec<float>& eta,
const ROOT::VecOps::RVec<int>& id) {
return ROOT::VecOps::Where(
pt > 25.0 && abs(eta) < 2.4 && id == 1
);
},
{"muon_pt", "muon_eta", "muon_id"}
);
analyzer.Filter("two_muons",
[](const ROOT::VecOps::RVec<int>& good_muons) {
return good_muons.size() == 2;
},
{"good_muons"}
);
Machine Learning Integration
Using ONNX Models (Recommended)
ONNX models provide a standard format for ML models from any framework.
1. Configure ONNX Models
Create cfg/onnx_models.txt:
file=aux/models/classifier.onnx name=ml_score inputVariables=jet_pt,jet_eta,jet_btag runVar=has_jets
Add to main config:
onnxConfig=cfg/onnx_models.txt
2. Define Input Features
// Define all required input features
analyzer.Define("jet_pt", /*...*/);
analyzer.Define("jet_eta", /*...*/);
analyzer.Define("jet_btag", /*...*/);
analyzer.Define("has_jets",
[](int n_jets) { return n_jets > 0; },
{"n_jets"}
);
3. Apply Model
// Get ONNX manager
auto* onnxMgr = analyzer.getPlugin<IOnnxManager>("onnx");
// Apply all configured models
onnxMgr->applyAllModels();
// Now "ml_score" is available as a DataFrame column
4. Use Model Output
// Apply ML-based selection
analyzer.Filter("ml_selection",
[](float score) { return score > 0.8; },
{"ml_score"}
);
// Or use in variables
analyzer.Define("final_weight",
[](float weight, float ml_score) {
return weight * ml_score;
},
{"event_weight", "ml_score"}
);
Multi-Output Models
For models with multiple outputs (e.g., ParticleTransformer):
// Model "transformer" creates: transformer_output0, transformer_output1, ...
onnxMgr->applyModel("transformer");
// Use individual outputs
analyzer.Define("is_top_jet",
[](float score0) { return score0 > 0.7; },
{"transformer_output0"}
);
analyzer.Define("is_qcd_jet",
[](float score1) { return score1 > 0.5; },
{"transformer_output1"}
);
Using BDTs (FastForest)
Similar to ONNX but for FastForest BDT files:
Create cfg/bdts.txt:
file=aux/bdts/signal_bdt.txt name=bdt_score inputVariables=var1,var2,var3 runVar=pass_presel
auto* bdtMgr = analyzer.getPlugin<IBDTManager>("bdt");
bdtMgr->applyAllModels();
Using SOFIE Models (Maximum Performance)
SOFIE (System for Optimized Fast Inference code Emit) generates compiled C++ code from ONNX models for maximum performance.
Key Difference: SOFIE models are compiled at build time, not loaded at runtime.
1. Generate SOFIE C++ Code
import ROOT
from ROOT import TMVA
# Convert ONNX to SOFIE C++ code
model = TMVA.Experimental.SOFIE.RModelParser_ONNX("classifier.onnx")
model.Generate()
model.OutputGenerated("ClassifierModel.hxx")
2. Create Wrapper Function
#include "ClassifierModel.hxx" // SOFIE-generated
std::vector<float> classifierInference(const std::vector<float>& input) {
static TMVA_SOFIE_ClassifierModel::Session session;
return session.infer(input.data());
}
3. Register and Use Model
auto* sofieMgr = analyzer.getPlugin<ISofieManager>("sofie");
// Register model manually (not auto-loaded like ONNX/BDT)
auto inferenceFunc = std::make_shared<SofieInferenceFunction>(classifierInference);
sofieMgr->registerModel("classifier", inferenceFunc,
{"jet_pt", "jet_eta", "jet_phi"}, "has_jet");
// Apply model
sofieMgr->applyModel("classifier");
When to use SOFIE:
- Maximum speed is critical
- Model is finalized (won’t change frequently)
- Can rebuild between updates
When to use ONNX:
- Model still being developed
- Need runtime flexibility
- Want to swap models without recompiling
See: SOFIE Implementation Guide for complete details.
Creating ONNX Models for Analysis
From a trained scikit-learn model:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Convert trained model
initial_type = [('float_input', FloatTensorType([None, n_features]))]
onnx_model = convert_sklearn(clf, initial_types=initial_type)
# Save
with open("classifier.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
From PyTorch:
import torch
# Export trained model
dummy_input = torch.randn(1, n_features)
torch.onnx.export(model, dummy_input, "classifier.onnx",
input_names=['input'],
output_names=['output'])
Corrections and Scale Factors
Using CorrectionManager
CorrectionManager applies corrections using the correctionlib format.
1. Configure Corrections
Create cfg/corrections.txt:
file=aux/corrections/muon_sf.json correctionName=NUM_TightID_DEN_TrackerMuons name=muon_id_sf inputVariables=muon_pt,muon_eta
file=aux/corrections/muon_sf.json correctionName=NUM_TightRelIso_DEN_TightIDandIPCut name=muon_iso_sf inputVariables=muon_pt,muon_eta
Add to main config:
correctionConfig=cfg/corrections.txt
2. Apply Corrections
The CorrectionManager automatically applies all configured corrections, creating output columns.
// Corrections are now available as DataFrame columns
analyzer.Define("total_muon_sf",
[](float id_sf, float iso_sf) {
return id_sf * iso_sf;
},
{"muon_id_sf", "muon_iso_sf"}
);
analyzer.Define("event_weight",
[](float gen_weight, float total_sf) {
return gen_weight * total_sf;
},
{"genWeight", "total_muon_sf"}
);
Creating Correction Files
Use the correctionlib Python library:
import correctionlib
import correctionlib.schemav2 as schema
# Define a correction
corr = schema.Correction(
name="muon_id_sf",
version=1,
inputs=[
schema.Variable(name="pt", type="real"),
schema.Variable(name="eta", type="real"),
],
output=schema.Variable(name="weight", type="real"),
data=... # Your correction data
)
# Create correction set
cset = schema.CorrectionSet(
schema_version=2,
corrections=[corr]
)
# Save to file
with open("corrections.json", "w") as f:
f.write(cset.json())
Jet Energy Corrections (JES/JER) with JetEnergyScaleManager
For CMS-style Jet Energy Scale (JES) and Jet Energy Resolution (JER)
corrections, use the dedicated JetEnergyScaleManager plugin. It
integrates with CorrectionManager for correctionlib evaluations and with
PhysicsObjectCollection for clean jet-collection management.
Full reference: Jet Energy Corrections Guide
Typical CMS NanoAOD JEC workflow
#include <JetEnergyScaleManager.h>
#include <PhysicsObjectCollection.h>
auto* jes = analyzer.getPlugin<JetEnergyScaleManager>("jes");
auto* cm = analyzer.getPlugin<CorrectionManager>("corrections");
// 1. Declare jet/MET column names.
jes->setJetColumns("Jet_pt", "Jet_eta", "Jet_phi", "Jet_mass");
jes->setMETColumns("MET_pt", "MET_phi");
// 2. Build a selected jet collection (any user selection).
analyzer.Define("goodJets",
[](const RVec<float>& pt, const RVec<float>& eta,
const RVec<float>& phi, const RVec<float>& mass) {
return PhysicsObjectCollection(pt, eta, phi, mass,
(pt > 25.f) && (abs(eta) < 2.4f));
},
{"Jet_pt", "Jet_eta", "Jet_phi", "Jet_mass"}
);
// 3. Strip NanoAOD JEC → raw pT.
jes->removeExistingCorrections("Jet_rawFactor");
// 4. Apply new L1L2L3 JEC via correctionlib.
jes->applyCorrectionlib(*cm, "jec_l1l2l3", {"L3Residual"},
"Jet_pt_raw", "Jet_pt_jec");
// 5. Register and apply CMS JES systematic set (reduced: "Total" only).
jes->registerSystematicSources("reduced", {"Total"});
jes->applySystematicSet(*cm, "jes_unc", "reduced",
"Jet_pt_jec", "Jet_pt_jes");
// 6. Propagate JEC to MET (Type-1).
jes->propagateMET("MET_pt", "MET_phi",
"Jet_pt_raw", "Jet_pt_jec",
"MET_pt_jec", "MET_phi_jec");
// 7. Produce corrected PhysicsObjectCollection columns.
jes->setInputJetCollection("goodJets");
jes->defineCollectionOutput("Jet_pt_jec", "goodJets_jec");
jes->defineVariationCollections("goodJets_jec", "goodJets",
"goodJets_variations");
// After execute():
// "goodJets_jec" PhysicsObjectCollection (nominal JEC)
// "goodJets_TotalUp" PhysicsObjectCollection (JES up)
// "goodJets_TotalDown" PhysicsObjectCollection (JES down)
// "goodJets_variations" PhysicsObjectVariationMap (all of the above)
Applying the full CMS JES source set
jes->registerSystematicSources("full", {
"AbsoluteCal", "AbsoluteScale", "AbsoluteMPFBias",
"FlavorQCD", "Fragmentation", "PileUpDataMC",
"PileUpPtRef", "RelativeFSR", "RelativeJEREC1",
"RelativeJEREC2", "RelativeJERHF",
"RelativePtBB", "RelativePtEC1", "RelativePtEC2",
"RelativePtHF", "RelativeBal", "RelativeSample"
});
jes->applySystematicSet(*cm, "jes_unc", "full", "Jet_pt_jec", "Jet_pt_jes");
// Creates 34 variation columns (17 sources × up/down)
Histogramming
Using NDHistogramManager
NDHistogramManager provides sophisticated N-dimensional histogramming with support for systematics and regions.
Basic Histogram
#include <plots.h> // For histInfo, selectionInfo
// Get histogram manager
auto* histMgr = analyzer.getPlugin<INDHistogramManager>("histogram");
// Define histogram metadata
std::vector<histInfo> hists = {
histInfo("h_jet_pt", "jet_pt", "Jet p_{T}", "event_weight", 50, 0.0, 500.0),
histInfo("h_jet_eta", "jet_eta", "Jet #eta", "event_weight", 50, -2.5, 2.5),
};
// Define selection axes (regions, channels, etc.)
std::vector<selectionInfo> selections = {
selectionInfo("channel", 2, 0, 2), // 2 channels
selectionInfo("region", 3, 0, 3), // 3 regions
};
// Names for selection bins
std::vector<std::vector<std::string>> regionNames = {
{"ee", "mumu"}, // Channel names
{"SR", "CR_top", "CR_wjets"} // Region names
};
// Book histograms
histMgr->bookND(hists, selections, "", regionNames);
Defining Selection Variables
// Define channel variable (0 = ee, 1 = mumu)
analyzer.Define("channel",
[](int leading_pdg, int subleading_pdg) {
if (abs(leading_pdg) == 11 && abs(subleading_pdg) == 11) return 0;
if (abs(leading_pdg) == 13 && abs(subleading_pdg) == 13) return 1;
return -1;
},
{"leading_lepton_pdg", "subleading_lepton_pdg"}
);
// Define region variable (0 = SR, 1 = CR_top, 2 = CR_wjets)
analyzer.Define("region",
[](float met, int n_bjets, int n_jets) {
if (met > 50 && n_bjets >= 1) return 0; // SR
if (met < 50 && n_bjets >= 2) return 1; // CR_top
if (met < 50 && n_bjets == 0 && n_jets >= 2) return 2; // CR_wjets
return -1;
},
{"met", "n_bjets", "n_jets"}
);
Saving Histograms
// Trigger execution by accessing histogram pointers
auto& histos = histMgr->GetHistos();
for (auto& h : histos) {
h.GetPtr(); // Force computation
}
// Save histograms to file
std::vector<std::vector<histInfo>> fullHistList = {hists};
histMgr->saveHists(fullHistList, regionNames);
Vector-Based Histograms
Fill one histogram entry per vector element:
// This fills one entry per jet
histInfo("h_all_jets_pt", "jet_pt", "All Jets p_{T}", "event_weight", 50, 0, 500)
Make sure jet_pt is a vector column. The histogram manager automatically expands scalars to match vector sizes.
Systematics
SystematicManager Integration
SystematicManager tracks systematic variations and propagates them through the analysis.
Defining Variables with Systematics
// Get systematic manager
auto* sysMgr = analyzer.getSystematicManager();
// Define variable with systematic variations
dataManager->Define("jet_pt_corrected",
[](const ROOT::VecOps::RVec<float>& pt,
const ROOT::VecOps::RVec<float>& jes_up,
const ROOT::VecOps::RVec<float>& jes_down,
const std::string& systematic) {
if (systematic == "jes_up") return pt * jes_up;
if (systematic == "jes_down") return pt * jes_down;
return pt; // Nominal
},
{"jet_pt", "jet_jes_up", "jet_jes_down"},
*sysMgr // Pass systematic manager
);
Registering Systematics
// Register systematic variations
sysMgr->registerSystematic("jes_up");
sysMgr->registerSystematic("jes_down");
sysMgr->registerSystematic("jer_up");
sysMgr->registerSystematic("jer_down");
Histograms with Systematics
When booking histograms with NDHistogramManager, an automatic systematic axis is added:
// Histograms are automatically filled for all systematic variations
// Access via: hist[channel][region][systematic]
Managing Event Weights
WeightManager is a plugin that assembles the nominal event weight and any systematic weight variations from individually registered components. It replaces ad-hoc per-event weight expressions scattered across the analysis with a single, audited weight definition.
Base Weight: Generator Weight and Weight Sum Tracking
In Monte Carlo analyses the first thing to include is the per-event generator
weight (e.g. genWeight in NanoAOD). This must be registered both in the
config file and in your analysis code:
-
Config file — tell
CounterServicewhich branch holds the generator weight so that it can accumulate the sum-of-weights for normalisation:counterWeightBranch = genWeightCounterService then writes two histograms to the meta ROOT file:
counter_weightSum_<sample>— sum ofgenWeightover all events (used assumWwhen computing the luminosity normalisation)counter_weightSignSum_<sample>— sum ofsign(genWeight), which is used byStitchingDerivationTaskto compute per-bin stitching scale factors
-
Analysis code — add
genWeightas the first scale factor so that the final weight column includes it:wm->addScaleFactor("genWeight", "genWeight");
For samples with negative-weight events (common in NLO generators) use
sign(genWeight) instead of genWeight itself to avoid sign-weighted distortions
while still properly normalising the sample. In that case set
counterWeightBranch = genWeight in the config (so CounterService tracks the
true sum) and add the sign column as the scale factor:
// Define sign-of-weight column
analyzer.Define("genWeightSign", [](float w){ return w >= 0.f ? 1.f : -1.f; }, {"genWeight"});
wm->addScaleFactor("genWeight", "genWeightSign");
Interaction with Stitching Weights
For inclusive+exclusive sample sets (e.g. DY + DY high-pT), per-event stitching
scale factors are derived by StitchingDerivationTask from the
counter_intWeightSignSum_* histograms written by CounterService. To enable
this:
-
In the config file set
counterIntWeightBranchto the integer stitching-bin column (produced by your analysis or CorrectionManager):counterIntWeightBranch = stitchBin -
After stitching weights are derived you load the resulting correctionlib JSON and apply it as a per-event scale factor:
// Apply the derived stitching weight wm->addScaleFactor("stitchSF", "stitch_weight_col");
See LAW_TASKS.md — StitchingDerivationTask for details on deriving these weights from LAW.
Registering WeightManager
Add a WeightManager plugin to your analysis and retrieve it via
getPlugin<WeightManager>:
#include <plugins/WeightManager/WeightManager.h>
auto* wm = analyzer.getPlugin<WeightManager>("weights");
Adding Weight Components
Register the generator weight first, then all additional per-event scale
factor columns and scalar normalization factors before calling
defineNominalWeight():
// Base generator weight — must come first.
wm->addScaleFactor("genWeight", "genWeight");
// Additional per-event scale factor columns (must already exist on the dataframe).
wm->addScaleFactor("leptonSF", "lepton_sf_col");
wm->addScaleFactor("pileupSF", "pu_weight_col");
// Scalar normalization applied uniformly to every event (lumi × xsec / sumW).
// sumW should come from the counterWeightSum histogram in a previous skim.
double normFactor = lumi * xsec / sumW;
wm->addNormalization("lumi_xsec", normFactor);
Registering Weight Variations
For each systematic that changes a scale factor, register the up/down column pair:
wm->addWeightVariation("bTagSF", "bTagSF_up_col", "bTagSF_dn_col");
wm->addWeightVariation("pileupSF", "pu_weight_up", "pu_weight_dn");
Defining Weight Columns
Schedule the weight columns to be defined on the dataframe. The columns
are created just before the event loop starts (execute()):
// Nominal weight: product of all SFs and normalizations.
wm->defineNominalWeight("weight_nominal");
// Varied weights: replace the named SF component with its up/down variant.
wm->defineVariedWeight("bTagSF", "up", "weight_bTagSF_up");
wm->defineVariedWeight("bTagSF", "down", "weight_bTagSF_dn");
wm->defineVariedWeight("pileupSF","up", "weight_pileupSF_up");
wm->defineVariedWeight("pileupSF","down", "weight_pileupSF_dn");
Retrieve the defined column names for use when booking histograms:
std::string nomCol = wm->getNominalWeightColumn(); // "weight_nominal"
std::string upCol = wm->getWeightColumn("bTagSF", "up"); // "weight_bTagSF_up"
Audit Information
After analyzer.run() completes, WeightManager writes per-component
statistics (sum, mean, min, max, negative-event count) to the meta ROOT
output file and to the analysis logger. You can also inspect the entries
programmatically:
for (const auto& entry : wm->getAuditEntries()) {
std::cout << entry.name
<< " sum=" << entry.sumWeights
<< " mean=" << entry.meanWeight
<< " negEvents=" << entry.negativeCount << "\n";
}
Complete WeightManager Example
#include <analyzer.h>
#include <plugins/WeightManager/WeightManager.h>
void myAnalysis() {
Analyzer analyzer("config.txt");
auto* wm = analyzer.getPlugin<WeightManager>("weights");
// Define SF columns on the dataframe first.
analyzer.Define("lepton_sf_col", computeLeptonSF, {"lep_pt", "lep_eta"});
analyzer.Define("bTagSF_col", computeBTagSF, {"jet_pt", "jet_eta"});
analyzer.Define("bTagSF_up_col", computeBTagSFUp, {"jet_pt", "jet_eta"});
analyzer.Define("bTagSF_dn_col", computeBTagSFDn, {"jet_pt", "jet_eta"});
// Register components.
wm->addScaleFactor("leptonSF", "lepton_sf_col");
wm->addScaleFactor("bTagSF", "bTagSF_col");
wm->addNormalization("lumi_xsec", 59.7 * 831.76 / 1e8);
// Register systematic variations.
wm->addWeightVariation("bTagSF", "bTagSF_up_col", "bTagSF_dn_col");
// Schedule column definitions.
wm->defineNominalWeight("weight_nominal");
wm->defineVariedWeight("bTagSF", "up", "weight_bTagSF_up");
wm->defineVariedWeight("bTagSF", "down", "weight_bTagSF_dn");
// Use the nominal weight column when booking histograms.
std::string nomCol = wm->getNominalWeightColumn();
analyzer.run();
}
Analysis Regions
RegionManager declares named analysis regions with an optional hierarchy, providing filtered RDataFrame nodes that other plugins (histogram and cutflow managers) can consume. The main analysis dataframe is not modified; filtered branches are built independently inside the plugin.
Registering RegionManager
#include <plugins/RegionManager/RegionManager.h>
auto* rm = analyzer.getPlugin<RegionManager>("regions");
Defining Boolean Filter Columns
Define all boolean filter columns on the dataframe before calling
declareRegion() for the first time, because the base node is captured at
that point:
analyzer.Define("isPresel", passPreselection, {"lep_pt", "met"});
analyzer.Define("isSignalRegion", passSignalSelection, {"mva_score"});
analyzer.Define("isControlRegion",passControlSelection, {"mva_score"});
analyzer.Define("isTightSignal", passTightSignal, {"mva_score", "bdt_score"});
Declaring Regions
Declare root regions (no parent) first, then child regions:
// Root region — applied to the base dataframe.
rm->declareRegion("presel", "isPresel");
// Child regions — filtered on top of their parent.
rm->declareRegion("signal", "isSignalRegion", "presel");
rm->declareRegion("control", "isControlRegion", "presel");
// Grandchild region.
rm->declareRegion("tight_signal", "isTightSignal", "signal");
Retrieving Region DataFrames
ROOT::RDF::RNode signalDf = rm->getRegionDataFrame("signal");
ROOT::RDF::RNode controlDf = rm->getRegionDataFrame("control");
Validating the Hierarchy
validate() checks for cycles, missing parent references, and duplicate
names and returns a list of error strings (empty means valid):
auto errors = rm->validate();
if (!errors.empty()) {
for (const auto& e : errors) std::cerr << e << "\n";
}
At analysis startup initialize() runs the same check and throws
std::runtime_error on failure. finalize() writes a region-summary
entry to the meta ROOT output file.
Using Regions with NDHistogramManager
ndh->bindToRegionManager(rm);
// NDHistogramManager now fills histograms separately for each declared region.
Using Regions with CutflowManager
cfm->bindToRegionManager(rm);
// CutflowManager books per-region count actions in a single event-loop pass.
Complete RegionManager Example
The typical pattern is to declare regions and then let the bound plugins (NDHistogramManager, CutflowManager) handle the per-region filling automatically. You do not need to retrieve raw per-region DataFrames; plugins that are bound to the RegionManager iterate over all regions internally.
#include <analyzer.h>
#include <plugins/RegionManager/RegionManager.h>
#include <plugins/NDHistogramManager/NDHistogramManager.h>
#include <plugins/CutflowManager/CutflowManager.h>
int main(int argc, char* argv[]) {
Analyzer analyzer(argv[1]);
auto* rm = analyzer.getPlugin<RegionManager>("regions");
auto* ndh = analyzer.getPlugin<NDHistogramManager>("NDHistogramManager");
auto* cfm = analyzer.getPlugin<CutflowManager>("cutflow");
// Step 1 — define boolean selection columns.
analyzer.Define("isPresel", passPreselection, {"lep_pt", "met"});
analyzer.Define("isSignalRegion", passSignalSelection, {"mva_score"});
analyzer.Define("isControlRegion", passControlSelection,{"mva_score"});
// Step 2 — declare regions.
rm->declareRegion("presel", "isPresel");
rm->declareRegion("signal", "isSignalRegion", "presel");
rm->declareRegion("control", "isControlRegion", "presel");
// Step 3 — bind plugins so they operate per-region automatically.
ndh->bindToRegionManager(rm); // each histogram gains a region axis
cfm->bindToRegionManager(rm); // cutflow tables produced per region
// Step 4 — register cuts (applies to the full dataframe; cfm tracks
// per-region counts separately via the bound RegionManager).
cfm->addCut("presel", "isPresel");
cfm->addCut("signal", "isSignalRegion");
// Step 5 — book config-driven histograms (fills all declared regions).
ndh->bookConfigHistograms();
ndh->saveHists();
// Step 6 — execute.
analyzer.save();
return 0;
}
Cutflow and Event Selection Monitoring
CutflowManager computes a sequential cutflow (events remaining after each successive cut) and an N-1 table (events passing all cuts except one) in a single event-loop pass.
Registering CutflowManager
#include <plugins/CutflowManager/CutflowManager.h>
auto* cfm = analyzer.getPlugin<CutflowManager>("cutflow");
Adding Cuts
Define all boolean cut columns on the dataframe before the first
addCut() call, since the base node (needed for N-1) is captured then.
addCut() also applies each filter to the main analysis dataframe:
// Define boolean columns first.
analyzer.Define("leptonPtCut", passLeptonPt, {"lep_pt"});
analyzer.Define("metCut", passMET, {"met"});
analyzer.Define("bTagCut", passBTag, {"jet_btag"});
// Register cuts in selection order.
cfm->addCut("lepton_pT", "leptonPtCut");
cfm->addCut("MET", "metCut");
cfm->addCut("b-tagging", "bTagCut");
All addCut() calls must be made before analyzer.run() (i.e. before
execute() is triggered by the framework).
Binding to a RegionManager (Optional)
When a RegionManager is bound, execute() books per-region count actions
alongside the global ones, so global and per-region cutflows are computed in
a single pass:
cfm->bindToRegionManager(rm);
Output Written Automatically
After analyzer.run():
cutflow— TH1D with sequential event counts after each cut.cutflow_nminus1— TH1D with N-1 counts.cutflow_regions— TH2D of regions × cuts (only when bound to a RegionManager).
All three histograms are written to the meta ROOT output file and a summary table is printed via the analysis logger.
Accessing Results Programmatically
// After analyzer.run().
// Global cutflow.
for (const auto& [name, count] : cfm->getCutflowCounts()) {
std::cout << name << ": " << count << " events\n";
}
// N-1 table.
for (const auto& [name, count] : cfm->getNMinusOneCounts()) {
std::cout << name << " (N-1): " << count << " events\n";
}
// Per-region cutflow (requires bindToRegionManager()).
for (const auto& [name, count] : cfm->getRegionCutflowCounts("signal")) {
std::cout << "signal / " << name << ": " << count << " events\n";
}
Complete CutflowManager Example
#include <analyzer.h>
#include <plugins/RegionManager/RegionManager.h>
#include <plugins/CutflowManager/CutflowManager.h>
void myAnalysis() {
Analyzer analyzer("config.txt");
auto* rm = analyzer.getPlugin<RegionManager>("regions");
auto* cfm = analyzer.getPlugin<CutflowManager>("cutflow");
// Define all boolean columns before the first addCut().
analyzer.Define("leptonPtCut", passLeptonPt, {"lep_pt"});
analyzer.Define("metCut", passMET, {"met"});
analyzer.Define("bTagCut", passBTag, {"jet_btag"});
// Register cuts (also applies filters to the main dataframe).
cfm->addCut("lepton_pT", "leptonPtCut");
cfm->addCut("MET", "metCut");
cfm->addCut("b-tagging", "bTagCut");
// Declare regions after all cuts are registered.
analyzer.Define("isSignalRegion", passSignal, {"bdt_score"});
rm->declareRegion("signal", "isSignalRegion");
// Bind for per-region cutflows.
cfm->bindToRegionManager(rm);
analyzer.run();
// Results are written to the meta ROOT file automatically.
}
Complete Example
Here’s a complete analysis putting it all together:
#include <analyzer.h>
#include <ManagerFactory.h>
#include <plots.h>
#include <iostream>
#include <cmath>
// ========== Helper Functions ==========
ROOT::VecOps::RVec<int> selectGoodJets(
const ROOT::VecOps::RVec<float>& pt,
const ROOT::VecOps::RVec<float>& eta,
const ROOT::VecOps::RVec<float>& btag) {
ROOT::VecOps::RVec<int> indices;
for (size_t i = 0; i < pt.size(); i++) {
if (pt[i] > 25.0 && std::abs(eta[i]) < 2.5 && btag[i] > 0.5) {
indices.push_back(i);
}
}
return indices;
}
float calculateMT(float pt, float phi, float met, float met_phi) {
float dphi = phi - met_phi;
while (dphi > M_PI) dphi -= 2*M_PI;
while (dphi < -M_PI) dphi += 2*M_PI;
return std::sqrt(2 * pt * met * (1 - std::cos(dphi)));
}
// ========== Main Analysis ==========
int main(int argc, char **argv) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <config_file>" << std::endl;
return 1;
}
// Initialize analyzer
Analyzer analyzer(argv[1]);
// ===== Define Variables =====
// Good jets selection
analyzer.Define("good_jet_indices", selectGoodJets,
{"jet_pt", "jet_eta", "jet_btag"});
analyzer.Define("n_good_jets",
[](const ROOT::VecOps::RVec<int>& indices) {
return static_cast<int>(indices.size());
},
{"good_jet_indices"});
// Leading jet properties
analyzer.Define("leading_jet_pt",
[](const ROOT::VecOps::RVec<float>& pt,
const ROOT::VecOps::RVec<int>& indices) {
return indices.size() > 0 ? pt[indices[0]] : -1.0f;
},
{"jet_pt", "good_jet_indices"});
// Transverse mass
analyzer.Define("mt_w", calculateMT,
{"lepton_pt", "lepton_phi", "met", "met_phi"});
// ===== Event Selection =====
analyzer.Filter("trigger",
[](bool trig) { return trig; },
{"HLT_IsoMu24"});
analyzer.Filter("one_lepton",
[](int n_lep) { return n_lep == 1; },
{"n_leptons"});
analyzer.Filter("lepton_pt_cut",
[](float pt) { return pt > 30.0; },
{"lepton_pt"});
analyzer.Filter("at_least_4_jets",
[](int n_jets) { return n_jets >= 4; },
{"n_good_jets"});
// ===== Apply ML Models =====
auto* onnxMgr = analyzer.getPlugin<IOnnxManager>("onnx");
if (onnxMgr) {
onnxMgr->applyAllModels();
// Use ML score
analyzer.Define("pass_ml_cut",
[](float score) { return score > 0.7; },
{"ml_score"});
analyzer.Filter("ml_selection",
[](bool pass) { return pass; },
{"pass_ml_cut"});
}
// ===== Apply Corrections =====
analyzer.Define("event_weight",
[](float gen_weight, float lep_sf, float btag_sf) {
return gen_weight * lep_sf * btag_sf;
},
{"genWeight", "lepton_sf", "btag_sf"});
// ===== Define Regions =====
analyzer.Define("region",
[](float mt, int n_bjets) {
if (mt > 50 && n_bjets >= 2) return 0; // Signal region
if (mt < 50 && n_bjets >= 2) return 1; // Control region
return -1;
},
{"mt_w", "n_bjets"});
// ===== Book Histograms =====
auto* histMgr = analyzer.getPlugin<INDHistogramManager>("histogram");
if (histMgr) {
std::vector<histInfo> hists = {
histInfo("h_njets", "n_good_jets", "N_{jets}", "event_weight", 10, 0, 10),
histInfo("h_leading_jet_pt", "leading_jet_pt", "Leading Jet p_{T}",
"event_weight", 50, 0, 500),
histInfo("h_mt", "mt_w", "M_{T}(W)", "event_weight", 40, 0, 200),
};
std::vector<selectionInfo> selections = {
selectionInfo("region", 2, 0, 2),
};
std::vector<std::vector<std::string>> regionNames = {
{"SR", "CR"},
};
histMgr->bookND(hists, selections, "", regionNames);
// Trigger computation
auto& histos = histMgr->GetHistos();
for (auto& h : histos) {
h.GetPtr();
}
// Save
std::vector<std::vector<histInfo>> fullHistList = {hists};
histMgr->saveHists(fullHistList, regionNames);
}
// ===== Save Output =====
analyzer.save();
std::cout << "Analysis complete!" << std::endl;
return 0;
}
Advanced Topics
Conditional ML Evaluation
Save computation by only running ML models when needed:
// Define condition
analyzer.Define("run_expensive_model",
[](int n_jets, float met) {
return n_jets >= 4 && met > 50.0;
},
{"n_good_jets", "met"});
In config:
file=model.onnx name=ml_score inputVariables=features runVar=run_expensive_model
Chaining Analyses
Use output of one analysis as input to another:
// First analysis: skim and save
analyzer1.save(); // Creates skim_output.root
// Second analysis: load skim
// In config: fileList=skim_output.root
Analyzer analyzer2("second_analysis_config.txt");
analyzer2.Define(...); // Further processing
analyzer2.save();
Per-Sample Processing
Handle different samples with different logic:
analyzer.DefinePerSample("cross_section",
[](unsigned int slot, const ROOT::RDF::RSampleInfo& info) {
std::string sample = info.AsString();
if (sample.find("ttbar") != std::string::npos) return 831.76;
if (sample.find("wjets") != std::string::npos) return 61526.7;
return 1.0;
});
analyzer.Define("event_weight",
[](float gen_weight, float xsec, float n_events) {
return gen_weight * xsec / n_events;
},
{"genWeight", "cross_section", "total_events"});
Snapshot Intermediate Results
Save intermediate processing steps:
// After preselection
analyzer.Filter("preselection", ...);
// Save snapshot
auto df = analyzer.getDataFrame();
df.Snapshot("Events", "preselection_output.root", {"useful_branches"});
// Continue processing
analyzer.Define(...);
Using Helper Functions
Create reusable helper functions:
// In a header file
namespace MyAnalysis {
ROOT::VecOps::RVec<int> selectObjects(...) { /*...*/ }
float computeVariable(...) { /*...*/ }
bool passSelection(...) { /*...*/ }
}
// In analysis
analyzer.Define("selected", MyAnalysis::selectObjects, {...});
Friend Trees
Friend trees let you attach extra branches stored in separate ROOT files to the main event tree without merging the files. The framework adds friend TChains to the main chain before the RDataFrame is created, so all friend branches are immediately accessible by name.
Configuration via submit_config.txt
Add one friendTreeConfig line per friend tree:
friendTreeConfig: calib,Events,/data/calib_files/*.root
friendTreeConfig: pileup,Events,/data/pileup/pu_weights.root
The format is alias,treeName,fileGlob. The alias is used to access friend
branches (e.g. calib.pt).
Configuration via dataset_manifest.yaml
Friend trees can also be specified per dataset under the friend_trees key.
This is useful when different samples require different sidecar files:
datasets:
- name: ttbar_2022
process: ttbar
files:
- /data/ttbar/ttbar_2022.root
friend_trees:
- alias: calib
tree_name: Events
files:
- /data/calib/calib_2022.root
- root://eosserver.cern.ch//eos/calib/calib_remote.root
index_branches: [run, luminosityBlock]
- alias: bTagWeights
tree_name: Events
directory: /data/btag_weights/2022/
globs: [".root"]
antiglobs: ["FAIL"]
FriendTreeConfig Fields
| Field | Type | Description |
|---|---|---|
alias |
string | Name used to access friend branches (e.g. alias.branch). |
tree_name |
string | TTree name inside the friend file(s). Default: "Events". |
files |
list | Explicit ROOT file paths or XRootD URLs. Takes precedence over directory. |
directory |
string | Directory scanned for ROOT files (alternative to files). |
globs |
list | Filename substrings to include when scanning directory. Default: [".root"]. |
antiglobs |
list | Filename substrings to exclude when scanning directory. Default: []. |
index_branches |
list | Branch names for index-based event matching (see below). |
Index-Based Event Matching
By default ROOT matches friend tree entries by sequential position. If file
ordering differs between the main tree and the friend tree, set
index_branches to enable TChain::BuildIndex-based matching:
index_branches: [run, luminosityBlock] # match on run + lumi block
# or
index_branches: [run, event] # match on run + event number
The first two entries are used as the major and minor index keys respectively.
Leave index_branches empty for position-based (sequential) matching.
Best Practices
- Configuration Management
- Version control all config files with Git
- Use descriptive names for configuration files
- Document non-obvious configuration choices
- Code Organization
- Put helper functions in separate headers
- Use namespaces to avoid name collisions
- Add comments explaining physics reasoning
- Performance
- Apply filters early to reduce data volume
- Use conditional ML evaluation
- Leverage ROOT’s multithreading (
threads=-1)
- Debugging
- Start with small input files during development
- Use
batch=falseto see progress bars - Print intermediate results with
Display()orReport()
- Reproducibility
- Pin ML model versions
- Document correction file sources
- Track configuration changes in Git
- Testing
- Test with different input samples
- Verify histogram contents make physical sense
- Check cutflow reports
Troubleshooting
Model Not Applied
Problem: ONNX/BDT model configured but output column doesn’t exist.
Solution: Explicitly call applyModel() or applyAllModels() after defining inputs.
Missing Input Features
Problem: Model application fails with “column not found”.
Solution: Ensure all inputVariables are defined before applying model.
Histogram Empty
Problem: Histograms are booked but contain no entries.
Solution: Check that:
- Filters aren’t rejecting all events
- Weight column exists and isn’t zero
- Region/channel variables are defined correctly
- You called
GetPtr()on histograms to trigger computation
Performance Issues
Problem: Analysis runs slowly.
Solution:
- Enable multithreading (
threads=-1) - Apply filters early
- Use
batch=truefor non-interactive jobs - Profile with ROOT’s profiling tools
Next Steps
- Understand internals: See ARCHITECTURE.md
- Add custom features: See PLUGIN_DEVELOPMENT.md
- Deploy to batch: Use HTCondor submission scripts in
core/python/ - Configuration reference: See CONFIG_REFERENCE.md
Happy analyzing!