FCCAnalyses Scope
Goal of the framework is to aid the user in obtaining the desired
physics results from the
reconstructed objects
Framework requirements:
Efficiency — Make quick turn-around possible
Flexibility — Allow heavy customization
Ease of use — Should not be hard to start using
Scalable — Handling of large datasets
Explain each of the bullet points in more detail.
Set of common software packages, tools, and standards for
different Detector concepts
Common for FCC, CLIC/ILC, CEPC, EIC, …
Individual participants can mix and match their stack
Main ingredients:
Data processing framework:
Gaudi
Event data model:
EDM4hep
Detector description:
DD4hep
Software distribution:
Spack
The edges of different parts of the analysis stack are becoming
better and better defined.
Key4hep is effort of several institutions to develop common software
stack.
EDM4hep I.
Describes event data with the set of standard objects.
Specification in a single YAML file
Generated with the help of Podio
EDM4hep II.
#------------- CalorimeterHit
edm4hep::CalorimeterHit:
Description: "Calorimeter hit"
Author : "F.Gaede, DESY"
Members:
- uint64_t cellID //detector specific (geometrical) cell id.
- float energy //energy of the hit in [GeV].
- float energyError //error of the hit energy in [GeV].
- float time //time of the hit in [ns].
- edm4hep::Vector3f position //position of the hit in world coordinates in [mm].
- int32_t type //type of hit. Mapping of integer types to names via collection parameters "CalorimeterHitTypeNames" and "CalorimeterHitTypeValues".
Current version: v0.8.0
Objects can be extended / new created
Bi-weekly discussion:
Indico
Datasets
Plethora of processes are pre-generated and available from EOS
ROOT RDataFrame
Describes processing of data as actions on table columns
Defines of new columns
Filter rules
Result definitions (histogram, graph)
The actions are instantly or lazily evaluated
Multi threading is available out of the box
Optimized for bulk processing
Available libraries
The physics analysis often depends on multitude of libraries
Libraries integrated into the framework:
ROOT — together with RDataFrame
ACTS — track reconstruction tools
ONNX — neural network exchange format
FastJet — jet finding package
DD4hep — detector description
Delphes — fast simulations
Distribution
FCCAnalyses latest release v0.7.0 can be found:
As a package in the stable Key4hep stack
Allows to quickly put together small analysis
Limited options for customization
As a tarball/tag from GitHub
Latest/development version of the FCCAnalyses can be found:
As a package in the nightlies Key4hep stack
Might easily break
Latest master
By checking out master branch
Allows greater customization
Requires discipline
Hint: Keep your master in sync with upstream (use rebase or
merge)
Developments are welcome to be merged :)
master should be always buildable
Platforms: CentOS 7, AlmaLinux 9 ,
Ubuntu 22.04
Ecosystem
Analysis spread through two repositories:
FCCAnalyses
Repository of common tools and algorithms
General analysis code in analyzers
Steering of the analysis (RDataFrame)
Access to the datasample (meta)data
Running over large datasets / on batch
Experimetal machinery for case studies
FCCeePhysicsPerformance
Main place for the abstracts
Contains very specific analysis code
Or prototypes of tools of common interest to be
eventually moved to FCCAnalysis
(Proto)package repository
Analysis Architecture I.
One can write and run an analysis in several ways:
Managed mode:
The RDataFrame frame is managed by the framework
User provides Python analysis script with compulsory attributes
Libraries are loaded automatically
Dataset metadata are loaded from remote location — CVMFS/HTTP server
Batch submission on HTCondor
Customization: Possible at the level of analyzer functions
Intend for: Quick analysis, no advanced analyzer functions
Analysis Architecture II.
One can write and run an analysis in several ways:
Standalone mode:
The RDataFrame frame is managed by the user
Can leverage the FCCAnalyses library of analyzer functions
The analysis can be written as a Python script or C++ program
Loading of the libraries is handled by the user
Dataset metadata have to be handled manually
Batch submission is not provided
Customization: Creation and steering of the RDataFrame
Intended for: Advanced users
Ntupleizer style:
Intend is to create just flat trees and continue without the
framework help
Writing an analyzer
The library of analyzer functions (analyzers) have been written over
the years
Analyzers are usually structs which operate on an EDM4hep objects
Optional dependencies for analyzers can be FastJet, DD4hep, ACTS
and ONNX
ROOT RDataFrame
needs to be aware of the analyzer function
Provided as a string
Compiled in the library
Loaded and JITed by the ROOT.gInterpreter
FCCAnalyses library
Vertexing
ACTS vertex finder
Event variables
Calorimeter hit/cluster variables
Reconstructed/MC particle operations
Flavour tagging
Jet clustering/constituents
Workflow
The complete analysis in managed mode is divided into three steps
(example ):
analysis_stage1.py , ... — pre-selection stages,
analysis dependent, usually runs on batch
analysis_final.py — final selection, produces final variables
analysis_plots.py — produces plots from histograms/TTrees
or into two with the help of Histmaker
(example ):
The pre-selection stages and final stage are combined together
Plotting stage
EOS Space
Various intermediate files of common interest can be stored at:
/eos/experiment/fcc/ee/analyses_storage/...
in four subfolders:
BSM
EW_and_QCD
flavor
Higgs_and_TOP
Access and quotas:
Read access is is granted to anyone
Write access needs to be granted: Ask your convener :)
Total quota for all four directories is 200TB
ATM only part of the quota is allocated
Recent changes
Included in
v0.7.0 :
External libraries as addons:
PR#194
EOS paths accessed through xrootd:
PR#202
Case studies (proto)packaging:
PR#199
Inclusion of ONNX + Jet flavour tools:
PR#188 ,
PR#224
Inclusion of Delphes + Vertexing from Franco Bedeschi:
PR#247
New sub-commands — build, pin
2D and 3D histograms:
PR#253
Benchmarking and testing of the example analyses
Available in the master :
Improvements in dNdx, time and energy smearing:
PR#268
Statistical uncertainty and rebin options in the plotter:
PR#269
Histmaker:
PR#277
Improved crash reports:
PR#276
More track utilities:
PR#289
Code formatting for the analyzers
Modularization of the python machinery
Physics Results I.
Decay of an HNL into a muon and two jets
BSM/LLP Analysis
Private fork with the customizations applied on top
Run in managed mode
New analyzers, adjustments to the managed mode
Uses mix of official and private productions
Physics Results II.
H to invisible
Higgs Analysis
Could not find the source code
Uses officially produced samples
Physics Results III.
Tagger on Z to qq events
Example of advanced usage of the framework
Uses combination of managed mode and custom python scripts
Leverages recently included libraries: Delphes, ONNX
Uses officially produced samples
Plans
Reliable framework to aid the physics performance studies for FSR
Heads up: Podio frame I/O in Gaudi,
PR#100
Heads up: Podio collection ids to hashes,
PR#412
Heads up: RNTuple backend,
PR#359
Make the framework free from lxplus/HTcondor
EDM4hep low level access unwieldy
Overhaul "standard library" and disentangle the dependencies
Prepare facilities to handle systematics
Find distribution channels which allow as wide customization as possible
Support fullsim detector studies
Support running on the distributed systems (Dirac)
Documentation
There are several sources of documentation
Conclusions & Outlook
The combination of EDM4hep and RDataFrame works well
Low level access unwieldy
Modularization and packaging options under way
Started focusing on the full simulation detector studies
Access to the detector description through the framework
More heads are welcome!
Babyface from Toy Story,
Pixar
FCCAnalyses vs. Coffea/Coffea-casa
Provides similar set of features to FCCAnalyses
Dataframe in coffea, Orchestration in coffea-casa
User interface purely pythonic
Integrated into python package ecosystem
FCCAnalysis purpose build for FCC
Integration with SWAN and Dask
FCCAnalyses batch submissions
Updated vertexing
Vertexing done with the help of code from Franco B.
Introduces dependency on Delphes
Introduces new analyzers: SmearedTracksdNdx , SmearedTracksTOF
Simplifies Delphes–EDM4hep unit gymnastic
Adds examples for Bs to Ds K