Reminder: use of EDM4hep in FCC analyses

Juraj Smieško (CERN)

FCC Software Meeting

CERN, 26 Feb 2024

Key4hep

  • Set of common software packages, tools, and standards for different Detector concepts
  • Common for FCC, CLIC/ILC, CEPC, EIC, …
  • Individual participants can mix and match their stack
  • Main ingredients:
    • Data processing framework: Gaudi
    • Event data model: EDM4hep
    • Detector description: DD4hep
    • Software distribution: Spack
HEP Stack
Key4hep design
Source: Frank Gaede

EDM4hep I.

Describes event data with the set of standard objects.

  • Specification in a single YAML file
  • Generated with the help of Podio
EDM4hep diagram

EDM4hep II.

Example object:


#-------------  CalorimeterHit
edm4hep::CalorimeterHit:
  Description: "Calorimeter hit"
  Author: "EDM4hep authors"
  Members:
    - uint64_t cellID  // detector specific (geometrical) cell id
    - float energy [GeV]              // energy of the hit
    - float energyError [GeV]         // error of the hit energy
    - float time [ns]                // time of the hit
    - edm4hep::Vector3f position [mm] // position of the hit in world coordinates
    - int32_t type                   // type of hit
            
  • Current version: v0.10.5
  • Objects can be extended / new created
  • Bi-weekly discussion: Indico

EDM4hep 1.0

The EDM4hep will reach version 1.0 soon, breaking changes and fixes are introduced.

Some of the changes/fixes underway:


edm4hep::TrackerHit:
  Description: "Tracker hit interface class"
  Author: "Thomas Madlener, DESY"
  Members:
    - uint64_t cellID // ID of the sensor that created this hit
    - int32_t type // type of the raw data hit
    - int32_t quality // quality bit flag of the hit
    - float time [ns] // time of the hit
    - float eDep [GeV] // energy deposited on the hit
    - float eDepError [GeV] // error measured on eDep
    - edm4hep::Vector3d position [mm] // hit position
  Types:
    - edm4hep::TrackerHit3D
    - edm4hep::TrackerHitPlane
            

New release of FCCAnalyses 0.9 — preserves state before EDM4hep 1.0 changes

  • Will arrive in stable Key4hep stack soon

Podio

Generates Event Data Model and serves as I/O Layer

  • Generates EDM from YAML files
  • Employs plain-old-data (POD) data structures
  • I/O machinery consists of three layers
    • POD Layer - actual data structures
    • Object Layer - helps resolve the relations
    • User Layer - full fledged EDM objects
  • Supports multiple backends:
    • ROOT, SIO, ...
  • Current version: 0.99

Podio Reader

Constructs the EDM4hep objects for the user

Example usage of Podio Reader in Pyhton:


from podio.root_io import Reader
reader = Reader("one or many input files")
for event in reader.get("events"):
  hits = store.get("hits")
  for hit in hits:
    # ...
                  

Datasets

Plethora of processes are pre-generated and available from EOS

Need to be reprocessed to be usable with EDM4hep 1.0

EOS Space

Intermediate analysis files of common interest can be stored at:
/eos/experiment/fcc/ee/analyses_storage/...

in four subfolders:

  • BSM
  • EW_and_QCD
  • flavor
  • Higgs_and_TOP

Access and quotas:

  • Read access is is granted to anyone
  • Write access needs to be granted: Ask your convener :)
  • Total quota for all four directories is 200TB
  • ATM only part of the quota is allocated

ROOT RDataFrame

ROOT RDataFrame Illustration
  • Describes processing of data as actions on table columns
    • Defines of new columns
    • Filter rules
    • Result definitions (histogram, graph)
  • The actions are lazily evaluated
  • Multi threading is available out of the box
  • Optimized for bulk processing
  • Allows integration of existing C++ libraries

Reading EDM4hep in RDataFrame

  • EDM4hep collection is read in by RDataFrame directly and presented to the user in form:
                      
                        
                      
                    
    • This is per event
    • No convenient access to relationships
  • Example of a simple function:
                      
                        
                      
                    
  • In the course of the analysis the EDM4hep slowly decays into more trivial objects
EDM4hep diagram

Relations

  • One collection can contain one-to-one or one-to-many relations to other collections, e.g.:
    • CaloHitCaloHitContribution
    • MCParticleMCParticle
  • Typically relationships between derived objects (Sim. side separated from Reco. side)
  • Example analyzer (FCC Tutorials link):
                      
                        
                      
                    

Associations

  • One-to-one relationships between two collection types, e.g.:
    • MCParticleReconstructedParticle
    • SimTrackerHitTrackerHit
  • Relationships between Simulation and Reconstruction side
  • Example analyzer: Association between RecoParticle and MCParticle (link):
                      
                        
                      
                    

Documentation

Multiple sources of documentation

Conclusions

  • Primary focus of EDM4hep is in Reconstruction
  • Current strategy in FCCAnalyses is to slowly decay EDM4hep into more basic objects/structures
  • To resolve relationships might require working with indexes across multiple collections
  • EDM4hep 1.0 is coming soon
    • All pre-generated samples will need to be reprocessed

Backup

Example analysis

The Higgs boson mass and σ(ZH) from the recoil mass with leptonic Z decays (link)