FCCAnalyses

A Framework for FCC Physics Performance Studies

Juraj Smieško (CERN)

PyHEP.dev 2024

Aachen, Germany

29 August 2024

Future Circular Collider

Energy and luminosity upgrade in an integrated program

  • FCC-ee (Z, WW, H, ttbar):
    Highest luminosities at Z, W, ZH among
    proposed Higgs and EW factories with
    indirect discovery potential up to ~ 70 TeV
  • FCC-hh (~100 TeV):
    Direct exploration of next energy frontier (~ x10 LHC) and unparalleled measurements
  • Feasibility Status Report in 2025
  • More than 150 institutes from 30 countries already involved

Tunnel Placement

FCC Tunnel Placement

source: FCC Week 2023 | J. Gutleber

Tunnel Placement

FCC Tunnel Placement

source: FCC Week 2023 | T. Watson

Tunnel Layout

FCC Infrastruktura

source: FCC Week 2024 | T. Watson

Timeline

FCC-ee parametre
FCC-ee parametre

source: FCC Week 2024 | F. Gianotti, FCC Poster

Parameters of FCC-ee

FCC-ee parametre

source: FCC Week 2024 | F. Gianotti

Parameters of FCC-hh

FCC-ee parametre

source: FCC Week 2024 | F. Gianotti

Detector Concepts for FCC-ee

FCC-ee detektorové koncepty

source: FCC Week 2024 | JA. Hewett

Key4hep

  • Set of common software packages, tools, and standards for different Detector/Collider Concepts
  • Common for FCC, CLIC/ILC, CEPC, EIC, …
  • Individual participants can adjust their stack
  • Main cornerstones:
    • Data processing framework: Gaudi
    • Event data model: EDM4hep
    • Detector description: DD4hep
    • Software distribution: Spack
HEP Stack
Key4hep design

EDM4hep I.

Describes event data with the set of standard objects.

  • Specification in a single YAML file
  • Generated with the help of Podio
EDM4hep diagram

EDM4hep II.

Example object:

#-------------  CalorimeterHit
edm4hep::CalorimeterHit:
  Description: "Calorimeter hit"
  Author: "EDM4hep authors"
  Members:
    - uint64_t cellID                 // detector specific (geometrical) cell id
    - float energy [GeV]              // energy of the hit
    - float energyError [GeV]         // error of the hit energy
    - float time [ns]                 // time of the hit
    - edm4hep::Vector3f position [mm] // position of the hit in world coordinates
    - int32_t type                    // type of hit
  • Current version: v0.99.0
  • Objects can be extended / new created
  • Bi-weekly discussion: Indico

Podio

Generates Event Data Model and serves as I/O Layer

  • Generates EDM from YAML files
  • Employs plain-old-data (POD) data structures
  • I/O machinery consists of three layers
    • POD Layer - actual data structures
    • Object Layer - helps resolve the relations
    • User Layer - full fledged EDM objects
  • Supports multiple backends:
    • ROOT, SIO, ...
  • Current version: 1.0.1

Podio Reader

Constructs the EDM4hep objects for the user

Example usage of Podio Reader in Pyhton:

                  
                    
                  
                

FCCAnalyses Overview

Analysis framework build on top of ROOT RDataFrame
with input from EDM4hep

  • Dependent on Key4hep Stack
  • Manages input samples
  • Has standard library of functions/closures
  • Runs the dataframe
  • Helps with histograms/plots
  • Registry for the analyses

FCCAnalyses script

  • Typical analysis divided into several stages
  • Results between stages stored in ROOT files
  • Running of the script with: fccanalysis run ana_script.py
            
              
            
          

Input samples

FCCAnalyses manages input ROOT files for the user

  • Analysis operates on named samples (by process name)
  • Pre-generated samples identified with production tag
  • Registry of available samples available at FCC Physics Events website
  • Local samples require input directory path
  • Process dictionary allows further parameters: fraction, chunks, ...
            
              
            
          

Analyzers

Collection of standard functions/closures

  • Users define their dataframe in a class method
  • Output variables registered in a list
  • Additional analyzers JIT compiled
            
              
            
          

Running of the RDF

Execution of the dataframe hidden from the user

  • User can affect how the dataframe runs with global attributes
  • Analysis can run locally or on HTCondor
            
              
            
          

Histograms/Plots

Last two stages of the analysis

  • User specifies output histograms
  • Histograms are combined into plots
            
              
            
          

Key4hep integration

FCCAnalyses is tied to the Key4hep stack

  • Distributed as a Spack package
  • Key4hep environment needed for running
  • EDM4hep objects read directly from ROOT files
  • Building from source expects Key4hep
  • People pin their analysis to the particular stack version

Integration with Existing Tools

  • Boundary between reconstruction and analysis blurred
    • Especially for full-sim
    • Plan: Develop algorithm on analysis side, then move to reconstruction
  • Many C++ tools/libraries created over the years
    • Most are integrated into the Key4hep stack
    • At the moment we have:
      • ROOT — together with RDataFrame
      • ACTS — track reconstruction tools
      • ONNX — neural network exchange format
      • FastJet — jet finding package
      • DD4hep — detector description
      • Delphes — fast simulations

Analysis registry

Central registry for the FCC-ee analyses

  • In the repository FCCeePhysicsPerformance FCCee analyses are listed
  • Experimental: One can create analysis package for analysis specific code

Conclusions & Outlook

  • The combination of EDM4hep and RDataFrame works well for the FSR Physics Studies based on Delphes Fastsim
    • Performant
    • Possibility to integrate range of existing (C++) libraries
  • Started focusing on the Geant4 Fullsim detector studies
    • Writing of an analysis without compilation preferred
    • Access to the detector description through the framework
    • Better integration into Python tooling
    • ML integration needs more thought
    • More complex collection relationships complicated
  • Bi-weekly FCC meeting focused on analysis framework development, but more importantly on the analysis tools