Analysis@FCC

FCCAnalyses framework

Juraj Smieško

CERN

PyHEP.dev 2023

Princeton, 27 July 2023

Introduction

CERN

  • FCCSW
  • FCCAnalyses
  • Phoenix/Web tools

Charles Uni.

  • LAr Calorimeter for FCC-ee
  • TileCal operations
  • TileCal offline DQ tools
  • Pileup mitigation

Comenius Uni.

  • Photon + c-jet / Intrinsic Charm
  • TileCal offline DQ tools
Juraj Smiesko (kjvbrt)

Pronunciation: You-rye

Institute: CERN

Alma Mater: Comenius University, SK

Future Circular Collider

Energy and luminosity upgrade in integrated program

  • FCC-ee (Z, W, H, tt):
    Highest luminosities at Z, W, ZH among
    proposed Higgs and EW factories with
    indirect discovery potential up to ~ 70 TeV
  • FCC-hh (~100 TeV):
    Direct exploration of next energy frontier (~ x10 LHC) and unparalleled measurements
  • Feasibility Status Report in 2025
FCC Placement
FCC Integrated program

See latest FCC Week in London

FCC Detectors

FCC Detectors

FCCAnalyses Scope

Goal of the framework is to aid the users in obtaining the desired results not only from the reconstructed physics objects

Requirements:

  • Efficiency — Make quick turn-around possible
  • Flexibility — Allow heavy customization
  • Ease of use — Should not be hard to start using
  • Scalable — Seamlessly handle from small to large datasets

Key4hep

  • Set of common software packages, tools, and standards for different Detector concepts
  • Common for FCC, CLIC/ILC, CEPC, EIC, …
  • Individual participants can mix and match their stack
  • Main ingredients:
    • Data processing framework: Gaudi
    • Event data model: EDM4hep
    • Detector description: DD4hep
    • Software distribution: Spack

EDM4hep I.

Describes event data with the set of standard objects.

  • Specification in a single YAML file
  • Strives to be minimal
  • Generated with the help of Podio

EDM4hep II.

Example object:

#-------------  CalorimeterHit
edm4hep::CalorimeterHit:
  Description: "Calorimeter hit"
  Author : "F.Gaede, DESY"
  Members:
    - uint64_t cellID            //detector specific (geometrical) cell id.
    - float energy               //energy of the hit in [GeV].
    - float energyError          //error of the hit energy in [GeV].
    - float time                 //time of the hit in [ns].
    - edm4hep::Vector3f position //position of the hit in world coordinates in [mm].
    - int32_t type               //type of hit. Mapping of integer types to names via collection parameters "CalorimeterHitTypeNames" and "CalorimeterHitTypeValues".
  • Current version: v0.8.0
  • Objects can be extended / new created
  • Bi-weekly discussion: Indico

ROOT RDataFrame

  • Describes processing of data as actions on table columns
    • Defines of new columns
    • Filter rules
    • Result definitions (histogram, graph)
  • The actions are lazily evaluated
  • Multi threading is available out of the box
  • Optimized for bulk processing

Integration with Existing Tools

  • Boundary between reconstruction and analysis blurred
    • Especially for full-sim
    • Plan: Develop algorithm on analysis side, then move to reconstruction
  • Many C++ tools/libraries created over the years
    • Most are integrated into the Key4hep stack
    • At the moment we have:
      • ROOT — together with RDataFrame
      • ACTS — track reconstruction tools
      • ONNX — neural network exchange format
      • FastJet — jet finding package
      • DD4hep — detector description
      • Delphes — fast simulations

Distribution

FCCAnalyses latest release v0.7.0 can be found:

  • As a package in the stable Key4hep stack
    • Allows to quickly put together small analysis
    • Limited options for customization

Latest/development version of the FCCAnalyses can be found:

  • As a package in the nightlies Key4hep stack
    • Might easily break
  • By checking out master branch
    • Allows greater customization
    • Requires discipline

Platforms: CentOS 7, AlmaLinux 9, Ubuntu 22.04

Analysis Architecture

One can write and run an analysis in several ways:

  • Managed mode: fccanalysis run my_ana.py
    • The RDataFrame frame is managed by the framework
    • Analysis script has to contain compulsory attributes
    • Libraries are loaded automatically
    • Dataset metadata are loaded from remote location — CVMFS/HTTP server
    • Batch submission on HTCondor
    • Customization: Possible at the level of analyzer functions
    • Intend for: Quick analysis, no advanced analyzer functions
  • Standalone mode: python my_ana.py
    • The RDataFrame frame is managed by the user
    • Can leverage the FCCAnalyses library of analyzer functions
  • Ntupleizer style

Writing an analyzer function

  • Typically an analyzer is a struct which operates on an EDM4hep object
  • ROOT RDataFrame needs to be aware of the analyzer function
    • Provided as a string
    • A file loaded and JITed by the ROOT.gInterpreter
    • Compiled in the library

Documentation

Several documentation types

Questions

  • Interaction of the C++ analyzer functions with the Python in the context of RDataFrame
  • Efficient work with the Podio and EDM4hep data format
  • Ways to distribute of analyzer functions among the users
  • Large scale management of the pre-generated input samples
  • Integration of facilities needed to support full simulation studies
  • Non CLI based modes of interaction with the analysis code