Analysis and visualization

Juraj Smieško

CERN

6th FCC Physics Workshop in Kraków

24 January 2023

Key4hep

  • Set of common software packages, tools, and standards for different Detector concepts
  • Common for FCC, CLIC/ILC, CEPC, EIC, …
  • Individual participants can mix and match their stack
  • Main ingredients:
    • Data processing framework: Gaudi
    • Event data model: EDM4hep
    • Detector description: DD4hep
    • Software distribution: Spack

Analysis

Analysis scope

  • Takes reconstructed objects and produces physics results
  • The objects are described in EDM4hep format
  • Input datasets are usually centrally produced
  • Access to the detector description
  • Needs to accommodate different analysis strategies
  • Runs locally and on Batch/Grid

Ecosystem I.

The physics analyses at FCC are spread through two repositories and a storage space:

  • FCCAnalyses
    • Repository of common tools and algorithms
    • General analysis code in analyzers
    • Steering of the analysis (RDataFrame)
    • Access to the (meta)data
    • Running over large datasets / on batch
    • (Proto)package machinery for case studies
  • FCCeePhysicsPerformance
    • Main place for the abstracts
    • Contains very specific analysis code
      • Or prototypes of tools of common interest to be eventually moved to FCCAnalysis
    • (Proto)package repository
  • Storage space on EOS /eos/experiment/fcc

Ecosystem II.

Supporting repositories:

RDataFrame

  • Describes processing of data as actions on table columns
  • The actions are instantly or lazily evaluated
  • Multi threading is available out of the box
  • Optimized for bulk processing

Architecture

  • The analysis is build around ROOT RDataFrame with rich "standard library"
  • Over the years this library (analyzers) have been written
  • Analyzers are usually structs which operate on EDM4hep objects
  • Optional dependencies for analyzers can be FastJet, DD4hep, ACTS and ONNX
  • Dataset metadata are loaded from remote location --- AFS/HTTP server
      Number of events generated, cross-section, ...
  • Python used for steering, but not necessary
    • One can write analysis in pure C++

FCCAnalyses library

  • Vertexing
  • ACTS vertex finder
  • Event variables
  • Calorimeter hit/cluster variables
  • Reconstructed/MC particle operations
  • Flavour tagging
  • Jet clustering/constituents

Workflow

  • The analysis is divided into three stages:
    • analysis_stage1.py, ... — pre-selection stages, analysis dependent, usually runs on batch
    • analysis_final.py — final selection, produces final variables
    • analysis_plots.py — produces plots from histograms/TTrees
    • The stages files contain objects which are loaded into "main" function with the help of getattr()
    • The first stage reads the data in EDM4hep format
    • Running on batch is done by running on-the-fly generated shell script in subprocess

Proto packages

case-studies machinery allows to create (semi)independent analysis

Example analysis is split into several locations:

  • Analysis stages are in examples in FCCAnalyses
  • Abstract and Results in case-studies in FCCeePhysicsPerformance
  • Benchmarks are in tests in FCCAnalyses
  • Analysis specific code in case-studies in FCCeePhysicsPerformance

Areas to improve I.

    Needs to be addressed for the FSR:

    • Debugging tools
    • Individual event investigation + visualization
    • Integration with the production system
      • Migration to Dirac
      • Automatic deployment of new samples
    • Rigidity of the FCCAnalysis framework
      • Restrictive predefined stages
      • Weaver analysis example:
        • Analysis "properly" implements only first stage
        • Requires custom stage for training/testing
        • Place to store common objects/variables between stages
      • All python machinery crammed into one module

Areas to improve II.

Nice to have:

  • Error signaling
  • Testing and reorganization of the analyzers
  • Long term reproducibility
  • Benchmarking covering all analyzers and analyses

Visualization

Main use cases

Detector
  • Detector description in DD4hep format
    • Combination of C++ and XML
  • Conversion available to:
    • ROOT, GDML, glTF
  • Can be viewed in:
    • geoDisplay, Geant4, JSROOT, Phoenix
Events
  • Events could come from different sources in EDM4hep format
    • Full simulation: Pythia
    • Fast simulation: Delphes
    • Simple particle gun
  • Storage formats:
    • ROOT, JSON
  • Viewers:
    • Phoenix, CED

Web based visualization

  • Visualize events and detectors
  • Using web based tools
    • Independent of OS graphics
  • Phoenix: Detector independent event display
    • Developed under HSF
    • Written in TypeScript
    • Static application
  • JSROOT
    • Part of the ROOT project
    • Offers possibility to work with ROOT files on the web

Phoenix workflow

  • Separate event data and detector description
  • Events
    • Described in EDM4hep event data format
    • Convert ROOT files into JSON files
    • EDM4hep data structure is kept
  • Detector
    • Detector is described in DD4hep compact files
    • Convert XML into ROOT for JSROOT
    • Convert ROOT into glTF for Phoenix

List of FCC detectors available.

Conclusions & Outlook

  • Choice of ROOT RDataFrame in combination with reading EDM4hep format suits most of the analyses
    • Continually growing list of analyses
  • Large "standard library" is being build up
  • Rigidity of the framework starts to limit more complex analyses
  • Several ways of visualizing event data and detector geometry readily available
  • Necessity of bridging the gap between the FCC stack and the graphic subsystem of the OS
More heads
are welcome!