FCCAnalyses

A Core Component of the Emerging Analysis Ecosystem for FCC


28th Conference on Computing in High Energy and Nuclear Physics

Juraj Smieško, FCC Software Team (CERN)

26 May 2026

Bangkok, Thailand

Future Circular Collider

Collider proposal

  • Post HL-LHC collider project at CERN
  • FCC-ee: lepton collider — precision factory
    • ∼91 km tunnel
    • Z pole (√s ≈ 91 GeV), WW (√s ≈ 160 GeV),
      ZH (√s ≈ 240 GeV), tt̅ (√s ≈ 365 GeV)
  • FCC-hh: hadron collider — energy frontier
    • √s ≈ 100 TeV
  • 4 interaction points foreseen in the baseline scenario
  • Feasibility Study Report published (2025)1; now in Reference Design phase

FCC-ee detector concepts

  • IDEA — drift chamber, dual-readout calorimetry
  • CLD — silicon tracker, SiW calorimetry
    • Adapted from CLIC detector design
  • ALLEGRO — built around high-granularity noble liquid electromagnetic calorimeter
  • ILD — detector proposed for ILC
    • Adapted to the FCC-ee environment
  • ALFA — all-MAPS tracker, GRAiNITA ECAL new
    • No full sim yet
  • IDEA, CLD, ALLEGRO, ILD available in Key4hep / k4geo

1 FCC Feasibility Study Report, Eur. Phys. J. C (2025)

Key4hep Software Stack

Common framework shared across future collider experiments — FCC, EIC, CLIC, ILC, CEPC, MuCol, …

Selected Components

  • EDM4hep — common event data model (PODIO)
  • DD4hep — detector description toolkit
  • Gaudi — event processing framework
  • MC generators packaged: Whizard2, Pythia, KKMCee, Babayaga, …
  • ddsim / k4geo — simulation
  • k4FWCore / k4Rec* — digitization, reconstruction
    • High-level: Pandora PFA, ML-based solutions (flavour tagging, tracking, …)
  • FCCAnalyses — analysis framework this talk
  • Built with Spack; distributed via CVMFS (nightly and stable releases)
  • Migration to LCGCMake / LCG Stacks under way
Key4hep logo Key4hep components
Key4hep stack diagram

See also: Recent developments in Key4hep — J. M. Carceller et al., this conference

FCCAnalyses

  • Main analysis framework for the FCC collaboration
    • Built on ROOT RDataFrame — declarative, multi-threaded
    • Reads EDM4hep data natively
    • C++ kernels with a Python user interface
  • Served the Feasibility Study Report phase well
    • Fast simulation studies, parametric performance analyses
    • Now evolving to meet the demands of the Reference Design phase — full simulation, larger datasets, stricter reproducibility
  • Manages the full analysis chain
    • Dataset metadata resolution
    • Local & distributed execution
    • Staged and histmaker analysis styles
  • Distributed in the Key4hep stack:
    source /cvmfs/sw.hsf.org/key4hep/setup.sh
  • github.com/HEP-FCC/FCCAnalyses
RDataFrame analysis graph
Example Higgs recoil analysis graph

Library of Analyzers

Shared building blocks that physics groups contribute to and reuse

  • Analysis is composed from functions/functors operating on RDataFrame columns — ideally small and stateless
    • Particle kinematics, jet finding (FastJet), dN/dx, ML inference (ONNX, TMVA), …
  • Two layers of analyzers:
    • External (ROOT RVec, EDM4hep utils, RAL)
    • FCCAnalyses standard library — encourage upstream contributions
  • Analysis-specific extensions via .hxx header
    • JIT-compiled by ROOT at startup — no extra build step
    • Old CMake-based extension method deprecated
  • Fork development model creates analyzer copies across groups
    • Goal: reduce duplication by upstreaming shared functions
  • podio::DataSource — native EDM4hep reading in RDataFrame
    • Lazy reading being added to ROOTReader and RNTupleReaderpodio#949
          
            
          
        
          
            
          
        

Writing and Running an Analysis

          
            
          
        
  • Analysis is now encapsulated in a Python class
    • CLI arguments passed through, for user level arguments
  • Helper functions widely used in practice
    • Multiple RDataFrame Define / Filter calls wrapped into a single operation
    • Often the C++ expression strings are templated
  • Unified CLI interface:
    fccanalysis run ana.py
    fccanalysis final final.py
    fccanalysis plots plots.py
  • Dataset metadata resolved automatically from FCC Physics Events
    • Interface under overhaul → API calls
  • New Job class recent
    • One unit of local work → one output ROOT file
    • Encapsulates RDataFrame setup, event counts, benchmarking info

Two Analysis Styles

Two complementary running modes for different analysis workflows

Staged

  • Analysis split into multiple sequential stages — each writing intermediate output to disk
  • Often requires running algorithms typically done at reconstruction level — e.g. vertexing, jet clustering
  • Specialised ML training stage often inserted between stages — not managed by FCCAnalyses; inference then run back through FCCAnalyses
  • Intermediate files written to disk — each stage can be re-run independently
  • Well-suited for analyses working with large datasets where re-running the full chain is expensive

Histmaker

  • Single-pass: histograms filled directly from EDM4hep input
  • No intermediate ntuples — lower disk footprint
  • Uses RDataFrame RunGraphs for concurrent processing of all samples
  • Good for exploratory work and quick iteration on observables
  • Well-suited for well-defined analyses with a fixed set of observables

Future goal: unified interface for both styles — staged and histmaker will converge into a single analysis description

Physics Reach & Detector Performance

FCCAnalyses as the primary tool for Reference Design phase detector benchmarking and physics reach assessment

  • Physics reach assessment
    • Benchmark analyses: e+e → ZH, WW, Z pole, …
    • Results hosted in FCCeePhysicsStudies and FCChhPhysicsPerformance
    • Process started to define analysis-level format and technical solution
    • In process of nominating flagship analyses to drive software requirements
  • Physics groups under FCC PED new
    • Formal physics group structure now in place for the Reference Design phase
    • Groups drive analysis requirements and coordinate physics reach studies
  • Detector performance
    • Multiple concepts studied in parallel: IDEA, CLD, ALLEGRO, ILD, ALFA
    • Full simulation samples from ddsim + k4geo
    • Detector-specific analyzers (dN/dx, track parameters, …)
    • Simultaneous support for fast simulation (Delphes) and full simulation
FCC PED physics groups
FCC PED physics groups (Reference Design phase)

Making FCCAnalyses Distributed

Two complementary approaches from FCCAnalyses' perspective

Integrated within FCCAnalyses

Distributed execution driven from within FCCAnalyses

  • HTCondor — production-ready
    • fccanalysis submit ana.py
    • Widely used in combination with centrally produced samples
  • DIRAC / iLCDirac — grid submission
    • Serious dataset production campaign planned for October
    • iLCDirac application for FCCAnalyses planned
  • RDataFrame distributed module
    • Apache Spark, Dask as possible backends — not yet used at FCC
  • Future: Slurm and other batch platforms
HTCondor logo DIRAC logo

External workflow management

FCCAnalyses used as an executable orchestrated from outside

  • Flare — b2luigi-based orchestration
    • Handles complex multi-stage analysis pipelines
    • Integrates directly with Key4hep executables
    • Runs on HTCondor, Slurm and LSF — DIRAC (via gbasf2) and others possible
    • See also: Flare: an open source data workflow orchestration tool — C. Harris et al., this conference
  • Other workflow tools possible — Key4hep has no preferred tool
Flare logo b2luigi logo

Emerging FCC Computing Model

User analysis currently on CERN EOS + HTCondor — goal is to move away from this towards proper distributed infrastructure

  • Dataset production via DIRAC/iLCDirac
    • Testing campaign ongoing — first serious campaign planned for October
    • Starting efforts for DiracX at FCC new
  • Dataset management via Rucio
    • ~1.4 PB registered across 6 sites: CERN, BARI, CNAF, MIT, DESY, GLASGOW, IEP SAS
    • Token-native, role-based setup being finalized
    • ESCAPE xRIDGE data challenge completed — concluding testing phase
    • Open to test users Q3 2026; distributed compute integration Q4 2026
  • FCC Physics Events — dataset metadata portal
    • API endpoint for JSON metadata — FCCAnalyses resolves sample locations via it
    • More APIs planned: dataset request, provenance, catalog access, …
    • Designed to be tooling-agnostic — any framework can consume them
  • Dataset Request System in development
    • Formalises how physics requests enter the production pipeline
    • Plans to integrate with FCC Physics Events
Centralized Dataset Production diagram
Naive representation of a GRID-based dataset production system

First Bits of Analysis Ecosystem

First tools and integrations taking shape around FCCAnalyses

  • Analysis registriesFCCeePhysicsStudies and FCChhPhysicsPerformance
    • Community hubs for analysis code, results and documentation
  • eedE — EDM4hep Event Data Explorer
    • Web-based, interactive EDM4hep event content exploration
  • Phoenix — web-based 3D event display
    • Interactive detector and track visualization
  • CMS Combine — statistical analysis framework
    • GSoC 2026: proper integration into the Key4hep stack and FCCAnalyses in progress
  • Improve plotting tools — effort started, may involve external tools
  • Upstream improvements: TupleWriter in k4FWCore, podio::DataSource performance
  • Overcoming silos: Key4hep ↔ PyHEP ↔ DiracOS ↔ Rucio
  • Full simulation flagship analyses will drive requirements in the Reference Design phase
  • Interoperability with the broader HEP software ecosystem is a key goal
eedE event data explorer screenshot
eedE — EDM4hep Event Data Explorer
Phoenix event display screenshot
Phoenix — CLD detector event display

Conclusions

  • FCCAnalyses is the main analysis framework for FCC, integrated in the Key4hep stack
    • Declarative RDataFrame interface — C++ performance, Python ergonomics
  • The Library of Analyzers concept encourages code sharing across physics working groups and detector concepts
  • Supports Reference Design phase detector benchmarking and physics reach assessment
  • Two complementary analysis styles (staged and histmaker) cover a wide range of workflows — unified interface is a future goal
  • FCC distributed computing capabilities growing: DIRAC/iLCDirac (serious campaign planned for October), Rucio (concluding testing phase)
  • Interoperability with the broader HEP software ecosystem is a key goal

Thanks to All Contributors

FCCAnalyses is a community effort — github.com/HEP-FCC/FCCAnalyses

clementhelsens · kjvbrt · selvaggi · davidjamin · vvolkl · EmanuelPerez · forthommel · BrieucF · jeyserma · jmcarcell · creakyorange969 · zuoxunwu · amanmdesai · gganis · tmadlener · JavierCVilla · gartrog · jalimena · gavinsalam · atishelmanch · kunal2796 · matthewkenzie · IneMEGAmaxi · bistapf · imelnyk1337 · portalesHEP · jacofan · prayagyadav · ShreyasBakare · lipeles

Backup

Open Tasks

High priority

  • Analysis-level data model
    • Identify needed observables & reco algorithms; define suitable file format
  • Reorganize analyzer library
    • Improve coherence and coverage; coordinate with ROOT RDF team
  • ML as first-class citizen
    • Rethink architecture to properly support current ML analysis needs
  • Debug-level visualization
    • Visual debugging at any stage of the event processing chain

Medium priority

  • Event weights & systematic uncertainties support
  • Redesign plotting — quick plots, ratio plots, publication-level
  • Expand distributed computing — GRID, Slurm, GPU resources
  • Robust analysis APIs & full dataset provenance
  • User-driven workflow management
  • Fitting tools integration — CMS Combine, zfit, RooFit, …

Analyzer Library

analyzers/dataframe — standard library of reusable functions and functors

  • Particles
    • MCParticle — generator-level particle access
    • ReconstructedParticle — reco particle kinematics & selection
    • ReconstructedParticle2MC, ReconstructedParticle2Track — associations
  • Tracks
    • ReconstructedTrack, TrackUtils — track parameters & utilities
  • Jets
    • JetClusteringUtils — FastJet interface
    • JetConstituentsUtils, JetFlavourUtils, JetTaggingUtils
  • Vertexing
    • VertexFinderActs, VertexFinderLCFIPlus
    • VertexFitterActs, VertexFitterSimple
  • Smearing & fast simulation
    • SmearObjects, Smearing
  • ML inference
    • WeaverUtils — ONNX-based neural network inference
  • Calorimetry & utilities
    • CaloNtupleizer, EventFilter, Algorithms
  • FCC-hh specific: Analysis_FCChh

Addons — optional components with heavier dependencies

  • FastJet — jet clustering (Valencia plugin, external recombiners)
  • ONNXRuntime — neural network inference, Weaver interface
  • TMVAHelper — ROOT TMVA integration