Workflow Management @ FCC


Scientific Workflow Management Cross-Experiment Retreat

B. François, D. Lange, G. Guerrieri, L. M. Herrmann, J. Smieško, P. Kontaxakis

May 13, 2026 — remote

Future Circular Collider

Collider proposal

  • Post HL-LHC collider project at CERN
  • FCC-ee: lepton collider — precision factory
    • ∼91 km tunnel
    • Z pole (√s ≈ 91 GeV), WW (√s ≈ 160 GeV),
      ZH (√s ≈ 240 GeV), tt̅ (√s ≈ 365 GeV)
  • FCC-hh: hadron collider — energy frontier
    • √s ≈ 100 TeV
  • 4 interaction points foreseen in the baseline scenario
  • Feasibility Study Report published (2025)1; now in Reference Design phase

FCC-ee detector concepts

  • IDEA — drift chamber, dual-readout calorimetry
  • CLD — silicon tracker, SiW calorimetry
    • Adapted from CLIC detector design
  • ALLEGRO — build around high-granularity noble liquid electromagnetic calorimeter
  • ILD — detector proposed for ILC
    • Adapted to the FCC-ee environment
  • ALFA — all-MAPS tracker, GRAiNITA ECAL new
    • No full sim yet
  • IDEA, CLD, ALLEGRO, ILD available in Key4hep / k4geo

1 FCC Feasibility Study Report, Eur. Phys. J. C (2025)

Key4hep Software Stack

Common framework shared across future collider experiments — FCC, EIC, CLIC, ILC, CEPC, MuCol, …

Selected Components

  • EDM4hep — common event data model (PODIO)
  • DD4hep — detector description toolkit
  • Gaudi — event processing framework
  • MC generators packaged: Whizard2, Pythia, KKMCee, Babayaga, …
  • ddsim / k4geo — simulation
  • k4FWCore / k4Rec* — digitization, reconstruction
    • High-level: Pandora PFA, ML-based solutions (flavour tagging, tracking, …)
  • Built with Spack; distributed via CVMFS (nightly and stable releases)
  • Migration to LCGCMake / LCG Stacks under way

Requirements & Constraints

Now — R&D & Detector Design

  • Physics studies to assess the physics potential of FCC and also individual detector concepts
  • Multiple detector concepts developed in parallel — frequent re-simulation
  • Scale targets:
    • ∼108 events/year/concept (detector dev) → ∼1.6 PB/year
    • ∼1010 events/year/concept (physics) → ∼8 PB/year
  • Individual datasets (parametrized simulation / Delphes) already exceeding 108 events (∼600 TB in total)
  • iLCDirac for larger grid productions; ∼75 TB stored today
  • Transitioning from parameterized simulation (Delphes) to full simulation — key motivation for investing in distributed computing
  • Go beyond CERN-centric approach: HTCondor + EOS + CVMFS
  • Small community — low barrier to entry is essential
  • Datasets nearly open — accessible to anyone with a CERN account; in contrast to current LHC experiments
  • Possible transition to DiracX (next-generation DIRAC)

Future — Post-Approval & Beyond

  • Unprecedented statistics across all run stages — Tera-Z (∼5×1012), WW, ZH, tt̅ — orders of magnitude beyond LEP
  • LHC-experiment-scale computing model: dedicated WLCG resources, multi-site distributed processing
  • Full dataset provenance required for publication-quality results
  • Automated workflow orchestration at scale — manual intervention not feasible
  • Larger and more diverse community; stricter data management and access control
  • Trigger strategy not yet decided — will strongly shape the data workflow
  • Push for a common computing model across all four experiments — mature and modern solution needed by 2029 when experiment collaborations are formed

Workflow at FCC

Centralized Dataset Production

  • Large-scale MC production on the grid
  • MCGen → Simulation → Reconstruction or MCGen → Delphes
  • Driven by iLCDirac Transformation System
  • Operated by production managers; regular users can also submit grid jobs
  • Output: EDM4hep files registered in DIRAC catalog
  • We are decommissioning homegrown production system used for previous campaigns

Analysis & Local Productions

  • End-to-end workflow driven by analysts themselves
  • Local or CERN HTCondor batch
  • Analysis framework: FCCAnalyses (RDataFrame)
  • Local orchestration: FLARE and others
  • Key4hep has no single local WM tool mandated
    • Discussions on going

The two approaches are complementary: centralized productions provide the datasets consumed in analysis.

Grid Production: iLCDirac

Overview

  • DIRAC: general-purpose distributed computing framework used by LHCb and others
  • iLCDirac: extension tailored for lepton collider experiments — custom job types, VO configuration, metadata conventions for FCC, ILC, CALICE
  • Unified interface to batch farms (iLCDirac), grid, HPCs
  • Data Management: file transfers, metadata-augmented catalog
  • High automation; web portal at ilcdirac.cern.ch

Resources & usage

  • Computing: CERN (working), BARI & CNAF (to be tested)
  • Storage: CERN-DST-EOS, BARI-DISK, CNAF-DISK, GLASGOW-DISK
  • Recent testing full sim productions:
    Higgs samples + flavour tagging; max 427 concurrent jobs
  • ∼75 TB stored across 277 k files in DIRAC catalog

Data Management

Rucio data lake

  • Growing fast — ∼1.4 PB registered across 6 sites (CERN, BARI, CNAF, MIT, DESY, GLASGOW, IEP SAS); more negotiating
  • Currently unpledged storage — no permanent allocation guarantee yet
  • Rucio and DIRAC currently separate — each storage site must be registered in both systems
  • Token-native, role-based Rucio being tested — ahead of larger experiments
  • ESCAPE xRIDGE data challenge completed (equivalent to a WLCG Data Challenge): winter2023 campaign successfully transferred
  • Concluding testing phase — open to test users soon

Dataset discovery & requests

  • FCC Physics Events — dataset metadata portal; currently shows datasets from old homegrown production system only
  • Plan: ingest metadata from both DIRAC and Rucio; eventually integrate with Dataset Request System
  • Provides additional level of control over publishing of the datasets
  • Dataset Request System — homegrown service for managing physics requests in development

iLCDirac: Transformation System

Central MC production is driven by Transformations — automatically creates new tasks as input data becomes available.

DataProcessing

  • Defines the full chain:
    MCGen → Simulation → Reconstruction or
    MCGen → Delphes (Parametrized simulation)
  • Automatic job creation, re-submission, input file discovery
  • Consistency checks: each input file treated exactly once
  • Productions implicitly coupled via TransformationID in metadata
  • All information preserved for reproducibility

Supported workflows

  • MC Generation: KKMC, Whizard2, Babayaga, BHLumi, GuineaPig
  • Parametrized simulation: Delphes (standalone or Gaudi-algorithm)
  • Full sim: ddsim → Gaudi
  • DataManipulation: file replication & transfer via FTS
  • Pandora PFA calibration service WiP

Analysis & Local Productions

FCCAnalyses

Analysis framework built on RDataFrame; not a workflow manager.


# Local, single-node RDataFrame
fccanalysis run analysis_stage1.py

# HTCondor batch (1 job per chunk)
fccanalysis batch analysis_stage1.py
        
  • Analysis typically split into several stages (pre-selection, analysis, plots)
  • Same script for local and batch — no duplication
  • New Job class: structured execution, metadata tracking, detailed run summary
  • Planned backends: Dask (distributed scaling) and DIRAC (grid submission)

FLARE

End-to-end local workflow orchestration based on b2luigi; included in Key4hep stack.

  • DAG-based task scheduling — handles dependencies between production steps
  • Covers full chain: generation → simulation → reconstruction → analysis
  • Designed for local productions and analysis campaigns
  • YAML-based configuration — batch system, study metadata, output paths
  • Analysis scripts named by stage prefix (stage1_, final_, plot_, …)
  • Available on PyPI: pip install hep-flare
  • github.com/CamCoop1/FLARE

Near-term Plans

Dataset production & data management

  • Expand computing to more sites following Rucio data lake footprint (MIT, DESY, GLASGOW, …)
  • Large-scale productions targeted for end of 2026
  • Expand Transformation System to cover multiple detector concepts
  • Support for more MC generators — MadGraph, Sherpa, …
  • Support for beam-induced background overlay
  • Production validation — automated sanity checks on produced datasets
  • Improved dataset provenance tracking
  • Rucio data lake consolidation: token rollout Q3 2026, distributed compute Q4 2026
  • Dataset Request System — formalise how physics requests enter the production pipeline

Analysis & local productions

  • FCCAnalyses: DIRAC backend for grid submission; Slurm and Dask as optional distributed backends
  • Run FCCAnalyses on non-CERN resources — datasets via XRootD
  • RNTuple support in FCCAnalyses and across the event processing chain
  • Improve metadata handling between analysis stages
  • Robust APIs for dataset metadata tracking
  • FLARE included in Key4hep stack (arXiv:2506.16094); other local WM tools welcome — no single tool mandated
  • FCC Physics Events: ingest metadata from DIRAC and Rucio; integrate Dataset Request System

Open Challenges

Provenance & validation

  • No unified solution for tracking full dataset lineage across the production chain
  • Automated validation of produced datasets not yet in place
  • Reproducing old campaigns from scratch is non-trivial

Infrastructure & integration

  • Token integration across job submission, data management and metadata services — current pain point
  • Rucio and DIRAC integration for FCC — to be coordinated with DIRAC team
  • Possible transition to DiracX (next-generation DIRAC)
  • Lack of monitoring — requirements not yet fully understood; serious productions yet to come

Scale & common model

  • FCC-ee data volumes comparable to LHC — applying that experience to a lepton collider context
  • No standard paradigm yet for end-to-end analysis workflow management
  • Long-term goal: common computing model across future collider experiments

Summary

  • FCC uses Key4hep as its common software stack — shared across five detector concepts and multiple future collider experiments
  • Centralized production via iLCDirac: Transformation System automates MCGen → Sim → Reco; ∼75 TB stored, computing at CERN, BARI, CNAF
  • Data managed with Rucio: ∼1.4 PB across 6 sites; token-native setup being finalized
  • Analysis via FCCAnalyses (RDataFrame) and local orchestration with FLARE; DIRAC grid backend planned
  • Near-term: large-scale productions end of 2026, expand to more computing sites, multi-detector & generator support, improved provenance & validation

Open questions — happy to learn from your experience

  • Token integration — how are you handling auth across job submission, data management and metadata services?
  • Dataset provenance — how do you track full lineage across a multi-step production chain?
  • Monitoring — what does your production monitoring look like at scale?
  • Long-term goal — common computing model across future collider experiments