FCCAnalyses Job Class


Analysis Tools and Productions

Juraj Smieško (CERN)

22 April 2026

New Job Class

One unit of local work producing one output ROOT file. Encapsulates the full RDataFrame lifecycle.

  • Replaces scattered RDataFrame setup code in run_fccanalysis.py
  • Parallel file metadata fetching using threads
  • Event restriction: stride and max events
  • Detailed run summary with event counts & sum of weights
  • Revamped metadata writing into the output ROOT file

Job Interface


job = Job(input_file_list, analysis_chain, use_data_source=False)

job.setup_output(output_filepath, output_variables)
job.enable_progress_bar()              # optional
job.restrict_events(n_events_max=1000, stride=2)

job.run()      # triggers the RDataFrame event loop
job.finalize() # writes metadata to the output ROOT file

n_events, elapsed = job.get_benchmark_info()
    

Event Counts Tracked in the Job

  • raw-orig / sow-orig
    • original totals from input files
  • raw-ttree
    • events in the input TTree
  • raw-init / sow-init
    • at dataframe creation
  • raw-restricted / sow-restricted
    • after stride / max-events filter
  • raw-final / sow-final
    • after analysis selection

All counts written as TParameter objects into a fccana/ directory in the output ROOT file.

Job Run Summary


================================ SUMMARY ================================
Elapsed time (HH:MM:SS):                 00:00:03
Number of events processed:              10,000
Events processed per second:             3,012
Sum of weights processed:                9,823
Number of result events:                 4,217
Local number of events reduction factor: 0.4217
Total number of events available:        500,000
Total reduction factor:                  0.008434
=========================================================================
    

Renaming process and production tag

  • processsample
  • prod_tagcampaign
  • Name better reflects what is being encoded:
    • campaign: <accelerator>/<season-and-year>/<detector>
    • sample: <generator>_<process>_<energy>
  • All imports updated across the framework
  • New validate_sample_list() function

validate_sample_list()

Normalises and validates the per-sample dictionary from analysis scripts.

Deprecations

(warns, but still works)

  • input_dirinput-dir
  • outputoutput-stem

Validated keys

  • input-dir
  • output-stem
  • fraction
  • chunks
  • stride new
  • n-events-max new

Used by both run_fccanalysis.py and batch.py.

Per-sample event controls


# analysis_stage1.py
class Analysis:
    samples = {
        "p8_ee_ZH_ecm240": {
            "fraction": 0.5,
            "chunks": 4,
            "stride": 2,           # process every 2nd event
            "n-events-max": 50000, # cap at 50k events
        }
    }
    

Both stride and n-events-max are also available as CLI arguments when running over a test file or an independent sample.

Specifying input and output

Input

  • Sample list in script + (campaign or input_dir):
    
    class Analysis:
        samples = {"p8_ee_ZH": {}}
        campaign = "winter2023"
                
  • Per-sample input directory:
    
    samples = {"p8_ee_ZH": {"input-dir": "/eos/..."}}
                
  • Direct files via CLI:
    
    fccanalysis run ana.py -i file1.root file2.root
    fccanalysis run ana.py -f files.txt
                

Output

  • Output directory in script or CLI:
    
    class Analysis:
        output_dir = "./output/"
        analysis_name = "my_analysis"
                
    
    fccanalysis run ana.py --output-dir ./out/ \
                           -a my_analysis
                
  • Per-sample output stem:
    
    samples = {"p8_ee_ZH": {"output-stem": "ZH"}}
                
  • Direct output file (independent sample):
    
    fccanalysis run ana.py -i file.root \
                           -o result.root
                

New CLI arguments for fccanalysis run

General

  • --output-dir
  • -a / --analysis-name
  • --apply-filepath-rewrites
    --no-filepath-rewrites
  • --n-events (alias for --nevents)

Independent sample

  • -s / --sample-name
  • --n-chunks
  • --stride
  • --test-file

Summary

  • New Job class — one unit of local work
  • process renamed to sample
    • process_listsamples
    prod_tag renamed to campaign
  • New per-sample keys: stride and n-events-max
  • New CLI arguments for fccanalysis run
  • Consistent validation via validate_sample_list()
  • FCCAnalyses PR#514

Backup

Other improvements

  • Deprecation warnings
    All [DEPRECATED] messages now follow a consistent style: [DEPRECATED] Please use "X" instead of "Y"!
  • Test output location
    Output ROOT files from ctest now go to ${CMAKE_BINARY_DIR} instead of the source tree.
  • Man pages updated
    fccanalysis-run(1) documents all new CLI arguments.
    fccanalysis-script(7) documents stride and n-events-max per-sample keys.
    ROOT.ExperimentalROOT.ROOT namespace fix.
  • Type annotations added throughout Python modules