Reports In Depth
Syntax Summary and Best Practices
// Create a struct for a row of the report; must implement `Serialize`.
#[derive(Serialize)]
struct ReportItem {
// Arbitrary (serializable) data fields.
}
// Define the report item in ixa.
define_report!(ReportItem);
// Somewhere during context initialization, initialize the report:
context.add_report::<ReportItem>("my_report")?;
// To write a row to the report:
context.send_report(
ReportItem {
// A concrete ReportItem instance
}
);
Best practices:
- Only record as much data as you require, because gathering and writing data takes computation time, the data itself takes up space, and post-processing also takes time.
- In particular, avoid a "snapshot the universe" approach to recording simulation state.
- Balance the amount of aggregation you do inside your simulation with the amount of aggregation you do outside of your simulation. There are trade-offs.
- Do not use the
-f/--force-overwriteflag in production to avoid data loss. - Use indexes / multi-indexes for reports that use queries. (See the chapter on Indexing.)
Introduction
Ixa reports let you record structured data from your simulation while it runs. You can capture events, monitor population states over time, and generate summaries of model behavior. The output is written to CSV files, making it easy to use external tools or existing data analysis pipelines.
The report API takes care of a lot of details so you don't have to.
- One file per report type - Each report you define creates its own CSV file
- Automatic headers - Column names are derived from your report structure
- Hook into global configuration - Control file names, prefixes, output directories, and whether existing output files are overwritten using a configuration file or ixa's command line arguments
- Streaming output - Data is written incrementally during simulation execution
There is also built-in support for reports based on queries and periodic reports that record data at regular intervals of simulation time.
Configuring Report Options
You can configure the reporting system using the report_options() method on
Context. The configuration API uses the builder pattern to configure the
options for all reports for the context at once.
use ixa::prelude::*;
use std::path::PathBuf;
let mut context = Context::new();
context
.report_options()
.file_prefix("simulation_".to_string())
.directory(PathBuf::from("./output"))
.overwrite(true);
The configuration options are:
| Method | Description |
|---|---|
file_prefix(String) | A prefix added to all report filenames. Useful for distinguishing different simulation runs or scenarios. |
directory(PathBuf) | The directory where CSV files will be written. Defaults to the current working directory. |
overwrite(bool) | Whether to overwrite existing files with the same name. Defaults to false to prevent accidental data loss. |
Note that all reports defined in the Context share this configuration.
This configuration is also used by other outputs that integrate with report
options, such as the profiling JSON written by
ixa::profiling::ProfilingContextExt::write_profiling_data().
Case Study 1: A Basic Report
Let's imagine we want a basic report that records every infectiousness status
change event (see examples/basic-infection/src/incidence_report.rs). The first
few rows might look like this:
| time | person_id | infection_status |
|---|---|---|
| 0.0 | 986 | I |
| 0.001021019741165120 | 373 | I |
| 0.013085498308028700 | 338 | I |
| 0.02134601331583040 | 542 | I |
| 0.02187737003255150 | 879 | I |
| ⋮ | ⋮ | ⋮ |
As far as ixa's report system is concerned, we really only need four ingredients for a simple report:
- A report item type that will represent one row of data in our report,
basically anything that implements
serde::Serialize. - A
define_report!macro invocation declaring the report item is for a report. - A call to
context.add_report(), which readies the output file for writing. This also establishes the filename associated to the report for the report item according to yourContext's report configuration. (See the section "Configuring Report Options" for details.) - One or more calls to
context.send_report(), which write lines of data to the output file.
But even for very simple use cases like this one, we will need to "wire up" simulation events, say, to our data collection. Here is what this might look like in practice:
// initialization_report.rs
use ixa::prelude::*;
use serde::{Deserialize, Serialize};
// ...any other imports you may need.
/// 1. This struct will represent a row of data in our report.
#[derive(Serialize)]
struct IncidenceReportItem {
/// The simulation time (in-universe time) the transition took place
time: f64,
/// The ID of the person whose status changed
person_id: PersonId,
/// The new status the person transitioned to
infection_status: InfectionStatusValue,
}
/// 2. Tell ixa that we want a report for `IncidenceReportItem`.
define_report!(IncidenceReportItem);
/// This and other auxiliary types would typically live in a different
/// source file in practice, but we emphasize here how we are "wiring up"
/// the call to `context.send_report()` to the change in a person property.
type InfectionStatusEvent = PersonPropertyChangeEvent<InfectionStatus>;
/// We will want to ensure our initialization function is called before
/// starting the simulation, so let's follow the standard pattern of having
/// an `init()` function for our report module, called from a main
/// initialization function somewhere.
pub fn init(context: &mut Context) -> Result<(), IxaError> {
/// 3. Prepare the report for use. This gives the report the *short name*
/// `"incidence"`.
context.add_report::<IncidenceReportItem>("incidence")?;
/// In our example, we will record each transition of infection status
context.subscribe_to_event::<InfectionStatusEvent>(handle_infection_status_change);
Ok(())
}
fn handle_infection_status_change(context: &mut Context, event: InfectionStatusEvent) {
/// 4. Writing a row to the report is as easy as calling
/// `context.send_report` with the data you want to record.
context.send_report(
IncidenceReportItem {
time: context.get_current_time(),
person_id: event.person_id,
infection_status: event.current,
}
);
}
This report is event driven: an InfectionStatusEvent triggers the creation of
a new row in the report. But do we really want to record every change of
infection status? Suppose what we actually care about is transitions from
susceptible to infected. In that case we might modify the code as follows:
fn handle_infection_status_change(context: &mut Context, event: InfectionStatusEvent) {
/// 4. Only write a row if a susceptible person becomes infected.
if (InfectionStatusValue::Susceptible, InfectionStatusValue::Infected)
== (event.previous, event.current)
{
context.send_report(
IncidenceReportItem {
time: context.get_current_time(),
person_id: event.person_id,
infection_status: event.current,
}
);
}
}
Report Design Considerations
Separation of Concerns
Notice that we use a property change event to trigger writing to the report in
the example of the previous section. We could have done it differently: Instead
of subscribing an even handler to a property change event, we could have made
the call to context.send_report directly from whatever code changes a person
from "susceptible" to "infected". But this is a bad idea for several reasons:
- Separation of Concerns & Modularity: The transmission manager, or whatever code is responsible for changing the property value, should not be burdened with responsibilities like reporting that are outside of its purview. Likewise, the code for the report exists in a single place and has a single responsibility.
- Maintainability: Putting the call to
context.send_reportwith the code that makes the property change implicitly assumes that that is the only way the property will be changed. But what if we modify how transmission works? We would have to remember to also update every single affected call tocontext.send_report. This explosion in complexity is exactly the problem the event system is meant to solve.
Data Aggregation
You have to decide what data to include in the report and when to collect it. To determine the data sets you need, work backwards from what kinds of analysis and visualizations you will want to produce. It is best to avoid over-printing data that will not be used downstream of the simulation process. Often the most important design question is:
How much aggregation do you do inside the model versus during post-processing after the fact?
Some of the trade-offs you should consider:
- Aggregation in Rust requires more engineering effort. You generally need to work with types and container data structures, which might be unfamiliar to programmers coming from dynamic languages like Python.
- Aggregation in Rust generally executes much faster than post-processing in Python or R.
- In the model you have access to the full context of the data, including input
parameters and person properties—all system state—at the time you are
recording the data. Consequently:
- you can do more sophisticated filtering of data;
- you can do computation or processing using the full context that might be difficult or impossible after the fact.
- Data processing in Python is easy to do and possibly a pre-existing skillset for the model author.
- Relying on post-processing might require very large datasets, possibly many gigabytes, which requires both disk space and processing time.
Case Study 2: A Report With Aggregation
In the Ixa source repository you will find the basic-infection example in the
examples/ directory. You can build and run this example with the following
command:
cargo run --example basic-infection
The incidence report implemented in
examples/basic-infection/src/incidence-report.rs is essentially the report of
the section Case Study 1: A Basic Report above, which records the current
time, PersonId, and infection status every time there is a change in a
person's InfectionStatus property. This obviously results in 2 × 1000 =
2000 rows of data, twice the population size, since each person makes two
transitions in this model.
But suppose what we really want is to plot the count of people having each
InfectionStatusValue at the end of each day over time.
We can easily compute this data from the existing incidence report in a post-processing step. But a more efficient approach is to do the aggregation within the model so that we write only exactly the data we need.
| time | susceptible_count | infected_count | recovered_count |
|---|---|---|---|
| 0.0 | 999 | 1 | 0 |
| 1.0 | 895 | 92 | 13 |
| 2.0 | 811 | 157 | 32 |
| 3.0 | 737 | 193 | 70 |
| 4.0 | 661 | 226 | 113 |
| ⋮ | ⋮ | ⋮ | ⋮ |
The MAX_TIME is set to 300 for this model, so this will result in only 301
rows of data (counting "day 0").
Aggregation
We could count how many people are in each category every time we write a row to the report, but it is much faster and more efficient to just keep track of the counts. We use a data plugin for this purpose:
struct AggregateSIRDataContainer {
susceptible_count: usize,
infected_count: usize,
recovered_count: usize,
}
define_data_plugin!(AggregateSIRData, AggregateSIRDataContainer,
AggregateSIRDataContainer {
susceptible_count: 0,
infected_count: 0,
recovered_count: 0,
}
);
We need to initialize the susceptible_count in an init function:
pub fn init(context: &mut Context) {
// Initialize `susceptible_count` with population size.
let susceptible = context.get_current_population();
let container = context.get_data_mut(AggregateSIRData);
container.susceptible_count = susceptible;
// ...
}
And we need to update these counts whenever a person transitions from one category to another:
fn handle_infection_status_change(context: &mut Context, event: InfectionStatusEvent) {
match (event.previous, event.current) {
(InfectionStatusValue::S, InfectionStatusValue::I) => {
// A person moved from susceptible to infected.
let container = context.get_data_mut(AggregateSIRData);
container.susceptible_count -= 1;
container.infected_count += 1;
}
(InfectionStatusValue::I, InfectionStatusValue::R) => {
// A person moved from infected to recovered.
let container = context.get_data_mut(AggregateSIRData);
container.infected_count -= 1;
container.recovered_count += 1;
}
(_, _) => {
// No other transitions are possible.
unreachable!("Unexpected infection status change.");
}
}
}
We need to wire up this event handler to the InfectionStatusEvent in the
init function:
pub fn init(context: &mut Context) {
// Initialize `susceptible_count` with population size....
// Wire up the `InfectionStatusEvent` to the handler.
context.subscribe_to_event::<InfectionStatusEvent>(handle_infection_status_change);
// ...
}
That is everything needed for the bookkeeping.
aggregation inside the model
The aggregation step often doesn't look like aggregation inside the model, because we can accumulate values as events occur instead of aggregating values after the fact.
Reporting
Now we tackle the reporting. We need a struct to represent a row of data:
// The report item, one row of data.
#[derive(Serialize)]
struct AggregateSIRReportItem {
time: f64,
susceptible_count: usize,
infected_count: usize,
recovered_count: usize,
}
// Tell ixa it is a report item.
define_report!(AggregateSIRReportItem);
Now we initialize the report with the context, and we add a periodic plan to
write to the report at the end of every day. The complete init() function is:
pub fn init(context: &mut Context) {
// Initialize `susceptible_count` with population size.
let susceptible = context.get_current_population();
let container = context.get_data_mut(AggregateSIRData);
container.susceptible_count = susceptible;
// Wire up the `InfectionStatusEvent` to the handler.
context.subscribe_to_event::<InfectionStatusEvent>(handle_infection_status_change);
// Initialize the report.
context.add_report::<AggregateSIRReportItem>("aggregate_sir_report")
.expect("Failed to add report");
// Write data to the report every simulated day.
context.add_periodic_plan_with_phase(
1.0, // A period of 1 day
write_aggregate_sir_report_item,
ExecutionPhase::Last // Execute the plan at the end of the simulated day.
);
}
The implementation of write_aggregate_sir_report_item is straightforward: We
fetch the values from the data plugin, construct an instance of
AggregateSIRReportItem, and "send" it to the report.
fn write_aggregate_sir_report_item(context: &mut Context) {
let time = context.get_current_time();
let container = context.get_data_mut(AggregateSIRData);
let report_item = AggregateSIRReportItem {
time,
susceptible_count: container.susceptible_count,
infected_count: container.infected_count,
recovered_count: container.recovered_count,
};
context.send_report(report_item);
}
Exercise:
- The
aggregate_sir_reportis 301 lines long, one row every day untilMAX_TIME=300, but most of the rows are of the form#, 0, 0, 1000, because the entire population is recovered long before we reachMAX_TIME. This is pretty typical of periodic reports—you get a lot of data you don't need at the end of the simulation. Add a simple filter towrite_aggregate_sir_report_itemso that theaggregate_sir_reportonly contains the data we actually want, about ~90 rows (the first ~90 days). Don't just filter on the day, because the "last" day can change with a different random seed or changes to the model.