Jetstream Data Products

Jetstream writes analysis results and enrollments information to BigQuery. Statistics data and metadata gets exported to GCS to make it accessible to the Experimenter console.

BigQuery Datasets

Results Datasets

The datasets that back the Experimenter results dashboards are available in BigQuery in the mozanalysis dataset in moz-fx-data-experiments. Technical documentation is available in the Mozilla data docs.

Monitoring Datasets

Datasets used for monitoring the operation of Jetstream are part of the monitoring dataset in moz-fx-data-experiments.

Error Logs

Jetstream logs errors and warning encountered during its analysis runs to monitoring.logs. This datasets is used as basis for the Jetstream error dashboard and for setting up alerts.

The logs table has the following schema:

Column name	Type	Description
`timestamp`	`TIMESTAMP`	Timestamp of when the log event was recorded
`experiment`	`STRING`	Experiment slug for which event was recorded
`message`	`STRING`	Log message
`log_level`	`STRING`	Log level: ERROR, WARNING
`exception`	`STRING`	Raised exception object
`filename`	`STRING`	Name the Jetstream code file the exception was raised
`func_name`	`STRING`	Name the Jetstream function the exception was raised
`exception_type`	`STRING`	Class name the exception raised

Query Cost

The monitoring.query_cost_v1 dataset contains the cost of each query run when analysing experiments. The dataset is updated daily and scrapes the cost information from the BigQuery logs. The query for determining the costs is part of bigquery-etl. The dataset is basis for the jetstream cost monitoring dashboard and alerts set up to send notifications when an analysis query exceeds a certain threshold.

The query_cost_v1 table has the following schema:

Column name	Type	Description
`submission_timestamp`	`TIMESTAMP`	Timestamp of when the query was executed
`destination_table`	`STRING`	Name of the table query was writing data to
`query`	`STRING`	SQL of the executed query
`total_bytes_processed`	`INT64`	Number of bytes the query processed
`cost_usd`	`FLOAT`	Cost of the query in USD based on BigQuery pricing

Experimenter Experiments

For monitoring Nimbus experiments, some common failure cases are exposed as part of the Experiments Enrollments dashboard. These monitoring rules will require access to collected experiments enrollment data which is available in monitoring.experimenter_experiments_v1. This dataset is part of bigquery-etl and updated every 10 minutes by fetching data from the Experimenter API.

GCS Data Export

Jetstream exports statistics data and metadata of analysed experiments to the mozanalysis GCS bucket.

Statistics Data

After each analysis run has completed, Jetstream exports the statistics results of each experiments to the statistics sub-directory as JSON. The JSON files follow the naming format:

statistics_<experiment_slug>_<period>.json

Each file contains a JSON object for every row in the corresponding statistics table. The JSON files are pulled in by Experimenter and used for visualizing results on the Experimenter results page.

Metadata

Metadata of analyzed experiments contains information about all metrics and outcomes that are computed during any analysis period. Metadata is written to JSON files into the metadata sub-directory with the following naming schema:

metadata_<experiment_slug>.json

Each JSON metadata file contains the following information:

"metrics": {
    "metric_slug": {
        "friendly_name": "Friendly metric name",
        "description": "Metric description defined in mozanalysis or metric-hub",
        "bigger_is_better": true
    }
    // ...
},
"outcomes": {
    "outcome_slug": {
        "slug": "outcome_slug",
        "friendly_name": "Friendly outcome name",
        "description": "Outcome description defined in metric-hub",
        "metrics": [    // metrics computed as part of outcome
            "metric_slug",
            "another_metric_slug"
        ],
        // commit hash of outcome version that was used in analysis
        "commit_hash": "74e45eb4c3bf4ea7f1d65f888a70bfa0f6a86c1e" 
    }
    // ...
}

Metadata of metrics and outcomes is used to show names, descriptions and whether larger numbers are better in the Experimenter results.

BigQuery Datasets​

Results Datasets​

Monitoring Datasets​

Error Logs​

Query Cost​

Experimenter Experiments​

GCS Data Export​

Statistics Data​

Metadata​