Pytest Plugin Quickstart¶

A walkthrough of the runnable demo at python/examples/pytest_plugin/. The demo is a self-contained pytest project that exercises every layer of the plugin's step tree: packages, modules, classes (including nested), parametrize axes, manual substeps, and gate markers. It also includes a tests directory that uses no Sift APIs, to show how the autouse fixtures capture plain pytest tests automatically.

For a conceptual reference (fixtures, ini flags, status semantics), see the Pytest Plugin guide.

Project layout¶

examples/pytest_plugin/
├── conftest.py                            # registers the plugin
├── pyproject.toml                         # pytest knobs + report name/test_case/metadata
├── .env.example                           # credential template
└── tests/
    ├── pytest_only/                       # subpackage step
    │   ├── __init__.py
    │   └── test_pytest_only_demo.py       # plain pytest, no Sift APIs
    └── with_sift/                         # subpackage step
        ├── __init__.py
        └── test_with_sift_demo.py         # measurements, substeps, classes, parametrize, gates

Every Python package (directory with __init__.py), test file, and test class above each test becomes its own parent step in the report tree.

`conftest.py`¶

A single pytest_plugins declaration loads the plugin. The default sift_client fixture reads SIFT_API_KEY / SIFT_GRPC_URI / SIFT_REST_URI from the environment. Set them in your shell, your CI secret store, or a local .env (pip install pytest-dotenv auto-loads it).

conftest.py

"""Project-level conftest for the pytest plugin demo.

A single ``pytest_plugins`` declaration is all that's needed — the plugin's
fixtures, hooks, and CLI options register through standard pytest machinery
from there.

The default ``sift_client`` fixture reads ``SIFT_API_KEY`` / ``SIFT_GRPC_URI``
/ ``SIFT_REST_URI`` from the environment. Set them however you prefer: your CI
secret store, your shell, or a local ``.env`` loaded by ``pytest-dotenv``
(``pip install pytest-dotenv`` and it auto-loads ``.env`` — no code here).
"""

pytest_plugins = ["sift_client.pytest_plugin"]

`pyproject.toml`¶

Pytest behavior knobs sit under [tool.pytest.ini_options], each commented at its default. Uncomment any line to opt out of a layer of the step tree. The report's display name, test_case, and free-form metadata are set under [tool.sift.pytest.report]; name and test_case accept template placeholders.

pyproject.toml

# Single config file for the demo. Pytest behavior lives under
# [tool.pytest.ini_options]; Sift report content lives under
# [tool.sift.pytest.report].

[tool.pytest.ini_options]
# Defaults give you the full step tree: every package, module, class, and
# parametrize axis becomes a parent step. These are the available knobs and
# their defaults — uncomment to opt out of a layer.
#
# sift_autouse = true              # autouse fixtures (default: true)
# sift_package_step = true         # Python package (dir with __init__.py) parent step (default: true)
# sift_module_step = true          # module (test file) parent step (default: true)
# sift_class_step = true           # class parent step incl. nested (default: true)
# sift_parametrize_nesting = true  # parametrize parent steps (default: true)
# sift_git_metadata = true         # git repo/branch/commit included on the report (default: true)

[tool.sift.pytest.report]
# Display name for the report. Placeholders: {target} {command} {args}
# {rootdir} {timestamp} {count} {git_repo} {git_branch} {git_commit}.
# Omit to use the default "{target} {timestamp}". {target} reflects what ran,
# from the collected tests, anchored to the project name: e.g.
# project/tests/test_x.py::test_y (single test, [param] stripped),
# project/tests/motor (several files' common dir), or project (whole suite).
name = "pytest-plugin demo ({count} tests) {timestamp}"
# Grouping key across runs (same placeholders available). Omit to default to
# {target} (what ran).
test_case = "pytest-plugin-demo"

[tool.sift.pytest.report.metadata]
# Free-form key/value metadata stamped on every report. Values keep their TOML
# type (string, int, float, bool).
ci_revision = 2
test_source = 'pytest-plugin-demo'

`.env.example`¶

.env.example

SIFT_API_KEY=your-api-key
SIFT_GRPC_URI=your-org.grpc.example.com
SIFT_REST_URI=https://your-org.rest.example.com

The pytest_only module¶

Plain pytest tests with no sift_client imports, no step fixture, no markers. Each one still becomes a leaf step in the report tree. The plugin's autouse fixtures capture pass/fail automatically.

tests/pytest_only/test_pytest_only_demo.py

"""Plain pytest tests are automatically captured by the plugin as steps.

No imports from ``sift_client`` or fixture usage required. Each test
becomes a step in the report tree: passing tests resolve to ``PASSED``,
failing tests to ``FAILED``. This allows integrating existing tests
with Sift Test Results without modification.
"""

import pytest


def test_passes():
    """Functions become steps in the report tree. The function docstring is used as the step description."""
    assert 1 + 1 == 2


@pytest.mark.parametrize("value", ["v1", "v2"])
def test_parametrize_without_step(value):
    """Parametrized tests are nested under a common step with sub steps for each permutation."""
    assert value.startswith("v")


class TestPytestClass:
    """Test classes are turned into parent steps for their methods. Class docstrings are used as step the description."""

    def test_method(self):
        assert True


def test_uses_a_pytest_fixture(tmp_path):
    """Normal pytest fixtures keep working the plugin doesn't intercept them."""
    (tmp_path / "marker").write_text("ok")
    assert (tmp_path / "marker").read_text() == "ok"


def test_assertion_failure_marks_step_failed():
    """An ``AssertionError`` resolves the Sift step as ``FAILED`` (no traceback attached)."""
    assert 1 + 1 == 3


@pytest.mark.skip(reason="Demonstrating the skip outcome")
def test_skipped():
    """Skipped tests resolve as ``SKIPPED`` in the Sift report."""
    pass


def test_unexpected_exception_marks_step_errored():
    """Non-``AssertionError`` exceptions resolve the Sift step as ``ERROR`` with the traceback attached."""
    raise ValueError("simulated environmental failure")

The with_sift module¶

Exercises the plugin's full surface: numeric / string / bool bounds, nested step.substep, @pytest.mark.sift_exclude, class steps with docstring descriptions, nested classes, stacked @pytest.mark.parametrize, and step.report_outcome.

tests/with_sift/test_with_sift_demo.py

"""End-to-end demo of the test-results features: measurements, substeps,
exclusion, classes, nested classes, and stacked parametrize."""

import pytest


def test_measurements(step) -> None:
    """Measurements are the first-class method for recording numeric, string, or bool bounds criteria and their outcomes. These show up in report steps.
    ``step.measure`` accepts numeric (min/max), string, or bool bounds.
    Names should be chosen that provide sufficient context, but general enough that similar/identical measurements
    across steps or reports can be compared.
    """
    step.measure(name="numeric_value", value=1.5, bounds={"min": 0.0, "max": 2.0})
    step.measure(name="string_label", value="ok", bounds="ok")
    step.measure(name="bool_flag", value=True, bounds=True)

    # Descriptions and metadata can also be provided to measurements.
    step.measure(
        name="numeric_value_2",
        value=1.5,
        bounds={"min": 0.0, "max": 2.0},
        description="Numeric that represents X, Y, Z",
        metadata={"subsystem": "A"},
    )

    # If you plan to link the pytest report to a Sift Run, you can also assign related channels for easy plotting in the app
    step.measure(
        name="numeric_value",
        value=1.5,
        bounds={"min": 0.0, "max": 2.0},
        channel_names=["channel_1", "channel_2"],
    )


def test_substeps(step) -> None:
    """``step.substep(...)`` opens child steps inside one test; substeps nest arbitrarily.
    This can be useful for grouping related measurements or for creating a more natural report structure
    without the need to create a new test, class, etc.

    Metadata can be attached at the step level by passing ``metadata=...`` to
    ``substep``; the same keyword is accepted by ``report_context.new_step``
    and propagates to the resulting ``TestStep``.

    A failed substep marks this step FAILED in the report without raising, so
    the end-of-test ``step.pytest_fail_if_step_failed()`` call is needed here
    too: it folds substep failures (not just direct measurements) into the
    pytest outcome.
    """
    with step.substep(name="phase_1", metadata={"phase_index": 1}) as s1:
        s1.measure(name="value", value=1.0, bounds={"min": 0.0, "max": 2.0})

    with step.substep(name="phase_2", metadata={"phase_index": 2}) as s2:
        with s2.substep(name="phase_2a") as s2a:
            s2a.measure(name="value", value=1.0, bounds={"min": 0.0, "max": 2.0})

    # Fails pytest if any substep above failed; no-op when they all passed.
    step.pytest_fail_if_step_failed()


def test_measure_series(step) -> None:
    """``measure_avg`` and ``measure_all`` are the series variants of ``measure``.

    Both accept a list, numpy array, or pandas series of numeric values.
    ``measure_avg`` records one row holding the mean of the series and
    bounds-checks it. ``measure_all`` evaluates every value individually and
    records one row per out-of-bounds element (in-bounds values are NOT
    recorded, keeping the report compact).
    """
    voltages = [4.95, 5.02, 5.01, 4.98, 5.00]
    step.measure_avg(
        name="voltage_mean",
        values=voltages,
        bounds={"min": 4.9, "max": 5.1},
        unit="V",
    )
    # All values are in-bounds here, so measure_all records nothing extra;
    # change one to e.g. 6.0 to see an out-of-bounds row appear.
    step.measure_all(
        name="voltage_samples",
        values=voltages,
        bounds={"min": 4.9, "max": 5.1},
        unit="V",
    )


def test_failed_measurement_marks_sift_step_failed(step) -> None:
    """An out-of-bounds measurement marks the Sift step as ``FAILED``
    without raising. The pytest test still passes (no assertion, no
    exception); the Sift report records bounds compliance while pytest
    records control flow.

    Use this pattern when measurements are diagnostic data you want to
    collect alongside the test result, even when some readings fall outside
    spec. See ``test_assert_passed_at_end`` below for the recommended way
    to also fail pytest when any measurement is out of bounds.
    """
    step.measure(
        name="voltage",
        value=99.0,  # outside the bounds below; marks the step FAILED in Sift
        bounds={"min": 0.0, "max": 10.0},
        unit="V",
    )


def test_pytest_fail_if_step_failed_at_end(step) -> None:
    """Recommended pattern: do every measurement and substep first, then call
    ``step.pytest_fail_if_step_failed()`` once at the end.

    Asserting on individual ``step.measure(...)`` calls raises
    ``AssertionError`` on the first failure, so any measurements after the
    failing one never run and never land in the Sift report. The end-of-test
    call is strictly better for diagnostic completeness: every measurement and
    substep is recorded, including the failures, and the aggregate result is
    then folded into the pytest outcome. It fails via ``pytest.fail`` rather
    than an assertion, so the failed step carries no assertion noise in
    ``error_info``.

    It fails on any failure the report would record: out-of-bounds
    measurements, failed substeps, and ``report_outcome`` failures. The ``b``
    measurement below is deliberately out of bounds. ``c`` still runs and is
    recorded; only the final call fails the test.
    """
    step.measure(name="a", value=1.0, bounds={"min": 0.0, "max": 2.0})
    step.measure(name="b", value=99.0, bounds={"min": 0.0, "max": 2.0})  # out of bounds
    step.measure(name="c", value=1.5, bounds={"min": 0.0, "max": 2.0})  # still recorded
    step.pytest_fail_if_step_failed()


def test_report_level_metadata(step, report_context) -> None:
    """Attach metadata to the run-wide ``TestReport`` via ``report_context.report.update(...)``.

    The same ``update({...})`` pattern works for any field on
    ``TestReportUpdate`` (``run_id``, ``serial_number``, ``part_number``,
    ``system_operator``, ``metadata``, ...). Useful for linking a session
    to a Sift Run or tagging the report with build / operator info at runtime.

    Updating ``metadata`` *replaces* the whole map server-side, so spread the
    report's current metadata first to add keys without dropping the entries
    configured under ``[tool.sift.pytest.report.metadata]`` (or the git
    metadata and auto-recorded ``pytest_command``).
    """
    report_context.report.update(
        {
            "metadata": {
                **report_context.report.metadata,
                "build_id": "v1.2.3",
                "operator": "ci",
            }
        }
    )
    step.measure(name="value", value=1.0, bounds={"min": 0.0, "max": 2.0})


@pytest.mark.sift_exclude
def test_excluded() -> None:
    """``sift_exclude`` runs the test in pytest but produces no Sift step."""
    assert True


class TestClassStep:
    """A test class becomes its own step in the report tree.

    This docstring becomes the description of the ``TestClassStep`` step.
    """

    @pytest.mark.parametrize("axis_a", ["a1", "a2"])
    @pytest.mark.parametrize("axis_b", ["b1", "b2"])
    def test_parametrize(self, step, axis_a: str, axis_b: str) -> None:
        """Stacked parametrize nests outer-to-inner in decorator-on-page order."""
        step.measure(name="value", value=1.0, bounds={"min": 0.0, "max": 2.0})

    class TestNested:
        """Nested classes produce nested class steps."""

        def test_report_outcome(self, step) -> None:
            """``step.report_outcome`` records a non-numeric pass/fail substep."""
            step.report_outcome(name="check", result=True, reason="value matched")


@pytest.fixture(
    scope="class",
    params=["1.4.2", "2.0.0-rc1"],
    ids=["stable", "beta"],
)
def firmware(request) -> str:
    """A class-scoped parametrized fixture: each value re-runs the whole class.

    ``ids=`` gives each value a human-readable label. The plugin uses that label
    for the step (``stable`` / ``beta``) instead of the default ``name=value``
    form (``firmware='1.4.2'``). A callable ``ids=`` factory works too — pytest
    calls it with each value to build the label.
    """
    return request.param


class TestScopedFixtureParam:
    """Higher-scoped parametrized fixtures lift to their scope in the tree.

    The ``firmware`` fixture is class-scoped, so pytest sets it up once per
    value for the whole class. The plugin places its parameter at that scope:
    just inside the class step, wrapping the methods, so each method runs once
    per value. Contrast this with the function-level ``@pytest.mark.parametrize``
    in ``TestClassStep.test_parametrize`` above, whose axes nest UNDER the test
    rather than above its methods. The same rule scales the ladder: a
    module-scoped fixture param lifts above the module's tests, a session-scoped
    one to the report root, and a ``@pytest.mark.parametrize(..., scope=...)``
    follows the scope it names.

    The steps here are named ``stable`` / ``beta`` because ``firmware`` declares
    ``ids=``; without it they would read ``firmware='1.4.2'`` / etc.
    """

    def test_boots(self, step, firmware: str) -> None:
        """Runs once per firmware revision, under that revision's step."""
        step.measure(name="boot_ok", value=True, bounds=True)

    def test_reports_version(self, step, firmware: str) -> None:
        """Also runs once per revision; both methods share each ``firmware`` step."""
        step.measure(name="firmware_rev", value=firmware, bounds=firmware)

Run it¶

Without Sift credentials¶

cd python/examples/pytest_plugin
pytest --sift-disabled -v

--sift-disabled makes the plugin a no-op transport: step.measure(...) still evaluates bounds and returns a real pass/fail boolean, but nothing contacts Sift and no log file is written. Useful for previewing the report tree or unit-testing measurement logic.

Against a real Sift org¶

cp .env.example .env
# Fill in SIFT_API_KEY / SIFT_GRPC_URI / SIFT_REST_URI
pytest -v

A TestReport shows up in Sift once the session finishes.

Offline (record now, replay later)¶

pytest --sift-offline --sift-output-dir=/tmp/sift-demo -v
# The summary panel prints the exact replay command and log path. Later, from
# anywhere with credentials:
import-test-result-log /tmp/sift-demo/a1b2c3/a1b2c3.jsonl

Expected report tree¶

With the plugin's defaults (every layer enabled), the demo produces:

TestReport (FAILED, since failures propagate up from leaves)
├── pytest_only                         ← package step (FAILED)
│   └── test_pytest_only_demo.py        ← module step (FAILED)
│       ├── test_passes                                              PASSED
│       ├── test_uses_a_pytest_fixture                               PASSED
│       ├── test_assertion_failure_marks_step_failed                 FAILED
│       ├── test_skipped                                             SKIPPED
│       ├── test_unexpected_exception_marks_step_errored             ERROR
│       ├── test_parametrize_without_step
│       │   ├── value='v1'                                           PASSED
│       │   └── value='v2'                                           PASSED
│       └── TestPytestClass
│           └── test_method                                          PASSED
└── with_sift                           ← package step (FAILED)
    └── test_with_sift_demo.py          ← module step (FAILED)
        ├── test_measurements                                        PASSED
        ├── test_substeps                                            PASSED
        │   ├── phase_1
        │   └── phase_2
        │       └── phase_2a
        │   (test_excluded: @sift_exclude, runs in pytest, NOT in tree)
        ├── test_measure_series                                      PASSED
        ├── test_failed_measurement_marks_sift_step_failed           FAILED  (pytest PASSED)
        ├── test_pytest_fail_if_step_failed_at_end                                FAILED  (pytest FAILED)
        ├── test_report_level_metadata                               PASSED
        ├── TestClassStep
        │   ├── test_parametrize
        │   │   ├── axis_a='a1'
        │   │   │   ├── axis_b='b1'                                  PASSED
        │   │   │   └── axis_b='b2'                                  PASSED
        │   │   └── axis_a='a2'
        │   │       ├── axis_b='b1'                                  PASSED
        │   │       └── axis_b='b2'                                  PASSED
        │   └── TestNested
        │       └── test_report_outcome
        │           └── check                                        PASSED
        └── TestScopedFixtureParam              ← class-scoped fixture param
            ├── stable                          ← ids= label (else firmware='1.4.2')
            │   ├── test_boots                                       PASSED
            │   └── test_reports_version                             PASSED
            └── beta
                ├── test_boots                                       PASSED
                └── test_reports_version                             PASSED

TestScopedFixtureParam shows two things. Scope-based placement: the class-scoped firmware fixture's parameter lifts to wrap the class methods (each runs once per value), unlike the function-level @pytest.mark.parametrize in TestClassStep, whose axes nest under the test. Module- and session-scoped fixture params lift higher still (above the module, and to the report root). And human-readable labels: firmware declares ids=["stable", "beta"], so the steps use those names instead of the default firmware='1.4.2' form (a list or a callable ids= factory both work, on parametrize axes as well as fixtures).

The pytest_only module deliberately includes one failing, one skipped, and one erroring test so the demo shows every TestStatus mapping (FAILED for assertions, SKIPPED for pytest.skip, ERROR for any other exception). The with_sift module shows two patterns for handling measurement results: test_failed_measurement_marks_sift_step_failed lets the test keep passing in pytest while the Sift step is FAILED (useful when measurements are diagnostic data you want to collect regardless of outcome); and test_pytest_fail_if_step_failed_at_end takes every measurement first and then calls step.pytest_fail_if_step_failed() once at the end, so every measurement still lands in the report even when one fails. The end-of-test call is the recommended pattern: it fails via pytest.fail (no assertion noise in error_info), and unlike asserting on an individual step.measure(...) call it does not short-circuit on the first failure and skip every measurement that follows. Expected pytest output is 20 passed, 3 failed, 1 skipped.

Flip any of the sift_*_step / sift_parametrize_nesting flags in pyproject.toml to false to collapse a layer.

Next steps¶

Pytest Plugin guide: conceptual reference covering fixtures, configuration, report structure, and pass/fail behavior.
The demo's README on GitHub mirrors this page and is the canonical source.