1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
>>> dp.enable_features("contrib")
>>> import polars as pl
We'll imagine an elementary school is taking a pet census.
The private census data will have two columns:
>>> lf_domain = dp.lazyframe_domain([
... dp.series_domain("grade", dp.atom_domain(T=dp.i32)),
... dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])
We also need to specify the column we'll be grouping by.
>>> lf_domain_with_margin = dp.with_margin(
... lf_domain,
... dp.polars.Margin(
... by=[pl.col("grade")],
... invariant="keys",
... max_length=50))
With that in place, we can plan the Polars computation, using the ``dp`` plugin.
>>> plan = (
... pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32})
... .group_by("grade")
... .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))
We now have all the pieces to make our measurement function using `make_private_lazyframe`:
>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe(
... input_domain=lf_domain_with_margin,
... input_metric=dp.symmetric_distance(),
... output_measure=dp.max_divergence(),
... lazyframe=plan,
... global_scale=1.0)
It's only at this point that we need to introduce the private data.
>>> df = pl.from_records(
... [
... [0, 0], # No kindergarteners with pets.
... [0, 0],
... [0, 0],
... [1, 1], # Each first grader has 1 pet.
... [1, 1],
... [1, 1],
... [2, 1], # One second grader has chickens!
... [2, 1],
... [2, 9]
... ],
... schema=['grade', 'pet_count'], orient="row")
>>> lf = pl.LazyFrame(df)
>>> results = dp_sum_pets_by_grade(lf).collect()
>>> print(results.sort("grade")) # doctest: +ELLIPSIS
shape: (3, 2)
┌───────┬───────────┐
│ grade ┆ pet_count │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════╪═══════════╡
│ 0 ┆ ... │
│ 1 ┆ ... │
│ 2 ┆ ... │
└───────┴───────────┘