spatial_join/
lib.rs

1//! `spatial-join` provides tools to perform streaming geospatial-joins on geographic data.
2//!
3//! ## Spatial Joins
4//!
5//! Given two sequences of geospatial shapes, `small` and `big`, a
6//! spatial-join indicates which elements of `small` and `big`
7//! intersect. You could compute this yourself using a nested loop,
8//! but like any good spatial-join package, this one uses
9//! [R-trees](https://en.wikipedia.org/wiki/R-tree) to dramatically
10//! reduce the search space.
11//!
12//! We're not limited to intersections only! We can also find pairs
13//! where elements of `small` contain elements of `big` or are within
14//! elements of `big` by passing different values of
15//! [Interaction](./enum.Interaction.html).
16
17//! ## Proximity Maps
18//!
19//! While spatial join is a well known term, proximity map is
20//! not. Given two sequences of shapes `small` and `big`, it just
21//! finds all pairs of items whose distance is less than some
22//! threshold. You set that threshold using the
23//! [`max_distance`](./struct.Config.html#method.max_distance) method
24//! on the [`Config`](./struct.Config.html) struct.
25//!
26//! ## Inputs
27//!
28//! Inputs are sequences of shapes, and shapes must be one of the
29//! following elements from the
30//! [`geo`](https://docs.rs/geo/latest/geo/) crate:
31//! * [points](https://docs.rs/geo/latest/geo/struct.Point.html),
32//! * [lines](https://docs.rs/geo/latest/geo/struct.Line.html),
33//! * [line strings](https://docs.rs/geo/latest/geo/struct.LineString.html),
34//! * [polygons](https://docs.rs/geo/latest/geo/struct.Polygon.html),
35//! * [rectangles](https://docs.rs/geo/latest/geo/struct.Rect.html),
36//! * [triangles](https://docs.rs/geo/latest/geo/struct.Triangle.html), or
37//! * the [Geometry](https://docs.rs/geo/latest/geo/enum.Geometry.html) enum
38//!
39//! `MultiPoint`, `MultiLineString`, and `MultiPolygon` are *not* supported.
40//!
41//! While the [geo] crate makes these types generic over the
42//! coordinate type, `spatial-join` only supports [geo] types
43//! parametrized with [std::f64] coordinate types (i.e.,
44//! `Polygon<f64>`).
45//!
46//! So what kind of sequences can you use?
47//! * slices: `&[T]`,
48//! * vectors: `Vec<T>` or `&Vec<T>`, or
49//! * [`&geo::GeometryCollection`](https://docs.rs/geo/latest/geo/struct.GeometryCollection.html)
50//!
51//! In addition:
52//! * all coordinate values must be finite
53//! * `LineStrings` must have at least two points
54//! * `Polygon` exteriors must have at least three points
55//!
56//! Input that doesn't meet these conditions will return an [error](./enum.Error.html).
57//!
58//! ## Outputs
59//!
60//! [`SpatialIndex::spatial_join`](./struct.SpatialIndex.html#method.spatial_join) returns `Result<impl
61//! Iterator<Item=SJoinRow>, Error>` where
62//! [`SJoinRow`](./struct.SJoinRow.html) gives you indexes into
63//! `small` and `big` to find the corresponding geometries.
64//!
65//! Alternatively, you can use [`SpatialIndex::spatial_join_with_geos`](./struct.SpatialIndex.html#method.spatial_join_with_geos)
66//! which returns `Result<impl Iterator<Item=SJoinGeoRow>, Error>`.
67//! [`SJoinGeoRow`](./struct.SJoinGeoRow.html) differs from
68//! [`SJoinRow`](./struct.SJoinRow.html) only in the addition of `big`
69//! and `small`
70//! [`Geometry`](https://docs.rs/geo/latest/geo/enum.Geometry.html)
71//! fields so you can work directly with the source geometries without
72//! having to keep the original sequences around. This convenience
73//! comes at the cost of cloning the source geometries which can be
74//! expensive for geometries that use heap storage like `LineString`
75//! and `Polygon`.
76//!
77//! In a similar manner, [`SpatialIndex::proximity_map`](./struct.SpatialIndex.html#method.proximity_map) and
78//! [`SpatialIndex::proximity_map_with_geos`](./struct.SpatialIndex.html#method.proximity_map) offer
79//! [`ProxMapRow`](./struct.ProxMapRow.html) and
80//! [`ProxMapGeoRow`](./struct.ProxMapGeoRow.html) iterators in their
81//! return types. These differ from their `SJoin` counterparts only in
82//! the addition of a `distance` field.
83//!
84//! ## Examples
85//!
86//! Here's the simplest thing: let's verify that a point intersects itself.
87//! ```
88//! use spatial_join::*;
89//! use geo::{Geometry, Point};
90//! fn foo() -> Result<(), Error> {
91//!     // Create a new spatial index loaded with just one point
92//!     let idx = Config::new()
93//!         // Ask for a serial index that will process data on only one core
94//!         .serial(vec![Geometry::Point(Point::new(1.1, 2.2))])?;
95//!     let results: Vec<_> = idx
96//!         .spatial_join(
97//!             vec![Geometry::Point(Point::new(1.1, 2.2))],
98//!             Interaction::Intersects,
99//!         )?
100//!         .collect(); // we actually get an iterator, but let's collect it into a Vector.
101//!     assert_eq!(
102//!         results,
103//!         vec![SJoinRow {
104//!             big_index: 0,
105//!             small_index: 0
106//!         }]);
107//!     Ok(())
108//! }
109//! foo();
110//! ```
111//!
112//! For a slightly more complicated, we'll take a box and a smaller
113//! box and verify that the big box contains the smaller box, and
114//! we'll do it all in parallel.
115//! ```
116//! #[cfg(feature = "parallel")] {
117//!     use spatial_join::*;
118//!     use geo::{Coordinate, Geometry, Point, Rect};
119//!     use rayon::prelude::*;
120//!
121//!     fn bar() -> Result<(), Error> {
122//!         let idx = Config::new()
123//!              .parallel(vec![Geometry::Rect(Rect::new(
124//!                  Coordinate { x: -1., y: -1. },
125//!                  Coordinate { x: 1., y: 1. },
126//!              ))])?;
127//!          let results: Vec<_> = idx
128//!              .spatial_join(
129//!                  vec![Geometry::Rect(Rect::new(
130//!                      Coordinate { x: -0.5, y: -0.5 },
131//!                      Coordinate { x: 0.5, y: 0.5 },
132//!              ))],
133//!                  Interaction::Contains,
134//!              )?
135//!              .collect();
136//!          assert_eq!(
137//!              results,
138//!              vec![SJoinRow {
139//!                  big_index: 0,
140//!                  small_index: 0
141//!              }]
142//!          );
143//!          Ok(())
144//!     }
145//!     bar();
146//! }
147//! ```
148//!
149//! ## Crate Features
150//!
151//! - `parallel`
152//!   - Enabled by default.
153//!   - This adds a dependency on
154//!     [`rayon`](https://crates.io/crates/rayon) and provides a
155//!     [`parallel`](./struct.Config.html#method.parallel) method that
156//!     returns a [`ParSpatialIndex`](./struct.ParSpatialIndex.html)
157//!     just like the [`SpatialIndex`](./struct.SpatialIndex.html)
158//!     that [`serial`](./struct.Config.html#method.serial) returns
159//!     except that all the methods return `Result<impl
160//!     ParallelIterator>` instead of `Result<impl Iterator>`.
161//!
162//! ## Geographic
163//!
164//! Right now, this entire crate assumes that you're dealing with
165//! euclidean geometry on a two-dimensional plane. But that's unusual:
166//! typically you've got geographic coordinates (longitude and
167//! latitude measured in decimal degrees). To use the tools in this
168//! package correctly, you should really reproject your geometries
169//! into an appropriate euclidean coordinate system. That might be
170//! require you to do a lot of extra work if the extent of your
171//! geometry sets exceeds what any reasonable projection can handle.
172//!
173//! Alternatively, you can just pretend that geodetic coordinates are
174//! euclidean. For spatial-joins that will mostly work if all of your
175//! geometries steer well-clear of the anti-meridian (longitude=±180
176//! degrees) and the polar regions as well.
177//!
178//! For proximity maps, you'll need to pick an appropriate
179//! `max_distance` value measured in decimal degrees which will be
180//! used for both longitude and latitude offsets
181//! simulataneously. That's challenging because while one degree of
182//! latitude is always the same (about 110 km), one degree of
183//! longitude changes from about 110 km at the equator to 0 km at the
184//! poles. If your geometry sets have a narrow extant and are near the
185//! equator, you might be able to find a `max_distance` value that
186//! works, but that's pretty unlikely.
187//!
188//! ## Performance
189//!
190//! * You'll notice that our API specifies geometry sequences in terms
191//!   of `small` and `big`. In order to construct a spatial index
192//!   object, we have to build a series of R-trees, one per geometry
193//!   type, using bulk loading. This process is expensive
194//!   (`O(n*log(n))`) so you'll probably get better overall performance
195//!   if you index the smaller sequence.
196//! * Because the spatial-join and proximity-map operations are
197//!   implemented as iterators, you can process very large data-sets
198//!   with low memory usage. But you do need to keep both the `small`
199//!   and `large` geometry sequence in memory, in addition to rtrees
200//!   for the `small` sequence. Note that in some cases, specifically
201//!   whenever we're processing a heap-bound element of the `large`
202//!   sequence (i.e., Polygons or LineStrings), we will buffer all
203//!   matching result records for each such `large` geometry.
204//! * If you use a non-zero `max_distance` value, then any
205//!   spatial-join operations will be somewhat slower since
206//!   `max_distance` effectively buffers `small` geometries in the
207//!   r-trees. You'll still get the correct answer, but it might take
208//!   longer. The larger the `max_distance` value, the longer it will
209//!   take.
210//!
211//! ## License
212//!
213//! Licensed under either of
214//!
215//!  * Apache License, Version 2.0
216//!    ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
217//!  * MIT license
218//!    ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
219//!
220//! at your option.
221//!
222//! ## Contribution
223//!
224//! Unless you explicitly state otherwise, any contribution intentionally submitted
225//! for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
226//! dual licensed as above, without any additional terms or conditions.
227//!
228
229use rstar::RTree;
230
231mod structs;
232pub use structs::*;
233
234mod validation;
235
236mod conv;
237
238mod relates;
239
240mod rtrees;
241use rtrees::FakeRegion;
242
243#[derive(Debug)]
244pub struct SpatialIndex {
245    small: SplitGeoSeq,
246    point_tree: RTree<FakeRegion>,
247    line_tree: RTree<FakeRegion>,
248    poly_tree: RTree<FakeRegion>,
249    ls_tree: RTree<FakeRegion>,
250    rect_tree: RTree<FakeRegion>,
251    tri_tree: RTree<FakeRegion>,
252    config: Config,
253}
254
255#[cfg(feature = "parallel")]
256pub struct ParSpatialIndex(SpatialIndex);
257
258mod index;
259
260#[cfg(test)]
261mod naive;
262
263#[cfg(test)]
264mod proptests;
265
266#[cfg(test)]
267mod tests {
268    use std::convert::TryInto;
269
270    use geo::Point;
271    use pretty_assertions::assert_eq;
272
273    #[cfg(feature = "parallel")]
274    use rayon::prelude::*;
275
276    use super::*;
277    use index::*;
278
279    pub fn test_prox_map<Small, Big, E1, E2>(
280        config: Config,
281        small: Small,
282        big: Big,
283        expected: &Vec<ProxMapRow>,
284    ) where
285        Small: TryInto<SplitGeoSeq, Error = E1> + Clone,
286        Big: TryInto<SplitGeoSeq, Error = E2> + Clone,
287        E1: std::any::Any + std::fmt::Debug,
288        E2: std::any::Any + std::fmt::Debug,
289    {
290        //assert!(expected.is_sorted());
291        let small_geoms = sgs_try_into(small.clone())
292            .expect("small conversion")
293            .to_vec();
294        let big_geoms = sgs_try_into(big.clone()).expect("big conversion").to_vec();
295        let expected_geoms: Vec<_> = expected
296            .iter()
297            .map(|pmr| ProxMapGeoRow {
298                big_index: pmr.big_index,
299                small_index: pmr.small_index,
300                distance: pmr.distance,
301                big: big_geoms[pmr.big_index].clone(),
302                small: small_geoms[pmr.small_index].clone(),
303            })
304            .collect();
305        let _expected_geoms2 = expected_geoms.clone();
306
307        let si = config
308            .clone()
309            .serial(small.clone())
310            .expect("construction succeeded");
311        let mut actual = si.proximity_map(big.clone()).unwrap().collect::<Vec<_>>();
312        actual.sort();
313        assert_eq!(actual, *expected);
314
315        let mut actual_geoms = si
316            .proximity_map_with_geos(big.clone())
317            .unwrap()
318            .collect::<Vec<_>>();
319        actual_geoms.sort();
320        assert_eq!(actual_geoms, expected_geoms);
321    }
322
323    #[cfg(feature = "parallel")]
324    pub fn test_par_prox_map<Small, Big, E1, E2>(
325        config: Config,
326        small: Small,
327        big: Big,
328        expected: &Vec<ProxMapRow>,
329    ) where
330        Small: TryInto<Par<SplitGeoSeq>, Error = E1> + Clone,
331        Big: TryInto<Par<SplitGeoSeq>, Error = E2> + Clone,
332        E1: std::any::Any + std::fmt::Debug,
333        E2: std::any::Any + std::fmt::Debug,
334    {
335        let small_geoms = par_sgs_try_into(small.clone())
336            .expect("small conversion")
337            .to_vec();
338        let big_geoms = par_sgs_try_into(big.clone())
339            .expect("big conversion")
340            .to_vec();
341        let expected_geoms: Vec<_> = expected
342            .iter()
343            .map(|pmr| ProxMapGeoRow {
344                big_index: pmr.big_index,
345                small_index: pmr.small_index,
346                distance: pmr.distance,
347                big: big_geoms[pmr.big_index].clone(),
348                small: small_geoms[pmr.small_index].clone(),
349            })
350            .collect();
351        let _expected_geoms2 = expected_geoms.clone();
352
353        let si = config
354            .clone()
355            .parallel(small.clone())
356            .expect("construction succeeded");
357        let mut actual = si.proximity_map(big.clone()).unwrap().collect::<Vec<_>>();
358        actual.sort();
359        assert_eq!(actual, *expected);
360
361        let mut actual_geoms = si
362            .proximity_map_with_geos(big.clone())
363            .unwrap()
364            .collect::<Vec<_>>();
365        actual_geoms.sort();
366        assert_eq!(actual_geoms, expected_geoms);
367    }
368
369    pub fn test_spatial_join<Small, Big, E1, E2>(
370        config: Config,
371        small: Small,
372        big: Big,
373        interaction: Interaction,
374        expected: &Vec<SJoinRow>,
375    ) where
376        Small: TryInto<SplitGeoSeq, Error = E1> + Clone,
377        Big: TryInto<SplitGeoSeq, Error = E2> + Clone,
378        E1: std::any::Any + std::fmt::Debug,
379        E2: std::any::Any + std::fmt::Debug,
380    {
381        let small_geoms = sgs_try_into(small.clone())
382            .expect("small conversion")
383            .to_vec();
384        let big_geoms = sgs_try_into(big.clone()).expect("big conversion").to_vec();
385        let expected_geoms: Vec<_> = expected
386            .iter()
387            .map(|sjr| SJoinGeoRow {
388                big_index: sjr.big_index,
389                small_index: sjr.small_index,
390                big: big_geoms[sjr.big_index].clone(),
391                small: small_geoms[sjr.small_index].clone(),
392            })
393            .collect();
394        let _expected_geoms2 = expected_geoms.clone();
395
396        let si = config
397            .clone()
398            .serial(small.clone())
399            .expect("construction succeeded");
400        let mut actual = si
401            .spatial_join(big.clone(), interaction)
402            .unwrap()
403            .collect::<Vec<_>>();
404        actual.sort();
405        assert_eq!(actual, *expected);
406
407        let mut actual_geoms = si
408            .spatial_join_with_geos(big.clone(), interaction)
409            .unwrap()
410            .collect::<Vec<_>>();
411        actual_geoms.sort();
412        assert_eq!(actual_geoms, expected_geoms);
413    }
414
415    #[cfg(feature = "parallel")]
416    pub fn test_par_spatial_join<Small, Big, E1, E2>(
417        config: Config,
418        small: Small,
419        big: Big,
420        interaction: Interaction,
421        expected: &Vec<SJoinRow>,
422    ) where
423        Small: TryInto<Par<SplitGeoSeq>, Error = E1> + Clone,
424        Big: TryInto<Par<SplitGeoSeq>, Error = E2> + Clone,
425        E1: std::any::Any + std::fmt::Debug,
426        E2: std::any::Any + std::fmt::Debug,
427    {
428        let small_geoms = par_sgs_try_into(small.clone())
429            .expect("small conversion")
430            .to_vec();
431        let big_geoms = par_sgs_try_into(big.clone())
432            .expect("big conversion")
433            .to_vec();
434        let expected_geoms: Vec<_> = expected
435            .iter()
436            .map(|sjr| SJoinGeoRow {
437                big_index: sjr.big_index,
438                small_index: sjr.small_index,
439                big: big_geoms[sjr.big_index].clone(),
440                small: small_geoms[sjr.small_index].clone(),
441            })
442            .collect();
443        let _expected_geoms2 = expected_geoms.clone();
444
445        let si = config
446            .clone()
447            .parallel(small.clone())
448            .expect("construction succeeded");
449        let mut actual = si
450            .spatial_join(big.clone(), interaction)
451            .unwrap()
452            .collect::<Vec<_>>();
453        actual.sort();
454        assert_eq!(actual, *expected);
455
456        let mut actual_geoms = si
457            .spatial_join_with_geos(big.clone(), interaction)
458            .unwrap()
459            .collect::<Vec<_>>();
460        actual_geoms.sort();
461        assert_eq!(actual_geoms, expected_geoms);
462    }
463
464    #[test]
465    fn simple_index_self() {
466        let config = Config::new().max_distance(4.);
467        let small = vec![Point::new(1., 1.)];
468        let big = vec![Point::new(1., 1.)];
469        let expected = vec![ProxMapRow {
470            big_index: 0,
471            small_index: 0,
472            distance: 0.,
473        }];
474        test_prox_map(config, small.clone(), big.clone(), &expected);
475        #[cfg(feature = "parallel")]
476        test_par_prox_map(config, small, big, &expected);
477    }
478
479    #[test]
480    fn self_spatial_join_pair() {
481        let config = Config::new();
482        let pts = vec![
483            geo::Geometry::Point(Point::new(1., 1.)),
484            geo::Geometry::Point(Point::new(22., 22.)),
485        ];
486        let expected = vec![
487            SJoinRow {
488                big_index: 0,
489                small_index: 0,
490            },
491            SJoinRow {
492                big_index: 1,
493                small_index: 1,
494            },
495        ];
496        test_spatial_join(config, &pts, &pts, Interaction::Intersects, &expected);
497        #[cfg(feature = "parallel")]
498        test_par_spatial_join(config, &pts, &pts, Interaction::Intersects, &expected);
499    }
500
501    #[test]
502    fn simple_index_some_other() {
503        let config = Config::new().max_distance(4.);
504        let small = vec![Point::new(1., 1.)];
505        let big = vec![Point::new(2., 1.)];
506        let expected = vec![ProxMapRow {
507            big_index: 0,
508            small_index: 0,
509            distance: 1.0,
510        }];
511        test_prox_map(config, small.clone(), big.clone(), &expected);
512        #[cfg(feature = "parallel")]
513        test_par_prox_map(config, small, big, &expected);
514    }
515
516    #[test]
517    fn simple_index_none() {
518        let config = Config::new().max_distance(0.5);
519        let small = vec![Point::new(1., 1.)];
520        let big = vec![Point::new(2., 1.)];
521        let expected = vec![];
522        test_prox_map(config, small.clone(), big.clone(), &expected);
523        #[cfg(feature = "parallel")]
524        test_par_prox_map(config, small, big, &expected);
525    }
526    // for all pairs of types, verift that prox map finds and doesn't find depending on max_distance
527}