spatial_join/lib.rs
1//! `spatial-join` provides tools to perform streaming geospatial-joins on geographic data.
2//!
3//! ## Spatial Joins
4//!
5//! Given two sequences of geospatial shapes, `small` and `big`, a
6//! spatial-join indicates which elements of `small` and `big`
7//! intersect. You could compute this yourself using a nested loop,
8//! but like any good spatial-join package, this one uses
9//! [R-trees](https://en.wikipedia.org/wiki/R-tree) to dramatically
10//! reduce the search space.
11//!
12//! We're not limited to intersections only! We can also find pairs
13//! where elements of `small` contain elements of `big` or are within
14//! elements of `big` by passing different values of
15//! [Interaction](./enum.Interaction.html).
16
17//! ## Proximity Maps
18//!
19//! While spatial join is a well known term, proximity map is
20//! not. Given two sequences of shapes `small` and `big`, it just
21//! finds all pairs of items whose distance is less than some
22//! threshold. You set that threshold using the
23//! [`max_distance`](./struct.Config.html#method.max_distance) method
24//! on the [`Config`](./struct.Config.html) struct.
25//!
26//! ## Inputs
27//!
28//! Inputs are sequences of shapes, and shapes must be one of the
29//! following elements from the
30//! [`geo`](https://docs.rs/geo/latest/geo/) crate:
31//! * [points](https://docs.rs/geo/latest/geo/struct.Point.html),
32//! * [lines](https://docs.rs/geo/latest/geo/struct.Line.html),
33//! * [line strings](https://docs.rs/geo/latest/geo/struct.LineString.html),
34//! * [polygons](https://docs.rs/geo/latest/geo/struct.Polygon.html),
35//! * [rectangles](https://docs.rs/geo/latest/geo/struct.Rect.html),
36//! * [triangles](https://docs.rs/geo/latest/geo/struct.Triangle.html), or
37//! * the [Geometry](https://docs.rs/geo/latest/geo/enum.Geometry.html) enum
38//!
39//! `MultiPoint`, `MultiLineString`, and `MultiPolygon` are *not* supported.
40//!
41//! While the [geo] crate makes these types generic over the
42//! coordinate type, `spatial-join` only supports [geo] types
43//! parametrized with [std::f64] coordinate types (i.e.,
44//! `Polygon<f64>`).
45//!
46//! So what kind of sequences can you use?
47//! * slices: `&[T]`,
48//! * vectors: `Vec<T>` or `&Vec<T>`, or
49//! * [`&geo::GeometryCollection`](https://docs.rs/geo/latest/geo/struct.GeometryCollection.html)
50//!
51//! In addition:
52//! * all coordinate values must be finite
53//! * `LineStrings` must have at least two points
54//! * `Polygon` exteriors must have at least three points
55//!
56//! Input that doesn't meet these conditions will return an [error](./enum.Error.html).
57//!
58//! ## Outputs
59//!
60//! [`SpatialIndex::spatial_join`](./struct.SpatialIndex.html#method.spatial_join) returns `Result<impl
61//! Iterator<Item=SJoinRow>, Error>` where
62//! [`SJoinRow`](./struct.SJoinRow.html) gives you indexes into
63//! `small` and `big` to find the corresponding geometries.
64//!
65//! Alternatively, you can use [`SpatialIndex::spatial_join_with_geos`](./struct.SpatialIndex.html#method.spatial_join_with_geos)
66//! which returns `Result<impl Iterator<Item=SJoinGeoRow>, Error>`.
67//! [`SJoinGeoRow`](./struct.SJoinGeoRow.html) differs from
68//! [`SJoinRow`](./struct.SJoinRow.html) only in the addition of `big`
69//! and `small`
70//! [`Geometry`](https://docs.rs/geo/latest/geo/enum.Geometry.html)
71//! fields so you can work directly with the source geometries without
72//! having to keep the original sequences around. This convenience
73//! comes at the cost of cloning the source geometries which can be
74//! expensive for geometries that use heap storage like `LineString`
75//! and `Polygon`.
76//!
77//! In a similar manner, [`SpatialIndex::proximity_map`](./struct.SpatialIndex.html#method.proximity_map) and
78//! [`SpatialIndex::proximity_map_with_geos`](./struct.SpatialIndex.html#method.proximity_map) offer
79//! [`ProxMapRow`](./struct.ProxMapRow.html) and
80//! [`ProxMapGeoRow`](./struct.ProxMapGeoRow.html) iterators in their
81//! return types. These differ from their `SJoin` counterparts only in
82//! the addition of a `distance` field.
83//!
84//! ## Examples
85//!
86//! Here's the simplest thing: let's verify that a point intersects itself.
87//! ```
88//! use spatial_join::*;
89//! use geo::{Geometry, Point};
90//! fn foo() -> Result<(), Error> {
91//! // Create a new spatial index loaded with just one point
92//! let idx = Config::new()
93//! // Ask for a serial index that will process data on only one core
94//! .serial(vec![Geometry::Point(Point::new(1.1, 2.2))])?;
95//! let results: Vec<_> = idx
96//! .spatial_join(
97//! vec![Geometry::Point(Point::new(1.1, 2.2))],
98//! Interaction::Intersects,
99//! )?
100//! .collect(); // we actually get an iterator, but let's collect it into a Vector.
101//! assert_eq!(
102//! results,
103//! vec![SJoinRow {
104//! big_index: 0,
105//! small_index: 0
106//! }]);
107//! Ok(())
108//! }
109//! foo();
110//! ```
111//!
112//! For a slightly more complicated, we'll take a box and a smaller
113//! box and verify that the big box contains the smaller box, and
114//! we'll do it all in parallel.
115//! ```
116//! #[cfg(feature = "parallel")] {
117//! use spatial_join::*;
118//! use geo::{Coordinate, Geometry, Point, Rect};
119//! use rayon::prelude::*;
120//!
121//! fn bar() -> Result<(), Error> {
122//! let idx = Config::new()
123//! .parallel(vec![Geometry::Rect(Rect::new(
124//! Coordinate { x: -1., y: -1. },
125//! Coordinate { x: 1., y: 1. },
126//! ))])?;
127//! let results: Vec<_> = idx
128//! .spatial_join(
129//! vec![Geometry::Rect(Rect::new(
130//! Coordinate { x: -0.5, y: -0.5 },
131//! Coordinate { x: 0.5, y: 0.5 },
132//! ))],
133//! Interaction::Contains,
134//! )?
135//! .collect();
136//! assert_eq!(
137//! results,
138//! vec![SJoinRow {
139//! big_index: 0,
140//! small_index: 0
141//! }]
142//! );
143//! Ok(())
144//! }
145//! bar();
146//! }
147//! ```
148//!
149//! ## Crate Features
150//!
151//! - `parallel`
152//! - Enabled by default.
153//! - This adds a dependency on
154//! [`rayon`](https://crates.io/crates/rayon) and provides a
155//! [`parallel`](./struct.Config.html#method.parallel) method that
156//! returns a [`ParSpatialIndex`](./struct.ParSpatialIndex.html)
157//! just like the [`SpatialIndex`](./struct.SpatialIndex.html)
158//! that [`serial`](./struct.Config.html#method.serial) returns
159//! except that all the methods return `Result<impl
160//! ParallelIterator>` instead of `Result<impl Iterator>`.
161//!
162//! ## Geographic
163//!
164//! Right now, this entire crate assumes that you're dealing with
165//! euclidean geometry on a two-dimensional plane. But that's unusual:
166//! typically you've got geographic coordinates (longitude and
167//! latitude measured in decimal degrees). To use the tools in this
168//! package correctly, you should really reproject your geometries
169//! into an appropriate euclidean coordinate system. That might be
170//! require you to do a lot of extra work if the extent of your
171//! geometry sets exceeds what any reasonable projection can handle.
172//!
173//! Alternatively, you can just pretend that geodetic coordinates are
174//! euclidean. For spatial-joins that will mostly work if all of your
175//! geometries steer well-clear of the anti-meridian (longitude=±180
176//! degrees) and the polar regions as well.
177//!
178//! For proximity maps, you'll need to pick an appropriate
179//! `max_distance` value measured in decimal degrees which will be
180//! used for both longitude and latitude offsets
181//! simulataneously. That's challenging because while one degree of
182//! latitude is always the same (about 110 km), one degree of
183//! longitude changes from about 110 km at the equator to 0 km at the
184//! poles. If your geometry sets have a narrow extant and are near the
185//! equator, you might be able to find a `max_distance` value that
186//! works, but that's pretty unlikely.
187//!
188//! ## Performance
189//!
190//! * You'll notice that our API specifies geometry sequences in terms
191//! of `small` and `big`. In order to construct a spatial index
192//! object, we have to build a series of R-trees, one per geometry
193//! type, using bulk loading. This process is expensive
194//! (`O(n*log(n))`) so you'll probably get better overall performance
195//! if you index the smaller sequence.
196//! * Because the spatial-join and proximity-map operations are
197//! implemented as iterators, you can process very large data-sets
198//! with low memory usage. But you do need to keep both the `small`
199//! and `large` geometry sequence in memory, in addition to rtrees
200//! for the `small` sequence. Note that in some cases, specifically
201//! whenever we're processing a heap-bound element of the `large`
202//! sequence (i.e., Polygons or LineStrings), we will buffer all
203//! matching result records for each such `large` geometry.
204//! * If you use a non-zero `max_distance` value, then any
205//! spatial-join operations will be somewhat slower since
206//! `max_distance` effectively buffers `small` geometries in the
207//! r-trees. You'll still get the correct answer, but it might take
208//! longer. The larger the `max_distance` value, the longer it will
209//! take.
210//!
211//! ## License
212//!
213//! Licensed under either of
214//!
215//! * Apache License, Version 2.0
216//! ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
217//! * MIT license
218//! ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
219//!
220//! at your option.
221//!
222//! ## Contribution
223//!
224//! Unless you explicitly state otherwise, any contribution intentionally submitted
225//! for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
226//! dual licensed as above, without any additional terms or conditions.
227//!
228
229use rstar::RTree;
230
231mod structs;
232pub use structs::*;
233
234mod validation;
235
236mod conv;
237
238mod relates;
239
240mod rtrees;
241use rtrees::FakeRegion;
242
243#[derive(Debug)]
244pub struct SpatialIndex {
245 small: SplitGeoSeq,
246 point_tree: RTree<FakeRegion>,
247 line_tree: RTree<FakeRegion>,
248 poly_tree: RTree<FakeRegion>,
249 ls_tree: RTree<FakeRegion>,
250 rect_tree: RTree<FakeRegion>,
251 tri_tree: RTree<FakeRegion>,
252 config: Config,
253}
254
255#[cfg(feature = "parallel")]
256pub struct ParSpatialIndex(SpatialIndex);
257
258mod index;
259
260#[cfg(test)]
261mod naive;
262
263#[cfg(test)]
264mod proptests;
265
266#[cfg(test)]
267mod tests {
268 use std::convert::TryInto;
269
270 use geo::Point;
271 use pretty_assertions::assert_eq;
272
273 #[cfg(feature = "parallel")]
274 use rayon::prelude::*;
275
276 use super::*;
277 use index::*;
278
279 pub fn test_prox_map<Small, Big, E1, E2>(
280 config: Config,
281 small: Small,
282 big: Big,
283 expected: &Vec<ProxMapRow>,
284 ) where
285 Small: TryInto<SplitGeoSeq, Error = E1> + Clone,
286 Big: TryInto<SplitGeoSeq, Error = E2> + Clone,
287 E1: std::any::Any + std::fmt::Debug,
288 E2: std::any::Any + std::fmt::Debug,
289 {
290 //assert!(expected.is_sorted());
291 let small_geoms = sgs_try_into(small.clone())
292 .expect("small conversion")
293 .to_vec();
294 let big_geoms = sgs_try_into(big.clone()).expect("big conversion").to_vec();
295 let expected_geoms: Vec<_> = expected
296 .iter()
297 .map(|pmr| ProxMapGeoRow {
298 big_index: pmr.big_index,
299 small_index: pmr.small_index,
300 distance: pmr.distance,
301 big: big_geoms[pmr.big_index].clone(),
302 small: small_geoms[pmr.small_index].clone(),
303 })
304 .collect();
305 let _expected_geoms2 = expected_geoms.clone();
306
307 let si = config
308 .clone()
309 .serial(small.clone())
310 .expect("construction succeeded");
311 let mut actual = si.proximity_map(big.clone()).unwrap().collect::<Vec<_>>();
312 actual.sort();
313 assert_eq!(actual, *expected);
314
315 let mut actual_geoms = si
316 .proximity_map_with_geos(big.clone())
317 .unwrap()
318 .collect::<Vec<_>>();
319 actual_geoms.sort();
320 assert_eq!(actual_geoms, expected_geoms);
321 }
322
323 #[cfg(feature = "parallel")]
324 pub fn test_par_prox_map<Small, Big, E1, E2>(
325 config: Config,
326 small: Small,
327 big: Big,
328 expected: &Vec<ProxMapRow>,
329 ) where
330 Small: TryInto<Par<SplitGeoSeq>, Error = E1> + Clone,
331 Big: TryInto<Par<SplitGeoSeq>, Error = E2> + Clone,
332 E1: std::any::Any + std::fmt::Debug,
333 E2: std::any::Any + std::fmt::Debug,
334 {
335 let small_geoms = par_sgs_try_into(small.clone())
336 .expect("small conversion")
337 .to_vec();
338 let big_geoms = par_sgs_try_into(big.clone())
339 .expect("big conversion")
340 .to_vec();
341 let expected_geoms: Vec<_> = expected
342 .iter()
343 .map(|pmr| ProxMapGeoRow {
344 big_index: pmr.big_index,
345 small_index: pmr.small_index,
346 distance: pmr.distance,
347 big: big_geoms[pmr.big_index].clone(),
348 small: small_geoms[pmr.small_index].clone(),
349 })
350 .collect();
351 let _expected_geoms2 = expected_geoms.clone();
352
353 let si = config
354 .clone()
355 .parallel(small.clone())
356 .expect("construction succeeded");
357 let mut actual = si.proximity_map(big.clone()).unwrap().collect::<Vec<_>>();
358 actual.sort();
359 assert_eq!(actual, *expected);
360
361 let mut actual_geoms = si
362 .proximity_map_with_geos(big.clone())
363 .unwrap()
364 .collect::<Vec<_>>();
365 actual_geoms.sort();
366 assert_eq!(actual_geoms, expected_geoms);
367 }
368
369 pub fn test_spatial_join<Small, Big, E1, E2>(
370 config: Config,
371 small: Small,
372 big: Big,
373 interaction: Interaction,
374 expected: &Vec<SJoinRow>,
375 ) where
376 Small: TryInto<SplitGeoSeq, Error = E1> + Clone,
377 Big: TryInto<SplitGeoSeq, Error = E2> + Clone,
378 E1: std::any::Any + std::fmt::Debug,
379 E2: std::any::Any + std::fmt::Debug,
380 {
381 let small_geoms = sgs_try_into(small.clone())
382 .expect("small conversion")
383 .to_vec();
384 let big_geoms = sgs_try_into(big.clone()).expect("big conversion").to_vec();
385 let expected_geoms: Vec<_> = expected
386 .iter()
387 .map(|sjr| SJoinGeoRow {
388 big_index: sjr.big_index,
389 small_index: sjr.small_index,
390 big: big_geoms[sjr.big_index].clone(),
391 small: small_geoms[sjr.small_index].clone(),
392 })
393 .collect();
394 let _expected_geoms2 = expected_geoms.clone();
395
396 let si = config
397 .clone()
398 .serial(small.clone())
399 .expect("construction succeeded");
400 let mut actual = si
401 .spatial_join(big.clone(), interaction)
402 .unwrap()
403 .collect::<Vec<_>>();
404 actual.sort();
405 assert_eq!(actual, *expected);
406
407 let mut actual_geoms = si
408 .spatial_join_with_geos(big.clone(), interaction)
409 .unwrap()
410 .collect::<Vec<_>>();
411 actual_geoms.sort();
412 assert_eq!(actual_geoms, expected_geoms);
413 }
414
415 #[cfg(feature = "parallel")]
416 pub fn test_par_spatial_join<Small, Big, E1, E2>(
417 config: Config,
418 small: Small,
419 big: Big,
420 interaction: Interaction,
421 expected: &Vec<SJoinRow>,
422 ) where
423 Small: TryInto<Par<SplitGeoSeq>, Error = E1> + Clone,
424 Big: TryInto<Par<SplitGeoSeq>, Error = E2> + Clone,
425 E1: std::any::Any + std::fmt::Debug,
426 E2: std::any::Any + std::fmt::Debug,
427 {
428 let small_geoms = par_sgs_try_into(small.clone())
429 .expect("small conversion")
430 .to_vec();
431 let big_geoms = par_sgs_try_into(big.clone())
432 .expect("big conversion")
433 .to_vec();
434 let expected_geoms: Vec<_> = expected
435 .iter()
436 .map(|sjr| SJoinGeoRow {
437 big_index: sjr.big_index,
438 small_index: sjr.small_index,
439 big: big_geoms[sjr.big_index].clone(),
440 small: small_geoms[sjr.small_index].clone(),
441 })
442 .collect();
443 let _expected_geoms2 = expected_geoms.clone();
444
445 let si = config
446 .clone()
447 .parallel(small.clone())
448 .expect("construction succeeded");
449 let mut actual = si
450 .spatial_join(big.clone(), interaction)
451 .unwrap()
452 .collect::<Vec<_>>();
453 actual.sort();
454 assert_eq!(actual, *expected);
455
456 let mut actual_geoms = si
457 .spatial_join_with_geos(big.clone(), interaction)
458 .unwrap()
459 .collect::<Vec<_>>();
460 actual_geoms.sort();
461 assert_eq!(actual_geoms, expected_geoms);
462 }
463
464 #[test]
465 fn simple_index_self() {
466 let config = Config::new().max_distance(4.);
467 let small = vec![Point::new(1., 1.)];
468 let big = vec![Point::new(1., 1.)];
469 let expected = vec![ProxMapRow {
470 big_index: 0,
471 small_index: 0,
472 distance: 0.,
473 }];
474 test_prox_map(config, small.clone(), big.clone(), &expected);
475 #[cfg(feature = "parallel")]
476 test_par_prox_map(config, small, big, &expected);
477 }
478
479 #[test]
480 fn self_spatial_join_pair() {
481 let config = Config::new();
482 let pts = vec![
483 geo::Geometry::Point(Point::new(1., 1.)),
484 geo::Geometry::Point(Point::new(22., 22.)),
485 ];
486 let expected = vec![
487 SJoinRow {
488 big_index: 0,
489 small_index: 0,
490 },
491 SJoinRow {
492 big_index: 1,
493 small_index: 1,
494 },
495 ];
496 test_spatial_join(config, &pts, &pts, Interaction::Intersects, &expected);
497 #[cfg(feature = "parallel")]
498 test_par_spatial_join(config, &pts, &pts, Interaction::Intersects, &expected);
499 }
500
501 #[test]
502 fn simple_index_some_other() {
503 let config = Config::new().max_distance(4.);
504 let small = vec![Point::new(1., 1.)];
505 let big = vec![Point::new(2., 1.)];
506 let expected = vec![ProxMapRow {
507 big_index: 0,
508 small_index: 0,
509 distance: 1.0,
510 }];
511 test_prox_map(config, small.clone(), big.clone(), &expected);
512 #[cfg(feature = "parallel")]
513 test_par_prox_map(config, small, big, &expected);
514 }
515
516 #[test]
517 fn simple_index_none() {
518 let config = Config::new().max_distance(0.5);
519 let small = vec![Point::new(1., 1.)];
520 let big = vec![Point::new(2., 1.)];
521 let expected = vec![];
522 test_prox_map(config, small.clone(), big.clone(), &expected);
523 #[cfg(feature = "parallel")]
524 test_par_prox_map(config, small, big, &expected);
525 }
526 // for all pairs of types, verift that prox map finds and doesn't find depending on max_distance
527}