max_tree/
lib.rs

1#![deny(missing_docs)]
2
3//! # Max Tree
4//!
5//! A utility maximizer library based on a maximum tree structure.
6//!
7//! ### Motivation
8//!
9//! Make it easier to compare and compose algorithms for utility programming.
10//!
11//! ### Example: Moon Landing
12//!
13//! This example is used to improve and push the limits of library.
14//! Why? Because it is fun!
15//!
16//! First make it work, then gradually make the model more realistic over time.
17//!
18//! - Source: examples/moon.rs
19//! - Description: An experiment to use a greedy optimizer to land a spaceship on the Moon from Earth.
20//! - Status: Simplified model works (without crashing), missing lots of features.
21//! - PRs: Welcome!
22//!
23//! ![The Moon](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e1/FullMoon2010.jpg/800px-FullMoon2010.jpg)
24//!
25//! **If you can make AI land a spaceship on the Moon, then what else is possible?**
26//!
27//! ### Usage of this library
28//!
29//! *Notice! This code is for research purposes only and should NEVER be used in system critical applications!*
30//!
31//! **Warning! Improper usage of this library can lead to unsafe AI behavior.**
32//!
33//! All algorithms that support general unrestricted classical utility theory are unsafe AI designs.
34//! In particular, for ASI (Artificial Super Intelligence) design,
35//! this library is extremely unsafe without safety verification before use.
36//! For more information about safe ASI core design, see [asi_core0](https://github.com/advancedresearch/asi_core0).
37//!
38//! That said, have fun!
39//!
40//! ### Introduction
41//!
42//! A maximum tree stores the maximum utility of node or children for every node.
43//! It is a convenient data structure for comparing or composing different search algorithms.
44//!
45//! Once a maximum tree is constructed, searching for the best course of action is trivial.
46//! Since each node stores maximum utility, it is easy to compare children and decide what to do.
47//!
48//! - A leaf stores the utility as maximum utility
49//! - A node is terminal if it stores higher maximum utility than its children
50//! - A node's utility is forgotten (overriden) when a child has higher utility
51//!
52//! ### How to use this library
53//!
54//! This library contains only a minimum set of features,
55//! intended to be used as a core for more advanced custom algorithms:
56//!
57//! - `Ai::full` does a complete search, finding global maximum
58//! - `Ai::greedy` does a local search, finding local maximum
59//! - `Ai::sub_breadth` constructs children for every available action
60//!
61//! The `full` and `greedy` algorithms assumes determinism and perfect information in context.
62//! Basically, it means they should only be used in simulations or controlled environments.
63//!
64//! The `Ai::sub_breadth` is used as a common sub-procedure for several algorithms.
65//!
66//! For non-determinism, the maximum utility becomes maximum expected utility.
67//! This requires constructing the maximum tree with custom algorithms.
68//! For more information, see "Custom algorithms" below.
69//!
70//! ### Differences from reward accumulation
71//!
72//! A maximum tree does not accumulate rewards over actions.
73//! This means that only the final reward is optimized.
74//!
75//! However, it possible to simulate accumulated rewards.
76//!
77//! To accumulate rewards, one can use the node data to store utility.
78//! Just add the reward to accumulated rewards so far.
79//! The accumulated rewards are stored as maximum utility.
80//!
81//! Optimization for final reward has special terminal semantics.
82//! For more information, see "Terminal semantics" below.
83//!
84//! ### Discounting action depth
85//!
86//! By default, more steps to complete the goal is not penalized.
87//!
88//! Subtracting a tiny amount of utility proportional to depth
89//! will make the algorithm prioritize fewer steps to reach the goal.
90//!
91//! Since this is common behavior, one can activate this by setting
92//! `AiSettings::eps_depth` to e.g. `0.0000001`.
93//!
94//! ### Custom algorithms
95//!
96//! When the algorithms that are included with this library are too limiting,
97//! it is possible to write custom algorithms that constructs the maximum tree
98//! in other ways or performs different kinds of analysis.
99//!
100//! The maximum tree is designed to be convenient for composing different search algorithms.
101//!
102//! One can perform e.g. posterior safety analysis without side effects in the context.
103//!
104//! It is also possible to restore state of the context and continue search from any node,
105//! using a different search algorithm than the one used to construct the tree.
106//! The final maximum tree can be used with any analysis algorithm.
107//!
108//! Under non-determinism or hidden states in the context,
109//! the semantics of maximum utility changes slightly.
110//! This means that one can not expect soundness when composing algorithms
111//! unless the intersecting semantics is sound when exploring a sub-branch.
112//! It is not necessary that the intersecting semantics hold in general,
113//! but it must hold for the particular usage in the application.
114//!
115//! The most common use case of this library is for contexts where
116//! there is perfect information and undoing changes restores the environment perfectly.
117//! In most applications, this means simulating the entire world where the AI operates.
118//!
119//! ### Terminal semantics
120//!
121//! Under verification for safety, evaluation of the terminal semantics must be included.
122//! This library is not safe to use when the terminal semantics of a given application
123//! has been not been verified for safety.
124//!
125//! When a node is terminal, which is the case for any global maximum
126//! that do not have any children with equal maximum utility,
127//! one must pay careful attention to the semantics of achieving that goal.
128//!
129//! In the sense that a global maximum is rearched,
130//! it makes no sense to do so if e.g. the world ends and there nothing left to do.
131//!
132//! Reaching infinite utility in an infinitesimal of time does not
133//! correspond to a [Zen Rational](https://github.com/advancedresearch/path_semantics/blob/master/ai-sequences.md#zen-rationality)
134//! human intuition about meaningful goals.
135//! The reason for this is that when the true goal among many possible goals is uncertain,
136//! one risks excluding the true goal with high probability by optimizing for a single goal.
137//! Instead, according to higher order utilitariansim, one should optimize for a cluster of goals
138//! where each goal is reachable from any other.
139//! For more information, see [Groupoid Assumption of Multi-Goal Optimization](https://github.com/advancedresearch/path_semantics/blob/master/papers-wip/groupoid-assumption-of-multi-goal-optimization.pdf).
140//!
141//! Although classical utility theory can be used to achieve a single goal,
142//! this does not guarantee that achieving the goal is meaningful.
143//! This is only the case if and only if the true goal is reachable from the achieved goal.
144//! A true goal is defined in [Naive Zen Logic](https://github.com/advancedresearch/path_semantics/blob/master/papers-wip/naive-zen-logic.pdf) as:
145//!
146//! ```text
147//! true_goal(X) = (goal(X) ? .me) ? me
148//! ```
149//!
150//! It means, the true goal is the goal I believe I would have if I (the AI agent) were smarter.
151//!
152//! If the AI agent achieves global maximum and then self-improve,
153//! in hindsight it was only meaningful to reach global maximum if and only if the new goal is reachable.
154//! Therefore, achieving global maximum is meaningful if and only if the true goal is reachable
155//! from the state of global maximum.
156//!
157//! Accumulated rewards can cloud the judgement about terminal semantics.
158//! With accumulated rewards, there is nothing that predicts termination
159//! when the expected utility from the environment is uncertain.
160//! As long as there exists some non-terminating state with positive utility,
161//! there exists a course of action that might increase utility.
162//! Therefore, most AI agents who optimize accumulated rewards do not need to
163//! reason about the terminal semantics in the same way that AI agents that optimizes for final rewards.
164//! However, under self-improvement, accumulated rewards also requires higher order reasoning for safety.
165//!
166//! Since optimizing for the final reward is a strict superset of
167//! optimizing for accumulated rewards, the terminal semantics in the first case
168//! is a strict superset of the terminal semantics of the latter.
169//!
170//! One can include a term in the final reward that estimates future potential.
171//! If the term for future potential can be negative, then excluding it will lead to unsafety.
172//!
173//! ## License
174//!
175//! Licensed under either of
176//!  * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
177//!  * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
178//! at your option.
179//!
180//! ### Contribution
181//!
182//! Unless you explicitly state otherwise, any contribution intentionally submitted
183//! for inclusion in the work by you shall be dual licensed as above, without any
184//! additional terms or conditions.
185
186/// Reexports commonly used objects.
187pub mod prelude {
188    pub use super::{Ai, AiAnalysis, AiSettings, Node};
189}
190
191/// Stores action node (represented as a maximum tree).
192///
193/// Each node stores a maximum utility of itself or any children.
194///
195/// A terminal node has higher utility than any other children.
196#[derive(Debug)]
197pub struct Node<T, A> {
198    /// Stores maximum utility of itself or any children.
199    pub max: f64,
200    /// Stores node data.
201    pub data: T,
202    /// Stores child nodes.
203    ///
204    /// Each child has an associated action.
205    /// The associated action identifies the node.
206    /// This means the action should be unique among the children.
207    /// This invariant is enforced by trusted search algorithms.
208    /// Use `check_unique_actions` when the input is not trusted.
209    pub children: Vec<(A, Node<T, A>)>,
210}
211
212impl<T, A> Node<T, A> {
213    /// Creates a new root.
214    ///
215    /// This sets the utility to `NaN` (not a number).
216    /// There are no children, which must be added through search.
217    pub fn root(data: T) -> Node<T, A> {
218        Node {
219            max: std::f64::NAN,
220            data,
221            children: vec![]
222        }
223    }
224
225    /// Returns `true` if all actions among children are unique.
226    ///
227    /// This algorithm does not provide any proof of the collision among children,
228    /// since this use case is uncommon (the invariant is enforced by search algorithms).
229    pub fn check_unique_actions(&self) -> bool
230        where A: Eq + std::hash::Hash
231    {
232        use std::collections::HashSet;
233
234        let mut hash_set = HashSet::new();
235        for &(ref a, _) in &self.children {
236            if hash_set.contains(a) {return false}
237            hash_set.insert(a.clone());
238        }
239        true
240    }
241
242    /// Returns `true` if the node is terminal.
243    ///
244    /// A terminal node has no children with equal or greater utility.
245    /// This means that without paying attention to terminal semantics,
246    /// an unsafe AI design can lead to a "stranded" situation.
247    /// For more information, see the section "Terminal semantics"
248    /// at this crate's top level documentation.
249    pub fn terminal(&self) -> bool {self.optimal().is_none()}
250
251    /// Returns the optimal course of action, if any.
252    ///
253    /// Returns `None` if no children has higher utility.
254    /// This means the node is terminal.
255    pub fn optimal(&self) -> Option<usize> {
256        for (i, ch) in self.children.iter().enumerate() {
257            if ch.1.max >= self.max {return Some(i)}
258        }
259        None
260    }
261
262    /// Returns optimal path from root.
263    pub fn optimal_path(&self) -> Vec<usize> {
264        let mut node = self;
265        let mut res = vec![];
266        loop {
267            if let Some(i) = node.optimal() {
268                node = &node.children[i].1;
269                res.push(i);
270            } else {
271                break;
272            }
273        }
274        res
275    }
276}
277
278/// AI settings.
279pub struct AiSettings {
280    /// Maximum depth.
281    pub max_depth: usize,
282    /// Utility discount from action depth.
283    ///
284    /// This is usually a small positive number (e.g. `0.000001`).
285    pub eps_depth: f64,
286    /// Whether to run analysis.
287    pub analysis: bool,
288    /// Eliminate unexplored actions when using greedy search.
289    pub greed_elim: bool,
290    /// A limit to estimated memory usage,
291    /// causing the search to terminate.
292    ///
293    /// This limit is only checked occationally, e.g. after breadth search,
294    /// so actual memory usage before termination will exceed limit.
295    pub max_mib: Option<f64>,
296}
297
298impl AiSettings {
299    /// Creates new settings.
300    pub fn new(max_depth: usize, eps_depth: f64) -> AiSettings {
301        AiSettings {
302            max_depth,
303            eps_depth,
304            analysis: false,
305            greed_elim: true,
306            max_mib: None,
307        }
308    }
309}
310
311/// Stores results from analysis.
312pub struct AiAnalysis {
313    /// Keeps track of maximum number of nodes.
314    pub node_count: usize,
315}
316
317impl AiAnalysis {
318    /// Creates new AI analysis.
319    pub fn new() -> AiAnalysis {
320        AiAnalysis {
321            node_count: 0,
322        }
323    }
324}
325
326impl AiAnalysis {
327    /// Estimates the maximum memory usage of nodes in Gibibytes.
328    pub fn gib(&self, node_size: usize) -> f64 {
329        (self.node_count as f64 * node_size as f64) / 1073741824.0
330    }
331
332    /// Estimates the maximum memory usage of nodes in Mibibytes.
333    pub fn mib(&self, node_size: usize) -> f64 {
334        (self.node_count as f64 * node_size as f64) / 1048576.0
335    }
336
337    /// Estimates the maximum memory usage of nodes in Kibibytes.
338    pub fn kib(&self, node_size: usize) -> f64 {
339        (self.node_count as f64 * node_size as f64) / 1024.0
340    }
341}
342
343/// AI setup.
344///
345/// Provides a common setup for different search algorithms.
346/// The search algorithm constructs a maximum tree,
347/// which can be used to find the optimal course of action.
348///
349/// The `T` parameter is the type of node data.
350/// This is used to store delta changes for undo operations.
351/// Also stores data that tracks internal state of the AI agent,
352/// when the AI agent is not interacting with the context (pure search).
353/// Node data is stored separately from the context, making it easy
354/// to analyse after constructing the maximum tree.
355///
356/// The `A` parameter is the type of action.
357/// It describes the choices the AI agent can make in a specific context.
358/// An action modifies the context and must be undone when rolling back changes.
359///
360/// The `C` parameter is the type of context (environment).
361/// This stores the data that is necessary to calculate utility.
362/// The context is modified by actions when exploring,
363/// but these changes are undone when rolling back changes.
364pub struct Ai<T, A, C> {
365    /// Calculates utility from data and context.
366    pub utility: fn(&T, &C) -> f64,
367    /// Returns a list of possible actions.
368    pub actions: fn(&T, &C) -> Vec<A>,
369    /// Executes an action, returning new node data.
370    pub execute: fn(&T, a: &A, &mut C) -> Result<T, ()>,
371    /// Undoes change made to context.
372    ///
373    /// The data required to rollback delta changes
374    /// must be stored in node data.
375    pub undo: fn(&T, &mut C),
376    /// Stores AI settings.
377    pub settings: AiSettings,
378    /// Stores analysis.
379    pub analysis: AiAnalysis,
380}
381
382impl<T, A, C> Ai<T, A, C> {
383    /// Computes the size of nodes in bytes.
384    pub fn node_size(&self) -> usize {
385        std::mem::size_of::<Node<T, A>>()
386    }
387
388    /// Calculates utility with extra terms computed from settings.
389    pub fn utility_with_settings(&self, data: &T, depth: usize, ctx: &C) -> f64 {
390        let utility = (self.utility)(data, ctx);
391        let discount_step = -self.settings.eps_depth * depth as f64;
392        utility + discount_step
393    }
394
395    /// Updates context by tracing the optimal path.
396    pub fn update<'a>(&mut self, node: &'a Node<T, A>, ctx: &mut C) -> Option<usize> {
397        if let Some(i) = node.optimal() {
398            if (self.execute)(&node.data, &node.children[i].0, ctx).is_ok() {
399                Some(i)
400            } else {
401                None
402            }
403        } else {
404            None
405        }
406    }
407
408    /// A sub-procedure constructing maximum tree of all available actions.
409    ///
410    /// Uses by other search algorithms.
411    pub fn sub_breadth(&mut self, root: &mut Node<T, A>, depth: usize, ctx: &mut C)
412        where A: Clone
413    {
414        root.children.clear();
415        let actions = (self.actions)(&root.data, ctx);
416        for a in &actions {
417            if let Ok(data) = (self.execute)(&root.data, a, ctx) {
418                let utility = self.utility_with_settings(&data, depth + 1, ctx);
419                if utility > root.max {
420                    root.max = utility;
421                }
422
423                // Undo changes made to context to reset state.
424                (self.undo)(&data, ctx);
425
426                root.children.push((a.clone(), Node {
427                    max: utility,
428                    data,
429                    children: vec![],
430                }));
431
432                if self.settings.analysis {
433                    self.analysis.node_count += 1;
434                }
435            }
436        }
437    }
438
439    /// Returns `true` when estimated memory usage is exceeded, `false` otherwise.
440    ///
441    /// Returns `false` when analysis is deactivated.
442    pub fn memory_exceeded(&self) -> bool {
443        if self.settings.analysis {
444            if let Some(limit) = self.settings.max_mib {
445                self.analysis.mib(self.node_size()) >= limit
446            } else {false}
447        } else {false}
448    }
449
450    /// Only picks choices that increases utility.
451    ///
452    /// In order to find global maximum, it requires utility gradient to be convex.
453    pub fn greedy(&mut self, root: &mut Node<T, A>, depth: usize, ctx: &mut C)
454        where A: Clone
455    {
456        if root.max.is_nan() {
457            root.max = self.utility_with_settings(&root.data, depth, ctx);
458        }
459
460        self.sub_breadth(root, depth, ctx);
461
462        if depth >= self.settings.max_depth {return};
463        if self.memory_exceeded() {return};
464
465        if let Some(i) = root.optimal() {
466            let i = if self.settings.greed_elim {
467                if self.settings.analysis {
468                    self.analysis.node_count -= root.children.len() - 1;
469                }
470                root.children.swap(i, 0);
471                root.children.truncate(1);
472                0
473            } else {i};
474
475            let a = &root.children[i].0;
476            if let Ok(_) = (self.execute)(&root.data, a, ctx) {
477                let ch = &mut root.children[i].1;
478                self.greedy(ch, depth + 1, ctx);
479
480                // Undo changes made to context to reset state.
481                (self.undo)(&ch.data, ctx);
482
483                // Update maximum utility since children are changed.
484                if ch.max > root.max {
485                    root.max = ch.max;
486                }
487            }
488        }
489    }
490
491    /// Performs a full construction of the entire maximum tree.
492    pub fn full(&mut self, root: &mut Node<T, A>, depth: usize, ctx: &mut C)
493        where A: Clone
494    {
495        if root.max.is_nan() {
496            root.max = self.utility_with_settings(&root.data, depth, ctx);
497        }
498
499        self.sub_breadth(root, depth, ctx);
500
501        if depth >= self.settings.max_depth {return};
502        if self.memory_exceeded() {return};
503
504        for (ref a, ref mut ch) in &mut root.children {
505            if let Ok(_) = (self.execute)(&root.data, a, ctx) {
506                self.full(ch, depth + 1, ctx);
507
508                // Undo changes made to context to reset state.
509                (self.undo)(&ch.data, ctx);
510
511                // Update maximum utility since children are changed.
512                if ch.max > root.max {
513                    root.max = ch.max;
514                }
515            }
516        }
517    }
518}
519
520#[cfg(test)]
521mod tests {
522    #[test]
523    fn it_works() {
524        assert_eq!(2 + 2, 4);
525    }
526}