whichlicense_detection 0.0.3

A tool to detect licenses used by the WhichLicense project
Documentation
# WhichLicense detection

This is a library to facilitate the detection of licenses in source code.

## Usage

### License Detection

#### Gaoya detection
```rust
let mut gaoya = GaoyaDetection {
    index: MinHashIndex::new(42, 3, 0.5),
    min_hasher: MinHasher32::new(42 * 3),
    shingle_text_size: 50,
};
gaoya.load_from_file("licenses.json");
// OR: 
// for l in load_licenses_from_folder("./licenses/RAW"){
//      gaoya.add_plain(l.name, l.text);
// }
```

#### Fuzzyhash-rs Detection
```rust
let mut fuzzy = FuzzyDetection {
        licenses: vec![],
        min_confidence: 50,
        exit_on_exact_match: false,
};
fuzzy.load_from_file("licenses.json");
// OR: 
// for l in load_licenses_from_folder("./licenses/RAW"){
//      fuzzy.add_plain(l.name, l.text);
// }
```

### Pipeline System
The pipeline system was developed to automatically improve the results of license detection outputs by allowing further processing when a confidence is, for example, too low.

#### Diffing pipeline
The diffing pipeline works by only taking the modified license parts and putting them in a new string. This string is then passed to the regex provided to check if the changes matches the regex.
![diffing_pipeline_expl_1](https://user-images.githubusercontent.com/30909481/227518673-361c79e8-752e-443b-a76b-58b8f40e2fb3.jpg)

```rust
let regex_pipeline = DiffingPipeLine {
    regex: String::from(r"\d{4}-\d{2}-\d{2}"), // date finding regex
    original_license: String::from("this is a sample license created on [enter_license_creation_date_here] copyright Some Company"),
    modified_license: String::from("this is a sample license created on 2014-01-01 copyright Some Company. and stuff"),
    run_condition: PipelineTriggerInstruction {
        // adjust this to the trigger condition you want
        condition: PipelineTriggerCondition::Always,
        // does not matter on always
        value: 10,
    },
    action: PipeLineAction {
        // what is the action you want to take when the regex matches?
        action: PipelineActionType::Add,
        value: 5,
    },
};
 
let result = regex_pipeline.run(10);
assert_eq!(result, 15)
```

#### Regex pipeline
The regex pipeline works by taking the entire (incoming) license text and checking if it matches the regex provided.
```rust
let regex_pipeline = RegexPipeLine {
        regex: String::from("some text"),
        license_text: String::from("this is a sample license with some text"),
        run_condition: PipelineTriggerInstruction {
            condition: PipelineTriggerCondition::GreaterThan,
            value: 50,
        },
        action: PipeLineAction {
            action: PipelineActionType::Add,
            value: 5,
        },
    };

let result = regex_pipeline.run(95);
assert_eq!(result, 100)
```

# Attributions

## ScanCode License data

> The initial database was generated by making use of the license data from the ScanCode toolkit. You do not need to make use of this copyright notice in your project if you choose not to use the ScanCode license database. However, if you do make use of the ScanCode license database, you must include this copyright notice in your project.

Copyright (c) nexB Inc. and others. All rights reserved. ScanCode is a trademark
of nexB Inc. SPDX-License-Identifier: CC-BY-4.0 See
https://creativecommons.org/licenses/by/4.0/legalcode for the license text. See
https://github.com/nexB/scancode-toolkit for support or download. See
https://aboutcode.org for more information about nexB OSS projects.