whichlicense_detection 0.1.0

A tool to detect licenses used by the WhichLicense project
Documentation

WhichLicense detection

This is a library to facilitate the detection of licenses in source code.

Usage

License Detection

Gaoya detection

let mut gaoya = GaoyaDetection {
    index: MinHashIndex::new(42, 3, 0.5),
    min_hasher: MinHasher32::new(42 * 3),
    shingle_text_size: 50,
};
gaoya.load_from_file("licenses.json");
// OR: 
// for l in load_licenses_from_folder("./licenses/RAW"){
//      gaoya.add_plain(l.name, l.text);
// }

Fuzzyhash-rs Detection

let mut fuzzy = FuzzyDetection {
        licenses: vec![],
        min_confidence: 50,
        exit_on_exact_match: false,
};
fuzzy.load_from_file("licenses.json");
// OR: 
// for l in load_licenses_from_folder("./licenses/RAW"){
//      fuzzy.add_plain(l.name, l.text);
// }

Pipeline System

The pipeline system was developed to automatically improve the results of license detection outputs by allowing further processing when a confidence is, for example, too low.

Diffing pipeline

The diffing pipeline works by only taking the modified license parts and putting them in a new string. This string is then passed to the regex provided to check if the changes matches the regex. diffing_pipeline_expl_1

let regex_pipeline = DiffingPipeLine {
    regex: String::from(r"\d{4}-\d{2}-\d{2}"), // date finding regex
    original_license: String::from("this is a sample license created on [enter_license_creation_date_here] copyright Some Company"),
    modified_license: String::from("this is a sample license created on 2014-01-01 copyright Some Company. and stuff"),
    run_condition: PipelineTriggerInstruction {
        // adjust this to the trigger condition you want
        condition: PipelineTriggerCondition::Always,
        // does not matter on always
        value: 10,
    },
    action: PipeLineAction {
        // what is the action you want to take when the regex matches?
        action: PipelineActionType::Add,
        value: 5,
    },
};
 
let result = regex_pipeline.run(10);
assert_eq!(result, 15)

Regex pipeline

The regex pipeline works by taking the entire (incoming) license text and checking if it matches the regex provided.

let regex_pipeline = RegexPipeLine {
        regex: String::from("some text"),
        license_text: String::from("this is a sample license with some text"),
        run_condition: PipelineTriggerInstruction {
            condition: PipelineTriggerCondition::GreaterThan,
            value: 50,
        },
        action: PipeLineAction {
            action: PipelineActionType::Add,
            value: 5,
        },
    };

let result = regex_pipeline.run(95);
assert_eq!(result, 100)

Attributions

ScanCode License data

The initial database was generated by making use of the license data from the ScanCode toolkit. You do not need to make use of this copyright notice in your project if you choose not to use the ScanCode license database. However, if you do make use of the ScanCode license database, you must include this copyright notice in your project.

Copyright (c) nexB Inc. and others. All rights reserved. ScanCode is a trademark of nexB Inc. SPDX-License-Identifier: CC-BY-4.0 See https://creativecommons.org/licenses/by/4.0/legalcode for the license text. See https://github.com/nexB/scancode-toolkit for support or download. See https://aboutcode.org for more information about nexB OSS projects.