rgwml 1.2.90 - Docs.rs

# RGWML (an AI, Data Science & Machine Learning Library designed to minimize developer cognitive load)

***Author: Ryan Gerard Wilson (https://ryangerardwilson.com)***

This library simplifies Data Science, Machine Learning, and Artifical Intelligence operations. It's designed to be graceful, elegant, and BATSHIT fun.

1. Overview
-----------

## `csv_utils`

- **Purpose**: A Comprehensive Toolkit for CSV File Management, in AI/ML pipelines.
- **Features**: Offers a powerful suite of tools designed for efficient and flexible handling of CSV files. Key components include:
  - **CsvBuilder**: A versatile builder for creating and manipulating CSV files, facilitating:
    - **Easy Initialization**: Start with a new CSV or load from an existing file.
    - **Custom Headers and Rows**: Set custom headers and add rows effortlessly.
    - **Advanced Data Manipulation**: Rename, drop, and reorder columns, sort data, and apply complex filters like fuzzy matching and timestamp comparisons.
    - **Chainable Methods**: Combine multiple operations in a fluent and readable manner.
    - **Data Analysis Aids**: Count rows, print specific rows, ranges, or unique values for quick analysis.
    - **Flexible Saving Options**: Save your modified CSV to a desired path.
  - **CsvResultCacher**: Cache results of CSV operations, enhancing performance for repetitive tasks.
  - **CsvConverter**: Seamlessly convert various data formats like JSON into CSV, expanding the utility of your data.

## `db_utils`

- **Purpose**: Query various SQL databases with simple elegant syntax.
- **Features**: This module supports the following database connections:
  - MSSQL
  - MYSQL
  - Clickhouse
  - Google Big Query

## `ai_utils`

- **Purpose**: This library provides simple AI utilities for neural association analysis, as well as connecting with the OpenAI JSON mode and BATCH processing API.
- **Features**: 
  - Use Native Rust implementations relating to Levenshtein distance computation and Fuzzy matching for simple AI-like analysis
  - Interact with OpenAI's JSON mode enabled models
  - Interact with OpenAI's BATCH processing enabled models

## `api_utils`

- **Purpose**: Gracefully make and cache API calls.
- **Features**: 
  - **ApiCallBuilder**: Make and cache API calls effortlessly, and manage cached data for efficient API usage.

2. `csv_utils`
------------

The `csv_utils` module encompasses a set of utilities designed to simplify various tasks associated with CSV files. These utilities include the `CsvBuilder` for creating and managing CSV files, the `CsvConverter` for transforming JSON data into CSV format, and the `CsvResultCacher` for efficient data caching and retrieval. Each utility is tailored to enhance productivity and ease in handling CSV data in different scenarios.

- CsvBuilder: Offers a fluent interface for creating, analyzing, and saving CSV files. It simplifies interactions with CSV data, whether starting from scratch, modifying existing files, etc.

- CsvConverter: Provides a method for converting JSON data into CSV format. This utility is particularly useful for processing and saving JSON API responses as CSV files, offering a straightforward approach to data conversion. The `CsvConverter` simplifies the process of converting JSON data into a CSV format. This is particularly useful for scenarios where data is received in JSON format from an API and needs to be transformed into a more accessible and readable CSV file. To use `CsvConverter`, simply call the `from_json` method with the JSON data and the desired output file path as arguments.

- CsvResultCacher: Uses a data generator function to create or fetch data, saves it to a specified path, and keeps it for a set duration. This helps avoid unnecessary data regeneration. Imagine you have a CSV file that logs daily temperatures. You don't want to generate this file every time you access it, especially if the data doesn't change much during the day.

### CsvBuilder

#### Instantiation

Example 1: Creating a new object

    use rgwml::csv_utils::CsvBuilder;

    let builder = CsvBuilder::new()
        .set_header(&["Column1", "Column2", "Column3"])
        .add_rows(&[&["Row1-1", "Row1-2", "Row1-3"], &["Row2-1", "Row2-2", "Row2-3"]])
        .save_as("/path/to/your/file.csv");

Example 2: Load from an existing file

    use rgwml::csv_utils::CsvBuilder;

    let builder = CsvBuilder::from_csv("/path/to/existing/file.csv");

Example 3: Load from xls/ xlsx files

    use rgwml::csv_utils::CsvBuilder;
        
    let builder_1 = CsvBuilder::from_xls("/path/to/existing/file.xls", 1); // Loads from the first sheet of the .xls file

    let builder_2 = CsvBuilder::from_xls("/path/to/existing/file.xlsx", 2); // Loads from the first sheet of the .xlsx file

Example 4: Load from raw data

    use rgwml::csv_utils::CsvBuilder;

    let headers = vec!["Header1".to_string(), "Header2".to_string(), "Header3".to_string()];
    let data = vec![
        vec!["Row1-1".to_string(), "Row1-2".to_string(), "Row1-3".to_string()],
        vec!["Row2-1".to_string(), "Row2-2".to_string(), "Row2-3".to_string()],
    ];

    let builder = CsvBuilder::from_raw_data(headers, data);

Example 5: Load from an MSSQL/MYSQL Server query

    use rgwml::csv_utils::CsvBuilder;

    let _ = CsvBuilder::from_mssql_query(            // Also available: .from_mysql_query
        "username", 
        "password", 
        "server", 
        "database", 
        "SELECT * from your_table").await;

    // To load the column description of a particular table into a CsvBuilder object
    let _ = CsvBuilder::get_mssql_table_description(
        "username", 
        "password", 
        "server", 
        "in_focus_database", 
        "table_name").await;

Example 6: Load from an MSSQL/ MYSQL Server query, receiving the data in chunks, collated as a union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_mssql_query_union(    // Also available: .from_chunked_mysql_query_union
        "username",
        "password",
        "server",
        "database",
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 7: Load from an MSSQL/ MYSQL Server query, receiving the data in chunks, collated as a bag union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_mssql_query_bag_union(    // Also available: .from_chunked_mysql_query_bag_union
        "username",
        "password",
        "server",
        "database", 
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 8: Load from a Clickhouse Server query

    use rgwml::csv_utils::CsvBuilder;

    let _ = CsvBuilder::from_clickhouse_query(  
        "username",
        "password",
        "server",
        "SELECT * from your_table").await;

    // To load the column description of a particular table into a CsvBuilder object
    let _ = CsvBuilder::get_clickhouse_table_description(
        "username",
        "password",
        "server",
        "table_name").await;

Example 9: Load from a Clickhouse Server query, receiving the data in chunks, collated as a union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_clickhouse_query_union(
        "username",
        "password",
        "server",
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 10: Load from a Clickhouse Server query, receiving the data in chunks, collated as a bag union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_clickhouse_query_bag_union(
        "username",
        "password",
        "server",
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 11: Load from a Google Big Query Server

    use rgwml::csv_utils::CsvBuilder;

    let _ = CsvBuilder::from_google_big_query_query(  
        "path/to/your/json/credentials",
        "SELECT * from your_table").await;

    // To load the column description of a particular table into a CsvBuilder object
    let _ = CsvBuilder::get_google_big_query_table_description(
        "path/to/your/json/credentials",
        "your_project_id",
        "your_dataset_name",
        "your_table_name").await;

Example 12: Load from a Google Big Query Server query, receiving the data in chunks, collated as a union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_google_big_query_query_union(
        "path/to/your/json/credentials",
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 13: Load from a Google Big Query Server query, receiving the data in chunks, collated as a bag union

    use rgwml::csv_utils::CsvBuilder;

    CsvBuilder::from_chunked_google_big_query_query_bag_union(
        "path/to/your/credentials",
        "SELECT * from your_table"
        "10000" // Get data in chunks of 10000 rows at a time
        ).await;

Example 14: Load a new instance from an existing instance

    use rgwml::csv_utils::CsvBuilder;

    let builder_instance_1 = CsvBuilder::from_xls("/path/to/existing/file.xls", 1);
    let builder_instance_2 = CsvBuilder::from_copy(builder_instance_1);

####  Manipulating a CsvBuilder Object for Analysis or Saving

    use rgwml::csv_utils::{Exp, ExpVal, CsvBuilder, CsvConverter, CsvResultCacher};

    let _ = CsvBuilder::from_csv("/path/to/your/file.csv")
        .rename_columns(vec![("OLD_COLUMN", "NEW_COLUMN")])
        .drop_columns(vec!["UNUSED_COLUMN"])
        .cascade_sort(vec![("COLUMN".to_string(), "ASC".to_string())])
        .reverse_rows() // Reverses the order of the rows
        .reverse_columns() // Reverses the order of columns
        .where_(
            vec![
                ("Exp1", Exp {
                    column: "customer_type",
                    operator: "==",
                    compare_with: ExpVal::STR("REGULAR".to_string()),
                    compare_as: "TEXT" // Also: "NUMBERS", "TIMESTAMPS"
                }),
                ("Exp2", Exp {
                    column: "invoice_data",
                    operator: ">",
                    compare_with: ExpVal::STR("2023-12-31 23:59:59".to_string()),
                    compare_as: "TEXT"
                }),
                ("Exp3", Exp {
                    column: "invoice_amount",
                    operator: "<",
                    compare_with: ExpVal::STR("1000".to_string()),
                    compare_as: "NUMBERS"
                }),
                ("Exp4", Exp {
                    column: "address",
                    operator: "FUZZ_MIN_SCORE_60",
                    compare_with: ExpVal::VEC(vec!["public school".to_string()]),
                    compare_as: "TEXT"
                })
            ],
            "Exp1 && (Exp2 || Exp3) && Exp4",
        )
        .print_row_count()
        .save_as("/path/to/modified/file.csv");

#### Chainable Options

    use rgwml::csv_utils::{CalibConfig, CsvBuilder, CsvConverter, CsvResultCacher, Exp, ExpVal, Piv, Train};

    CsvBuilder::from_csv("/path/to/your/file1.csv")
    // A. Calibrating an irrugularly formatted file
    .calibrate(
        CalibConfig {
            header_is_at_row: "21".to_string(),
            rows_range_from: ("23".to_string(), "*".to_string())
        }) // sets the row 21 content as the header, and row 23 to last row content as the data

    // B. Setting and adding headers
    .set_header(vec!["Header1", "Header2", "Header3"])
    .add_column_header("NewColumn1")
    .add_column_headers(vec!["NewColumn2", "NewColumn3"])

    // C. Set an Index
    .resequence_id_column("account_id") // Sets the values of the specified column sequentially from 1 onwards, ensuring each entry is uniquely numbered in ascending order until the last row.
    
    // D. Assuming a single row csv, set the value of a column
    .set("column_name", "value");

    // E. Ordering columns
    .order_columns(vec!["Column1", "...", "Column5", "Column2"])
    .order_columns(vec!["...", "Column5", "Column2"])
    .order_columns(vec!["Column1", "Column5", "..."])

    // F. Overriding data from another builder object
    .override_with(other_csv_builder_object);

    // G. Modifying columns
    .drop_columns(vec!["Column1", "Column3"])
    .retain_columns(vec!["Column1", "Column3"])
    .rename_columns(vec![("Column1", "NewColumn1"), ("Column3", "NewColumn3")])

    // H. Adding and modifying rows
    .add_row(vec!["Row1-1", "Row1-2", "Row1-3"])
    .add_rows(vec![vec!["Row1-1", "Row1-2", "Row1-3"], vec!["Row2-1", "Row2-2", "Row2-3"]])
    .update_row_by_row_number(2, vec!["Bob", "36", "San Francisco"])
    .update_row_by_id(2, vec!["Bob", "36", "San Francisco"]) // Updates a row by id in the CSV, assuming the first column is 'id'
    .delete_row_by_row_number(2)
    .delete_row_by_id(2) // Deletes a row by id in the CSV, assuming the first column is 'id'
    .remove_duplicates()
    
    // I. Cleaning/ Replacing Cell values
    .trim_all() // Trims white spaces at the beginning and end of all cells in all columns.
    .replace_all(vec!["Column1", "Column2"], vec![("null", ""), ("NA", "-")]) // In specified columns
    .replace_all(vec!["*"], vec![("null", ""), ("NA", "-")]) // In all columns
    .clean_by_column_parse(
        vec![
            ("Column1".to_string(), vec!["HAS_ONLY_POSITIVE_NUMERICAL_VALUES".to_string(), "HAS_LENGTH:10".to_string()]),
            ("Column3".to_string(), vec!["HAS_MIN_LENGTH:7".to_string()]),
            ("Column3".to_string(), vec!["HAS_MAX_LENGTH:12".to_string()]),
            ("Column4".to_string(), vec!["HAS_VALID_TEN_DIGIT_INDIAN_MOBILE_NUMBER".to_string()]),
            ("Column5".to_string(), vec!["HAS_NO_EMPTY_STRINGS".to_string()]),
            ("Column7".to_string(), vec!["IS_DATETIME_PARSEABLE".to_string()]),
        ]
    )
 
    // J. Limiting and sorting
    .limit(10)
    .limit_distributed_raw(10)  //  limit rows distributed as evenly as possible across the dataset
    .limit_distributed_category(10, "Colum7")  //  limit rows distributed as evenly as possible across the dataset, to maximize variance in values of the indicated column
    .limit_rand(10)         // limit rows randomly
    .limit_where(
        10,
        vec![
            ("Exp1", Exp {
                column: "Withdrawal Amt.".to_string(),
                operator: "<".to_string(),
                compare_with: ExpVal::STR("1000".to_string()),
                compare_as: "NUMBERS".to_string() // Also: "TEXT", "TIMESTAMPS"
            }),
            ("Exp2", Exp {
                column: "Withdrawal Type".to_string(),
                operator: "==".to_string(),
                compare_with: ExpVal::STR("Urgent".to_string()),
                compare_as: "TEXT".to_string()
            }),
        ],
        "Exp1 && Exp2",
        "TAKE:FIRST" // Also: TAKE:LAST, TAKE:RANDOM
        )
    .cascade_sort(vec![("Column1".to_string(), "DESC".to_string()), ("Column3".to_string(), "ASC".to_string())])

    // K. Search operations
    .print_contains_search_results("needle") // Prints rows where any cell contains the needle
    .print_not_contains_search_results("needle") // Prints rows where no cell contains the needle
    .print_starts_with_search_results("needle") // Prints rows where any cell starts with the needle
    .print_not_starts_with_search_results("needle") // Prints rows where no cell starts with the needle

    // L. Search operations
    .print_contains_search_results("needle", vec!["*"]) // Prints rows where any cell in all columns contains the needle
    .print_contains_search_results("needle", vec!["column1", "column2"]) // Same as above, but only specific columns targetted
    .print_not_contains_search_results("needle", vec!["*"]) // Prints rows where no cell in all columns contains the needle
    .print_not_contains_search_results("needle", vec!["column1", "column2"]) // Same as above, but only specific columns targetted
    .print_starts_with_search_results("needle", vec!["*"]) // Prints rows where any cell in all columns starts with the needle
    .print_starts_with_search_results("needle", vec!["column1", "column2"]) // Same as above, but only specific columns targetted
    .print_not_starts_with_search_results("needle", vec!["*"]) // Prints rows where no cell in all columns starts with the needle
    .print_not_starts_with_search_results("needle", vec!["column1", "column2"]) // Same as above, but only specific columns targetted
    .print_raw_levenshtein_search_results("needle", 10, ["column1", "column2"]) // Prints rows where cells in column1, column2 have a levenshtein distance of less than 10 vis-a-vis the needle
    .print_vectorized_levenshtein_search_results(["awesome", "good job"], max_lev_distance, ["column1", "column2"]) // Dynamically compares each needle against successive combinations of words within the cell values from the indicated columns, considering the minimum word count of the needle. It computes the Levenshtein distance for each needle qua the cell value, and for each such comparison the cell value is considered based on every combination of constituent words accruing from the minimum distance found within a specified maximum distance (max_lev_distance). This approach allows matching based on the proximity of words, providing a more contextually relevant search. For instance, if the cell contains "django is a good boy", it generates and compares distances for combinations like "django is", "is a", "a good", "good boy", up to the full cell content, ultimately considering the closest match. The minimum levenshtein distance acorss all needles for that cell value is then considered as the basis for filtering.

    // M. Applying conditional operations
    .where_(
        vec![
            ("Exp1", Exp {
                column: "customer_type".to_string(),
                operator: "==".to_string(),
                compare_with: ExpVal::STR("REGULAR".to_string()),
                compare_as: "TEXT".to_string() // Also: "NUMBERS", "TIMESTAMPS"
            }),
            ("Exp2", Exp {
                column: "invoice_data".to_string(),
                operator: ">".to_string(),
                compare_with: ExpVal::STR("2023-12-31 23:59:59".to_string()),
                compare_as: "TEXT".to_string()
            }),
            ("Exp3", Exp {
                column: "invoice_amount".to_string(),
                operator: "<".to_string(),
                compare_with: ExpVal::STR("1000".to_string()),
                compare_as: "NUMBERS".to_string()
            }),
            ("Exp4", Exp {
                column: "address".to_string(),
                operator: "FUZZ_MIN_SCORE_60".to_string(),
                compare_with: ExpVal::VEC(vec!["public school".to_string()]),
                compare_as: "TEXT".to_string()
            }),
            ("Exp5", Exp {
                column: "status".to_string(),
                operator: "CONTAINS".to_string(), // Also: "DOES_NOT_CONTAIN"
                compare_with: ExpVal::STR("REJECTED".to_string()),
                compare_as: "TEXT".to_string()
            }),
            ("Exp6", Exp {
                column: "status".to_string(),
                operator: "STARTS_WITH".to_string(), // Also: "DOES_NOT_START_WITH"
                compare_with: ExpVal::STR("VERIFIED".to_string()),
                compare_as: "TEXT".to_string()
            }),
        ],
        "Exp1 && (Exp2 || Exp3 || Exp4) && Exp5 && Exp6")
    .where_set(
        vec![
            // Same as .where() 
        ],
        "Exp1 && (Exp2 || Exp3 || Exp4) && Exp5 && Exp6",
        "Column10",
        "IS OKAY")

    // N. Analytical Prints for data inspection
    .print_columns()
    .print_row_count()
    .print_first_row()
    .print_last_row()
    .print_rows_range(2,5) // Shows results per a spreadsheet row range
    .print_rows() // Shows results as per a spreadsheet row range
    .print_rows_where(
        vec![
            // Same as .where()
        ],
        "Exp1 && (Exp2 || Exp3 || Exp4) && Exp5 && Exp6")
    .print_table() // Prints a truncated table to the terminal
    .print_table_all_rows() // Prints a truncated table to the terminal, with all rows
    .print_cells(vec!["Column1", "Column2"])
    .print_unique("column_name")
    .print_unique_count("column_name")
    .print_column_numerical_analysis(vec!["Column1", "Column2"]) // Prints the min, max, range, mean, median, mode, variance, standard deviation, sum of squared deviations, and list non-numerical values, if any, for each of the indicated columns
    .print_freq(vec!["Column1", "Column2"])
    .print_cascading_freq(vec!["Column1", "Column2"]) // Prints cascading frequency tables for selected columns of a dataset.
    .print_freq_mapped(vec![
            ("Column1", vec![
                ("Delhi", vec!["New Delhi", "Delhi"]),
                ("UP", vec!["Ghaziabad", "Noida"])
            ]),
            ("Column2", vec![("NO_GROUPINGS", vec![])])
        ])
    .print_unique_values_stats(vec!["Column1", "Column2"]) // Prints the number of unique values in a column, along with the mean and median of their frequencies
    .print_count_where(
        vec![
            // Same as .where()
        ],
        "Exp1 && (Exp2 || Exp3 || Exp4) && Exp5 && Exp6")
    .print_cleanliness_report_by_column_parse(
        vec![
            ("Column1".to_string(), vec!["HAS_ONLY_POSITIVE_NUMERICAL_VALUES".to_string(), "HAS_LENGTH:10".to_string()]),
            ("Column3".to_string(), vec!["HAS_MIN_LENGTH:7".to_string()]),
            ("Column3".to_string(), vec!["HAS_MAX_LENGTH:12".to_string()]),
            ("Column4".to_string(), vec!["HAS_VALID_TEN_DIGIT_INDIAN_MOBILE_NUMBER".to_string()]),
            ("Column5".to_string(), vec!["HAS_NO_EMPTY_STRINGS".to_string()]),
            ("Column7".to_string(), vec!["IS_DATETIME_PARSEABLE".to_string()]),
        ]
    )

    // O. Grouping Data
    .split_as("ColumnNameToGroupBy", "/output/folder/for/grouped/csv/files/") // Groups data by a specified column and saves each group into a separate CSV file in a given folder
    .grouped_index_transform("Column1", "column_1_event_history", vec![("Column2".to_string(), "COUNT_UNIQUE".to_string())]) // Groups data by a specified column and transforms the grouped data into a new column containing serialized JSON strings, such that the result is sorted as per the specified column in ascending order, and the elements of grouped data are arranged in a consistent order of key value pairs as per the original builder object. Supports features: COUNT_UNIQUE, NUMERICAL_MAX, NUMERICAL_MIN, NUMERICAL_SUM, NUMERICAL_MEAN, NUMERICAL_MEDIAN, NUMERICAL_STANDARD_DEVIATION, DATETIME_MAX, DATETIME_MIN, MODE, BOOL_PERCENT for detailed analysis.


    // P. Basic Set Theory Operations 
    
    // P.1.A. UNIONS (WITH CSV FILE)
    .set_bag_union_with_csv_file("/path/to/set_b/file.csv") // Returns a 'bag/ multiset union' 
    .set_union_with_csv_file("/path/to/set_b/file.csv", "UNION_TYPE:NORMAL", vec!["*"]) // Computes a traditional set theory union, where a row is deemed unique based on all its column values
    .set_union_with_csv_file("/path/to/set_b/file.csv", "UNION_TYPE:NORMAL", vec!["Column1", "Column2"]) // Computes a traditional set theory union, where a row is deemed unique based on the uniqueness of the combination of Column1 and Column2
    .set_union_with_csv_file("/path/to/table_b.csv", "UNION_TYPE:LEFT_JOIN", vec!["Column1"]) // Left join using "Column1" as the join column.
    .set_union_with_csv_file("/path/to/table_b.csv", "UNION_TYPE:RIGHT_JOIN", vec!["Column1"]) // Right join using "ID" as the join column.
    .set_union_with_csv_file("/path/to/table_b.csv", "UNION_TYPE:OUTER_FULL_JOIN", vec!["Column1"]) 

    // P.1.B. UNIONS (WITH CSV BUILDER OBJECT)
    .set_bag_union_with_csv_builder(&other_builder_object) // Returns a 'bag/ multiset union'
    .set_union_with_csv_builder(&other_builder_object, "UNION_TYPE:NORMAL", vec!["*"]) // Computes a traditional set theory union, where a row is deemed unique based on all its column values
    .set_union_with_csv_builder(&other_builder_object, "UNION_TYPE:NORMAL", vec!["Column1", "Column2"]) // Computes a traditional set theory union, where a row is deemed unique based on the uniqueness of the combination of Column1 and Column2
    .set_union_with_csv_builder(&other_builder_object, "UNION_TYPE:LEFT_JOIN", vec!["Column1"]) // Left join using "Column1" as the join column.
    .set_union_with_csv_builder(&other_builder_object, "UNION_TYPE:RIGHT_JOIN", vec!["Column1"]) // Right join using "ID" as the join column.
    .set_union_with_csv_builder(&other_builder_object, "UNION_TYPE:OUTER_FULL_JOIN", vec!["Column1"])

    // P.2.A. INTERSECTIONS (WITH CSV FILE)
    .set_intersection_with_csv_file("/path/to/set_b/file.csv", vec!["keyColumn1", "keyColumn2"], "INTERSECTION_TYPE:NORMAL")
    .set_intersection_with_csv_file("/path/to/set_b/file.csv", vec!["keyColumn1", "keyColumn2"], "INTERSECTION_TYPE:INNER_JOIN")

    // P.2.B. INTERSECTIONS (WITH CSV BUILDER OBJECT)
    .set_intersection_with_csv_builder(&other_builder_object, vec!["keyColumn1", "keyColumn2"], "INTERSECTION_TYPE:NORMAL")
    .set_intersection_with_csv_builder(&other_builder_object, vec!["keyColumn1", "keyColumn2"], "INTERSECTION_TYPE:INNER_JOIN")

    // P.3.A. DIFFERENCES (WITH CSV FILE)
    .set_difference_with_csv_file("/path/to/set_b/file.csv", "DIFFERENCE_TYPE:NORMAL", vec!["keyColumn1", "keyColumn2"]) 
    .set_difference_with_csv_file("/path/to/set_b/file.csv", "DIFFERENCE_TYPE:SYMMETRIC", vec!["keyColumn1", "keyColumn2"])

    // P.3.B. DIFFERENCES (WITH CSV BUILDER OBJECT)
    .set_difference_with_csv_builder(&other_builder_object, "DIFFERENCE_TYPE:NORMAL", vec!["keyColumn1", "keyColumn2"])
    .set_difference_with_csv_builder(&other_builder_object, "DIFFERENCE_TYPE:SYMMETRIC", vec!["keyColumn1", "keyColumn2"])

    // Q. Append Derivative Columns
    .append_derived_boolean_column(
        "is_qualified_for_discount",
        vec![
            // Same as .where() 
        ],
        "Exp1 && (Exp2 || Exp3 || Exp4) && Exp5 && Exp6")
    .append_derived_category_column(
        "EXPENSE_RANGE",
        vec![
            (
                "< 1000",
                vec![
                    ("Exp1", Exp {
                        column: "Withdrawal Amt.".to_string(),
                        operator: "<".to_string(),
                        compare_with: ExpVal::STR("1000".to_string()),
                        compare_as: "NUMBERS".to_string() // Also: "TEXT", "TIMESTAMPS"
                    }),
                ],
                "Exp1"
            ),
            (
                "1000-5000",
                vec![
                    ("Exp1", Exp {
                        column: "Withdrawal Amt.".to_string(),
                        operator: ">=".to_string(),
                        compare_with: ExpVal::STR("1000".to_string()),
                        compare_as: "NUMBERS".to_string()
                    }),
                    ("Exp2", Exp {
                        column: "Withdrawal Amt.".to_string(),
                        operator: "<".to_string(),
                        compare_with: ExpVal::STR("5000".to_string()),
                        compare_as: "NUMBERS".to_string()
                    }),
                ],
                "Exp1 && Exp2"
            )
        )
    .append_derived_concatenation_column("NewColumnName", vec!["Column1", " ", "Column2", "@"]) // Items in the vector that are not column names will be concatenated as strings
    .append_derived_openai_analysis_columns(
        vec!["column7", "column9"],     // Names of the columns to be analyzed 
        std::collections::HashMap::from([
            ("noun".to_string(), "extract the noun from the sentence".to_string()),
            ("verb".to_string(), "extract the verb from the sentence".to_string()),
        ]),
        "YOUR_OPEN_AI_API_KEY",
        "gpt-3.5-turbo-0125"            // Any OpenAI model with the JSON mode feature
        )
        .await
    .append_derived_linear_regression_column(
        "predictions",                  // name of new column to store predictions
        vec![                           // predictor combinations/ feature sets - length should be 2x the number of predictors/features
            vec!["90", "good"],         // predictor/ feature values can also be text strings. The model uses a Levenshtein distance based approach to tokenize strings.
            vec!["70", "bad"], 
            vec!["60", "great"], 
            vec!["40", "awful"]
        ], 
        vec![72.0, 65.0, 63.0, 56.0],   // labels mapped to the above predictors
        vec![0.0, 100.0],               // normalization range of minimum and maximum prediction value
        vec!["Column1", "Column7"])     // names of columns whose values are to be used to make predictions as the 'test' data set 
    .append_openai_batch_analysis_columns(
        "YOUR_OPEN_AI_API_KEY",
        "output_file_id"
    )
    .append_fuzzai_analysis_columns(
        "Column1", // Name of column to be analyzed
        "sales_analysis", // Identifier for newly created columns
        vec![
            Train {
                input: "I want my money back".to_string(),
                output: "refund".to_string()
            },
            Train {
                input: "I want a refund immediately".to_string(),
                output: "refund".to_string()
            },
        ],
        "WORD_SPLIT:2", // The minimum length of word combinations that training data is to be broken into
        "WORD_LENGTH_SENSITIVITY:0.8", // Multiplies differences in word length between training data input and the value being analyzed by 0.8
        "GET_BEST:2" // Get the top 2 results, max value is 3
        )
    .append_fuzzai_analysis_columns_with_values_where(
        "Column1", // Name of column to be analyzed
        "sales_analysis", // Identifier for newly created column
        vec![
            Train {
                input: "I want my money back".to_string(),
                output: "refund".to_string()
            },
            Train {
                input: "I want a refund immediately".to_string(),
                output: "refund".to_string()
            },
        ],
        "WORD_SPLIT:2", // The minimum length of word combinations that training data is to be broken into
        "WORD_LENGTH_SENSITIVITY:0.8", // Multiplies differences in word length between training data input and the value being analyzed by 0.8
        "GET_BEST:2", // Get the top 2 results, max value is 3
        vec![
            ("Exp1", Exp {
                column: "Deposit Amt.".to_string(),
                operator: ">".to_string(),
                compare_with: ExpVal::STR("500".to_string()),
                compare_as: "NUMBERS".to_string() // Also: "TEXT", "TIMESTAMPS"
            }),
        ],
        "Exp1", // Filters rows where fuzzai analysis would be applied
        )
    .split_date_as_appended_category_columns("Column10", "%d/%m/%y") // Appends additional columns splitting a date/timestamp into categorization columns by year, month and week

    // R. Pivot Tables
    .pivot_as(
        "/path/to/save/the/pivot/file/as/csv",
        Piv {
            index_at: "month".to_string(),
            values_from: "sales".to_string(),
            operation: "NUMERICAL_MEDIAN".to_string(), // Also: "COUNT", "COUNT_UNIQUE", "NUMERICAL_MAX", "NUMERICAL_MIN", "NUMERICAL_SUM", "NUMERICAL_MEAN", "NUMERICAL_MEDIAN", "NUMERICAL_STANDARD_DEVIATION", "BOOL_PERCENT" (assuming column values of 0 or 1 in 'values_from', calculates the % of 1 values for the segment)
            seggregate_by: vec![  // Set to vec![] if seggregation is not required
                ("is_customer", "AS_BOOLEAN".to_string()) // Is appended directly as a seggregation column
                ("acquisition_type", "AS_CATEGORY".to_string()) // The unique values of this column are appended as seggregation columns
            ],
        })

    // S. Plot charts
    .print_dot_chart("Column3", "Column5") // X axis column followed by the Y axis column
    .print_cumulative_dot_chart("Column3", "Column5") // X axis column followed by the Y axis column
    .print_smooth_line_chart("Column3", "Column5") // X axis column followed by the Y axis column
    .print_cumulative_smooth_line_chart("Column3", "Column5") // X axis column followed by the Y axis column

    // T. Save
    .save_as("/path/to/your/file2.csv")

    // U. Die
    .die() // Gracefully terminates execution of a CsvBuilder chain

#### Extract Data

These methods return specific data, instead of a mutable CsvBuilder object, and hence, can not be subsequently chained.

    CsvBuilder::from_csv("/path/to/your/file1.csv")

    .get_unique("column_name"); // Returns a Vec<String>
    .get("column_name"); // Returns cell content as a String, if the csv has been filtered to single row. See the chainable ".set()" method above for set a value in such a circumstance
    .get_freq(vec!["Column1", Column2]) // Returns a HashMap where keys are column names and values are vectors of sorted (value, frequency) pairs.
    .get_freq_mapped(vec![
            ("Column1", vec![
                ("Delhi", vec!["New Delhi", "Delhi"]),
                ("UP", vec!["Ghaziabad", "Noida"])
            ]),
            ("Column2", vec![("NO_GROUPINGS", vec![])])
        ])
    .has_data() // Returns `true` if either headers or data rows are present, `false` otherwise.
    .has_headers() // Returns `true` if headers are present, `false` otherwise.
    .get_headers().unwrap() // Returns an Option<&[String]> containing a reference to the headers if present, `None` otherwise.
    .get_data().unwrap() // Returns an Option<&Vec<Vec<String>>> containing a reference to the data contained in the builder.

    .get_numeric_min("Column1").unwrap() // Returns a String value of the minimum numeric value - assuming all values of the column can be consistently parsed as such
    .get_numeric_max("Column1").unwrap() // Returns a String value of the maximum numeric value - assuming all values of the column can be consistently parsed as such
    .get_datetime_min("Column1").unwrap() // Returns a String value of the minimum numeric value - assuming all values of the column can be consistently parsed as such
    .get_datetime_max("Column1").unwrap() // Returns a String value of the maximum numeric value - assuming all values of the column can be consistently parsed as such
    .get_range("Column1").unwrap() // Returns an `Option<f64>` the range (difference between the maximum and minimum) in a numerically parseable column. 
    .get_sum("Column1").unwrap() // Returns an `Option<f64>` the sum of all values in a numerically parseable column.
    .get_mean("Column1").unwrap() // Returns an `Option<f64>` - the mean of all values in a numerically parseable column.
    .get_median("Column1").unwrap() // Returns an `Option<f64>` - the median of all values in a numerically parseable column.
    .get_mode("Column1").unwrap() // Returns an `Option<f64>` - the mode of all values in a numerically parseable column.
    .get_variance("Column1").unwrap() // Returns an `Option<f64>` - the variance of all values in a numerically parseable column.
    .get_standard_deviation("Column1").unwrap() // Returns an `Option<f64>` - the standard deviation of all values in a numerically parseable column.
    .get_sum_of_squared_deviations("Column1").unwrap() // Returns an `Option<f64>` - the getsum of squared deviations of all values in a numerically parseable column.
    .get_get_non_numeric_values("Column1").unwrap() // Returns an `Option<Vec<String>>` - the non numeric values in a column. 

    // Send data to OpenAI for batch analysis, returning a batch_id as `Result<String, Box<dyn std::error::Error>>`
    .send_columns_for_openai_batch_analysis(
        vec!["column7", "column9"],     // Names of the columns to be analyzed
        std::collections::HashMap::from([
            ("noun".to_string(), "extract the noun from the sentence".to_string()),
            ("verb".to_string(), "extract the verb from the sentence".to_string()),
        ]),
        "YOUR_OPEN_AI_API_KEY",
        "gpt-3.5-turbo-0125"            // Any OpenAI model with the JSON mode feature
        "night_job"                     // Name of the batch

    )

### CsvConverter

    use serde_json::json;
    use tokio;
    use rgwml::csv_utils::CsvConverter;
    use rgwml::api_utils::ApiCallBuilder;

    // Function to fetch sales data from an API
    async fn fetch_sales_data_from_api() -> Result<String, Box<dyn std::error::Error>> {
        let method = "POST";
        let url = "http://example.com/api/sales"; // API URL to fetch sales data

        // Payload for the API call
        let payload = json!({
            "date": "2023-12-21"
        });

        // Performing the API call
        let response = ApiCallBuilder::call(method, url, None, Some(payload))
            .execute()
            .await?;

        Ok(response)
    }

    // Main function with tokio's async runtime
    #[tokio::main]
    async fn main() {
        // Fetch sales data and handle potential errors inline
        let sales_data_response = fetch_sales_data_from_api().await.unwrap_or_else(|e| {
            eprintln!("Failed to fetch sales data: {}", e);
            std::process::exit(1); // Exit the program in case of an error
        });

        // Convert the fetched JSON data to CSV
        CsvConverter::from_json(&sales_data_response, "path/to/your/file.csv")
            .expect("Failed to convert JSON to CSV"); // Handle errors in CSV conversion
    }

### CsvResultCacher

    use rgwml::api_utils::ApiCallBuilder;
    use rgwml::csv_utils::{CsvBuilder, CsvResultCacher};
    use serde_json::json;
    use tokio;

    async fn generate_daily_sales_report() -> Result<(), Box<dyn std::error::Error>> {
        async fn fetch_sales_data_from_api() -> Result<String, Box<dyn std::error::Error>> {
            let method = "POST";
            let url = "http://example.com/api/sales"; // API URL to fetch sales data

            let payload = json!({
                "date": "2023-12-21"
            });

            let response = ApiCallBuilder::call(method, url, None, Some(payload))
                .execute()
                .await?;

            Ok(response)
        }

        let sales_data_response = fetch_sales_data_from_api().await?;

        // Convert the JSON response to CSV format using CsvBuilder
        let csv_builder = CsvBuilder::from_api_call(sales_data_response)
            .await
            .unwrap()
            .save_as("/path/to/daily/sales/report/cache.csv");

        Ok(())
    }

    #[tokio::main]
    async fn main() {
        let cache_path = "/path/to/daily_sales_report.csv";
        let cache_duration_minutes = 1440; // Cache duration set to 1 day

        let result = CsvResultCacher::fetch_async(
            || Box::pin(generate_daily_sales_report()),
            "/path/to/daily/sales/report/cache.csv",
            cache_duration_minutes,
        ).await;

        match result {
            Ok(_) => println!("Sales report is ready."),
            Err(e) => eprintln!("Failed to generate sales report: {}", e),
        }
    }

3. `db_utils`
-----------

### Easily query a MSSQL, MYSQL, Clickhouse server, or Google Big Query to extract data

    use rgwml::db_utils::DbConnect;

    #[tokio::main]
    async fn main() {
        let result_1 = DbConnect::execute_mssql_query( // use `execute_mysql_query` for MYSQL
            "username", 
            "password", 
            "server/host", 
            "database", 
            "SELECT * FROM your_table").await?;

        let headers_1 = result_1.0;
        let row_data_1 = result_1.1;

        let result_2 = DbConnect::execute_clickhouse_query( 
            "username",
            "password",
            "server/host",
            "SELECT * FROM your_table").await?;

        let headers_2 = result_2.0;
        let row_data_2 = result_2.1;

        let result_3 = DbConnect::execute_google_big_query_query(
            "your/json/credentials/path",
            "SELECT * FROM your_table").await?;

        let headers_3 = result_2.0;
        let row_data_3 = result_2.1;

    }

### Easily query a MYSQL server to write data

Easily query a MSSQL or MYSQL server to extract data

    use rgwml::db_utils::DbConnect;

    #[tokio::main]
    async fn main() {
        let result = DbConnect::execute_mysql_write(
            "username", 
            "password", 
            "server/host", 
            "database", 
            ""INSERT INTO your_table (column1, column2) VALUES ('value1', 'value2')").await?;
    }

### Print information on a MYSQL/ MSSQL Server

    use rgwml::db_utils::DbConnect;

    // Print MSSQL Server Information
    DbConnect::print_mssql_databases("username", "password", "server", "default_database");
    DbConnect::print_mssql_schemas("username", "password", "server", "in_focus_database");
    DbConnect::print_mssql_tables("username", "password", "server", "in_focus_database", "schema");
    DbConnect::print_mssql_table_description("username", "password", "server", "in_focus_database", "table_name");
    DbConnect::print_mssql_architecture("username", "password", "server", "default_database");

    // Print MySQL Server Information
    DbConnect::print_mysql_databases("username", "password", "server", "default_database");
    DbConnect::print_mysql_tables("username", "password", "server", "in_focus_database");
    DbConnect::print_mysql_table_description("username", "password", "server", "in_focus_database", "table_name");
    DbConnect::print_mysql_architecture("username", "password", "server", "default_database");

    // Print Clickhouse Server Information
    DbConnect::print_clickhouse_databases("username", "password", "server");
    DbConnect::print_clickhouse_tables("username", "password", "server", "in_focus_database");
    DbConnect::print_clickhouse_table_description("username", "password", "server", "in_focus_database", "table_name");
    DbConnect::print_clickhouse_architecture("username", "password", "server");

    // Print BigQuery Server Information
    DbConnect::print_google_big_query_datasets("path/to/your/json/credentials", "your_project_id");
    DbConnect::print_google_big_query_tables("path/to/your/json/credentials", "your_project_id", "dataset_name");
    DbConnect::print_google_big_query_table_description("path/to/your/json/credentials", "your_project_id", "dataset_name", "table_name");
    DbConnect::print_google_big_query_architecture("path/to/your/json/credentials", "your_project_id"); // Note: Your json credentials must have READ METADATA access for this to work

4. `ai_utils`
-----------

This library provides simple AI utilities for neural association analysis, as well as connecting with the OpenAI JSON mode and BATCH processing API. 

### 4.1. Rust Native AI Functionalities

It focuses on using simple Levenshtein/ Fuzzy matching for processing and analyzing data within neural networks, with an emphasis on understanding AI decision-making processes and text analysis, optimized for a parallel computing environment.

    use rgwml::ai_utils::{fuzzai, SplitUpto, ShowComplications, WordLengthSensitivity};
    use std::error::Error;

    #[tokio::main]
    async fn main() {
        // Call the fuzzai function with CSV file path
        let fuzzai_result = fuzzai(
            "path/to/your/model/training/csv/file.csv",
            "model_training_input_column_name",
            "model_training_output_column_name",
            "your text to be analyzed against the training data model",
            "your task description: clustering customer complaints",
            SplitUpto::WordSetLength(2), // Set the minimum word length of combination value to split the training input data during the analysis
            ShowComplications::False, // Set to True to see inner workings of the model
            WordLengthSensitivity::Coefficient(0.2), // Set to Coefficient::None to disregard differences in the word length of the training input and the text being analyzed; Increase the coefficient to give higher weightage to matches with similar word length
        ).await.expect("Analysis should succeed");

        dbg!(fuzzai_result);
    }

### 4.2. OpenAI API Functionalities

#### 4.2.1. OpenAI Synchronus JSON mode

    use rgwml::ai_utils::{get_openai_analysis_json};
    use std::collections::HashMap;

    let customer_feedback = "Your servcies are great!";
    let mut analysis_query = HashMap::new();
    analysis_query.insert("was_positive".to_string(), "Return true if the sentiment is positive, else return False".to_string());

    let analysis = get_openai_analysis_json(
        customer_feedback,
        analysis_query,
        "your/OpenAI/API/key"
        "gpt-3.5-turbo" // Or any model supporting JSON Mode
    );

    dbg!(analysis); 

#### 4.2.2. OpenAI Asynchronus BATCH mode

    use rgwml::ai_utils::{upload_file_to_openai, create_openai_batch, fetch_and_print_openai_batches, cancel_openai_batch};
    use rgwml::csv_utils::CsvBuilder;
    use std::collections::HashMap;


    let headers = vec!["customer_feedback".to_string(), "resolution_time".to_string()];
    let data = vec![
        vec!["Your services are great!".to_string(), "5".to_string()],
        vec!["Not satisfied with the resolution.".to_string(), "15".to_string()],
    ];

    let mut csv_builder = CsvBuilder::from_raw_data(headers, data);

    let columns_to_analyze = vec!["customer_feedback", "resolution_time"];
    let mut analysis_query = HashMap::new();
    analysis_query.insert("was_positive".to_string(), "Return true if the sentiment is positive, else return False".to_string());
    let api_key = "your_openai_api_key";
    let model = "gpt-3.5-turbo";
    let batch_description = "Positive Sentiment Analysis";

    // Send OpenAI a batch task
    let batch_id = csv_builder.send_data_for_openai_batch_analysis(
        columns_to_analyze,
        analysis_query,
        &api_key,
        model,
        batch_description
    ).await?;

    dbg!(&batch_id);

    // To fetch and print details of all your batch tasks
    let _ = fetch_and_print_openai_batches(api_key).await?;

    // To cancel the batch task
    let _ = cancel_openai_batch(api_key, batch_id).await?;

    // To retreive an OpenAI batch analysiss as a named temp file `Result<NamedTempFile, Box<dyn Error>>`
    let _ = retrieve_openai_batch(api_key, file_id)

5. `api_utils`
------------

Examples across common API call patterns

    use serde_json::json;
    use rgwml::api_utils::ApiCallBuilder;
    use std::collections::HashMap;

    #[tokio::main]
    async fn main() {
        // Fetch and cache post request without headers, with retry mechanism
        let response = fetch_and_cache_post_request().await.unwrap_or_else(|e| {
            eprintln!("Failed to fetch data: {}", e);
            std::process::exit(1);
        });
        println!("Response: {:?}", response);

        // Fetch and cache post request with headers, with retry mechanism
        let response_with_headers = fetch_and_cache_post_request_with_headers().await.unwrap_or_else(|e| {
            eprintln!("Failed to fetch data with headers: {}", e);
            std::process::exit(1);
        });
        println!("Response with headers: {:?}", response_with_headers);

        // Fetch and cache post request with form URL encoded content type, with retry mechanism
        let response_form_urlencoded = fetch_and_cache_post_request_form_urlencoded().await.unwrap_or_else(|e| {
            eprintln!("Failed to fetch form URL encoded data: {}", e);
            std::process::exit(1);
        });
        println!("Form URL encoded response: {:?}", response_form_urlencoded);
    }

    // Example 1: Without Headers, includes retry mechanism
    async fn fetch_and_cache_post_request() -> Result<String, Box<dyn std::error::Error>> {
        let method = "POST";
        let url = "http://example.com/api/submit";
        let payload = json!({
            "field1": "Hello",
            "field2": 123
        });

        let response = ApiCallBuilder::call(method, url, None, Some(payload))
            .maintain_cache(30, "/path/to/post_cache.json") // Uses cache for 30 minutes
            .retries(3, 5) // Retry up to 3 times with a 5-second timeout between retries
            .execute()
            .await?;

        Ok(response)
    }

    // Example 2: With Headers, includes retry mechanism
    async fn fetch_and_cache_post_request_with_headers() -> Result<String, Box<dyn std::error::Error>> {
        let method = "POST";
        let url = "http://example.com/api/submit";
        let headers = json!({
            "Content-Type": "application/json",
            "Authorization": "Bearer your_token_here"
        });
        let payload = json!({
            "field1": "Hello",
            "field2": 123
        });

        let response = ApiCallBuilder::call(method, url, Some(headers), Some(payload))
            .maintain_cache(30, "/path/to/post_with_headers_cache.json") // Uses cache for 30 minutes
            .retries(3, 5) // Retry up to 3 times with a 5-second timeout between retries
            .execute()
            .await?;

        Ok(response)
    }

    // Example 3: With application/x-www-form-urlencoded Content-Type, includes retry mechanism
    async fn fetch_and_cache_post_request_form_urlencoded() -> Result<String, Box<dyn std::error::Error>> {
        let method = "POST";
        let url = "http://example.com/api/submit";
        let headers = json!({
            "Content-Type": "application/x-www-form-urlencoded"
        });
        let payload = HashMap::from([
            ("field1", "value1"),
            ("field2", "value2"),
        ]);

        let response = ApiCallBuilder::call(method, url, Some(headers), Some(payload))
            .maintain_cache(30, "/path/to/post_form_urlencoded_cache.json") // Uses cache for 30 minutes
            .retries(3, 5) // Retry up to 3 times with a 5-second timeout between retries
            .execute()
            .await?;

        Ok(response)
    }

6. License
----------

This project is licensed under the MIT License - see the LICENSE file for details.