sklears-python 0.1.1

# Sklears Python API Reference

Complete reference documentation for all Sklears Python classes, functions, and modules.

## Table of Contents

1. [Linear Models](#linear-models)
2. [Clustering](#clustering)
3. [Preprocessing](#preprocessing)
4. [Metrics](#metrics)
5. [Model Selection](#model-selection)
6. [Utilities](#utilities)

## Linear Models

### LinearRegression

Ordinary least squares linear regression.

```python
class LinearRegression(fit_intercept=True, copy_x=True)
```

**Parameters:**
- `fit_intercept` : bool, default=True
  Whether to calculate the intercept for this model.
- `copy_x` : bool, default=True
  If True, X will be copied; else, it may be overwritten.

**Attributes:**
- `coef_` : ndarray of shape (n_features,)
  Estimated coefficients for the linear regression problem.
- `intercept_` : float
  Independent term in the linear model.

**Methods:**

#### fit(X, y)
Fit linear model.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  Training data.
- `y` : array-like of shape (n_samples,)
  Target values.

**Returns:**
- `self` : object
  Fitted estimator.

#### predict(X)
Predict using the linear model.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  Samples.

**Returns:**
- `y_pred` : ndarray of shape (n_samples,)
  Returns predicted values.

#### score(X, y)
Return the coefficient of determination R² of the prediction.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  Test samples.
- `y` : array-like of shape (n_samples,)
  True values for X.

**Returns:**
- `score` : float
  R² of self.predict(X) wrt. y.

**Example:**
```python
import sklears as skl
import numpy as np

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

model = skl.LinearRegression()
model.fit(X, y)
print(model.score(X, y))  # 1.0
```

### Ridge

Linear least squares with L2 regularization.

```python
class Ridge(alpha=1.0, fit_intercept=True, copy_x=True)
```

**Parameters:**
- `alpha` : float, default=1.0
  Regularization strength; must be a positive float.
- `fit_intercept` : bool, default=True
  Whether to fit the intercept.
- `copy_x` : bool, default=True
  If True, X will be copied; else, it may be overwritten.

**Example:**
```python
import sklears as skl
import numpy as np

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

model = skl.Ridge(alpha=0.5)
model.fit(X, y)
predictions = model.predict(X)
```

### Lasso

Linear Model trained with L1 prior as regularizer.

```python
class Lasso(alpha=1.0, fit_intercept=True, max_iter=1000, tol=1e-4)
```

**Parameters:**
- `alpha` : float, default=1.0
  Constant that multiplies the L1 term.
- `fit_intercept` : bool, default=True
  Whether to fit the intercept.
- `max_iter` : int, default=1000
  The maximum number of iterations.
- `tol` : float, default=1e-4
  The tolerance for the optimization.

### LogisticRegression

Logistic Regression classifier.

```python
class LogisticRegression(penalty='l2', C=1.0, fit_intercept=True, max_iter=100)
```

**Parameters:**
- `penalty` : str, default='l2'
  Used to specify the norm used in the penalization. 'l1' or 'l2'.
- `C` : float, default=1.0
  Inverse of regularization strength.
- `fit_intercept` : bool, default=True
  Specifies if a constant should be added to the decision function.
- `max_iter` : int, default=100
  Maximum number of iterations of the solver.

**Methods:**

#### predict_proba(X)
Probability estimates.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  Vector to be scored.

**Returns:**
- `T` : ndarray of shape (n_samples, n_classes)
  Returns the probability of the sample for each class.

## Clustering

### KMeans

K-Means clustering.

```python
class KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=1e-4, random_state=None)
```

**Parameters:**
- `n_clusters` : int, default=8
  The number of clusters to form.
- `init` : str, default='k-means++'
  Method for initialization: 'k-means++' or 'random'.
- `n_init` : int, default=10
  Number of time the k-means algorithm will be run.
- `max_iter` : int, default=300
  Maximum number of iterations for a single run.
- `tol` : float, default=1e-4
  Relative tolerance for convergence.
- `random_state` : int, optional
  Random state for reproducible results.

**Attributes:**
- `cluster_centers_` : ndarray of shape (n_clusters, n_features)
  Coordinates of cluster centers.
- `labels_` : ndarray of shape (n_samples,)
  Labels of each point.
- `inertia_` : float
  Sum of squared distances of samples to their closest cluster center.

**Methods:**

#### fit_predict(X)
Compute cluster centers and predict cluster index for each sample.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  New data to transform.

**Returns:**
- `labels` : ndarray of shape (n_samples,)
  Index of the cluster each sample belongs to.

**Example:**
```python
import sklears as skl
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = skl.KMeans(n_clusters=2, random_state=0)
labels = kmeans.fit_predict(X)
print(labels)  # [1 1 1 0 0 0]
```

### DBSCAN

Density-Based Spatial Clustering of Applications with Noise.

```python
class DBSCAN(eps=0.5, min_samples=5, metric='euclidean', algorithm='auto')
```

**Parameters:**
- `eps` : float, default=0.5
  The maximum distance between two samples for one to be considered as in the neighborhood of the other.
- `min_samples` : int, default=5
  The number of samples in a neighborhood for a point to be considered as a core point.
- `metric` : str, default='euclidean'
  The metric to use when calculating distance between instances.
- `algorithm` : str, default='auto'
  The algorithm to be used: 'auto', 'ball_tree', 'kd_tree', or 'brute'.

## Preprocessing

> **Coming Soon:** Preprocessing classes (`StandardScaler`, `MinMaxScaler`, `LabelEncoder`) are not yet exposed in this release. They will be available in a future version.

### StandardScaler *(Coming Soon)*

Standardize features by removing the mean and scaling to unit variance.

```python
# Not yet available - Coming Soon
# class StandardScaler(copy=True, with_mean=True, with_std=True)
```

**Parameters:**
- `copy` : bool, default=True
  If False, try to avoid a copy and do inplace scaling instead.
- `with_mean` : bool, default=True
  If True, center the data before scaling.
- `with_std` : bool, default=True
  If True, scale the data to unit variance.

**Attributes:**
- `mean_` : ndarray of shape (n_features,)
  The mean value for each feature in the training set.
- `scale_` : ndarray of shape (n_features,)
  Per feature relative scaling of the data.

**Methods:**

#### fit_transform(X)
Fit to data, then transform it.

#### inverse_transform(X)
Scale back the data to the original representation.

### MinMaxScaler *(Coming Soon)*

Transform features by scaling each feature to a given range.

```python
# Not yet available - Coming Soon
# class MinMaxScaler(feature_range=(0, 1), copy=True, clip=False)
```

**Parameters:**
- `feature_range` : tuple (min, max), default=(0, 1)
  Desired range of transformed data.
- `copy` : bool, default=True
  Set to False to perform inplace scaling.
- `clip` : bool, default=False
  Set to True to clip transformed values to the provided feature range.

**Attributes:**
- `data_min_` : ndarray of shape (n_features,)
  Per feature minimum seen in the data.
- `data_max_` : ndarray of shape (n_features,)
  Per feature maximum seen in the data.

### LabelEncoder *(Coming Soon)*

Encode target labels with value between 0 and n_classes-1.

```python
# Not yet available - Coming Soon
# class LabelEncoder()
```

**Attributes:**
- `classes_` : list
  Holds the label for each class.

**Methods:**

#### fit_transform(y)
Fit label encoder and return encoded labels.

#### inverse_transform(y)
Transform labels back to original encoding.

## Metrics

> **Coming Soon:** All metrics functions are not yet exposed in this release. They will be available in a future version. Use `model.score()` for R² evaluation, or compute metrics manually with NumPy as an interim solution.

### Classification Metrics *(Coming Soon)*

#### accuracy_score *(Coming Soon)*

Classification accuracy score. Not yet available.

#### precision_score *(Coming Soon)*

Compute the precision. Not yet available.

#### recall_score *(Coming Soon)*

Compute the recall. Not yet available.

#### f1_score *(Coming Soon)*

Compute the F1 score. Not yet available.

#### confusion_matrix *(Coming Soon)*

Compute confusion matrix. Not yet available.

### Regression Metrics *(Coming Soon)*

#### mean_squared_error *(Coming Soon)*

Mean squared error regression loss. Not yet available.

#### mean_absolute_error *(Coming Soon)*

Mean absolute error regression loss. Not yet available.

#### r2_score *(Coming Soon)*

R² (coefficient of determination) regression score function. Not yet available.

**Interim NumPy-based example:**
```python
import numpy as np

y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])

# Compute metrics manually until sklears exposes them
mse = float(np.mean((y_true - y_pred) ** 2))
mae = float(np.mean(np.abs(y_true - y_pred)))
ss_res = float(np.sum((y_true - y_pred) ** 2))
ss_tot = float(np.sum((y_true - np.mean(y_true)) ** 2))
r2 = 1.0 - ss_res / ss_tot

print(f"MSE: {mse:.3f}")  # MSE: 0.375
print(f"MAE: {mae:.3f}")  # MAE: 0.5
print(f"R²: {r2:.3f}")    # R²: 0.948
```

## Model Selection

### train_test_split(X, y=None, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)

Split arrays into random train and test subsets.

**Parameters:**
- `X` : array-like of shape (n_samples, n_features)
  Training data.
- `y` : array-like of shape (n_samples,), optional
  Target variable.
- `test_size` : float or int, optional
  Proportion of the dataset to include in the test split.
- `train_size` : float or int, optional
  Proportion of the dataset to include in the train split.
- `random_state` : int, optional
  Random state for reproducible output.
- `shuffle` : bool, default=True
  Whether to shuffle the data before splitting.
- `stratify` : array-like, optional
  Data is split in a stratified fashion using this as the class labels.

**Returns:**
- `splitting` : list
  List containing train-test split of inputs.

### KFold

K-Fold cross-validator.

```python
class KFold(n_splits=5, shuffle=False, random_state=None)
```

**Parameters:**
- `n_splits` : int, default=5
  Number of folds.
- `shuffle` : bool, default=False
  Whether to shuffle the data before splitting.
- `random_state` : int, optional
  Random state for reproducible output.

**Methods:**

#### split(X, y=None)
Generate indices to split data into training and test set.

#### get_n_splits(X=None, y=None)
Returns the number of splitting iterations in the cross-validator.

**Example:**
```python
import sklears as skl
import numpy as np

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])

kf = skl.KFold(n_splits=2)
for train_index, test_index in kf.split(X):
    print(f"TRAIN: {train_index} TEST: {test_index}")
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
```

## Utilities

### get_version()

Get the version of sklears.

**Returns:**
- `version` : str
  Version string.

### get_build_info()

Get build information about sklears.

**Returns:**
- `info` : dict
  Dictionary containing build information including version, features, and dependencies.

### get_hardware_info() *(Coming Soon)*

Get hardware acceleration capabilities. Not yet available in this release.

### benchmark_basic_operations() *(Coming Soon)*

Run basic performance benchmarks. Not yet available in this release.

### set_config(option, value) *(Coming Soon)*

Set global configuration options. Not yet available in this release.

### get_config() *(Coming Soon)*

Get current configuration. Not yet available in this release.

### show_versions() *(Coming Soon)*

Print comprehensive system information. Not yet available in this release.

**Available utilities example:**
```python
import sklears as skl

print(f"Version: {skl.get_version()}")

build_info = skl.get_build_info()
print(f"Build info: {build_info}")
```

## Error Handling

All functions and methods raise appropriate Python exceptions:

- `ValueError` - Invalid parameter values or data shape mismatches
- `RuntimeError` - Internal computation errors
- `TypeError` - Incorrect parameter types

**Example:**
```python
import sklears as skl
import numpy as np

try:
    model = skl.LinearRegression()
    # This will raise an error due to shape mismatch
    model.fit(np.array([[1, 2]]), np.array([1, 2]))
except ValueError as e:
    print(f"Error: {e}")
```

## Performance Notes

- All algorithms are optimized for performance and use SIMD instructions when available
- Memory usage is optimized with zero-copy operations where possible
- Parallel processing is used automatically when beneficial
- Large datasets are handled efficiently with streaming algorithms

For optimal performance:
1. Use contiguous NumPy arrays when possible
2. Consider data types (float64 vs float32)
3. Enable hardware acceleration features
4. Use appropriate batch sizes for large datasets