sparta.nn¶

class sparta.nn.OperatorBase(raw_module: Module)[source]¶

Base class of sparse operators.

Examples

python

# Create a dense softmax layer
dense_softmax = torch.nn.Softmax

# Create a mask
mask = torch.rand((2048, 1024)) > 0.99

# Create a sparse softmax layer using the dense layer and the mask
sparse_softmax = sparta.nn.SparseSoftmax(dense_softmax, mask=mask)

# Tune the sparse softmax layer
sparta.tune(sparse_softmax, sample_inputs=[torch.rand((2048, 1024))])

Parameters: raw_module (torch.nn.Module) – The corresponding dense operator.

build(params: Dict, sample_inputs: List, jit: bool = True)[source]¶

Build the sparse kernel using the specified implementation and configs.

Parameters

params (Dict) – building parameters. It should be a valid sample of search space params[‘_name’] should be a valid kernel name in self._possible_implementations other key-value pairs in params are the parameters for self._possible_implementations[params[‘_name’]]
sample_inputs (List) – sample inputs for shape inference
jit (bool) – Determine whether to build the kernel using JIT mode.

forward(*args)[source]¶: Forward function. Calls the corresponding dense operator if not built.

get_search_space() → TunableItemCfg[source]¶

Get the search space of the sparse operator.

Returns: the search space of the sparse operator.
Return type: TunableItemCfg

set_search_space(search_space: Optional[TunableItemCfg] = None)[source]¶

Input a custom search space to override the default one before tuning.

Examples

python

# Create a dense linear layer
dense_linear = torch.nn.Linear(1024, 2048)

# Create a mask
weight_mask = torch.rand((2048, 1024)) > 0.99

# Create a sparse linear layer using the dense layer and the mask
sparse_linear = sparta.nn.SparseLinear(dense_linear, weight_mask=weight_mask)

# Set custom search space
search_space_cfg = TunableItemCfg('choice', {
    'openai': {},
    'sparta': {
        'BLOCK_SIZE_M_VALUE': TunableItemCfg('choice', [32, 64]),
        'BLOCK_SIZE_K_VALUE': TunableItemCfg('choice', [32, 64]),
        'BLOCK_SIZE_N_VALUE': TunableItemCfg('choice', [32, 64]),
        'THREAD_SIZE_M_VALUE': TunableItemCfg('choice', [4]),
        'THREAD_SIZE_K_VALUE': TunableItemCfg('choice', [4]),
        'THREAD_SIZE_N_VALUE': TunableItemCfg('choice', [4]),
    },
})
sparse_linear.set_search_space(search_space_cfg)

# Tune the sparse linear layer
sparta.tune(sparse_linear, sample_inputs=[torch.rand((512, 1024))])

Parameters: search_space (dict) – Key is the tuning algorithm, value is a dictionary whose keys are tunable parameters and values are lists of possible values.

tester(params: Dict, sample_inputs: List, jit: bool = False, weight_bk: float = 0.0) → float[source]¶

Tester function for tuning. It will build the sparse kernel and run the forward function (or backward also), and return the measured time.

Parameters

params (Dict) – building parameters. It should be a valid sample of search space
sample_inputs (List) – sample inputs for shape inference
jit (bool) – Determine whether to test the kernel using JIT mode.
weight_bk (float) – The weight of the backward time in the total time. If set to 0, the backward time is not counted.

Returns

The performance (running latency) of the kernel.

Return type

float

class sparta.nn.SparseLinear(raw_module: Linear, input_mask: Optional[Tensor] = None, weight_mask: Optional[Tensor] = None, output_mask: Optional[Tensor] = None)[source]¶

Sparse linear operator.

Examples

python

# Create a dense linear layer
dense_linear = torch.nn.Linear(1024, 2048)

# Create a mask
weight_mask = torch.rand((2048, 1024)) > 0.99

# Create a sparse linear layer using the dense layer and the mask
sparse_linear = sparta.nn.SparseLinear(dense_linear, weight_mask=weight_mask)

# Tune the sparse linear layer
sparta.tune(sparse_linear, sample_inputs=[torch.rand((512, 1024))])

Parameters

raw_module (torch.nn.Linear) – The corresponding dense linear operator.
input_mask (torch.Tensor) – The input mask tensor with shape (*, in_features). The kernel mode will be “sparse x dense => dense” if the input mask is set.
weight_mask (torch.Tensor) – The weight mask tensor with shape (out_features, in_features). The kernel mode will be “dense x sparse => dense” if the input mask is set.
output_mask (torch.Tensor) – The output mask tensor with shape (*, out_features). The kernel mode will be “dense x dense => sparse” if the input mask is set.

class sparta.nn.SparseSoftmax(raw_module: Softmax, mask: Optional[Tensor] = None)[source]¶

Sparse softmax operator.

Examples

python

# Create a dense softmax layer
dense_softmax = torch.nn.Softmax

# Create a mask
mask = torch.rand((2048, 1024)) > 0.99

# Create a sparse softmax layer using the dense layer and the mask
sparse_softmax = sparta.nn.SparseSoftmax(dense_softmax, mask=mask)

# Tune the sparse softmax layer
sparta.tune(sparse_softmax, sample_inputs=[torch.rand((2048, 1024))])

Parameters

raw_module (torch.nn.Softmax) – The corresponding dense softmax operator.
mask (torch.Tensor) – The mask with the same shape as the input tensor.

sparta.nn.tune(module: Module, sample_inputs: List[Tensor], algo: str = 'grid', max_trials: int = 9223372036854775807, tester_kw: Optional[Dict] = None, build_kw: Optional[Dict] = None, tuner_kw: Optional[Dict] = None, verbose: bool = False)¶

Find, tune and build all sparse operators in the model.

Parameters

module (torch.nn.Module) – A PyTorch module that contains one or more sparse sub-modules.
sample_inputs (List[torch.Tensor]) – Sample input tensors to determine shape parameters.
algo – (str, optional): The algorithm to search the best parameters. Defaults to ‘grid’.
max_trials – (int, optional): The maximum number of trials to run. Defaults to sys.maxsize.
tester_kw – (Dict, optional): The keyword arguments for the tester. Defaults to None.
build_kw – (Dict, optional): The keyword arguments for the builder (after tuning). Defaults to None.
tuner_kw – (Dict, optional): The keyword arguments for the tuner. Defaults to None.