Welcome to STEM’s documentation!¶
Contents:
Tensor Overview¶
Synopsis¶
First, we will define a convenience alias:
typealias NativeStorage<Double> D
Next, we can create two vectors:
// construct vector with values
let v1 = Tensor<D>([1, 2, 3])
// construct Tensor with a specific size
let v2 = Tensor<D>(Extent(5, 3))
// construct a matrix with values
let v3 = Tensor<D>([[1, 2, 3], [4, 5, 6]])
STEM supports standard linear algebra operators:
// take the dot product (result is a scalar)
let s1 = v1⊙v2
// take the outer product (result is a matrix)
let m1 = v1⊗v2
// add two vectors together
let v4 = v1+v3
// multiply by a scalar
let v5 = 0.5*v1
STEM also supports advanced indexing (similar to Numpy and Matlab):
let v6 = v2[1..<4]
let m2 = m1[1..<4, 0..<2]
As STEM‘s name implies N-dimensional Tensors are supported. Both the Vector
and Matrix classes are specializations of the Tensor class. These
specializations allow for simpler construction methods as well as the
use of accelerated libraries such as CBLAS and CUDA or OpenCL
through function overloading.
Storage¶
All Tensor s have an associated Storage class that is responsible for
the allocated memory. The two built-in Storage types are: NativeStorage
and CBlasStorage. Other storage types (e.g. CUDA or OpenCL) can
be added without requiring any rewrite of the main library. Because the Storage
type determines which functions get called. If no methods have been specified
for the Storage class, NativeStorage will be called by default.
The Storage protocol is defined as:
public protocol Storage {
associatedtype ElementType:NumericType
var size:Int { get }
var order:DimensionOrder { get }
init(size:Int)
init(array:[ElementType])
init(storage:Self)
init(storage:Self, copy:Bool)
subscript(index:Int) -> ElementType {get set}
// returns the order of dimensions to traverse
func calculateOrder(dims:Int) -> [Int]
// re-order list in order of dimensions to traverse
func calculateOrder(values:[Int]) -> [Int]
}
An implementation of Storage determines the allocation through the init
methods, subscript determines how the storage gets indexed, and calculateStride
allows the Storage to be iterated through in a sequential fashion.
The Tensor class frequently makes use of the generator IndexGenerator to iterate
through the Storage class. This provides a convenient way to access all the
elements without knowing the underyling memory allocation.
To do so, the Tensor class defined the method:
public func indices(order:DimensionOrder?=nil) -> GeneratorSequence<IndexGenerator> {
if let o = order {
return GeneratorSequence<IndexGenerator>(IndexGenerator(shape, order: o))
} else {
return GeneratorSequence<IndexGenerator>(IndexGenerator(shape, order: storage.order))
}
}
which can be used like:
func fill<StorageType:Storage>(tensor:Tensor<StorageType>, value:StorageType.ElementType) {
for i in tensor.indices() {
tensor.storage[i] = value
}
}
However, as mentioned previously, if an optimized version for a particular Tensor
operation exists, you can write:
// This will be used if the Tensor's storage type is CBlasStorage for doubles,
// an alternative can be specified for Floats separately.
func fill(tensor:Tensor<CBlasStorage<Double>>, value:StorageType.ElementType) {
// call custom library
}
| Type | Description |
|---|---|
| NativeStorage | Unaccelerated using row-major memory storage |
| CBlasStorage | CBLAS acceleration using column-major storage |
| GPUStorage | (Not Implemented) GPU acceleration using row-memory storage |
Tensor Class¶
The Tensor class is parameterized by the Storage type, allowing instances
of the class to maintain a pointer to the underlying memory. The Tensor class
also has an instance of ViewType, which allows different views of the same
memory to be constructed, and the array dimIndex, which determines the order
that the dimensions in the Tensor are traversed. These features allow for
multiple Tensor s to provide a different view to the same memory (e.g. a slice
of a Tensor can be created by changing the ViewType instance, or a
Tensor can be transposed by shuffling dimIndex).
Note
Throughout the documentation Tensor<S> indicates the parameterization of
the Tensor class by Storage type S, and NumericType refers to
S.NumericType (see section on Storage for details).
Tensor Construction¶
-
Tensor<S>(_ shape:Extent) Constructs a tensor with the given shape.
-
Tensor<S>([NumericType], axis:Int) Constructs a vector along the given axis
-
Tensor<S>(colvector:[NumericType]) Constructs a column vector (equivalent to
Tensor<S>([NumericType], axis:0))
-
Tensor<S>(rowvector:[NumericType]) Constructs a row vector (equivalent to
Tensor<S>([NumericType], axis:1))
-
Tensor<S>([[NumericType]]) Constructs a matrix
Indexing¶
STEM supports single indexing as well as slice indexing. Given a Tensor T:
To index element (i, j, k):
let value = T[i, j, k]
T[i, j, k] = value
To index the slices (if:il, jf:jl, kf:kl):
let T2 = T[if...il, jf...jl, kf...kl]
T[if...il, jf...jl, kf...kl] = T2
Views¶
Views in STEM are instances of Tensor that point to the same Storage
as another Tensor but with different bounds and/or ordering of dimensions. Views
are most commonly created whenever a slice indexing is used.
A copy of a view can be made by using the copy function.
Tensor Functions¶
Basic Math¶
-
add(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
add(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
add(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
+(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S> -
+(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S> -
+(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S> Adds
lhsandrhstogether, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
sub(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
sub(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
sub(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
-(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S> -
-(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S> -
-(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S> Subtracts
lhsfromrhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
mul(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
mul(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
mul(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
*(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S> -
*(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S> -
*(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S> Performs element-wise multiplication between
lhsandrhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
div(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
div(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
div(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
/(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S> -
/(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S> -
/(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S> Performs element-wise division between
lhsandrhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
iadd(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
iadd(lhs:Tensor<S>, rhs:NumericType) -
iadd(lhs:NumericType, rhs:Tensor<S>) -
+=(lhs:Tensor<S>, rhs:Tensor<S>) -
+=(lhs:Tensor<S>, rhs:NumericType) -
+=(lhs:NumericType, rhs:Tensor<S>) Adds
lhstorhsand stores result inlhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
isub(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
isub(lhs:Tensor<S>, rhs:NumericType) -
isub(lhs:NumericType, rhs:Tensor<S>) -
-=(lhs:Tensor<S>, rhs:Tensor<S>) -
-=(lhs:Tensor<S>, rhs:NumericType) -
-=(lhs:NumericType, rhs:Tensor<S>) Subtracts
rhsfromlhsand stores result inlhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
imul(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
imul(lhs:Tensor<S>, rhs:NumericType) -
imul(lhs:NumericType, rhs:Tensor<S>) -
*=(lhs:Tensor<S>, rhs:Tensor<S>) -
*=(lhs:Tensor<S>, rhs:NumericType) -
*=(lhs:NumericType, rhs:Tensor<S>) Multiplies
lhsbyrhsand stores result inlhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
idiv(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
idiv(lhs:Tensor<S>, rhs:NumericType) -
idiv(lhs:NumericType, rhs:Tensor<S>) -
/=(lhs:Tensor<S>, rhs:Tensor<S>) -
/=(lhs:Tensor<S>, rhs:NumericType) -
/=(lhs:NumericType, rhs:Tensor<S>) Divides
lhsbyrhsand stores results inlhs, either place results inresultor returning its value. If the shape oflhsandrhsdo not match, will either usebroadcastto match their size, or use optimized methods to accomplish the same purpose.
-
pow(lhs:Tensor<S>, exp:NumericType) → Tensor<S>¶ -
**(lhs:Tensor<S>, exp:NumericType) -> Tensor<S> Raises every element of
lhsbyexp.
-
exp(op::Tensor<S>) → Tensor<S>¶ Returns an element-wise exponentiation of
op.
-
abs(op:Tensor<S>) → Tensor<S>¶ Performs an elementwise
abson Tensor.
-
sum(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
sum(op:Tensor<S>) → NumericType When
axisis specified, sumsopalongaxisand returns resulting Tensor. When noaxisis specified, returns entire Tensor summed.
-
max(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
max(op:Tensor<S>) → NumericType When
axisis specified, find the maximum valueopacrossaxisand returns resulting Tensor. When noaxisis specified, returns maximum value of entire Tensor.
-
min(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
min(op:Tensor<S>) → NumericType When
axisis specified, find the minimum valueopacrossaxisand returns resulting Tensor. When noaxisis specified, returns minimum value of entire Tensor.
Linear Algerbra¶
-
dot(lhs:Tensor<S>, rhs:Tensor<S>) → NumericType¶ -
⊙(lhs:Tensor<S>, rhs:Tensor<S>) -> NumericType Returns the dot product between two vectors. Both
lhsandrhsmust be vectors.
-
outer(lhs:Tensor<S>, rhs:Tensor<S>) → NumericType¶ -
⊗(lhs:Tensor<S>, rhs:Tensor<S>) -> NumericType Returns the outer product between two vectors. Both
lhsandrhsmust be vectors.
Tensor Manipulation¶
-
fill(op:Tensor<S>, value:NumericType)¶ Sets all elements of
optovalue.
-
copy(op:Tensor<S>) → Tensor<S>¶ -
copy(from:Tensor<S>, to:Tensor<S>) First form creates a new Tensor and copies
opinto it. The second form copiesopinto an already existing Tensor.
-
map(op:Tensor<S>, fn:(NumericType) -> NumericType) → Tensor<S>¶ Applies
fnto each element of Tensor and returns resulting Tensor.
-
reduce(op:Tensor<S>, fn:(Tensor<S>, Tensor<S>) -> NumericType) → NumericType¶ Applies
fnto elements of Tensor, returns a scalar.
-
concat(op1:Tensor<S>, op2:Tensor<S>, ..., opN:Tensor<S>, axis:Int)¶ Concats N Tensors along
axis.
-
vstack(op1:Tensor<S>, op2:Tensor<S>) → Tensor<S>¶ Returns a new Tensor that is a composite of
op1andop2vertically stacked.
-
hstack(op1:Tensor<S>, op2:Tensor<S>) → Tensor<S>¶ Returns a new Tensor that is a composite of
op1andop2horizontally stacked.
-
broadcast(op1:Tensor<S>, op2:Tensor<S>) -> (Tensor<S>, Tensor<S>)¶ Returns two Tensors of the same shape broadcasted from
op1andop2. Need to go into much more detail about broadcasting.
Random library¶
To generate a random number you first need to create a RandomNumberGenerator:
let rng = RandomNumberGenerator(seed)
The seed variable is optional, and if left out will be set the system’s clock.
The random library can be used in two ways:
- Fill an already existing Tensor with random values
- Create a new Tensor that is filled with random values
Neural Network library¶
Overview¶
The basic building block is an Op, which is parameterized by the underyling storage type (see the Tensor documentation for more details). Every Op as the function apply, which performs a transform on the Op s input updating the output variable. Op s may have a variable number of inputs and outputs.
Figure 1: A computation graph is formed by connecting multiple Op s together. The inputs are a collection of Op s, and the outputs, the result of the operation applied to the input, are a collection of Tensor s.
The design of the Op class is keep the details of how the optimization is performed separate from the transform itself. For example, the protocol Differentiable provides a method to extend the Op class with the gradient method for stochastic gradient descent optimization. The gradient methods returns a new Op whose transform is the derivative of the target Op. This allows a secondary computation graph to easily be constructed:
let linear = Linear<D>(inputSize: 5, outputSize: 3)
let linearGrad = linear.gradient()
Some Op s, like Sequence are collections of other Op s, and will invoke (via apply) each of the contained Op s in a pre-determined order. For example, a standard neural network can be created with:
let net = Sequence<D>(
VariableOp<D>(zeros(Extent(5))),
Linear<D>(inputSize: 5, outputSize: 3),
Sigmoid<D>(size: 3),
L2Loss<D>(size: 3))
net.apply()
There are a few important items of note in this example:
- The
applymethod takes no parameters and has no return value - The input to the first layer of the network is of the type
Variable, which is also anOp - The loss function
L2Lossis also anOp
The design decision to make everything an Op allows the creation of the computational graph. In this case, all of the Op s are also Differentiable, and thus you can do:
let netGrad = net.gradient()
Optimization can be performed with the following code:
params = net.params()
gradParams = netGrad.params()
for i in 0..<iterations {
netGrad.reset()
net.apply()
netGrad.apply()
for (param, gradParam) in Zip2Sequences(params, gradParams) {
param += 0.01*gradParam
}
}
However, this code can be simplified using an Optimizer:
let alpha = VariableOp<D>(0.01)
let opt = GradientDescentOptimizer(net, alpha: alpha)
for i in 0..<iterations {
opt.apply()
}
where GradientDescentOptimizer automatically constructs the gradient network and collects the parmaeters for both the forward and backward sequences.
One of the advantages to having everything be an operation in the computation graph is that the alpha variable can be set dynamically. For example, if a momentum optimization is desired, the alpha variable can be computed from the current error.
The Op class¶
The Op class has the following properties:
- id: unique ID for instance of
Op - inputs: collection of
Ops - output: collection
Tensors that are a result of the transform
and has the following methods defined:
- apply(): performs transform on inputs and stores results in
output - params(): returns all the parameters of the transform (e.g. if its a
LinearOp, then the parameters areweightandbias).
OrderedDictionary¶
One important detail about the inputs and outputs of an Op is that it
is maintained by the OrderedDictionary class. An instance of the OrderedDictionary
class maintains a dictionary of (String: [T]), but also provides a method to
traverse the items in the order that they were added. This provides a guarantee in
the order of traversal as well as provide a method for efficient access (e.g. if
an Op has a specific ordering of inputs, an integer index may be used instead
of a String).
By maintaining an array of T means that a single entry in the OrderedDictionary
may be a collection of items. This provides an easy way to create Op s that
have a variable number of inputs and/or outputs. For example, the AddOp
can take in N inputs and will provide a single output.
Op library¶
-
Linear()¶ Performs a linear transformation on input.
-
Sigmoid()¶ Applies the sigmoid function to each element of the input.
-
Tanh()¶ Applies the Tanh function to each element of the input
-
AddOp()¶ Adds a collection of inputs together
-
MulOp()¶ Multiplies a collection of inputs together
-
Concat()¶ Concatenates a series of inputs together
-
L2Loss()¶ Takes two inputs:
valueandtarget. Calculates the square distance between the two.
Creating a new Op¶
Suppose you wanted to create an Op that takes the log of the input. The Log op can be defined as:
public class Log<S:Storage where S.ElementType:FloatNumericType>: Op<S> {
public init(size:Int) {
super.init( inputs: [NoOp<S>()],
output: Tensor<S>(Extent(size)),
labels: ["input"])
}
public override func apply() {
if output == nil || output!.shape != inputs[0].output!.shape {
output = Tensor<S>(Extent(inputs[0].output!.shape))
}
log(inputs[0].output!, result: output!)
}
}
where the initialization defines a single input (input) that is currently not defined (the NoOp) and the output is allocated as the size specified by the parameter. The apply function finds the maximum value in the input, divides each element of the input by that value, and stores in the result in output.
The gradient of Log can be defined as:
The Log gradient takes two additional inputs: the instance of the Log op its the gradient of, and gradOutput, which is the gradient of the op’s output.
Finally, to allow the gradient to be taken of Log, the class must be extended to Differentiable:
extension Log:Differentiable {
public func gradient() -> GradientType {
return LogGrad<S>(op: self)
}
}
We can change the construction of our network by adding Log into the sequence:
let net = Sequence<D>(
VariableOp<D>(zeros(Extent(5))),
Log<D>(size: 5)
Linear<D>(inputSize: 5, outputSize: 3),
Sigmoid<D>(size: 3),
L2Loss<D>(size: 3))
and have the optimization correctly calculate the derivative as before:
let opt = GradientDescentOptimizer(net, alpha: alpha)
because GradientDescentOptimizer will automatically call gradient on each Op, an instance of LogGradient will be created for each instance of Log.
Testing your Op¶
It is always a good idea to do a gradient check on a newly created Op. You can create a new unit test to do so:
func testLogOpGradient() {
let eps = 10e-6
let input = VariableOp<S>(uniform(Extent(10)))
let gradOutput = VariableOp<S>(zeros(Extent(10)))
let log = Log<S>(size: 10)
log.setInput("input", to: input)
let logGrad = log.gradient() as! LogGrad<S>
logGrad.setInput("gradOutput", to: gradOutput)
// test gradient wrt to the input
let inputError = checkGradient(log, grad: logGrad, params: input.output, gradParams: logGrad.output, eps: eps)
XCTAssertLessThan(inputError, eps)
}
Indices and tables¶
Status¶
STEM is very much a work in progress. It should not be considered stable yet. The status will be kept updated for when the design of STEM has settled down.
If a Tensor library for Swift interests you, please help by contributing code.