Welcome to STEM’s documentation!¶
Contents:
Tensor Overview¶
Synopsis¶
First, we will define a convenience alias:
typealias NativeStorage<Double> D
Next, we can create two vectors:
// construct vector with values
let v1 = Tensor<D>([1, 2, 3])
// construct Tensor with a specific size
let v2 = Tensor<D>(Extent(5, 3))
// construct a matrix with values
let v3 = Tensor<D>([[1, 2, 3], [4, 5, 6]])
STEM supports standard linear algebra operators:
// take the dot product (result is a scalar)
let s1 = v1⊙v2
// take the outer product (result is a matrix)
let m1 = v1⊗v2
// add two vectors together
let v4 = v1+v3
// multiply by a scalar
let v5 = 0.5*v1
STEM also supports advanced indexing (similar to Numpy and Matlab):
let v6 = v2[1..<4]
let m2 = m1[1..<4, 0..<2]
As STEM‘s name implies N-dimensional Tensors are supported. Both the Vector
and Matrix
classes are specializations of the Tensor
class. These
specializations allow for simpler construction methods as well as the
use of accelerated libraries such as CBLAS and CUDA or OpenCL
through function overloading.
Storage¶
All Tensor
s have an associated Storage
class that is responsible for
the allocated memory. The two built-in Storage
types are: NativeStorage
and CBlasStorage
. Other storage types (e.g. CUDA or OpenCL) can
be added without requiring any rewrite of the main library. Because the Storage
type determines which functions get called. If no methods have been specified
for the Storage
class, NativeStorage
will be called by default.
The Storage
protocol is defined as:
public protocol Storage {
associatedtype ElementType:NumericType
var size:Int { get }
var order:DimensionOrder { get }
init(size:Int)
init(array:[ElementType])
init(storage:Self)
init(storage:Self, copy:Bool)
subscript(index:Int) -> ElementType {get set}
// returns the order of dimensions to traverse
func calculateOrder(dims:Int) -> [Int]
// re-order list in order of dimensions to traverse
func calculateOrder(values:[Int]) -> [Int]
}
An implementation of Storage
determines the allocation through the init
methods, subscript
determines how the storage gets indexed, and calculateStride
allows the Storage
to be iterated through in a sequential fashion.
The Tensor
class frequently makes use of the generator IndexGenerator
to iterate
through the Storage
class. This provides a convenient way to access all the
elements without knowing the underyling memory allocation.
To do so, the Tensor
class defined the method:
public func indices(order:DimensionOrder?=nil) -> GeneratorSequence<IndexGenerator> {
if let o = order {
return GeneratorSequence<IndexGenerator>(IndexGenerator(shape, order: o))
} else {
return GeneratorSequence<IndexGenerator>(IndexGenerator(shape, order: storage.order))
}
}
which can be used like:
func fill<StorageType:Storage>(tensor:Tensor<StorageType>, value:StorageType.ElementType) {
for i in tensor.indices() {
tensor.storage[i] = value
}
}
However, as mentioned previously, if an optimized version for a particular Tensor
operation exists, you can write:
// This will be used if the Tensor's storage type is CBlasStorage for doubles,
// an alternative can be specified for Floats separately.
func fill(tensor:Tensor<CBlasStorage<Double>>, value:StorageType.ElementType) {
// call custom library
}
Type | Description |
---|---|
NativeStorage | Unaccelerated using row-major memory storage |
CBlasStorage | CBLAS acceleration using column-major storage |
GPUStorage | (Not Implemented) GPU acceleration using row-memory storage |
Tensor Class¶
The Tensor
class is parameterized by the Storage
type, allowing instances
of the class to maintain a pointer to the underlying memory. The Tensor
class
also has an instance of ViewType
, which allows different views of the same
memory to be constructed, and the array dimIndex
, which determines the order
that the dimensions in the Tensor
are traversed. These features allow for
multiple Tensor
s to provide a different view to the same memory (e.g. a slice
of a Tensor
can be created by changing the ViewType
instance, or a
Tensor
can be transposed by shuffling dimIndex
).
Note
Throughout the documentation Tensor<S>
indicates the parameterization of
the Tensor
class by Storage
type S
, and NumericType
refers to
S.NumericType
(see section on Storage
for details).
Tensor Construction¶
-
Tensor<S>(_ shape:Extent)
Constructs a tensor with the given shape.
-
Tensor<S>([NumericType], axis:Int)
Constructs a vector along the given axis
-
Tensor<S>(colvector:[NumericType])
Constructs a column vector (equivalent to
Tensor<S>([NumericType], axis:0)
)
-
Tensor<S>(rowvector:[NumericType])
Constructs a row vector (equivalent to
Tensor<S>([NumericType], axis:1)
)
-
Tensor<S>([[NumericType]])
Constructs a matrix
Indexing¶
STEM supports single indexing as well as slice indexing. Given a Tensor
T:
To index element (i, j, k):
let value = T[i, j, k]
T[i, j, k] = value
To index the slices (if:il, jf:jl, kf:kl):
let T2 = T[if...il, jf...jl, kf...kl]
T[if...il, jf...jl, kf...kl] = T2
Views¶
Views in STEM are instances of Tensor
that point to the same Storage
as another Tensor
but with different bounds and/or ordering of dimensions. Views
are most commonly created whenever a slice indexing is used.
A copy of a view can be made by using the copy
function.
Tensor Functions¶
Basic Math¶
-
add
(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
add
(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
add
(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
+(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S>
-
+(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S>
-
+(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S>
Adds
lhs
andrhs
together, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
sub
(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
sub
(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
sub
(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
-(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S>
-
-(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S>
-
-(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S>
Subtracts
lhs
fromrhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
mul
(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
mul
(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
mul
(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
*(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S>
-
*(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S>
-
*(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S>
Performs element-wise multiplication between
lhs
andrhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
div
(lhs:Tensor<S>, rhs:Tensor<S>, result:Tensor<S>)¶ -
div
(lhs:Tensor<S>, rhs:NumericType, result:Tensor<S>) -
div
(lhs:NumericType, rhs:Tensor<S>, result:Tensor<S>) -
/(lhs:Tensor<S>, rhs:Tensor<S>) -> Tensor<S>
-
/(lhs:Tensor<S>, rhs:NumericType) -> Tensor<S>
-
/(lhs:NumericType, rhs:Tensor<S>) -> Tensor<S>
Performs element-wise division between
lhs
andrhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
iadd
(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
iadd
(lhs:Tensor<S>, rhs:NumericType) -
iadd
(lhs:NumericType, rhs:Tensor<S>) -
+=(lhs:Tensor<S>, rhs:Tensor<S>)
-
+=(lhs:Tensor<S>, rhs:NumericType)
-
+=(lhs:NumericType, rhs:Tensor<S>)
Adds
lhs
torhs
and stores result inlhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
isub
(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
isub
(lhs:Tensor<S>, rhs:NumericType) -
isub
(lhs:NumericType, rhs:Tensor<S>) -
-=(lhs:Tensor<S>, rhs:Tensor<S>)
-
-=(lhs:Tensor<S>, rhs:NumericType)
-
-=(lhs:NumericType, rhs:Tensor<S>)
Subtracts
rhs
fromlhs
and stores result inlhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
imul
(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
imul
(lhs:Tensor<S>, rhs:NumericType) -
imul
(lhs:NumericType, rhs:Tensor<S>) -
*=(lhs:Tensor<S>, rhs:Tensor<S>)
-
*=(lhs:Tensor<S>, rhs:NumericType)
-
*=(lhs:NumericType, rhs:Tensor<S>)
Multiplies
lhs
byrhs
and stores result inlhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
idiv
(lhs:Tensor<S>, rhs:Tensor<S>)¶ -
idiv
(lhs:Tensor<S>, rhs:NumericType) -
idiv
(lhs:NumericType, rhs:Tensor<S>) -
/=(lhs:Tensor<S>, rhs:Tensor<S>)
-
/=(lhs:Tensor<S>, rhs:NumericType)
-
/=(lhs:NumericType, rhs:Tensor<S>)
Divides
lhs
byrhs
and stores results inlhs
, either place results inresult
or returning its value. If the shape oflhs
andrhs
do not match, will either usebroadcast
to match their size, or use optimized methods to accomplish the same purpose.
-
pow
(lhs:Tensor<S>, exp:NumericType) → Tensor<S>¶ -
**(lhs:Tensor<S>, exp:NumericType) -> Tensor<S>
Raises every element of
lhs
byexp
.
-
exp
(op::Tensor<S>) → Tensor<S>¶ Returns an element-wise exponentiation of
op
.
-
abs
(op:Tensor<S>) → Tensor<S>¶ Performs an elementwise
abs
on Tensor.
-
sum
(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
sum
(op:Tensor<S>) → NumericType When
axis
is specified, sumsop
alongaxis
and returns resulting Tensor. When noaxis
is specified, returns entire Tensor summed.
-
max
(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
max
(op:Tensor<S>) → NumericType When
axis
is specified, find the maximum valueop
acrossaxis
and returns resulting Tensor. When noaxis
is specified, returns maximum value of entire Tensor.
-
min
(op:Tensor<S>, axis:Int) → Tensor<S>¶ -
min
(op:Tensor<S>) → NumericType When
axis
is specified, find the minimum valueop
acrossaxis
and returns resulting Tensor. When noaxis
is specified, returns minimum value of entire Tensor.
Linear Algerbra¶
-
dot
(lhs:Tensor<S>, rhs:Tensor<S>) → NumericType¶ -
⊙(lhs:Tensor<S>, rhs:Tensor<S>) -> NumericType
Returns the dot product between two vectors. Both
lhs
andrhs
must be vectors.
-
outer
(lhs:Tensor<S>, rhs:Tensor<S>) → NumericType¶ -
⊗(lhs:Tensor<S>, rhs:Tensor<S>) -> NumericType
Returns the outer product between two vectors. Both
lhs
andrhs
must be vectors.
Tensor Manipulation¶
-
fill
(op:Tensor<S>, value:NumericType)¶ Sets all elements of
op
tovalue
.
-
copy
(op:Tensor<S>) → Tensor<S>¶ -
copy
(from:Tensor<S>, to:Tensor<S>) First form creates a new Tensor and copies
op
into it. The second form copiesop
into an already existing Tensor.
-
map
(op:Tensor<S>, fn:(NumericType) -> NumericType) → Tensor<S>¶ Applies
fn
to each element of Tensor and returns resulting Tensor.
-
reduce
(op:Tensor<S>, fn:(Tensor<S>, Tensor<S>) -> NumericType) → NumericType¶ Applies
fn
to elements of Tensor, returns a scalar.
-
concat
(op1:Tensor<S>, op2:Tensor<S>, ..., opN:Tensor<S>, axis:Int)¶ Concats N Tensors along
axis
.
-
vstack
(op1:Tensor<S>, op2:Tensor<S>) → Tensor<S>¶ Returns a new Tensor that is a composite of
op1
andop2
vertically stacked.
-
hstack
(op1:Tensor<S>, op2:Tensor<S>) → Tensor<S>¶ Returns a new Tensor that is a composite of
op1
andop2
horizontally stacked.
-
broadcast
(op1:Tensor<S>, op2:Tensor<S>) -> (Tensor<S>, Tensor<S>)¶ Returns two Tensors of the same shape broadcasted from
op1
andop2
. Need to go into much more detail about broadcasting.
Random library¶
To generate a random number you first need to create a RandomNumberGenerator:
let rng = RandomNumberGenerator(seed)
The seed variable is optional, and if left out will be set the system’s clock.
The random library can be used in two ways:
- Fill an already existing Tensor with random values
- Create a new Tensor that is filled with random values
Neural Network library¶
Overview¶
The basic building block is an Op
, which is parameterized by the underyling storage type (see the Tensor
documentation for more details). Every Op
as the function apply
, which performs a transform on the Op
s input updating the output variable. Op
s may have a variable number of inputs and outputs.
The design of the Op
class is keep the details of how the optimization is performed separate from the transform itself. For example, the protocol Differentiable
provides a method to extend the Op
class with the gradient
method for stochastic gradient descent optimization. The gradient
methods returns a new Op
whose transform is the derivative of the target Op
. This allows a secondary computation graph to easily be constructed:
let linear = Linear<D>(inputSize: 5, outputSize: 3)
let linearGrad = linear.gradient()
Some Op
s, like Sequence
are collections of other Op
s, and will invoke (via apply
) each of the contained Op
s in a pre-determined order. For example, a standard neural network can be created with:
let net = Sequence<D>(
VariableOp<D>(zeros(Extent(5))),
Linear<D>(inputSize: 5, outputSize: 3),
Sigmoid<D>(size: 3),
L2Loss<D>(size: 3))
net.apply()
There are a few important items of note in this example:
- The
apply
method takes no parameters and has no return value - The input to the first layer of the network is of the type
Variable
, which is also anOp
- The loss function
L2Loss
is also anOp
The design decision to make everything an Op
allows the creation of the computational graph. In this case, all of the Op
s are also Differentiable
, and thus you can do:
let netGrad = net.gradient()
Optimization can be performed with the following code:
params = net.params()
gradParams = netGrad.params()
for i in 0..<iterations {
netGrad.reset()
net.apply()
netGrad.apply()
for (param, gradParam) in Zip2Sequences(params, gradParams) {
param += 0.01*gradParam
}
}
However, this code can be simplified using an Optimizer
:
let alpha = VariableOp<D>(0.01)
let opt = GradientDescentOptimizer(net, alpha: alpha)
for i in 0..<iterations {
opt.apply()
}
where GradientDescentOptimizer
automatically constructs the gradient network and collects the parmaeters for both the forward and backward sequences.
One of the advantages to having everything be an operation in the computation graph is that the alpha
variable can be set dynamically. For example, if a momentum optimization is desired, the alpha
variable can be computed from the current error.
The Op class¶
The Op
class has the following properties:
- id: unique ID for instance of
Op
- inputs: collection of
Op
s - output: collection
Tensor
s that are a result of the transform
and has the following methods defined:
- apply(): performs transform on inputs and stores results in
output
- params(): returns all the parameters of the transform (e.g. if its a
Linear
Op, then the parameters areweight
andbias
).
OrderedDictionary¶
One important detail about the inputs
and outputs
of an Op
is that it
is maintained by the OrderedDictionary
class. An instance of the OrderedDictionary
class maintains a dictionary of (String
: [T
]), but also provides a method to
traverse the items in the order that they were added. This provides a guarantee in
the order of traversal as well as provide a method for efficient access (e.g. if
an Op
has a specific ordering of inputs
, an integer index may be used instead
of a String
).
By maintaining an array of T
means that a single entry in the OrderedDictionary
may be a collection of items. This provides an easy way to create Op
s that
have a variable number of inputs and/or outputs. For example, the AddOp
can take in N
inputs and will provide a single output.
Op library¶
-
Linear
()¶ Performs a linear transformation on input.
-
Sigmoid
()¶ Applies the sigmoid function to each element of the input.
-
Tanh
()¶ Applies the Tanh function to each element of the input
-
AddOp
()¶ Adds a collection of inputs together
-
MulOp
()¶ Multiplies a collection of inputs together
-
Concat
()¶ Concatenates a series of inputs together
-
L2Loss
()¶ Takes two inputs:
value
andtarget
. Calculates the square distance between the two.
Creating a new Op¶
Suppose you wanted to create an Op
that takes the log of the input. The Log
op can be defined as:
public class Log<S:Storage where S.ElementType:FloatNumericType>: Op<S> {
public init(size:Int) {
super.init( inputs: [NoOp<S>()],
output: Tensor<S>(Extent(size)),
labels: ["input"])
}
public override func apply() {
if output == nil || output!.shape != inputs[0].output!.shape {
output = Tensor<S>(Extent(inputs[0].output!.shape))
}
log(inputs[0].output!, result: output!)
}
}
where the initialization defines a single input (input
) that is currently not defined (the NoOp
) and the output is allocated as the size specified by the parameter. The apply
function finds the maximum value in the input, divides each element of the input by that value, and stores in the result in output
.
The gradient of Log
can be defined as:
The Log
gradient takes two additional inputs: the instance of the Log
op its the gradient of, and gradOutput
, which is the gradient of the op’s output.
Finally, to allow the gradient to be taken of Log
, the class must be extended to Differentiable
:
extension Log:Differentiable {
public func gradient() -> GradientType {
return LogGrad<S>(op: self)
}
}
We can change the construction of our network by adding Log
into the sequence:
let net = Sequence<D>(
VariableOp<D>(zeros(Extent(5))),
Log<D>(size: 5)
Linear<D>(inputSize: 5, outputSize: 3),
Sigmoid<D>(size: 3),
L2Loss<D>(size: 3))
and have the optimization correctly calculate the derivative as before:
let opt = GradientDescentOptimizer(net, alpha: alpha)
because GradientDescentOptimizer
will automatically call gradient
on each Op
, an instance of LogGradient
will be created for each instance of Log
.
Testing your Op¶
It is always a good idea to do a gradient check on a newly created Op
. You can create a new unit test to do so:
func testLogOpGradient() {
let eps = 10e-6
let input = VariableOp<S>(uniform(Extent(10)))
let gradOutput = VariableOp<S>(zeros(Extent(10)))
let log = Log<S>(size: 10)
log.setInput("input", to: input)
let logGrad = log.gradient() as! LogGrad<S>
logGrad.setInput("gradOutput", to: gradOutput)
// test gradient wrt to the input
let inputError = checkGradient(log, grad: logGrad, params: input.output, gradParams: logGrad.output, eps: eps)
XCTAssertLessThan(inputError, eps)
}
Indices and tables¶
Status¶
STEM is very much a work in progress. It should not be considered stable yet. The status will be kept updated for when the design of STEM has settled down.
If a Tensor library for Swift interests you, please help by contributing code.