Documentation of NumericExtensions.jl¶
NumericExtensions.jl is a Julia package that provides high performance support of numerical computation. This package is an extension of the Julia Base – part of the material may be migrated into the Base in future.
NumericExtensions.jl provides a wide range of tools, which include:
- higher order functions for mapping, reduction, and map-reduce operation that takes typed functors to achieve performance comparable to hand-crafted loops.
- Functions that allow inplace updating and writing results to pre-allocated arrays for mapping, reduction, and map-reduce operations.
- Convenient functions for inplace vectorized computation.
- Vector broadcasting.
- Fast views for operating on contiguous blocks/slices.
Contents:
Package Overview¶
Julia provides a fantastic technical computing environment that allows you to write codes that are both performant and generic. However, as it is still at its early stage, some functions are not as performant as they can be and writing computational algorithms directly based on builtin functions may not give you the best performance. This package provides you with a variety of tools to address such issues.
Motivating example¶
To see how this package may help you, let’s first consider a simple example, that is, to compute the sum of squared difference between two vectors. This can be easily done in Julia in one line as follows
r = sum(abs2(x - y))
Whereas this is simple, this expression involves some unnecessary operations that would lead to suboptimal performance: (1) it creates two temporary arrays respectively to store x - y
and abs(x - y)
, (2) it completes the computation through three passes over the data – computing x - y
, computing abs2(x - y)
, and finally computing the sum. Julia provides a mapreduce
function which allows you to complete the operation in a single pass without creating any temporaries:
r = mapreduce((x, y) -> abs2(x - y), +, x, y)
However, if you really run this you may probably find that this is even slower. The culprit here is that the anonymous function (x, y) -> abs2(x - y)
is not inlined; it will be resolved and called at each iteration. Therefore, to compute this efficiently, one has to write loops as below
s = 0.
for i = 1 : length(x)
s += abs2(x[i] - y[i])
end
This is not too bad though, until you have more complex needs, e.g. computing this along each row/column of the matrix. Then writing the loops can become more involved, and it is more tricky to implement it in a cache-friendly way.
With this package, we can compute this efficiently without writing loops, as
r = sumfdiff(Abs2Fun(), x, y)
# to compute this along a specific dimension
r = sumfdiff(Abs2Fun(), x, y, dim)
# this package also provides sumsqdiff for this
r = sumabsdiff(x, y)
r = sumabsdiff(x, y, dim)
Here, Abs2Fun
and Add
are typed functors provided by this package, which, unlike normal functions, can still be properly inlined when passed into a higher order function (thus causing zero overhead). This package extends map
, foldl
, foldr
, sum
, and so on to accept typed functors and as well introduces additional higher order functions like sumfdiff
and scan
etc to simplify the usage in common cases.
Benchmark shows that writing in this way is over 9x faster than sum(abs2(x - y))
.
Main features¶
Main features of this package are highlighted below:
- Pre-defined functors that cover most typical mathematical computation;
- A easy way for user to define customized functors;
- Extended/specialized methods for
map
,map!
,foldl
,foldr
, andscan
. These methods are carefully optimized, which often result in 2x - 10x speed up; - Additional functions such as
map1!
,sum!
, andscan!
that allow inplace updating or writing results to preallocated arrays; - Vector broadcasting computation (supporting both inplace updating and writing results to new arrays).
- Fast shared-memory views of arrays.
Many of the methods are extensions of base functions. Simply adding a statement using NumericExtensions
is often enough for substantial performance improvement. Consider the following code snippet:
using NumericExtensions
x = rand(1000, 1000)
r = sum(x, 2)
Here, when adding the statement using NumericExtensions
transparently replace the method provided in the Base module by the specialized method in NumericExtensions. As a consequence, the statement r = sum(x, 2)
becomes 6x faster. Using additional functions provided by this package can further improve the performance.
Functors¶
Passing functions as arguments are essential in writing generic algorithms. However, function arguments do not get inlined in Julia (at current version), usually resulting in suboptimal performance.
Motivating example¶
Passing functions as arguments are essential in writing generic algorithms. However, function arguments do not get inlined in Julia (at current version), usually resulting in suboptimal performance. Consider the following example:
plus(x, y) = x + y
map_plus(x, y) = map(plus, x, y)
a = rand(1000, 1000)
b = rand(1000, 1000)
# warming up and get map_plus compiled
a + b
map_plus(a, b)
# benchmark
@time for i in 1 : 10 map_plus(a, b) end # -- statement (1)
@time for i in 1 : 10 a + b end # -- statement (2)
Run this script in you computer, you will find that statement (1) is over 20+ times slower than statement (2). The reason is that the function argument plus
is resolved and called at each iteration of the inner loop within the map
function.
This package addresses this issue through type functors (i.e. function-like objects of specific types) and a set of highly optimized higher level functions for mapping and reduction. The codes above can be rewritten as
using NumericExtensions
# benchmark
@time for i in 1 : 10 map(Add(), a, b) end # -- statement(1)
@time for i in 1 : 10 a + b end # -- statement(2)
Here, using a typed functor Add
statement (1) is 10% faster than statement (2) in my benchmark.
Functor types¶
Functor
is the abstract base type for all functors, which are formally defined as below
abstract Functor{N} # N: the number of arguments
typealias UnaryFunctor Functor{1}
typealias BinaryFunctor Functor{2}
typealias TernaryFunctor Functor{3}
Below is an example that shows how to define a functor that computes the squared difference:
type SqrDiff <: Functor{2} end
NumericExtensions.evaluate(::SqrDiff, x, y) = abs2(x - y)
The package also provides macros @functor1
and @functor2
, respectively for defining unary and binary functors. For example,
# this defines a functor type MyAbs, which,
# when evaluated, invokes abs
@functor1 MyAbsFun abs
# this defines a functor type SqrDiff
sqrdiff(x, y) = abs2(x - y)
@functor2 SqrDiff sqrdiff
Pre-defined functors¶
NumericExtensions.jl has defined a series of functors as listed below:
- Arithmetic operators:
Add
,Subtract
,Multiply
,Divide
,Negate
,Pow
,Modulo
- Comparison operators:
Greater
,GreaterEqual
,Less
,LessEqual
,Equal
,NotEqual
- Floating-point predicates:
IsfiniteFun
,IsinfFun
,IsnanFun
,IsequalFun
- Logical operators:
Not
,And
,Or
- Bitwise operators:
BitwiseNot
,BitwiseAnd
,BitwiseOr
,BitwiseXor
- max and min:
MaxFun
,MinFun
- Rounding functors:
FloorFun
,CeilFun
,RoundFun
,TruncFun
,IfloorFun
,IceilFun
,IroundFun
,ItruncFun
- Algebraic functors:
AbsFun
,Abs2Fun
,SqrFun
,SqrtFun
,CbrtFun
,RcpFun
,RsqrtFun
,RcbrtFun
,HypotFun
- exp and log functors:
ExpFun
,Exp2Fun
,Exp10Fun
,LogFun
,Log2Fun
,Log10Fun
,Expm1Fun
,Log1pFun
- Trigonometric functors:
SinFun
,CosFun
,TanFun
,CotFun
,CscFun
,SecFun
- Inverse Trigono functors:
AsinFun
,AcosFun
,AtanFun
,Atan2Fun
,AcotFun
,AcscFun
,AsecFun
- Hyperbolic functors:
SinhFun
,CoshFun
,TanhFun
,CothFun
,CschFun
,SechFun
- Inverse Hyperbolic functors:
AsinhFun
,AcoshFun
,AtanhFun
,AcothFun
,AcschFun
,AsechFun
- Error functors:
ErfFun
,ErfcFun
,ErfInvFun
,ErfcInvFun
- Gamma functors:
GammaFun
,LgammaFun
,LfactFun
,DigammaFun
- Beta functors:
BetaFun
,LbetaFun
,EtaFun
,ZetaFun
- Airy functors:
AiryFun
,AiryprimeFun
,AiryaiFun
,AiryaiprimeFun
,AirybiFun
,AirybiprimeFun
- Bessel functors:
BesseljFun
,Besselj0Fun
,Besselj1Fun
,BesseliFun
,BesselkFun
- Fused multiply and add:
FMA
(i.e.(a, b, c) -> a + b * c
) - Others:
LogitFun
,LogisticFun
,InvLogisticFun
,XlogxFun
,XlogyFun
Except for several functors that corresponding to operators, most functors are named using the capitalized version of the corresponding math function. Therefore, you don’t have to look up this list to find the names. The collection of pre-defined functors will be extended in future. Please refer to src/functors.jl
for the most updated list.
Mapping¶
NumericExtensions.jl extends map
and map!
to accept functors for efficient element-wise mapping:
General usage¶
Synopsis:
Let f1
, f2
, and f3
be respectively unary, binary, and ternary functors. Generic usage of map
and map!
is summarized as follows:
map(f1, x)
map(f2, x1, x2)
map(f3, x1, x2, x3)
map!(f1, dst, x)
map!(f2, dst, x1, x2)
map!(f3, dst, x1, x2, x3)
Here, map
creates and returns the resultant array, while map!
writes results to a pre-allocated dst
and returns it. Each argument can be either an array or a scalar number. At least one argument should be an array, and all array arguments should have compatible sizes.
Examples:
map(AbsFun(), x) # returns abs(x)
map(FMA(), x, y, z) # returns x + y .* z
map!(Add(), dst, x, 2) # writes x + 2 to dst
Additional functions¶
NumericExtensions.jl provides additional functions (map1!
, mapdiff
, and mapdiff!
) to simplify common use:
Synopsis
map1!
updates the first argument inplace with the results, mapdiff
maps a functor to the difference between two arguments, and mapdiff!
writes the results of mapdiff
to a pre-allocated array.
map1!(f1, x1) # x1 <-- f1(x1)
map1!(f2, x1, x2) # x1 <-- f2(x1, x2)
map1!(f3, x1, x2, x3) # x1 <-- f3(x1, x2, x3)
mapdiff(f1, x, y) # returns f1(x - y)
mapdiff!(f1, dst, x, y) # dst <-- f1(x - y)
Here, x1
(i.e. the first argument to map1!
must be an array, while x2
and x3
can be either an array or a number).
Note that mapdiff
and mapdiff!
uses an efficient implementation, which completes the computation in one-pass and never creates the intermediate array x - y
.
Examples
map1!(Mul(), x, 2) # multiply x by 2 (inplace)
mapdiff(Abs2Fun(), x, y) # compute squared differences between x and y
mapdiff(AbsFun(), x, 1) # compute |x - 1|
Pre-defined mapping functions¶
Julia already provides vectorized function for most math computations. In this package, we additionally define several functions for vectorized inplace computation (based on map!
), as follows
add!(x, y) # x <- x + y
subtract!(x, y) # x <- x - y
multiply!(x, y) # x <- x .* y
divide!(x, y) # x <- x ./ y
negate!(x) # x <- -x
pow!(x, y) # x <- x .^ y
abs!(x) # x <- abs(x)
abs2!(x) # x <- abs2(x)
rcp!(x) # x <- 1 ./ x
sqrt!(x) # x <- sqrt(x)
exp!(x) # x <- exp(x)
log!(x) # x <- log(x)
floor!(x) # x <- floor(x)
ceil!(x) # x <- ceil(x)
round!(x) # x <- round(x)
trunc!(x) # x <- trunc(x)
In the codes above, x
must be an array (i.e. an instance of AbstractArray
), while y
can be either an array or a scalar.
In addition, this package also define some useful functions using compound functos:
absdiff(x, y) # abs(x - y)
sqrdiff(x, y) # abs2(x - y)
fma(x, y, c) # x + y .* c, where c can be array or scalar
fma!(x, y, c) # x <- x + y .* c
Performance¶
For simple functions, such as x + y
or exp(x)
, the performance of the map version such as map(Add(), x, y)
and map(ExpFun(), x)
is comparable to the Julia counter part. However, map
can accelerate computation considerably in a variety of cases:
- When the result storage has been allocated (e.g. in iterative updating algorithms) or you want inplace update, then
map!
or the pre-defined inplace computation function can be used to avoid unnecessary memory allocation/garbage collection, which can sometimes be the performance killer. - When the inner copy contains two or multiple steps,
map
andmap!
can complete the computation in one-pass without creating intermediate arrays, usually resulting in about2x
or even more speed up. Benchmark shows thatabsdiff(x, y)
andsqrdiff(x, y)
are about 2.2x faster thanabs(x - y)
andabs2(x - y)
. - The script
test/benchmark_map.jl
runs a series of benchmarks to compare the performancemap
and the Julia vectorized expressions for a variety of computation.
Vector Broadcasting¶
Julia has very nice and performant functions for broadcasting: broadcast
and broadcast!
, which however does not work with functors. For customized computation, you have to pass in function argument, which would lead to severe performance degradation. NumericExtensions.jl provides vbroadcast
and vbroadcast!
to address this issue.
Synopsis
vbroadcast(f, x, y, dim) # apply a vector y to vectors of x along a specific dimension
vbroadcast!(f, dst, x, y, dim) # write results to dst
vbroadcast1!(f, x, y) # update x with the results
Here, f
is a binary functor, and x
and y
are arrays such that length(y) == size(x, dim)
.
Examples
vbroadcast(f, x, y, 1) # r[:,i] = f(x[:,i], y) for each i
vbroadcast(f, x, y, 2) # r[i,:] = f(x[i,:], y) for each i
vbroadcast(f, x, y, 3) # r[i,j,:] = f(x[i,j,:], y) for each i, j
vbroadcast1!(Add(), x, y, 1) # x[:,i] += y[:,i] for each i
vbroadcast1!(Mul(), x, y, 2) # x[i,:] .*= y[i,:] for each i
Difference from broadcast
Unlike broadcast
, you have to specify the vector dimension for vbroadcast
. The benefit is two-fold: (1) the overhead of figuring out broadcasting shape information is elimintated; (2) the shape of y
can be flexible.
x = rand(5, 6)
y = rand(6)
vbroadcast(Add(), x, y, 2) # this adds y to each row of x
broadcast(+, x, reshape(y, 1, 6)) # with broadcast, you have to first reshape y into a row
Reduction¶
A key advantage of this package are highly optimized reduction and map-reduction functions, which sometimes lead to over 10x
speed up.
Basic reduction functions¶
The package extends/specializes sum
, mean
, max
, and min
, not only providing substantially better performance, but also allowing reduction over function results. It also provides sum!
, mean!
, max!
, and min!
, which allow writing results to pre-allocated storage when performing reduction along specific dimensions.
The function sum
and its variant forms:
sum(x)
sum(f1, x) # compute sum of f1(x)
sum(f2, x1, x2) # compute sum of f2(x1, x2)
sum(f3, x1, x2, x3) # compute sum of f3(x1, x2, x3)
sum(x, dim)
sum(f1, x, dim)
sum(f2, x1, x2, dim)
sum(f3, x1, x2, x3, dim)
sum!(dst, x, dim) # write results to dst
sum!(dst, f1, x1, dim)
sum!(dst, f2, x1, x2, dim)
sum!(dst, f3, x1, x2, x3, dim)
sumfdiff(f2, x, y) # compute sum of f2(x - y)
sumfdiff(f2, x, y, dim)
sumfdiff!(dst, f2, x, y, dim)
The function mean
and its variant forms:
mean(x)
mean(f1, x) # compute mean of f1(x)
mean(f2, x1, x2) # compute mean of f2(x1, x2)
mean(f3, x1, x2, x3) # compute mean of f3(x1, x2, x3)
mean(x, dim)
mean(f1, x, dim)
mean(f2, x1, x2, dim)
mean(f3, x1, x2, x3, dim)
mean!(dst, x, dim) # write results to dst
mean!(dst, f1, x1, dim)
mean!(dst, f2, x1, x2, dim)
mean!(dst, f3, x1, x2, x3, dim)
meanfdiff(f2, x, y) # compute mean of f2(x - y)
meanfdiff(f2, x, y, dim)
meanfdiff!(dst, f2, x, y, dim)
The function max
and its variants:
max(x)
max(f1, x) # compute maximum of f1(x)
max(f2, x1, x2) # compute maximum of f2(x1, x2)
max(f3, x1, x2, x3) # compute maximum of f3(x1, x2, x3)
max(x, (), dim)
max(f1, x, dim)
max(f2, x1, x2, dim)
max(f3, x1, x2, x3, dim)
max!(dst, x, dim) # write results to dst
max!(dst, f1, x1, dim)
max!(dst, f2, x1, x2, dim)
max!(dst, f3, x1, x2, x3, dim)
maxfdiff(f2, x, y) # compute maximum of f2(x - y)
maxfdiff(f2, x, y, dim)
maxfdiff!(dst, f2, x, y, dim)
The function min
and its variants
min(x)
min(f1, x) # compute minimum of f1(x)
min(f2, x1, x2) # compute minimum of f2(x1, x2)
min(f3, x1, x2, x3) # compute minimum of f3(x1, x2, x3)
min(x, (), dim)
min(f1, x, dim)
min(f2, x1, x2, dim)
min(f3, x1, x2, x3, dim)
min!(dst, x, dim) # write results to dst
min!(dst, f1, x1, dim)
min!(dst, f2, x1, x2, dim)
min!(dst, f3, x1, x2, x3, dim)
minfdiff(f2, x, y) # compute minimum of f2(x - y)
minfdiff(f2, x, y, dim)
minfdiff!(dst, f2, x, y, dim)
Note: when computing maximum/minimum along specific dimension, we use max(x, (), dim)
and min(x, (), dim)
instead of max(x, dim)
and min(x, dim)
to avoid ambiguities that would otherwise occur.
Generic folding¶
This package extends foldl
and foldr
for generic folding.
# suppose length(x) == 4
foldl(op, x) # i.e. op(op(op(x[1], x[2]), x[3]), x[4])
foldr(op, x) # i.e. op(x[1], op(x[2], op(x[3], x[4])))
foldl(Add(), x) # sum over x from left to right
foldr(Add(), x) # sum over x from right to left
You can also use functors to generate terms for folding.
foldl(op, f1, x) # fold over f1(x) from left to right
foldr(op, f1, x) # fold over f1(x) from right to left
foldl(op, f2, x1, x2) # fold over f2(x1, x2) from left to right
foldr(op, f2, x1, x2) # fold over f2(x1, x2) from right to left
foldl(op, f3, x1, x2, x3) # fold over f3(x1, x2, x3) from left to right
foldr(op, f3, x1, x2, x3) # fold over f3(x1, x2, x3) from right to left
foldl_fdiff(op, f, x, y) # fold over f(x - y) from left to right
foldr_fdiff(op, f, x, y) # fold over f(x - y) from right to left
You may also provide an initial value s0
for folding.
foldl(op, s0, x)
foldr(op, s0, x)
foldl(op, s0, f1, x)
foldr(op, s0, f1, x)
foldl(op, s0, f2, x1, x2)
foldf(op, s0, f2, x1, x2)
foldl(op, s0, f3, x1, x2, x3)
foldr(op, s0, f3, x1, x2, x3)
foldl_fdiff(op, s0, f, x, y)
foldr_fdiff(op, s0, f, x, y)
The function foldl
also supports reduction along a specific dim.
foldl(op, s0, x, dim) # fold op over x along dimension dim
foldl(op, s0, f1, x, dim) # fold op over f1(x) along dimension dim
foldl(op, s0, f2, x1, x2, dim) # fold op over f2(x1, x2) along dimension dim
foldl(op, s0, f3, x1, x2, x3, dim) # fold op over f3(x1, x2, x3) along dimension dim
foldl_fdiff(op, s0, f, x, y, dim) # fold op over f(x - y) along dimension dim
# the following statement write results to pre-allocated storage
foldl!(dst, op, s0, x, dim)
foldl!(dst, op, s0, f1, x, dim)
foldl!(dst, op, s0, f2, x1, x2, dim)
foldl!(dst, op, s0, f3, x1, x2, x3, dim)
foldl_fdiff!(dst, op, s0, f, x, y, dim)
Derived reduction functions¶
In addition to these basic reduction functions, we also define a set of derived reduction functions, as follows:
var(x)
var(x, dim)
var!(dst, x, dim)
std(x)
std(x, dim)
std!(dst, x, dim)
sumabs(x) # == sum(abs(x))
sumabs(x, dim)
sumabs!(dst, x, dim)
meanabs(x) # == mean(abs(x))
meanabs(x, dim)
meanabs!(dst, x, dim)
maxabs(x) # == max(abs(x))
maxabs(x, dim)
maxabs!(dst, x, dim)
minabs(x) # == min(abs(x))
minabs(x, dim)
minabs!(dst, x, dim)
sumsq(x) # == sum(abs2(x))
sumsq(x, dim)
sumsq!(dst, x, dim)
meansq(x) # == mean(abs2(x))
meansq(x, dim)
meansq!(dst, x, dim)
dot(x, y) # == sum(x .* y)
dot(x, y, dim)
dot!(dst, x, y, dim)
sumabsdiff(x, y) # == sum(abs(x - y))
sumabsdiff(x, y, dim)
sumabsdiff!(dst, x, y, dim)
meanabsdiff(x, y) # == mean(abs(x - y))
meanabsdiff(x, y, dim)
meanabsdiff!(dst, x, y, dim)
maxabsdiff(x, y) # == max(abs(x - y))
maxabsdiff(x, y, dim)
maxabsdiff!(dst, x, y, dim)
minabsdiff(x, y) # == min(abs(x - y))
minabsdiff(x, y, dim)
minabsdiff!(dst, x, y, dim)
sumsqdiff(x, y) # == sum(abs2(x - y))
sumsqdiff(x, y, dim)
sumsqdiff!(dst, x, y, dim)
meansqdiff(x, y) # == mean(abs2(x - y))
meansqdiff(x, y, dim)
meansqdiff!(dst, x, y, dim)
Although this is quite a large set of functions, the actual code is quite concise, as most of such functions are generated through macros (see src/reduce.jl
)
In addition to the common reduction functions, this package also provides a set of statistics functions that are particularly useful in probabilistic or information theoretical computation, as follows
sumxlogx(x) # == sum(xlogx(x)) with xlog(x) = x > 0 ? x * log(x) : 0
sumxlogx(x, dim)
sumxlogx!(dst, x, dim)
sumxlogy(x, y) # == sum(xlog(x,y)) with xlogy(x,y) = x > 0 ? x * log(y) : 0
sumxlogy(x, y, dim)
sumxlogy!(dst, x, y, dim)
entropy(x) # == - sumxlogx(x)
entropy(x, dim)
entropy!(dst, x, dim)
logsumexp(x) # == log(sum(exp(x)))
logsumexp(x, dim)
logsumexp!(dst, x, dim)
softmax!(dst, x) # dst[i] = exp(x[i]) / sum(exp(x))
softmax(x)
softmax!(dst, x, dim)
softmax(x, dim)
For logsumexp
and softmax
, special care is taken to ensure numerical stability for large x values, that is, their values will be properly shifted during computation (e.g. you can perfectly do logsumexp([1000., 2000., 3000.]))
with this package, while log(sum(exp([1000., 2000., 3000.])))
would lead to overflow.)
Weighted Sum¶
Computation of weighted sum as below is common in practice.
NumericExtensions.jl directly supports such computation via wsum
and wsumfdiff
:
wsum(w, x) # weighted sum of x with weights w
wsum(w, f1, x1) # weighted sum of f1(x1) with weights w
wsum(w, f2, x1, x2) # weighted sum of f2(x1, x2) with weights w
wsum(w, f3, x1, x2, x3) # weighted sum of f3(x1, x2, x3) with weights w
wsumfdiff(w, f2, x, y) # weighted sum of f2(x - y) with weights w
These functions also support computing the weighted sums along a specific dimension:
wsum(w, x, dim)
wsum!(dst, w, x, dim)
wsum(w, f1, x1, dim)
wsum!(dst, w, f1, x1, dim)
wsum(w, f2, x1, x2, dim)
wsum!(dst, w, f2, x1, x2, dim)
wsum(w, f3, x1, x2, x3, dim)
wsum!(dst, w, f3, x1, x2, x3, dim)
wsumfdiff(w, f2, x, y, dim)
wsumfdiff!(dst, w, f2, x, y, dim)
Furthermore, wsumabs
, wsumabsdiff
, wsumsq
, wsumsqdiff
are provided to compute weighted sum of absolute values / squares to simplify common use:
wsumabs(w, x) # weighted sum of abs(x)
wsumabs(w, x, dim)
wsumabs!(dst, w, x, dim)
wsumabsdiff(w, x, y) # weighted sum of abs(x - y)
wsumabsdiff(w, x, y, dim)
wsumabsdiff!(dst, w, x, y, dim)
wsumsq(w, x) # weighted sum of abs2(x)
wsumsq(w, x, dim)
wsumsq!(dst, w, x, dim)
wsumsqdiff(w, x, y) # weighted sum of abs2(x - y)
wsumsqdiff(w, x, y, dim)
wsumsqdiff!(dst, w, x, y, dim)
Performance¶
The reduction and map-reduction functions are carefully optimized. In particular, several tricks lead to performance improvement:
- computation is performed in a cache-friendly manner;
- computation completes in a single pass without creating intermediate arrays;
- kernels are inlined via the use of typed functors;
- inner loops use linear indexing (with pre-computed offset);
- opportunities of using BLAS are exploited.
Generally, many of the reduction functions in this package can achieve 3x - 12x speed up as compared to the typical Julia expression.
We observe further speed up for certain functions:
- full reduction with
sumabs
,sumsq
, anddot
utilize BLAS level 1 routines, and they achieve 10x to 30x speed up. - For
var
andstd
, we devise dedicated procedures, where computational steps are very carefully scheduled such that most computation is conducted in a single pass. This results in about 25x speedup.
Vector Norms and Normalization¶
This package provides functions for evaluating vector norms and normalizing vectors.
Vector Norm Evaluation¶
Synopsis
vnorm(x, p) # compute L-p norm of vec(x)
vnorm(x, p, dim) # compute L-p norm of x along dimension dim
vnorm!(r, x, p, dim) # compute L-p norm along specific dimension and
# write results to r
vnormdiff(x, y, p) # compute L-p norm of vec(x - y)
vnormdiff(x, y, p, dim) # compute L-p norm of x - y along dim
vnormdiff!(r, x, y, p, dim) # compute L-p norm of x - y along dim and write to r
Notes:
- For
vnormdiff
andvnormdiff!
,x
ory
can be either an array or a scalar. - When
p
is 1, 2, or Inf, specialized fast routines are used.
Examples
vnorm(x, 2) # compute L-2 norm of x
vnorm(x, 2, 1) # compute L-2 norm of each column of x
vnorm(x, Inf, 2) # compute L-inf norm of each row of x
vnorm!(r, x, 2, 1) # compute L-2 norm of each column, and write results to r
vnormdiff(x, 2.5, 2) # compute L-2 norm of x - 2.5
vnormdiff(x, y, 1, 2) # compute L-1 norm of x - y for each column
Normalization¶
Normalizing a vector w.r.t L-p norm means to scale a vector such that the L-p norm of the vector becomes 1.
Synopsis
normalize(x, p) # returns a normalized vector w.r.t. L-p norm
normalize!(x, p) # normalize x w.r.t. L-p norm inplace
normalize!(r, x, p) # write the normalized vector to a pre-allocated array r
normalize(x, p, dim) # returns an array comprised of normalized vectors along dim
normalize!(x, p, dim) # normalize vectors of x along dim inplace w.r.t. L-p norm
normalize!(r, x, p, dim) # write the normalized vectors along dim to r
Array Scanning¶
Scanning refers to the operation that applies a binary function recursively as follows:
Let x
be an input vector and f
be a binary operator, then the output y
is a vector of the same length given by
For example, the function cumsum
is a special case of array scanning.
This package provides both generic scanning functions, such as scan
, scan!
, mapscan
, and mapscan!
, as well as specialized scanning functions, such as cumsum
and cumsum!
, etc.
Generic Scanning¶
Synopsis
Let op
be a binary operator, f1
, f2
, and f3
be respectively unary, binary, and ternary functors.
scan(op, x) # scan x using operator op
mapscan(op, f1, x) # scan f1(x)
mapscan(op, f2, x, y) # scan f2(x, y)
mapscan(op, f3, x, y, z) # scan f3(x, y, z)
scan!(r, op, x) # write the results of scanning of x to r
mapscan!(r, op, f1, x)
mapscan!(r, op, f2, x, y)
mapscan!(r, op, f3, x, y, z)
scan!(op, x) # inplace scanning of x using operator op
Both scan
and scan!
accepts a dimension argument for scanning vectors along a specific dimension.
scan(op, x, dim) # scan vectors of x along dim using operator op
mapscan(op, f1, x, dim) # scan vectors of f1(x) along dim
mapscan(op, f2, x, y, dim) # scan vectors of f2(x, y) along dim
mapscan(op, f3, x, y, z, dim) # scan vectors of f3(x, y, z) along dim
scan!(r, op, x, dim) # write the results of scanning of x to r
mapscan!(r, op, f1, x, dim)
mapscan!(r, op, f2, x, y, dim)
mapscan!(r, op, f3, x, y, z, dim)
scan!(op, x, dim) # inplace scanning of x along dim
Note: mapscan
and mapscan!
use efficient ways to scan the terms without creating a temporary array such as f1(x)
. Hence, mapscan(op, f1, x)
is generally faster than scan(op, f(x))
.
Examples
scan(Add(), x) # equivalent to cumsum(x)
scan(Add(), x, 1) # equivalent to cumsum(x, 1)
scan!(Add(), x) # writing cumsum(x) inplace (i.e. back to x)
Common Scanning Functions¶
For some common scanning operations, we provide specific functions to simplify the use.
# cumsum uses Add() as scanning operator
cumsum(x)
cumsum!(r, x)
cumsum!(x)
cumsum(f1, x)
cumsum(f2, x, y)
cumsum(f3, x, y, z)
cumsum!(r, f1, x)
cumsum!(r, f2, x, y)
cumsum!(r, f3, x, y, z)
# cummax uses MaxFun() as scanning operator
cummax(x)
cummax!(r, x)
cummax!(x)
cummax(f1, x)
cummax(f2, x, y)
cummax(f3, x, y, z)
cummax!(r, f1, x)
cummax!(r, f2, x, y)
cummax!(r, f3, x, y, z)
# cummin uses MinFun() as scanning operator
cummin(x)
cummin!(r, x)
cummin!(x)
cummin(f1, x)
cummin(f2, x, y)
cummin(f3, x, y, z)
cummin!(r, f1, x)
cummin!(r, f2, x, y)
cummin!(r, f3, x, y, z)