Welcome to Benchpress¶

Contents:
Quick Start¶Fire up your terminal, and: # Install using PyPi
pip install benchpress --user
# Make the Benchpress binaries available
export PATH=$PATH:$HOME/.local/bin
Specify what to benchmark by implementing a Python script that generates commands: import benchpress as bp
from benchpress.suite_util import BP_ROOT
scripts = [
('X-ray', 'xraysim', ["10*10*1", "20*10*1"]),
('Bean', 'galton_bean_machine', ["10000*10", "20000*10"]),
]
cmd_list = []
for label, name, sizes in scripts:
for size in sizes:
full_label = "%s/%s" % (label, size)
bash_cmd = "python {root}/benchmarks/{script}/python_numpy/{script}.py {size}" \
.format(root=BP_ROOT, script=name, size=size)
cmd_list.append(bp.command(bash_cmd, full_label))
# Finally, we build the Benchpress suite, which is written to `--output`
bp.create_suite(cmd_list)
And run the script: $ python suites/simple_example.py --output my_benchmark.json
Scheduling 'X-ray/10*10*1': 'python xraysim/python_numpy/xraysim.py --size=10*10*1'
Scheduling 'X-ray/20*10*1': 'python xraysim/python_numpy/xraysim.py --size=20*10*1'
Scheduling 'Bean/10000*10': 'python galton_bean_machine/python_numpy/galton_bean_machine.py --size=10000*10'
Scheduling 'Bean/20000*10': 'python galton_bean_machine/python_numpy/galton_bean_machine.py --size=20000*10'
Writing suite file: my_benchmark.json
The result is a JSON file results.json that encapsulate the commands that make up the benchmark suite. Now, use bp-run to run the benchmark suite: $bp-run results.json
Executing 'X-ray/10*10*1'
Executing 'X-ray/20*10*1'
Executing 'Bean/10000*10'
Executing 'Bean/20000*10'
Finally, let’s visualize the results in ASCII: $bp-cli results.json
X-ray/10*10*1: [0.013303, 0.013324, 0.012933] 0.0132 (0.0002)
X-ray/20*10*1: [0.108884, 0.105319, 0.105392] 0.1065 (0.0017)
Bean/10000*10: [0.002653, 0.002553, 0.002616] 0.0026 (0.0000)
Bean/20000*10: [0.005149, 0.005088, 0.005271] 0.0052 (0.0001)
Or as a bar chart: $bp-chart results.json --output results.pdf
Writing file 'results.pdf' using format 'pdf'.
![]() Installation¶Benchpress is distributed via PyPi and Github, which means that you can install it from a Python package (from PyPi) or directly from an unpacked tarball or git-clone from Github.com. Note Benchpress is designed to work, with minimal friction, in an environment where the user has limited system permissions. Such as shared computing environments, clusters and supercomputers. A system-wide installation of Benchpress is therefore untested. However, it should work if write permission is assigned to Benchpress users for the From pypi.python.org¶The following shows how to do a user-mode / local installation: pip install benchpress --user
Extend your export PATH=$PATH:$HOME/.local/bin
When you are done using Benchpress, purging it from your system is as easy as: pip uninstall benchpress
From clone or tarball¶Clone the repos: git clone https://github.com/bh107/benchpress.git
cd benchpress
or download and unpack a tarball: wget https://github.com/bh107/benchpress/archive/master.zip
unzip master.zip
cd bohrium-benchpress
And install it: pip install . --user
When developing Benchpress use PyPi’s developer installation:
The Suite File¶The suite file contains the set of commands to execute and the results of said executions.
In Python, it is easy to create a suite file. You start by creating a list of [{
'label': 'X-ray/10*10*1', # The label of the command
'cmd': 'python xraysim.py 10*10*1', # The command to execute
'env': {} # The environment variables that will be defined before execution
}, {
'label': 'X-ray/20*10*1',
'cmd': 'python xraysim.py 20*10*1',
'env': {}
}]
And then you call JSON schema¶The suite format is defined in json-schema: A example of a suite file that contains two commands; one finished and one pending: {
"creation_date_utc": "2017-05-23T09:02:25.373696",
"cmd_list": [
{
"cmd": "python xraysim.py 10*10*1",
"jobs": [
{
"status": "finished",
"warmup": true,
"results": [
{
"stderr": "",
"success": true,
"stdout": "elapsed-time: 0.013244\n"
}
],
"nruns": 1
}
],
"env": {},
"label": "X-ray/10*10*1"
},
{
"cmd": "python xraysim.py 20*10*1",
"jobs": [
{
"status": "pending",
"warmup": true,
"nruns": 1
}
],
"env": {},
"label": "X-ray/20*10*1"
}
]
}
Usage - Commands¶Benchpress provides a set of binaries for running and visualize the suite file The Suite File. bp-run¶After you have created a benchmark suite file (see. The Suite File), you can then use usage: run.py [-h] [--nruns NRUNS] [--warmup] [--dirty] [--tag TAG] [--slurm]
[--partition PARTITION] [--multi-jobs] [--nice NICE]
suite
Runs a benchmark suite and stores the results in a JSON-file.
positional arguments:
suite Path to the JSON file where the benchmark results will
be read and written. If the file exist, the benchmark
will resume.
optional arguments:
-h, --help show this help message and exit
--nruns NRUNS How many times should each command run.
--warmup Execute one warm up run, before the measured runs
--dirty Do no clean up.
--tag TAG Assign a tag to the result.
SLURM Queuing System:
--slurm Use the SLURM queuing system.
--partition PARTITION
Submit to a specific SLURM partition.
--multi-jobs Submit 'nruns' SLURM jobs instead of one job with
'nruns' number of runs.
--nice NICE The scheduling priority - range is from -10000
(highest priority) to 10000 (lowest priority) where
zero is default. Only privileged users can specify a
negative priority.
bp-cli¶At any point, you can read the result of the finished executions within a benchmark suite file.
usage: cli.py [-h] [-o FILE] [--parse-regex RegEx] [--py-type {float,int,str}]
[--labels-to-include RegEx] [--labels-to-exclude RegEx]
[--label-map RegEx:label,...,RegEx:label] [--csv]
[--csv-separator sep]
FILE
Prints the result of a Benchpress JSON-file.
positional arguments:
FILE JSON file containing results
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
Write output to FILE. (default: None)
--parse-regex RegEx How to parse the result of each run. For each RegEx
match, group one is recorded as a result. (default:
elapsed-time: ([\d.]+))
--py-type {float,int,str}
The Python data type of the parsed results. (default:
float)
--labels-to-include RegEx
All labels that match the RegEx are showed. (default:
.*)
--labels-to-exclude RegEx
All labels that match the RegEx are ignored. (default:
None)
--label-map RegEx:label,...,RegEx:label
Comma separated list of original-to-new-label names
(default: None)
--csv Use the CSV format using 'separator' as the separator.
(default: False)
--csv-separator sep Use the CSV format using 'sep' as the separator.
(default: ,)
bp-cli-series¶usage: cli_series.py [-h] [-o FILE] [--parse-regex RegEx]
[--py-type {float,int,str}] [--labels-to-include RegEx]
[--labels-to-exclude RegEx]
[--label-map RegEx:label,...,RegEx:label]
[--csv-separator sep]
[--meta-key {creation_date_utc,tag}]
FILE_LIST [FILE_LIST ...]
Prints the result of a series of Benchpress JSON-files.
positional arguments:
FILE_LIST JSON file containing results (accept multiple files)
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
Write output to FILE. (default: None)
--parse-regex RegEx How to parse the result of each run. For each RegEx
match, group one is recorded as a result. (default:
elapsed-time: ([\d.]+))
--py-type {float,int,str}
The Python data type of the parsed results. (default:
float)
--labels-to-include RegEx
All labels that match the RegEx are showed. (default:
.*)
--labels-to-exclude RegEx
All labels that match the RegEx are ignored. (default:
None)
--label-map RegEx:label,...,RegEx:label
Comma separated list of original-to-new-label names
(default: None)
--csv-separator sep Use the CSV format using 'sep' as the separator.
(default: ,)
--meta-key {creation_date_utc,tag}
The meta key, which value vary throughout the series
(default: creation_date_utc)
bp-chart¶usage: bar_per_cmd.py [-h] [-o FILE] [--parse-regex RegEx]
[--py-type {float,int,str}] [--labels-to-include RegEx]
[--labels-to-exclude RegEx]
[--label-map RegEx:label,...,RegEx:label]
[--title TITLE] [--warmups WARMUPS] [--ymax YMAX]
[--ymin YMIN] [--ylog] [--ylabel YLABEL]
[--xticklabel-rotation DEGREE] [--no-legend]
[--fontsize FONTSIZE]
[--pyplot-style {seaborn-darkgrid,Solarize_Light2,seaborn-notebook,classic,seaborn-ticks,grayscale,bmh,seaborn-talk,dark_background,ggplot,fivethirtyeight,_classic_test,seaborn-colorblind,seaborn-deep,seaborn-whitegrid,seaborn,seaborn-poster,seaborn-bright,seaborn-muted,seaborn-paper,seaborn-white,fast,seaborn-pastel,seaborn-dark,tableau-colorblind10,seaborn-dark-palette}]
[--mpld3]
FILE
Plots the result of a Benchpress JSON-file (one bar per command)
positional arguments:
FILE JSON file containing results
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
Write output to FILE. (default: None)
--parse-regex RegEx How to parse the result of each run. For each RegEx
match, group one is recorded as a result. (default:
elapsed-time: ([\d.]+))
--py-type {float,int,str}
The Python data type of the parsed results. (default:
float)
--labels-to-include RegEx
All labels that match the RegEx are showed. (default:
.*)
--labels-to-exclude RegEx
All labels that match the RegEx are ignored. (default:
None)
--label-map RegEx:label,...,RegEx:label
Comma separated list of original-to-new-label names
(default: None)
--title TITLE The title of the chart. (default: None)
--warmups WARMUPS Specify the amount of samples from warm-up rounds.
(default: 0)
--ymax YMAX Max value of the y-axis (default: None)
--ymin YMIN Min value of the y-axis (default: None)
--ylog Makes the y-axis logarithmic (default: False)
--ylabel YLABEL Label on the y-axis (default: Elapsed time in seconds)
--xticklabel-rotation DEGREE
The rotation of the labels on the x-axis (default: 90)
--no-legend Hide the legend box (default: False)
--fontsize FONTSIZE Fontsize (default: 12)
--pyplot-style {seaborn-darkgrid,Solarize_Light2,seaborn-notebook,classic,seaborn-ticks,grayscale,bmh,seaborn-talk,dark_background,ggplot,fivethirtyeight,_classic_test,seaborn-colorblind,seaborn-deep,seaborn-whitegrid,seaborn,seaborn-poster,seaborn-bright,seaborn-muted,seaborn-paper,seaborn-white,fast,seaborn-pastel,seaborn-dark,tableau-colorblind10,seaborn-dark-palette}
The matplotlib.pyplot.style to use (default: seaborn-
darkgrid)
--mpld3 Generate mpld3 plots <https://mpld3.github.io>
(default: False)
bp-chart-series¶usage: series_per_cmd.py [-h] [-o FILE] [--parse-regex RegEx]
[--py-type {float,int,str}]
[--labels-to-include RegEx]
[--labels-to-exclude RegEx]
[--label-map RegEx:label,...,RegEx:label]
[--title TITLE] [--warmups WARMUPS] [--ymax YMAX]
[--ymin YMIN] [--ylog] [--ylabel YLABEL]
[--plot-size PLOT_SIZE]
[--xticklabel-rotation DEGREE] [--no-legend]
[--fontsize FONTSIZE] [--pyplot-style PYPLOT_STYLE]
[--meta-key {creation_date_utc,tag}] [--mpld3]
FILE_LIST [FILE_LIST ...]
Plots the result of a Benchpress JSON-file (one bar per command)
positional arguments:
FILE_LIST JSON file containing results (accept multiple files)
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
Write output to FILE. (default: None)
--parse-regex RegEx How to parse the result of each run. For each RegEx
match, group one is recorded as a result. (default:
elapsed-time: ([\d.]+))
--py-type {float,int,str}
The Python data type of the parsed results. (default:
float)
--labels-to-include RegEx
All labels that match the RegEx are showed. (default:
.*)
--labels-to-exclude RegEx
All labels that match the RegEx are ignored. (default:
None)
--label-map RegEx:label,...,RegEx:label
Comma separated list of original-to-new-label names
(default: None)
--title TITLE The title of the chart. (default: None)
--warmups WARMUPS Specify the amount of samples from warm-up rounds.
(default: 0)
--ymax YMAX Max value of the y-axis (default: None)
--ymin YMIN Min value of the y-axis (default: None)
--ylog Makes the y-axis logarithmic (default: False)
--ylabel YLABEL Label on the y-axis (default: Elapsed time in seconds)
--plot-size PLOT_SIZE
Size of the plot compared to total figure (0.1 to 1.0)
(default: 0.7)
--xticklabel-rotation DEGREE
The rotation of the labels on the x-axis (default: 90)
--no-legend Hide the legend box (default: False)
--fontsize FONTSIZE Fontsize (default: 12)
--pyplot-style PYPLOT_STYLE
The matplotlib.pyplot.style to use (default: seaborn-
darkgrid)
--meta-key {creation_date_utc,tag}
The meta key, which value vary throughout the series
(default: creation_date_utc)
--mpld3 Generate mpld3 plot <https://mpld3.github.io>
(default: False)
bp-raw¶usage: raw.py [-h] [-o FILE] [--parse-regex RegEx] [--py-type {float,int,str}]
[--labels-to-include RegEx] [--labels-to-exclude RegEx]
[--label-map RegEx:label,...,RegEx:label] [--no-stdout]
[--no-stderr]
FILE
Raw dump of the stdout and stderr in a Benchpress JSON-file.
positional arguments:
FILE JSON file containing results
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
Write output to FILE. (default: None)
--parse-regex RegEx How to parse the result of each run. For each RegEx
match, group one is recorded as a result. (default:
elapsed-time: ([\d.]+))
--py-type {float,int,str}
The Python data type of the parsed results. (default:
float)
--labels-to-include RegEx
All labels that match the RegEx are showed. (default:
.*)
--labels-to-exclude RegEx
All labels that match the RegEx are ignored. (default:
None)
--label-map RegEx:label,...,RegEx:label
Comma separated list of original-to-new-label names
(default: None)
--no-stdout Don't dump stdout. (default: False)
--no-stderr Don't dump stderr. (default: False)
Reference¶
Benchpress: a benchmark tool and collection¶Benchpress is primarily a tool for running benchmarks and analyze/visualize the result.
Benchpress¶
|
Suite Examples¶NAS Parallel Benchmarks by NASA¶This script generates a Benchpress suite file that will run the NAS benchmark suite <https://www.nas.nasa.gov/publications/npb.html>. It download and compile the NAS benchmark based on the configuration specified with ‘–nas-config’ Example Use the GCC config: python nas_parallel_benchmark.py --nas-config "make.def.gcc_x86" --output nas_suite.json
And then run the suite: bp-run nas_suite.json
Finally, check the result: # Wall clock timings
bp-cli nas_suite.json --parse-regex "Time in seconds\s*=(.+)"
# Mop/s
bp-cli nas_suite.json --parse-regex "Mop/s total\s*=(.+)"
The Source: """
===============================
NAS Parallel Benchmarks by NASA
===============================
This script generates a Benchpress suite file that will run the NAS benchmark suite
<https://www.nas.nasa.gov/publications/npb.html>.
It download and compile the NAS benchmark based on the configuration specified with '--nas-config'
Example
-------
Use the GCC config::
python nas_parallel_benchmark.py --nas-config "make.def.gcc_x86" --output nas_suite.json
And then *run* the suite::
bp-run nas_suite.json
Finally, check the result::
# Wall clock timings
bp-cli nas_suite.json --parse-regex "Time in seconds\s*=(.+)"
# Mop/s
bp-cli nas_suite.json --parse-regex "Mop/s total\s*=(.+)"
"""
import urllib
import tarfile
import tempfile
import subprocess
import os
from os.path import join
import benchpress as bp
from benchpress.argument_handling import args, add_argument
if __name__ == "__main__":
# Adding arguments that are specific to this Benchpress suite
add_argument(
'--nas-config',
default="make.def.gcc_x86",
help="The name of the NAS config to use."
)
# Let's download and unpack the NAS benchmark
tdir = tempfile.mkdtemp()
nas_filename = join(tdir, "NAS.tgz")
print ("Download NAS benchmark to '%s'" % nas_filename)
urllib.urlretrieve("https://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz", nas_filename)
tar = tarfile.open(nas_filename)
tar.extractall(path=tdir)
tar.close()
# Choose NAS config
print ("Using NAS config '%s'" % args().nas_config)
wdir = join(tdir, "NPB3.3.1/NPB3.3-OMP")
os.rename("%s/config/NAS.samples/%s" % (wdir, args().nas_config), "%s/config/make.def" % wdir)
benchmarks = ['bt', 'cg', 'dc', 'ft', 'is', 'lu', 'mg', 'sp', 'ua']
size = "W"
for benchmark in benchmarks:
subprocess.check_call(["make %s CLASS=%s" % (benchmark, size)], shell=True, cwd=wdir)
print ("BUILD FINISHED\n\n")
cmd_list = []
for label in benchmarks:
bash_cmd = "{root}/bin/{label}.{size}.x".format(root=wdir, label=label, size=size)
cmd_list.append(bp.command(bash_cmd, "%s/%s/OMP1" % (label, size), env={'OMP_NUM_THREADS': 1}))
bp.create_suite(cmd_list)
Benchmark Examples¶Benchpress includes a collection of benchmarks: Benchmarks¶
Black Scholes¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
import math
try:
import numpy_force as npf
except ImportError:
import numpy as npf
bench = util.Benchmark("Black Scholes", "samples*iterations")
def model(N):
"""Construct pseudo-data representing price samples?"""
# Create the outside of bohrium
S = npf.random.random(N).astype(bench.dtype) * 4.0 + 58.0 # Price is between 58-62
S = np.array(S)
return S
def CND(X):
"""
Cumulative normal distribution.
Helper function used by BS(...).
"""
(a1, a2, a3, a4, a5) = (0.31938153, -0.356563782, 1.781477937, -1.821255978, 1.330274429)
L = np.absolute(X)
K = 1.0 / (1.0 + 0.2316419 * L)
w = 1.0 - 1.0 / math.sqrt(2 * np.pi) * np.exp(-L * L / 2.) * \
(a1 * K + \
a2 * (K ** 2) + \
a3 * (K ** 3) + \
a4 * (K ** 4) + \
a5 * (K ** 5))
mask = X < 0
w = w * ~mask + (1.0 - w) * mask
return w
def BS(CallPutFlag, S, X, T, r, v):
"""Black Sholes Function."""
d1 = (np.log(S / X) + (r + v * v / 2.) * T) / (v * math.sqrt(T))
d2 = d1 - v * math.sqrt(T)
if CallPutFlag == 'c':
return S * CND(d1) - X * math.exp(-r * T) * CND(d2)
else:
return X * math.exp(-r * T) * CND(-d2) - S * CND(-d1)
def price(S, I, flag='c', X=65.0, dT=(1.0 / 365.0), r=0.08, v=0.3):
"""Model price as a function of time."""
T = dT
Ps = []
N = len(S)
for i in range(I):
P = np.sum(BS(flag, S, X, T, r, v)) / N
Ps.append(P)
T += dT
bench.flush()
return Ps
def main():
N = bench.args.size[0]
I = bench.args.size[1]
S = model(N) # Construct pseudo-data\
bench.start()
R = price(S, I) # Run the model
bench.stop()
bench.pprint()
if __name__ == "__main__":
main()
C99 Omp¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double* model(int64_t samples)
{
const uint64_t key = 0;
double* data = (double*)malloc(sizeof(double)*samples);
#pragma omp parallel for
for(int64_t count = 0; count<samples; ++count) {
philox2x32_as_1x64_t counter;
counter.combined = count;
uint64_t x_raw = ((philox2x32_as_1x64_t)philox2x32(
counter.orig,
(philox2x32_key_t){ { key } }
)).combined;
double x = x_raw;
x /= 18446744073709551616.000000;
x *= 4.0;
x += 58.0; // Model between 58-62
data[count] = x;
}
return data;
}
// The cumulative normal distribution function
double cnd( double x )
{
double L, K, w;
double const a1 = 0.31938153,
a2 = -0.356563782,
a3 = 1.781477937,
a4 = -1.821255978,
a5 = 1.330274429;
L = fabs(x);
K = 1.0 / (1.0 + 0.2316419 * L);
w = 1.0 - 1.0 / sqrt(2 * M_PI) * exp(-L *L / 2) * (\
a1 * K + \
a2 * pow(K, 2) + \
a3 * pow(K, 3) + \
a4 * pow(K, 4) + \
a5 * pow(K, 5)
);
return (x<0) ? 1.0 - w : w;
}
// The Black and Scholes (1973) Stock option formula
void pricing(double* market, double *prices,
size_t samples, size_t iterations,
char flag, double x, double d_t, double r, double v)
{
double t = d_t;
for(size_t iter=0; iter<iterations; ++iter) {
double res = 0;
#pragma omp parallel for reduction(+:res)
for(size_t sample=0; sample<samples; ++sample) {
double d1 = (log(market[sample]/x) + (r+v*v/2)*t) / (v*sqrt(t));
double d2 = d1-v*sqrt(t);
if (flag == 'c') {
res += market[sample] *cnd(d1)-x * exp(-r*t)*cnd(d2);
} else {
res += x * exp(-r*t) * cnd(-d2) - market[sample] * cnd(-d1);
}
}
t += d_t;
prices[iter] = res / samples;
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 0;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
double* market = model(samples); // Generate pseudo-market data
double* prices = (double*)malloc(sizeof(double)*iterations); // Prices
bp.timer_start();
pricing(
market, prices,
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(c99_omp)");
if (bp.args.verbose) { // and values.
printf("output: [ ");
for(int i=0; i<iterations; ++i) {
printf("%f ", prices[i]);
}
printf("]\n");
}
free(market);
free(prices);
return 0;
}
C99 Seq¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double* model(int64_t samples)
{
const uint64_t key = 0;
double* data = (double*)malloc(sizeof(double)*samples);
for(int64_t count = 0; count<samples; ++count) {
philox2x32_as_1x64_t counter;
counter.combined = count;
uint64_t x_raw = ((philox2x32_as_1x64_t)philox2x32(
counter.orig,
(philox2x32_key_t){ { key } }
)).combined;
double x = x_raw;
x /= 18446744073709551616.000000;
x *= 4.0;
x += 58.0; // Model between 58-62
data[count] = x;
}
return data;
}
// The cumulative normal distribution function
double cnd( double x )
{
double L, K, w;
double const a1 = 0.31938153,
a2 = -0.356563782,
a3 = 1.781477937,
a4 = -1.821255978,
a5 = 1.330274429;
L = fabs(x);
K = 1.0 / (1.0 + 0.2316419 * L);
double K2 = K*K;
double K4 = K2*K2;
w = 1.0 - 1.0 / sqrt(2 * M_PI) * exp(-L *L / 2) * (\
a1 * K + \
a2 * K2 + \
a3 * K*K2 + \
a4 * K4 + \
a5 * K*K4
);
return (x<0) ? 1.0 - w : w;
}
// The Black and Scholes (1973) Stock option formula
void pricing(double* market, double *prices,
size_t samples, size_t iterations,
char flag, double x, double d_t, double r, double v)
{
double t = d_t;
for(size_t iter=0; iter<iterations; ++iter) {
double res = 0;
for(size_t sample=0; sample<samples; ++sample) {
double d1 = (log(market[sample]/x) + (r+v*v/2)*t) / (v*sqrt(t));
double d2 = d1-v*sqrt(t);
if (flag == 'c') {
res += market[sample] *cnd(d1)-x * exp(-r*t)*cnd(d2);
} else {
res += x * exp(-r*t) * cnd(-d2) - market[sample] * cnd(-d1);
}
}
t += d_t;
prices[iter] = res / samples;
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 0;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
double* market = model(samples); // Generate pseudo-market data
double* prices = (double*)malloc(sizeof(double)*iterations); // Prices
bp.timer_start();
pricing(
market, prices,
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(c99_seq)");
if (bp.args.verbose) { // and values.
printf("output: [ ");
for(int i=0; i<iterations; ++i) {
printf("%f ", prices[i]);
}
printf("]\n");
}
free(market);
free(prices);
return 0;
}
Cpp11 Armadillo¶/*
This file is part of Bohrium and Copyright (c) 2012 the Bohrium team:
http://bohrium.bitbucket.org
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#include <iostream>
#include <armadillo>
#include <bp_util.h>
using namespace std;
using namespace arma;
template <typename T>
Col<T> cnd(Col<T> x)
{
size_t samples = x.n_elem;
Col<T> l(samples), k(samples), w(samples);
T a1 = 0.31938153,
a2 =-0.356563782,
a3 = 1.781477937,
a4 =-1.821255978,
a5 = 1.330274429,
pp = 2.5066282746310002; // sqrt(2.0*PI)
l = abs(x);
k = 1.0 / (1.0 + 0.2316419 * l);
w = 1.0 - 1.0 / pp * exp(-1.0*l%l/2.0) % \
(a1*k + \
a2*(pow(k,(T)2)) + \
a3*(pow(k,(T)3)) + \
a4*(pow(k,(T)4)) + \
a5*(pow(k,(T)5)));
uvec mask = x < 0.0;
return w % (-mask) + (1.0-w) % mask;
}
template <typename T>
T* pricing(size_t samples, size_t iterations, char flag, T x, T d_t, T r, T v)
{
T* p = (T*)malloc(sizeof(T)*samples); // Intermediate results
T t = d_t; // Initial delta
Col<T> d1(samples), d2(samples), res(samples);
Col<T> s = randu<Col<T> >(samples)*4.0 +58.0; // Model between 58-62
for(size_t i=0; i<iterations; i++) {
d1 = (log(s/x) + (r+v*v/2.0)*t) / (v*sqrt(t));
d2 = d1-v*sqrt(t);
if (flag == 'c') {
res = s % cnd<T>(d1) -x * exp(-r*t) * cnd<T>(d2);
} else {
res = x * exp(-r*t) * cnd<T>(-1.0*d2) - s*cnd<T>(-1.0*d1);
}
t += d_t; // Increment delta
p[i] = sum(res) / (T)samples; // Result from timestep
}
return p;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 1;
}
const size_t samples = bp.args.sizes[0];
const size_t iterations = bp.args.sizes[1];
bp.timer_start();
double* prices = pricing(
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(cpp11_armadillo)");
if (bp.args.verbose) {
cout << ", \"output\": [";
for(size_t i=0; i<iterations; i++) {
cout << prices[i];
if (iterations-1!=i) {
cout << ", ";
}
}
cout << "]" << endl;
}
free(prices);
return 0;
}
Cpp11 Blitz¶#include <blitz/array.h>
#include <random/uniform.h>
#include <bp_util.h>
using namespace blitz;
using namespace ranlib;
template <typename T>
Array<T, 1> cnd(Array<T, 1> & x)
{
int samples = x.numElements();
Array<T, 1> l(samples), k(samples), w(samples), res(samples);
Array<bool, 1> mask(samples);
T a1 = 0.31938153,
a2 =-0.356563782,
a3 = 1.781477937,
a4 =-1.821255978,
a5 = 1.330274429,
pp = 2.5066282746310002; // sqrt(2.0*PI)
l = abs(x);
k = 1.0 / (1.0 + 0.2316419 * l);
w = 1.0 - 1.0 / pp * exp(-1.0*l*l/2.0) * \
(a1*k + \
a2*(pow(k,(T)2)) + \
a3*(pow(k,(T)3)) + \
a4*(pow(k,(T)4)) + \
a5*(pow(k,(T)5)));
mask = x < 0.0;
res = (w * cast<T>(!mask) + (1.0-w)* cast<T>(mask));
return res;
}
template <typename T>
T* pricing(size_t samples, size_t iterations, char flag, T x, T d_t, T r, T v)
{
Array<T, 1> d1(samples), d2(samples), res(samples);
T* p = (T*)malloc(sizeof(T)*samples); // Intermediate results
T t = d_t; // Initial delta
Array<T, 1> s(samples); // Initial uniform sampling
Uniform<T> rand; // values between 58-62
rand.seed((unsigned int)time(0));
s = rand.random() *4.0 +58.0;
for(size_t i=0; i<iterations; i++) {
d1 = (log(s/x) + (r+v*v/2.0)*t) / (v*sqrt(t));
d2 = d1-v*sqrt(t);
if (flag == 'c') {
res = s * cnd(d1) - x * exp(-r * t) * cnd(d2);
} else {
Array<T, 1> tmp1(samples), tmp2(samples);
tmp1 = -1.0*d2;
tmp2 = -1.0*d1;
res = x * exp(-r*t) * cnd(tmp1) - s*cnd(tmp2);
}
t += d_t; // Increment delta
p[i] = sum(res) / (T)samples; // Result from timestep
}
return p;
}
//FLOP count: 2*s+i*(s*8+2*s*23) where s is samples and i is iterations
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 1;
}
const size_t samples = bp.args.sizes[0];
const size_t iterations = bp.args.sizes[1];
bp.timer_start();
double* prices = pricing(
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(cpp11_blitz)");
if (bp.args.verbose) {
cout << ", \"output\": [";
for(size_t i=0; i<iterations; i++) {
cout << prices[i];
if (iterations-1!=i) {
cout << ", ";
}
}
cout << "]" << endl;
}
free(prices);
return 0;
}
Cpp11 Eigen¶Error There are issues with the implementation. Compilation errors. /*
This file is part of Bohrium and Copyright (c) 2012 the Bohrium team:
http://bohrium.bitbucket.org
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#include <iostream>
#include <Eigen/Dense>
#include <bp_util.h>
using namespace std;
using namespace Eigen;
template <typename T>
Array<T, Dynamic, 1>& cnd(const Array<T, Dynamic, 1>& x)
{
size_t samples = x.size();
Array<T, Dynamic, 1> l(samples), k(samples), w(samples);
Array<T, Dynamic, 1> mask(samples);
Array<bool, Dynamic, 1> mask_bool(samples);
T a1 = 0.31938153,
a2 =-0.356563782,
a3 = 1.781477937,
a4 =-1.821255978,
a5 = 1.330274429,
pp = 2.5066282746310002; // sqrt(2.0*PI)
l = abs(x);
//k = (T)1.0 / ((T)1.0 + (T)0.2316419 * l);
//k = (T)0.5 / l / (T)0.5;
k = l / (T)0.5;
w = (T)1.0 - (T)1.0 / (T)pp * exp((T)-(T)1.0*l*l/(T)2.0) * \
((T)a1*k + \
(T)a2*(pow(k,(T)2)) + \
(T)a3*(pow(k,(T)3)) + \
(T)a4*(pow(k,(T)4)) + \
(T)a5*(pow(k,(T)5)));
mask_bool= (x < (T)0.0);
mask = mask_bool.cast<T>();
return w * (mask) + ((T)1.0-w);
//return w * (mask) + ((T)1.0-w)* mask;
//return w * (!mask) + (1.0-w)* mask;
}
template <typename T>
T* pricing(size_t samples, size_t iterations, char flag, T x, T d_t, T r, T v)
{
T* p = (T*)malloc(sizeof(T)*samples); // Intermediate results
T t = d_t; // Initial delta
Array<T, Dynamic, 1> d1(samples), d2(samples), res(samples);
Array<T, Dynamic, 1> s = Array<T, Dynamic, 1>::Random(samples)*4.0 +58.0; // Model between 58-62
for(size_t i=0; i<iterations; i++) {
d1 = (log(s/x) + (r+v*v/2.0)*t) / (v*sqrt(t));
d2 = d1-v*sqrt(t);
if (flag == 'c') {
size_t samples = x.size();
Array<T, Dynamic, 1> l(samples), k(samples), w(samples);
Array<T, Dynamic, 1> mask(samples);
Array<bool, Dynamic, 1> mask_bool(samples);
T a1 = 0.31938153,
a2 =-0.356563782,
a3 = 1.781477937,
a4 =-1.821255978,
a5 = 1.330274429,
pp = 2.5066282746310002; // sqrt(2.0*PI)
l = abs(x);
//k = (T)1.0 / ((T)1.0 + (T)0.2316419 * l);
//k = (T)0.5 / l / (T)0.5;
k = l / (T)0.5;
w = (T)1.0 - (T)1.0 / (T)pp * exp((T)-(T)1.0*l*l/(T)2.0) * \
((T)a1*k + \
(T)a2*(pow(k,(T)2)) + \
(T)a3*(pow(k,(T)3)) + \
(T)a4*(pow(k,(T)4)) + \
(T)a5*(pow(k,(T)5)));
mask_bool= (x < (T)0.0);
mask = mask_bool.cast<T>();
res = w * (!mask) + (1.0-w)* mask;
//cnd<T>(d1);
//res = s * cnd<T>(d1) -x * exp(-r*t) * cnd<T>(d2);
} else {
/*
Array<T, Dynamic, 1> tmp1(samples), tmp2(samples);
tmp1 = -1.0*d2;
tmp2 = -1.0*d1;
res = x * exp(-r*t) * cnd<T>(tmp1) - s*cnd<T>(tmp2);
*/
}
t += d_t; // Increment delta
p[i] = res.sum() / (T)samples; // Result from timestep
}
return p;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2); // Grab arguments
if (bp.args.has_error) {
return 1;
}
const size_t samples = bp.args.sizes[0];
const size_t iterations = bp.args.sizes[1];
bp.timer_start(); // Start timer
double* prices = pricing( // Run...
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop(); // Stop timer
bp.print("black_scholes(cpp11_bxx)"); // Print restults..
if (bp.args.verbose) { // ..verbosely.
cout << ", \"output\": [";
for(size_t i=0; i<iterations; i++) {
cout << prices[i];
if (iterations-1!=i) {
cout << ", ";
}
}
cout << "]" << endl;
}
free(prices); // Cleanup
return 0;
}
Cpp11 Omp¶#include <iostream>
#include <sstream>
#include <string>
#include <cmath>
#include <Random123/philox.h>
#include <bp_util.h>
using namespace std;
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double* model(int64_t samples)
{
const uint64_t key = 0;
double* data = (double*)malloc(sizeof(double)*samples);
#pragma omp parallel for
for(int64_t count = 0; count<samples; ++count) {
philox2x32_as_1x64_t counter;
counter.combined = count;
philox2x32_as_1x64_t x_philox;
x_philox.orig = philox2x32(
counter.orig,
(philox2x32_key_t){ { key } }
);
double x = x_philox.combined;
x /= 18446744073709551616.000000;
x *= 4.0;
x += 58.0; // Model between 58-62
data[count] = x;
}
return data;
}
// The cumulative normal distribution function
double cnd( double x )
{
double L, K, w;
double const a1 = 0.31938153,
a2 = -0.356563782,
a3 = 1.781477937,
a4 = -1.821255978,
a5 = 1.330274429;
L = fabs(x);
K = 1.0 / (1.0 + 0.2316419 * L);
w = 1.0 - 1.0 / sqrt(2 * M_PI) * exp(-L *L / 2) * (\
a1 * K + \
a2 * pow(K, 2) + \
a3 * pow(K, 3) + \
a4 * pow(K, 4) + \
a5 * pow(K, 5)
);
return (x<0) ? 1.0 - w : w;
}
// The Black and Scholes (1973) Stock option formula
void pricing(double* market, double *prices,
size_t samples, size_t iterations,
char flag, double x, double d_t, double r, double v)
{
double t = d_t;
for(size_t iter=0; iter<iterations; ++iter) {
double res = 0;
#pragma omp parallel for reduction(+:res)
for(size_t sample=0; sample<samples; ++sample) {
double d1 = (log(market[sample]/x) + (r+v*v/2)*t) / (v*sqrt(t));
double d2 = d1-v*sqrt(t);
if (flag == 'c') {
res += market[sample] *cnd(d1)-x * exp(-r*t)*cnd(d2);
} else {
res += x * exp(-r*t) * cnd(-d2) - market[sample] * cnd(-d1);
}
}
t += d_t;
prices[iter] = res / samples;
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 0;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
double* market = model(samples); // Generate pseudo-market data
double* prices = (double*)malloc(sizeof(double)*iterations); // Prices
bp.timer_start();
pricing(
market, prices,
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(cpp11_omp)");
if (bp.args.verbose) { // and values.
printf("output: [ ");
for(int i=0; i<iterations; ++i) {
printf("%f ", prices[i]);
}
printf("]\n");
}
free(market);
free(prices);
return 0;
}
Cpp11 Seq¶#include <iostream>
#include <sstream>
#include <string>
#include <cmath>
#include <Random123/philox.h>
#include <bp_util.h>
using namespace std;
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double* model(int64_t samples)
{
const uint64_t key = 0;
double* data = (double*)malloc(sizeof(double)*samples);
for(int64_t count = 0; count<samples; ++count) {
philox2x32_as_1x64_t counter;
counter.combined = count;
philox2x32_as_1x64_t x_philox;
x_philox.orig = philox2x32(
counter.orig,
(philox2x32_key_t){ { key } }
);
double x = x_philox.combined;
x /= 18446744073709551616.000000;
x *= 4.0;
x += 58.0; // Model between 58-62
data[count] = x;
}
return data;
}
// The cumulative normal distribution function
double cnd( double x )
{
double L, K, w;
double const a1 = 0.31938153,
a2 = -0.356563782,
a3 = 1.781477937,
a4 = -1.821255978,
a5 = 1.330274429;
L = fabs(x);
K = 1.0 / (1.0 + 0.2316419 * L);
w = 1.0 - 1.0 / sqrt(2 * M_PI) * exp(-L *L / 2) * (\
a1 * K + \
a2 * pow(K, 2) + \
a3 * pow(K, 3) + \
a4 * pow(K, 4) + \
a5 * pow(K, 5)
);
return (x<0) ? 1.0 - w : w;
}
// The Black and Scholes (1973) Stock option formula
void pricing(double* market, double *prices,
size_t samples, size_t iterations,
char flag, double x, double d_t, double r, double v)
{
double t = d_t;
for(size_t iter=0; iter<iterations; ++iter) {
double res = 0;
for(size_t sample=0; sample<samples; ++sample) {
double d1 = (log(market[sample]/x) + (r+v*v/2)*t) / (v*sqrt(t));
double d2 = d1-v*sqrt(t);
if (flag == 'c') {
res += market[sample] *cnd(d1)-x * exp(-r*t)*cnd(d2);
} else {
res += x * exp(-r*t) * cnd(-d2) - market[sample] * cnd(-d1);
}
}
t += d_t;
prices[iter] = res / samples;
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);
if (bp.args.has_error) {
return 0;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
double* market = model(samples); // Generate pseudo-market data
double* prices = (double*)malloc(sizeof(double)*iterations); // Prices
bp.timer_start();
pricing(
market, prices,
samples, iterations,
'c', 65.0, 1.0 / 365.0,
0.08, 0.3
);
bp.timer_stop();
bp.print("black_scholes(cpp11_seq)");
if (bp.args.verbose) { // and values.
printf("output: [ ");
for(int i=0; i<iterations; ++i) {
printf("%f ", prices[i]);
}
printf("]\n");
}
free(market);
free(prices);
return 0;
}
Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
namespace blackscholes
{
using NumCIL.Double;
using DATA = System.Double;
public static class BlackScholesSolverDouble
{
private static NdArray CND(NdArray X)
{
DATA a1 = 0.31938153f, a2 = -0.356563782f, a3 = 1.781477937f, a4 = -1.821255978f, a5 = 1.330274429f;
var L = X.Abs();
var K = 1.0f / (1.0f + 0.2316419f * L);
var w = 1.0f - 1.0f / ((DATA)Math.Sqrt(2 * Math.PI)) * (-L * L / 2.0f).Exp() * (a1 * K + a2 * (K.Pow(2)) + a3 * (K.Pow(3)) + a4 * (K.Pow(4)) + a5 * (K.Pow(5)));
var mask1 = (NdArray)(X < 0);
var mask2 = (NdArray)(X >= 0);
w = w * mask2 + (1.0f - w) * mask1;
return w;
}
private static NdArray BlackScholes(bool callputflag, NdArray S, DATA X, DATA T, DATA r, DATA v)
{
var d1 = ((S / X).Log() + (r + v * v / 2.0f) * T) / (v * (DATA)Math.Sqrt(T));
var d2 = d1 - v * (DATA)Math.Sqrt(T);
if (callputflag)
return S * CND(d1) - X * (DATA)Math.Exp(-r * T) * CND(d2);
else
return X * (DATA)Math.Exp(-r * T) * CND(-d2) - S * CND(-d1);
}
public static NdArray Create(long size)
{
var S = Generate.Random(size);
S = S * 4.0f - 2.0f + 60.0f; //Price is 58-62
return S;
}
public static DATA Sync(NdArray data)
{
return data.Value[0];
}
public static DATA Solve(NdArray data, long years)
{
DATA X = 65.0f;
DATA r = 0.08f;
DATA v = 0.3f;
DATA day=1.0f/years;
DATA T = day;
DATA total = 0.0f;
for (long i = 0; i < years; i++)
{
total += Add.Reduce(BlackScholes(true, data, X, T, r, v)).Value[0] / data.Shape.Length;
T += day;
}
return total;
}
}
}
namespace blackscholes
{
using NumCIL.Float;
using DATA = System.Single;
public static class BlackScholesSolverSingle
{
private static NdArray CND(NdArray X)
{
DATA a1 = 0.31938153f, a2 = -0.356563782f, a3 = 1.781477937f, a4 = -1.821255978f, a5 = 1.330274429f;
var L = X.Abs();
var K = 1.0f / (1.0f + 0.2316419f * L);
var w = 1.0f - 1.0f / ((DATA)Math.Sqrt(2 * Math.PI)) * (-L * L / 2.0f).Exp() * (a1 * K + a2 * (K.Pow(2)) + a3 * (K.Pow(3)) + a4 * (K.Pow(4)) + a5 * (K.Pow(5)));
var mask1 = (NdArray)(X < 0);
var mask2 = (NdArray)(X >= 0);
w = w * mask2 + (1.0f - w) * mask1;
return w;
}
private static NdArray BlackScholes(bool callputflag, NdArray S, DATA X, DATA T, DATA r, DATA v)
{
var d1 = ((S / X).Log() + (r + v * v / 2.0f) * T) / (v * (DATA)Math.Sqrt(T));
var d2 = d1 - v * (DATA)Math.Sqrt(T);
if (callputflag)
return S * CND(d1) - X * (DATA)Math.Exp(-r * T) * CND(d2);
else
return X * (DATA)Math.Exp(-r * T) * CND(-d2) - S * CND(-d1);
}
public static NdArray Create(long size)
{
var S = Generate.Random(size);
S = S * 4.0f - 2.0f + 60.0f; //Price is 58-62
return S;
}
public static DATA Sync(NdArray data)
{
return data.Value[0];
}
public static DATA Solve(NdArray data, long years)
{
DATA X = 65.0f;
DATA r = 0.08f;
DATA v = 0.3f;
DATA day=1.0f/years;
DATA T = day;
DATA total = 0.0f;
for (long i = 0; i < years; i++)
{
total += Add.Reduce(BlackScholes(true, data, X, T, r, v)).Value[0] / data.Shape.Length;
T += day;
}
return total;
}
}
}
Convolve¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
# if Bohrium is installed we use its convolve else we use the convolve from SciPy
try:
raise ImportError
import bohrium as bh
convolveNd = bh.convolve_scipy
except ImportError:
import scipy
from scipy import signal
convolveNd = signal.convolve
bench = util.Benchmark("Convolution Filter (any dimensional)", "<image-size>*<filter-size>*<ndims>*<niters>")
def main():
"""
Convolve filter (any dimensional)
Parameter: `<image-size>*<filter-size>*<ndims>*<niters>`
where image and filter size is the size of each dimension (not their total size).
"""
(image_size, filter_size, ndims, I) = bench.args.size
image = bench.random_array((image_size ** ndims,)).reshape([image_size] * ndims)
image_filter = bench.random_array((filter_size ** ndims,)).reshape([filter_size] * ndims)
bench.start()
for _ in range(I):
R = convolveNd(image, image_filter)
bench.flush()
bench.stop()
bench.save_data({'res': R})
bench.pprint()
if __name__ == "__main__":
main()
Convolve1D¶Python Numpy¶"""
Convolve filter (1. dimensional)
Parameter: `--size<image-size>*<filter-size>*<niters>`.
"""
from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Convolution Filter 1D", "<image-size>*<filter-size>*<niters>")
def main():
(image_size, filter_size, I) = bench.args.size
image = bench.random_array((image_size,))
image_filter = bench.random_array((filter_size,))
bench.start()
for _ in range(I):
R = np.convolve(image, image_filter)
bench.flush()
bench.stop()
bench.save_data({'res': R})
bench.pprint()
if __name__ == "__main__":
main()
Galton Bean Machine¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Galton Bean Machine", "<num_of_beans>*<height>")
def bean(num_beans, height):
return np.sum(np.sign(bench.random_array((num_beans, height)) - 0.5), axis=1)
def main():
num_beans, height = bench.args.size
bench.start()
R = bean(num_beans, height)
bench.stop()
bench.save_data({'res': R})
bench.pprint()
if bench.args.visualize:
from matplotlib import pyplot
bins = 100
pyplot.hist(R, bins)
pyplot.title("Galton Normal distribution")
pyplot.xlabel("Value")
pyplot.ylabel("Frequency")
pyplot.show()
if __name__ == "__main__":
main()
Gameoflife¶Python Numpy¶from __future__ import print_function
"""
Game of Life
------------
"""
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Game of Life", "height*width*iterations*rules")
SURVIVE_LOW = 2
SURVIVE_HIGH = 3
SPAWN = 3
def world(height, width):
state = np.ones((height + 2, width + 2), dtype=bench.dtype)
state[2:-2, 2:-2] = bench.dtype(0)
return state
def world_zeros(height, width):
state = np.zeros((height + 2, width + 2), dtype=bench.dtype)
return state
def play(state, iterations, version=1, visualize=False):
cells = state[1:-1, 1:-1]
ul = state[0:-2, 0:-2]
um = state[0:-2, 1:-1]
ur = state[0:-2, 2:]
ml = state[1:-1, 0:-2]
mr = state[1:-1, 2:]
ll = state[2:, 0:-2]
lm = state[2:, 1:-1]
lr = state[2:, 2:]
def update():
"""
This is the first implementation of the game rules.
"""
neighbors = ul + um + ur + ml + mr + ll + lm + lr # count neighbors
live = neighbors * cells # extract live cells neighbors
stay = (live >= SURVIVE_LOW) & (live <= SURVIVE_HIGH) # find cells the stay alive
dead = neighbors * (cells == 0) # extract dead cell neighbors
spawn = dead == SPAWN # find cells that spaw new life
cells[:] = stay | spawn # save result for next iteration
def update_optimized():
"""
This is an optimized implementation of the game rules.
"""
neighbors = ul + um + ur + ml + mr + ll + lm + lr # Count neighbors
c1 = (neighbors == SURVIVE_LOW) # Life conditions
c2 = (neighbors == SPAWN)
cells[:] = cells * c1 + c2 # Update
if version == 1: # Select the update function
update_func = update
elif version == 2:
update_func = update_optimized
for i in range(iterations): # Run the game
if visualize:
bench.plot_surface(state, "3d", 16, 1, 0)
update_func()
bench.flush()
return state
def main():
(H, W, I, V) = bench.args.size
if V not in [1, 2]:
raise Exception("Unsupported rule-implementation.")
S = world(H, W)
bench.start()
R = play(S, I, V, bench.args.visualize)
bench.stop()
bench.save_data({'res': R})
bench.pprint()
if __name__ == "__main__":
main()
C99 Seq¶Error There are issues with the implementation. In progress… #include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <bp_util.h>
void world(double* state, int height, int width)
{
for (int h=0; h<height; ++h) {
for (int w=0; w<width; ++w) {
state[h*width + w] = (((h < 2) && (h > (height-3))) ||
((w < 2) && (w > (width-3))));
}
}
}
void play(double* state, int height, int width, int iterations, int version)
{
printf("%p %d %d %d %d\n", state, height, width, iterations, version);
}
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 4); // Grab arguments
if (bp.args.has_error) {
return 1;
}
const int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int iter = bp.args.sizes[2];
const int version = bp.args.sizes[3];
size_t grid_size = height*width*sizeof(double);
double *state = (double*)malloc(grid_size);
world(state, height, width);
bp.timer_start();
play(state, height, width, iter, version);
bp.timer_stop();
bp.print("gameoflife(c99_seq)");
free(state);
return 0;
}
Cpp11 Omp¶Error There are issues with the implementation. In progress… #include <iostream>
#include <iomanip>
#include <bp_util.h>
#define HEIGHT 1000
#define WIDTH 1000
static double SURVIVE_LOW = 2.0;
static double SURVIVE_HIGH = 3.0;
static double SPAWN = 3.0;
using namespace std;
template <typename T>
void world(T state)
{
#pragma omp parallel for collapse(2)
for(int h=0; h<HEIGHT+2; ++h) {
for(int w=0; w<WIDTH+2; ++w) {
state[h][w] = (((h<2) and (h>WIDTH-3)) &&
((w<2) and (w>HEIGHT-3)));
}
}
}
template <typename T>
void play(T state, int iterations, int version, int visualize)
{
cout << state << SURVIVE_LOW << SURVIVE_HIGH << SPAWN << endl;
cout << state << iterations << version << visualize << endl;
T* cells, ul, um, ur, ml, mr, ll, lm, lr;
}
template <typename T>
void bench(bp_util_type& bp, const int I, const int V)
{
auto state = new T[HEIGHT][WIDTH]; // Construct matrices
world(state);
bp.timer_start(); // Start timer
play(state, I, V, bp.args.visualize);
bp.timer_stop(); // Stop timer
bp.print("gameoflife(cpp11_omp)"); // Print results..
if (bp.args.verbose) { // ..and value.
cout << fixed << setprecision(10)
<< "Result = " << endl;
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 4);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int W = bp.args.sizes[0];
const int H = bp.args.sizes[1];
const int I = bp.args.sizes[2];
const int V = bp.args.sizes[3];
if (HEIGHT != H) {
cout << "Multidimensional arrays in C11 does not support dynamic size, so it has to be: " << H << "." << endl;
return 0;
}
if (WIDTH != W) {
cout << "Multidimensional arrays in C11 does not support dynamic size, so it has to be: " << W << "." << endl;
return 0;
}
bench<double>(bp, I, V);
return 0;
}
Gauss¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Gaussian elimination on matrix a without pivoting", "<size>")
def gauss(a):
"""
Perform Gaussian elimination on matrix a without pivoting
"""
for c in range(1, a.shape[0]):
a[c:, c - 1:] = a[c:, c - 1:] - (a[c:, c - 1] / a[c - 1, c - 1:c])[:, None] * a[c - 1, c - 1:]
bench.flush()
a /= np.diagonal(a)[:, None]
return a
def main():
n = bench.args.size[0]
matrix = bench.load_data()
if matrix is not None:
matrix = matrix['matrix']
else:
matrix = bench.random_array((n, n))
bench.start()
res = gauss(matrix)
bench.stop()
bench.save_data({'matrix': res})
bench.pprint()
if __name__ == "__main__":
main()
Heat Equation¶Python Numpy¶from __future__ import print_function
import numpy as np
from benchpress.benchmarks import util
bench = util.Benchmark("Solving the heat equation using the jacobi method", "height*width*iterations")
def init_grid(height, width, dtype=np.float32):
grid = np.zeros((height + 2, width + 2), dtype=dtype)
grid[:, 0] = dtype(-273.15)
grid[:, -1] = dtype(-273.15)
grid[-1, :] = dtype(-273.15)
grid[0, :] = dtype(40.0)
return grid
def jacobi(grid, epsilon=0.005, max_iterations=None):
def loop_body(grid):
center = grid[1:-1, 1:-1]
north = grid[0:-2, 1:-1]
east = grid[1:-1, 2:]
west = grid[1:-1, 0:-2]
south = grid[2:, 1:-1]
work = 0.2 * (center + north + east + west + south)
delta = np.sum(np.absolute(work - center))
center[:] = work
bench.plot_surface(grid, "2d", 0, 200, -200)
return delta > epsilon
bench.do_while(loop_body, max_iterations, grid)
return grid
def main():
H = bench.args.size[0]
W = bench.args.size[1]
I = bench.args.size[2]
grid = bench.load_data()
if grid is not None:
grid = grid['grid']
else:
grid = init_grid(H, W, dtype=bench.dtype)
bench.start()
grid = jacobi(grid, max_iterations=I)
bench.stop()
bench.save_data({'grid': grid})
bench.pprint()
if __name__ == "__main__":
main()
C99 Omp¶#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <bp_util.h>
/*
Without collapsing
void solve(int height, int width, double *grid, double epsilon, int max_iterations)
{
double *T = (double*)malloc(height*width*sizeof(double));
double delta = epsilon+1.0;
int iterations = 0;
while(delta>epsilon) {
++iterations;
#pragma omp parallel for shared(grid, T) reduction(+:delta)
for(int i=0; i<height-2; ++i)
{
int a = i * width;
const double *up = &grid[a+1];
const double *left = &grid[a+width];
const double *right = &grid[a+width+2];
const double *down = &grid[a+1+width*2];
const double *center = &grid[a+width+1];
double *t_center = &T[a+width+1];
double delta_local = 0;
for(int j=0; j<width-2; ++j)
{
*t_center = (*center + *up + *left + *right + *down) * 0.2;
delta_local += fabs(*t_center - *center);
center++;up++;left++;right++;down++;t_center++;
}
delta += delta_local;
}
#pragma omp parallel for shared(grid, T)
for(int i=0; i<height-2; ++i)
{
int a = i * width;
const double *center = &grid[a+width+1];
double *t_center = &T[a+width+1];
for(int j=0; j<width-2; ++j)
{
*t_center = *center;
++t_center;
++center;
}
}
if (iterations>=max_iterations) {
break;
}
}
free(T);
}*/
void solve(int height, int width, double *grid, double epsilon, int max_iterations)
{
double *T = (double*)malloc(height*width*sizeof(double));
double delta = epsilon+1.0;
int iterations = 0;
while (delta>epsilon) {
++iterations;
#pragma omp parallel for reduction(+:delta) collapse(2)
for (int i=1; i<height-1; ++i) {
for (int j=1; j<width-1; ++j) {
const int a = i * width + j;
const double *up = &grid[a-width];
const double *left = &grid[a-1];
const double *right = &grid[a+1];
const double *down = &grid[a+width];
const double *center = &grid[a];
double *t_center = &T[a];
*t_center = (*up + *down + *center + *left + *right) * 0.2;
delta += fabs(*t_center - *center);
}
}
#pragma omp parallel for collapse(2)
for (int i=1; i<height-1; ++i) {
for (int j=1; j<width-1; ++j) {
const int a = i * width + j;
const double *center = &grid[a];
double *t_center = &T[a];
*t_center = *center;
}
}
if (iterations>=max_iterations) {
break;
}
}
free(T);
}
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
const int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int iter = bp.args.sizes[2];
double epsilon = 0.005;
size_t grid_size = height*width*sizeof(double);
double *grid = (double*)malloc(grid_size);
//
// NumaEffects - begin
//
// memset(grid, 0, grid_size); // <--- bad idea.
// memset is sequentiel and will touch the entire
// grid on one numa-node.
//
// Instead of memset, parallel initialization
// is performed with the following loop construct:
//
double* grid_i = grid;
#pragma omp parallel for collapse(2)
for (int i=0; i<height; ++i) {
for (int j=0; j<width; ++j) {
*grid_i = 0;
++grid_i;
}
}
//
// NumaEffects - end
for (int j=0; j<height; j++) {
grid[j*width] = -273.15;
grid[j*width+width-1] = -273.15;
}
for (int j=0; j<width; j++) {
grid[j+(height-1)*width] = -273.15;
grid[j] = 40.0;
}
bp.timer_start();
solve(height, width, grid, epsilon, iter);
bp.timer_stop();
bp.print("heat_equation(c99_omp)");
free(grid);
return 0;
}
C99 Omp Mpi¶#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <bp_util.h>
inline double innerloop(const double* grid, double* T, int width, int i)
{
int a = i * width;
const double *up = &grid[a+1];
const double *left = &grid[a+width];
const double *right = &grid[a+width+2];
const double *down = &grid[a+1+width*2];
const double *center = &grid[a+width+1];
double *t_center = &T[a+width+1];
double delta = 0;
for(int j=0; j<width-2; ++j)
{
*t_center = (*center + *up + *left + *right + *down) * 0.2;
delta += fabs(*t_center - *center);
center++;up++;left++;right++;down++;t_center++;
}
return delta;
}
void openmp(int height, int width, double *grid, int iter)
{
double *T = (double*)malloc(height*width*sizeof(double));
memcpy(T, grid, height*width*sizeof(double));
for(int n=0; n<iter; ++n)
{
double delta=0;
#pragma omp parallel for shared(grid,T) reduction(+:delta)
for(int i=0; i<height-2; ++i)
{
delta += innerloop(grid, T, width, i);
}
memcpy(grid, T, height*width*sizeof(double));
}
free(T);
}
void openmp_mpi(int height, int width, double *grid, int iter)
{
int myrank, worldsize;
MPI_Comm comm;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
int periods[] = {0};
MPI_Cart_create(MPI_COMM_WORLD, 1, &worldsize,
periods, 1, &comm);
double *T = (double*)malloc(height*width*sizeof(double));
memcpy(T, grid, height*width*sizeof(double));
for(int n=0; n<iter; ++n)
{
int p_src, p_dest;
//Send/receive - neighbor above
MPI_Cart_shift(comm,0,1,&p_src,&p_dest);
MPI_Sendrecv(grid+width,width,MPI_DOUBLE,
p_dest,1,
grid,width, MPI_DOUBLE,
p_src,1,comm,MPI_STATUS_IGNORE);
//Send/receive - neighbor below
MPI_Cart_shift(comm,0,-1,&p_src,&p_dest);
MPI_Sendrecv(grid+(height-2)*width, width,MPI_DOUBLE,
p_dest,1,
grid+(height-1)*width,
width,MPI_DOUBLE,
p_src,1,comm,MPI_STATUS_IGNORE);
double delta=0, global_delta;
#pragma omp parallel for shared(grid,T) reduction(+:delta)
for(int i=0; i<height-2; ++i)
{
delta += innerloop(grid, T, width, i);
}
memcpy(grid, T, height*width*sizeof(double));
MPI_Allreduce(&global_delta, &delta, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
}
free(T);
}
int main (int argc, char **argv)
{
MPI_Init(&argc,&argv);
int myrank, worldsize;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int iter = bp.args.sizes[2];
//Local vertical size. NB: the horizontal size is always the full grid including borders
height = (width-2) / worldsize;
if(myrank == worldsize-1)
height += (width-2) % worldsize;
height += 2;//Add a ghost line above and below
double *grid = malloc(height*width*sizeof(double));
for(int j=0; j<width;j++)
{
if(myrank == 0)
grid[j] = 40.0;
if(myrank == worldsize-1)
grid[j+(height-1)*width] = -273.15;
}
for(int j=1; j<height-1;j++)
{
grid[j*width] = -273.15;
grid[j*width+width-1]= -273.15;
}
MPI_Barrier(MPI_COMM_WORLD);
bp.timer_start();
openmp(height,width,grid,iter);
MPI_Barrier(MPI_COMM_WORLD);
bp.timer_stop();
if (myrank == 0) {
bp.print("heat_equation(c99_omp_mpi)");
}
free(grid);
MPI_Finalize();
return 0;
}
C99 Seq¶#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <bp_util.h>
void sequential(int height, int width, double *grid, int iter)
{
double *T = (double*)malloc(height*width*sizeof(double));
memcpy(T, grid, height*width*sizeof(double));
for(int n=0; n<iter; ++n)
{
double *a = grid;
double *t = T;
double delta=0;
for(int i=1; i<height-1; ++i)
{
double *up = a+1;
double *left = a+width;
double *right = a+width+2;
double *down = a+1+width*2;
double *center = a+width+1;
double *t_center = t+width+1;
for(int j=0; j<width-2; ++j)
{
*t_center = (*center + *up + *left + *right + *down) * 0.2;
delta += fabs(*t_center - *center);
center++;up++;left++;right++;down++;t_center++;
}
a += width;
t += width;
}
#ifdef DEBUG
printf("Delta: %lf\n",delta);
#endif
memcpy(grid, T, height*width*sizeof(double));
}
free(T);
}
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int iter = bp.args.sizes[2];
size_t grid_size = height*width*sizeof(double);
double *grid = (double*)malloc(grid_size);
memset(grid, 0, grid_size);
for(int j=0; j<width;j++)
{
grid[j] = 40.0;
grid[j+(height-1)*width] = -273.15;
}
for(int j=1; j<height-1;j++)
{
grid[j*width] = -273.15;
grid[j*width+width-1]= -273.15;
}
#ifdef DEBUG
for (int i = 0; i<height; i++)
{
for(int j=0; j<width;j++)
{
printf ("%lf ", grid[j+i*width]);
}
printf ("\n");
}
#endif
bp.timer_start();
sequential(height,width,grid,iter);
bp.timer_stop();
#ifdef DEBUG
for (int i = 0; i<height; i++)
{
for(int j=0; j<width;j++)
{
printf ("%lf ", grid[j+i*width]);
}
printf ("\n");
}
#endif
bp.print("heat_equation(c99_seq)");
free(grid);
return 0;
}
Cpp11 Omp¶#include <iostream>
#include <sstream>
#include <string>
#include <cmath>
#include <bp_util.h>
#define xdim 14000
#define ydim 14000
using namespace std;
template <typename T>
void solve(T grid, double epsilon, int max_iterations)
{
T temp = new double[ydim][xdim];
double delta = epsilon+1.0;
int iterations = 0; // Compute the heat equation
while(delta>epsilon) {
++iterations;
#pragma omp parallel for reduction(+:delta) collapse(2)
for (int i=1; i<ydim-1; i++) {
for(int j=1;j<xdim-1;j++) {
temp[i][j] = (grid[i-1][j] + grid[i+1][j] + grid[i][j] + grid[i][j-1] + grid[i][j+1])*0.2;
delta += abs(temp[i][j] - grid[i][j]);
}
}
#pragma omp parallel for collapse(2)
for (int i=1;i<ydim-1; i++) {
for(int j=1;j<xdim-1;j++) {
grid[i][j] = temp[i][j];
}
}
if (iterations>=max_iterations) {
break;
}
}
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 3);// Grab arguments
if (bp.args.has_error) {
return 1;
}
bp.timer_start();
const int width = bp.args.sizes[0];
const int height = bp.args.sizes[1];
if (ydim != height) {
cout << "Multidimensional arrays in C11 does not support dynamic size, so it has to be: " << ydim << "." << endl;
return 0;
}
if (xdim != width) {
cout << "Multidimensional arrays in C11 does not support dynamic size, so it has to be: " << xdim << "." << endl;
return 0;
}
const int max_iterations = bp.args.sizes[2];
double epsilon = 0.005;
auto grid = new double[ydim][xdim];
#pragma omp parallel for collapse(2)
for (int i=0; i<ydim; i++) { // Initialize the grid
for (int j=0;j<xdim;j++) {
grid[i][j] = 0;
}
}
for (int i=0; i<ydim; i++) { // And borders
grid[i][0] = -273.15;
grid[i][xdim-1] = -273.15;
}
for (int i=0; i<xdim; i++) {
grid[0][i] = -273.15;
grid[ydim-1][i] = 40.0;
}
bp.timer_start(); // Start timer
solve(grid, epsilon, max_iterations);
bp.timer_stop();
bp.print("heat_equation(cpp11_omp)");
if (bp.args.verbose) { // and values.
cout << ", \"output\": [";
for (int i=0; i<10; ++i) {
for (int j=0; j<10; ++j) {
cout << grid[i][j] << ", ";
}
cout << endl;
}
cout << "]";
}
return 0;
}
Cpp11 Opencl¶Error There are issues with the implementation. Two known issues: * Implementation compiles (with warning) but execution is untested.
* Implementation does not use ``bp-util`` for argparsing and timing, getting it to run in a suite might be cumbersome...
#include <iostream>
#include <sys/time.h>
#include <CL/cl.hpp>
#include <stdexcept>
#include <sys/time.h>
#include <cstdio>
#include <fstream>
#include <sstream>
#include <utility>
cl::Context context;
std::vector<cl::Device> devices;
cl::CommandQueue commandQueue;
cl::Kernel kernel;
#define TPB 32
#ifndef DTYPE
#define DTYPE double
#endif
#define STR(s) #s
#define xSTR(s) STR(s)
int main (int argc, char** argv)
{
if (argc != 5)
{
std::cout << "Usage: " << argv[0] << " HEAT-EQ-KERNEL-FILE WIDTH HEIGHT ITERATIONS" << std::endl;
return 0;
}
// Commandline arguments
char *opencl_file = argv[1];
unsigned int width = atoi(argv[2]);
unsigned int height = atoi(argv[3]);
unsigned int iter = atoi(argv[4]);
// Setup OpenCL execution system
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
bool foundPlatform = false;
std::vector<cl::Platform>::iterator it;
for (it = platforms.begin(); it != platforms.end(); ++it)
{
try {
cl_context_properties props[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(*it)(),0};
context = cl::Context(CL_DEVICE_TYPE_GPU, props);
foundPlatform = true;
break;
}
catch (cl::Error e)
{
foundPlatform = false;
}
}
if (foundPlatform)
{
devices = context.getInfo<CL_CONTEXT_DEVICES>();
commandQueue = cl::CommandQueue(context,devices[0],0);
} else {
throw std::runtime_error("Could not find valid OpenCL platform.");
}
// Read source file and compile code
std::ifstream file(opencl_file, std::ios::in);
if (!file.is_open())
{
throw std::runtime_error("Could not open source file.");
}
std::ostringstream srcstrs;
srcstrs << "#define DTYPE " << xSTR(DTYPE) << "\n";
srcstrs << file.rdbuf();
std::string srcstr = srcstrs.str();
#ifdef DEBUG
std::cout << srcstr << std::endl;
#endif
cl::Program::Sources source(1,std::make_pair(srcstr.c_str(),0));
cl::Program program(context, source);
try {program.build(devices);}
catch (cl::Error e)
{
std::cout << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices[0]) << std::endl;
}
kernel = cl::Kernel(program, "heat_eq_jacobi");
// Set up model
unsigned int w = width+2;
unsigned int h = height+2;
size_t grid_size = w*h*sizeof(DTYPE);
size_t delta_size = w*sizeof(DTYPE);
DTYPE* grid = (DTYPE*)malloc(grid_size);
memset(grid, 0, grid_size);
for(int j=0; j<w;j++)
{
grid[j] = 40.0;
grid[j+(h-1)*w] = -273.15;
}
for(int j=1; j<h-1;j++)
{
grid[j*w] = -273.15;
grid[j*w+w-1]= -273.15;
}
#ifdef DEBUG
for (int i = 0; i<h; i++)
{
for(int j=0; j<w;j++)
{
printf ("%lf ", grid[j+i*w]);
}
printf ("\n");
}
#endif
// Start timing
timeval t_start,t_end;
gettimeofday(&t_start,NULL);
// Set up buffers and copy data
cl::Buffer inDev = cl::Buffer(context, CL_MEM_READ_WRITE, grid_size, NULL);
cl::Buffer outDev = cl::Buffer(context, CL_MEM_READ_WRITE, grid_size, NULL);
cl::Buffer deltaDev = cl::Buffer(context, CL_MEM_READ_WRITE, delta_size, NULL);
cl::Buffer deltaHost = cl::Buffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, delta_size, NULL);
DTYPE* deltaHostPtr = (DTYPE*)commandQueue.enqueueMapBuffer(deltaHost, CL_TRUE, CL_MAP_WRITE, 0, delta_size);
commandQueue.enqueueWriteBuffer(inDev, CL_FALSE, 0, grid_size, grid);
commandQueue.enqueueCopyBuffer(inDev, outDev, 0, 0, grid_size);
// Iterating
for(int n=0; n<iter; ++n)
{
kernel.setArg(0,width);
kernel.setArg(1,height);
kernel.setArg(2,inDev);
kernel.setArg(3,outDev);
kernel.setArg(4,deltaDev);
commandQueue.enqueueNDRangeKernel(kernel, cl::NullRange,
cl::NDRange((width/TPB+1)*TPB), cl::NDRange(TPB));
commandQueue.enqueueReadBuffer(deltaDev, CL_FALSE, 0, delta_size, deltaHostPtr);
commandQueue.finish();
DTYPE delta = 0.0;
for (unsigned int i = 0; i < width; ++i)
delta += deltaHostPtr[i];
#ifdef DEBUG
std::cout << "Delta: " << delta << std::endl;
#endif
cl::Buffer tmp = inDev;
inDev = outDev;
outDev = tmp;
}
commandQueue.enqueueReadBuffer(inDev, CL_TRUE, 0, grid_size, grid);
gettimeofday(&t_end,NULL);
#ifdef DEBUG
for (int i = 0; i<h; i++)
{
for(int j=0; j<w;j++)
{
printf ("%lf ", grid[j+i*w]);
}
printf ("\n");
}
#endif
double d_time = (t_end.tv_sec - t_start.tv_sec) + (t_end.tv_usec - t_start.tv_usec)/1000000.0;
std::cout << argv[0] << " - iter: " << iter << " size: " << width << "*" << height << " elapsed-time: " <<
d_time << std::endl;
return 0;
}
Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
namespace HeatEquation
{
using NumCIL.Double;
using T = System.Double;
public static class HeatEquationSolverDouble
{
public static NdArray Create(long width, long height)
{
var res = Generate.Zeroes(height + 2, width + 2);
res[R.All, R.El(0)] = -273.5f;
res[R.All, R.El(-1)] = -273.5f;
res[R.El(0), R.All] = 40f;
res[R.El(-1), R.All] = -273.5f;
return res;
}
public static Tuple<int, NdArray> Solve(NdArray full, long? fixedIterations = null)
{
var center = full[R.Slice(1, -1), R.Slice(1, -1) ];
var north = full[R.Slice(1, -1), R.Slice(0, -2) ];
var east = full[R.Slice(0, -2), R.Slice(1, -1) ];
var west = full[R.Slice(2, 0), R.Slice(1, -1) ];
var south = full[R.Slice(1, -1), R.Slice(2, 0) ];
if (fixedIterations != null)
{
for(var i =0; i < fixedIterations.Value; i++)
center[R.All] = 0.2f * (center + north + east + west + south);
return new Tuple<int, NdArray>((int)fixedIterations.Value, full);
}
else
{
T epsilon = 0.005f;
T delta = epsilon + 1;
int iteration = 0;
while (epsilon < delta)
{
iteration++;
var work = 0.2f * (center + north + east + west + south);
delta = (work - center).Abs().Sum();
center[R.All] = work;
}
return new Tuple<int, NdArray>(iteration, full);
}
}
}
}
namespace HeatEquation
{
using NumCIL.Float;
using T = System.Single;
public static class HeatEquationSolverSingle
{
public static NdArray Create(long width, long height)
{
var res = Generate.Zeroes(height + 2, width + 2);
res[R.All, R.El(0)] = -273.5f;
res[R.All, R.El(-1)] = -273.5f;
res[R.El(0), R.All] = 40f;
res[R.El(-1), R.All] = -273.5f;
return res;
}
public static Tuple<int, NdArray> Solve(NdArray full, long? fixedIterations = null)
{
var center = full[R.Slice(1, -1), R.Slice(1, -1) ];
var north = full[R.Slice(1, -1), R.Slice(0, -2) ];
var east = full[R.Slice(0, -2), R.Slice(1, -1) ];
var west = full[R.Slice(2, 0), R.Slice(1, -1) ];
var south = full[R.Slice(1, -1), R.Slice(2, 0) ];
if (fixedIterations != null)
{
for(var i =0; i < fixedIterations.Value; i++)
center[R.All] = 0.2f * (center + north + east + west + south);
return new Tuple<int, NdArray>((int)fixedIterations.Value, full);
}
else
{
T epsilon = 0.005f;
T delta = epsilon + 1;
int iteration = 0;
while (epsilon < delta)
{
iteration++;
var work = 0.2f * (center + north + east + west + south);
delta = (work - center).Abs().Sum();
center[R.All] = work;
}
return new Tuple<int, NdArray>(iteration, full);
}
}
}
}
Lattice Boltzmann D2Q9¶Python Numpy¶from __future__ import print_function
"""
The Lattice Boltzmann Methods D2Q9
------------
Channel flow past a cylindrical obstacle, using a LB method
Copyright (C) 2006 Jonas Latt
Address: Rue General Dufour 24, 1211 Geneva 4, Switzerland
E-mail: Jonas.Latt@cui.unige.ch
"""
import numpy as np
from benchpress.benchmarks import util
bench = util.Benchmark("The Lattice Boltzmann Methods D2Q9", "height*width*iterations")
# D2Q9 Lattice constants
t = [4 / 9., 1 / 9., 1 / 9., 1 / 9., 1 / 9., 1 / 36., 1 / 36., 1 / 36., 1 / 36.]
cx = [0, 1, 0, -1, 0, 1, -1, -1, 1]
cy = [0, 0, 1, 0, -1, 1, 1, -1, -1]
opp = [0, 3, 4, 1, 2, 7, 8, 5, 6]
uMax = 0.02 # maximum velocity of Poiseuille inflow
Re = 100 # Reynolds number
def cylinder(height, width, obstacle=True):
assert (height > 2)
assert (width > 2)
lx = width
ly = height
state = {"lx": lx, "ly": ly}
obst_x = lx / 5. + 1 # position of the cylinder; (exact
obst_y = ly / 2. + 0 # y-symmetry is avoided)
obst_r = ly / 10. + 1 # radius of the cylinder
nu = uMax * 2. * obst_r / Re # kinematic viscosity
omega = 1. / (3 * nu + 1. / 2.) # relaxation parameter
col = np.arange(1, ly - 1)
# Place obstacle
if obstacle:
y, x = np.meshgrid(np.arange(ly), np.arange(lx))
obst = (x - obst_x) ** 2 + (y - obst_y) ** 2 <= obst_r ** 2
obst[:, 0] = obst[:, ly - 1] = 1
bbRegion = np.asarray(obst, dtype=np.bool)
else:
bbRegion = None
# We need some of the constants as 3D numpy arrays
t_3d = np.asarray(t)[:, np.newaxis, np.newaxis]
cx_3d = np.asarray(cx)[:, np.newaxis, np.newaxis]
cy_3d = np.asarray(cy)[:, np.newaxis, np.newaxis]
# Initial condition: (rho=0, u=0) ==> fIn[i] = t[i]
fIn = np.array(t_3d.repeat(lx, axis=1).repeat(ly, axis=2))
state['fIn'] = fIn
state['cx_3d'] = cx_3d
state['cy_3d'] = cy_3d
state['col'] = col
state['omega'] = float(omega)
state['bbRegion'] = bbRegion
# We save the visualize limits in order to make it constant
# We find the limit by calculating the first iteration if `ux`
ux = np.sum(cx_3d * fIn, axis=0) / np.sum(fIn, axis=0)
L = ly - 2.0
y = col - 0.5
ux[0, 1:ly - 1] = 4 * uMax / (L ** 2) * (y * L - y ** 2)
state['viz_max'] = ux.max()
state['viz_min'] = ux.min()
return state
def solve(state, timesteps):
# load the ready only state
ly = int(state['ly'])
lx = int(state['lx'])
col = state['col']
cx_3d = state['cx_3d']
cy_3d = state['cy_3d']
bbRegion = state['bbRegion']
omega = state['omega']
def loop_body(fIn):
# Macroscopic variables
rho = np.sum(fIn, axis=0)
ux = np.sum(cx_3d * fIn, axis=0) / rho
uy = np.sum(cy_3d * fIn, axis=0) / rho
# Macroscopic (Dirichlet) boundary conditions
# Inlet: Poisseuille profile
L = ly - 2.0
y = col - 0.5
ux[0, 1:ly - 1] = 4 * uMax / (L ** 2) * (y * L - y ** 2)
uy[0, 1:ly - 1] = 0
# Using a loop instead of "index of indexes"
# t1 = fIn[[0, 2, 4], 0, 1:ly-1].sum(axis = 0)
# t1 = np.zeros(ly-2, bohrium=fIn.bohrium)
t1 = np.zeros(ly - 2)
for i in [0, 2, 4]:
t1 += fIn[i, 0, 1:ly - 1]
# Using a loop instead of "index of indexes"
# t2 = 2 * fIn[[3, 6, 7], 0, 1:ly-1].sum(axis = 0)
# t2 = np.zeros(ly-2, bohrium=fIn.bohrium)
t2 = np.zeros(ly - 2)
for i in [3, 6, 7]:
t2 += fIn[i, 0, 1:ly - 1]
t2 *= 2
rho[0, 1:ly - 1] = 1 / (1 - ux[0, 1:ly - 1]) * (t1 + t2)
# Outlet: Zero gradient on rho/ux
rho[lx - 1, 1:ly - 1] = 4.0 / 3 * rho[lx - 2, 1:ly - 1] - \
1.0 / 3 * rho[lx - 3, 1:ly - 1]
uy[lx - 1, 1:ly - 1] = 0
ux[lx - 1, 1:ly - 1] = 4.0 / 3 * ux[lx - 2, 1:ly - 1] - \
1.0 / 3 * ux[lx - 3, 1:ly - 1]
# fEq = np.zeros((9, lx, ly), bohrium=fIn.bohrium)
fEq = np.zeros((9, lx, ly))
# fOut = np.zeros((9, lx, ly), bohrium=fIn.bohrium)
fOut = np.zeros((9, lx, ly))
for i in range(0, 9):
cu = 3 * (cx[i] * ux + cy[i] * uy)
fEq[i] = rho * t[i] * (1 + cu + 0.5 * cu ** 2 - \
1.5 * (ux ** 2 + uy ** 2))
fOut[i] = fIn[i] - omega * (fIn[i] - fEq[i])
# Microscopic boundary conditions
for i in range(0, 9):
# Left boundary:
fOut[i, 0, 1:ly - 1] = fEq[i, 0, 1:ly - 1] + 18 * t[i] * cx[i] * cy[i] * \
(fIn[7, 0, 1:ly - 1] - fIn[6, 0, 1:ly - 1] - fEq[7, 0, 1:ly - 1] + fEq[6, 0,
1:ly - 1])
# Right boundary:
fOut[i, lx - 1, 1:ly - 1] = fEq[i, lx - 1, 1:ly - 1] + 18 * t[i] * cx[i] * cy[i] * \
(fIn[5, lx - 1, 1:ly - 1] - fIn[8, lx - 1, 1:ly - 1] - \
fEq[5, lx - 1, 1:ly - 1] + fEq[8, lx - 1, 1:ly - 1])
# Bounce back region:
# fOut[i,bbRegion] = fIn[opp[i],bbRegion]
# Using a explict mask
if bbRegion is not None:
masked = fIn[opp[i]].copy() * bbRegion
fOut[i] = fOut[i] * ~bbRegion + masked
# Streaming step
for i in range(0, 9):
# fIn[i] = np.roll(np.roll(fOut[i], cx[i], axis = 0), cy[i], axis = 1)
# Replacing the np.roll() call with:
if cx[i] == 1:
# t1 = np.empty_like(fOut[i], bohrium=fIn.bohrium)
t1 = np.empty_like(fOut[i])
t1[1:] = fOut[i][:-1]
t1[0] = fOut[i][-1]
fOut[i] = t1
elif cx[i] == -1:
# t1 = np.empty_like(fOut[i], bohrium=fIn.bohrium)
t1 = np.empty_like(fOut[i])
t1[:-1] = fOut[i][1:]
t1[-1] = fOut[i][0]
fOut[i] = t1
if cy[i] == 1:
# t1 = np.empty_like(fOut[i], bohrium=fIn.bohrium)
t1 = np.empty_like(fOut[i])
t1[:, 1:] = fOut[i][:, :-1]
t1[:, 0] = fOut[i][:, -1]
fIn[i] = t1
elif cy[i] == -1:
# t1 = np.empty_like(fOut[i], bohrium=fIn.bohrium)
t1 = np.empty_like(fOut[i])
t1[:, :-1] = fOut[i][:, 1:]
t1[:, -1] = fOut[i][:, 0]
fIn[i] = t1
else:
fIn[i] = fOut[i]
bench.plot_surface(ux.T, "2d", 0, state['viz_max'], state['viz_min'])
bench.do_while(loop_body, timesteps, state['fIn'])
def main():
H = bench.args.size[0]
W = bench.args.size[1]
I = bench.args.size[2]
state = bench.load_data()
if state is None:
state = cylinder(H, W)
bench.start()
solve(state, I)
bench.stop()
bench.save_data(state)
bench.pprint()
if __name__ == "__main__":
main()
Leibnitz Pi¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Leibnitz Pi", "<size>*<niters>")
def leibnitz_pi(N):
n = np.arange(0, N)
return np.sum(1.0 / (4 * n + 1) - 1.0 / (4 * n + 3))
def main():
(size, niter) = bench.args.size
bench.start()
pi = 0.0
for _ in range(niter):
pi += 4.0 * leibnitz_pi(size) # Execute benchmark
bench.flush()
pi /= niter
bench.stop()
bench.pprint()
if bench.args.verbose:
print(pi)
if __name__ == "__main__":
main()
C99 Seq¶#include <stdio.h>
#include <bp_util.h>
double leibnitz_pi(int nelements)
{
double sum = 0.0;
for(int n=0; n<nelements; ++n) {
sum += 1.0/(4*n+1) - 1.0/(4*n+3);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 1);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
bp.timer_start(); // Start timer
double pi = 4.0*leibnitz_pi(nelements); // Run benchmark
bp.timer_stop(); // Stop timer
bp.print("leibnitz_pi(c99_seq)"); // Print results..
if (bp.args.verbose) { // ..and value.
printf("PI-approximation = %.11f\n", pi);
}
return 0;
}
Cpp11 Omp¶#include <iostream>
#include <iomanip>
#include <bp_util.h>
#include <omp.h>
using namespace std;
double leibnitz_pi(int nelements)
{
double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for(int n=0; n<nelements; ++n) {
sum += 1.0/(4*n+1) - 1.0/(4*n+3);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 1);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
bp.timer_start(); // Start timer
double pi = 4.0*leibnitz_pi(nelements); // Run benchmark
bp.timer_stop(); // Stop timer
bp.print("leibnitz_pi(cpp11_omp)"); // Print results..
if (bp.args.verbose) { // ..and value.
cout << fixed << setprecision(11)
<< "PI-approximation = " << pi << endl;
}
return 0;
}
Cpp11 Seq¶#include <iostream>
#include <iomanip>
#include <bp_util.h>
using namespace std;
double leibnitz_pi(int nelements)
{
double sum = 0.0;
for(int n=0; n<nelements; ++n) {
sum += 1.0/(4*n+1) - 1.0/(4*n+3);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 1);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
bp.timer_start(); // Start timer
double pi = 4.0*leibnitz_pi(nelements); // Run benchmark
bp.timer_stop(); // Stop timer
bp.print("leibnitz_pi(cpp11_seq)"); // Print results..
if (bp.args.verbose) { // ..and value.
cout << fixed << setprecision(11)
<< "PI-approximation = " << pi << endl;
}
return 0;
}
Lu¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("LU decomposition on the matrix so that A = L*U", "<size>")
def lu(a):
"""
Perform LU decomposition on the matrix `a` so that A = L*U
"""
u = a.copy()
l = np.identity(a.shape[0], a.dtype)
for c in range(1, u.shape[0]):
l[c:, c - 1] = (u[c:, c - 1] / u[c - 1, c - 1:c])
u[c:, c - 1:] = u[c:, c - 1:] - l[c:, c - 1][:, None] * u[c - 1, c - 1:]
bench.flush()
return (l, u)
def main():
n = bench.args.size[0]
matrix = bench.random_array((n, n))
bench.start()
res = lu(matrix)
bench.stop()
bench.pprint()
if __name__ == "__main__":
main()
Magnetic Field Extrapolation¶Python Cython¶import magnetic_field_extrapolation
magnetic_field_extrapolation.main()
Python Numba¶# This Python file uses the following encoding: utf-8
from benchpress.benchmarks import util
import numpy as np
from math import cos, sin, exp, pi
import numba
from numba import cuda
def window(B,a=0.37):
assert (len(B.shape) == 2)
assert (B.shape[0] == B.shape[1])
n = B.shape[0]
wl = np.ones_like(B[0])
b = int(np.ceil((a * (n-1) / 2)))
wl[:b] = 0.5 * (1 + np.cos(pi*(2 * np.arange(b) / (a * (n-1)) - 1)))
wl[-b:] = 0.5 * (1 + np.cos(pi*(2 * np.arange(b-1,-1,-1) / (a * (n-1)) - 1)))
wl *= wl
w = np.sqrt(wl+wl[:,None])
return B*w
def calcB_loop(B_x0, alpha=0.0,
x_min = 0.0, x_max = 0.25,
y_min = 0.0, y_max = 1.0,
z_min = 0.0, z_max = 1.0):
n = len(B_x0)
x = np.linspace(x_min,x_max,num=n,endpoint=False).astype(B_x0.dtype,copy=False)
y = np.linspace(y_min,y_max,num=n).astype(B_x0.dtype,copy=False)
z = np.linspace(z_min,z_max,num=n).astype(B_x0.dtype,copy=False)
u = np.arange(n,dtype=B_x0.dtype)
# Making C
C = 4.0 / (n-1.0)**2 * np.sum(np.sum((B_x0 * np.sin(pi/y_max * u * y[:,None])[:,:,None])[:,None] * np.sin(pi/z_max * u * z[:,None])[:,None],-1),-1)
l = pi**2 * ((u**2 / y_max)[:,None] + (u**2 / z_max))
l[0,0] = 1.0
r = np.sqrt(l - alpha**2)
exprx = np.exp((-r * x[:, None, None]))
# Calculating B
Bx = np.empty((n,n,n),dtype=B_x0.dtype)
By = np.empty((n,n,n),dtype=B_x0.dtype)
Bz = np.empty((n,n,n),dtype=B_x0.dtype)
temp_x = np.empty((n, n), dtype=B_x0.dtype)
temp_y = np.empty((n, n), dtype=B_x0.dtype)
temp_z = np.empty((n, n), dtype=B_x0.dtype)
for i in range(n):
for j in range(n):
for k in range(n):
Bx[k, i, j] = 0
By[k, i, j] = 0
Bz[k, i, j] = 0
for m in range(n):
sincos = np.sin(pi * k * y[i] / y_max) * (m * np.cos(pi * m * z[j] / z_max))
cossin = (k * np.cos(pi * k * y[i] / y_max)) * (np.sin(pi * m * z[j] / z_max))
temp_x[k,m] = C[k,m] * (np.sin(pi * k * y[i] / y_max) * (np.sin(pi * m * z[j] / z_max)))
temp_y[k,m] = C[k,m] / l[k,m] * (alpha * pi / z_max * sincos - r[k,m] * pi / y_max * cossin)
temp_z[k,m] = C[k,m] / l[k,m] * (alpha * pi / y_max * cossin + r[k,m] * pi / z_max * sincos)
for k in range(n):
for m in range(n):
for q in range(n):
Bx[k, i, j] += temp_x[m, q] * exprx[k, m, q]
By[k, i, j] += temp_y[m, q] * exprx[k, m, q]
Bz[k, i, j] += temp_z[m, q] * exprx[k, m, q]
return (Bx, By, Bz)
#@cuda.jit()
@numba.jit(nopython=True)
def loop(n, Bx, By, Bz, u, y, y_max, z, z_max, alpha, C, l, r, temp_x, temp_y, temp_z, exprx):
for i in range(n):
for j in range(n):
for k in range(n):
Bx[k, i, j] = 0
By[k, i, j] = 0
Bz[k, i, j] = 0
for m in range(n):
sincos = sin(pi * k * y[i] / y_max) * (m * cos(pi * m * z[j] / z_max))
cossin = (k * cos(pi * k * y[i] / y_max)) * (sin(pi * m * z[j] / z_max))
temp_x[k,m] = C[k,m] * (sin(pi * k * y[i] / y_max) * (sin(pi * m * z[j] / z_max)))
temp_y[k,m] = C[k,m] / l[k,m] * (alpha * pi / z_max * sincos - r[k,m] * pi / y_max * cossin)
temp_z[k,m] = C[k,m] / l[k,m] * (alpha * pi / y_max * cossin + r[k,m] * pi / z_max * sincos)
for k in range(n):
for m in range(n):
for q in range(n):
Bx[k, i, j] += temp_x[m, q] * exprx[k, m, q]
By[k, i, j] += temp_y[m, q] * exprx[k, m, q]
Bz[k, i, j] += temp_z[m, q] * exprx[k, m, q]
def calcB_numba(B_x0, alpha=0.0,
x_min = 0.0, x_max = 0.25,
y_min = 0.0, y_max = 1.0,
z_min = 0.0, z_max = 1.0):
n = len(B_x0)
x = np.linspace(x_min,x_max,num=n,endpoint=False).astype(B_x0.dtype,copy=False)
y = np.linspace(y_min,y_max,num=n).astype(B_x0.dtype,copy=False)
z = np.linspace(z_min,z_max,num=n).astype(B_x0.dtype,copy=False)
u = np.arange(n,dtype=B_x0.dtype)
# Making C
C = 4.0 / (n-1.0)**2 * np.sum(np.sum((B_x0 * np.sin(pi/y_max * u * y[:,None])[:,:,None])[:,None] * np.sin(pi/z_max * u * z[:,None])[:,None],-1),-1)
l = pi**2 * ((u**2 / y_max)[:,None] + (u**2 / z_max))
l[0,0] = 1.0
r = np.sqrt(l - alpha**2)
exprx = np.exp((-r * x[:, None, None]))
# Calculating B
Bx = np.empty((n,n,n),dtype=B_x0.dtype)
By = np.empty((n,n,n),dtype=B_x0.dtype)
Bz = np.empty((n,n,n),dtype=B_x0.dtype)
temp_x = np.empty((n, n), dtype=B_x0.dtype)
temp_y = np.empty((n, n), dtype=B_x0.dtype)
temp_z = np.empty((n, n), dtype=B_x0.dtype)
loop(n, Bx, By, Bz, u, y, y_max, z, z_max, alpha, C, l, r, temp_x, temp_y, temp_z, exprx)
return (Bx, By, Bz)
def main():
B = util.Benchmark()
if B.inputfn is None:
B_x0 = B.random_array((B.size[0],B.size[1]), dtype=B.dtype)
else:
inputfn = B.inputfn if B.inputfn else '../idl_input-float64_512*512.npz'
sd = { 512:1, 256:2, 128:4, 64:8, 32:16, 16:32, 8:64}
try:
h = sd[B.size[0]]
w = sd[B.size[1]]
except KeyError:
raise ValueError('Only valid sizes are: '+str(sd.keys()))
B_x0 = B.load_array(inputfn, 'input', dtype=B.dtype)[::h,::w]
B.start()
for _ in range(B.size[2]):
Rx, Ry, Rz = calcB_numba(window(B_x0.copy()))
"""
Rx1, Ry1, Rz1 = calcB_loop(window(B_x0.copy()))
assert np.allclose(Rx, Rx1)
assert np.allclose(Ry, Ry1)
assert np.allclose(Rz, Rz1)
"""
B.stop()
B.pprint()
if B.outputfn:
R = Rx+Ry+Rz
B.tofile(B.outputfn, {'res': R, 'res_x': Rx, 'res_y': Ry, 'res_z': Rz})
if __name__ == '__main__':
main()
Python Numpy¶# This Python file uses the following encoding: utf-8
from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
import math
bench = util.Benchmark("Magnetic Field Extrapolation", "<x-size>*<y-size>*<iterations>")
def window(B, a=0.37):
assert (len(B.shape) == 2)
assert (B.shape[0] == B.shape[1])
n = B.shape[0]
wl = np.ones_like(B[0])
b = int(np.ceil((a * (n - 1) / 2)))
wl[:b] = 0.5 * (1 + np.cos(math.pi * (2 * np.arange(b) / (a * (n - 1)) - 1)))
wl[-b:] = 0.5 * (1 + np.cos(math.pi * (2 * np.arange(b - 1, -1, -1) / (a * (n - 1)) - 1)))
wl *= wl
w = np.sqrt(wl + wl[:, None])
return B * w
def calcB(B_x0, alpha=0.0,
x_min=0.0, x_max=0.25,
y_min=0.0, y_max=1.0,
z_min=0.0, z_max=1.0):
n = len(B_x0)
x = np.linspace(x_min, x_max, num=n, endpoint=False).astype(B_x0.dtype, copy=False)
y = np.linspace(y_min, y_max, num=n).astype(B_x0.dtype, copy=False)
z = np.linspace(z_min, z_max, num=n).astype(B_x0.dtype, copy=False)
u = np.arange(n, dtype=B_x0.dtype)
# Making C
C = 4.0 / (n - 1.0) ** 2 * np.sum(np.sum(
(B_x0 * np.sin(math.pi / y_max * u * y[:, None])[:, :, None])[:, None] * np.sin(
math.pi / z_max * u * z[:, None])[:, None], -1), -1)
l = np.pi ** 2 * ((u ** 2 / y_max)[:, None] + (u ** 2 / z_max))
l[0, 0] = 1.0
r = np.sqrt(l - alpha ** 2)
# Calculating B
sincos = np.sin(math.pi / y_max * u * y[:, None])[:, None, :, None] * (u * np.cos(
math.pi / z_max * u * z[:, None]))[None, :, None, :]
cossin = (u * np.cos(math.pi / y_max * u * y[:, None]))[:, None, :, None] * np.sin(
math.pi / z_max * u * z[:, None])[None, :, None, :]
temp_x = C * np.sin(math.pi / y_max * u * y[:, None])[:, None, :, None] * np.sin(math.pi / z_max * u * z[:, None])[
None, :, None, :]
temp_y = C / l * (alpha * math.pi / z_max * sincos - r * math.pi / y_max * cossin)
temp_z = C / l * (alpha * math.pi / y_max * cossin + r * math.pi / z_max * sincos)
exprx = np.exp((-r * x[:, None, None]))
Bx = np.sum(np.sum(temp_x * exprx[:, None, None], -1), -1)
By = np.sum(np.sum(temp_y * exprx[:, None, None], -1), -1)
Bz = np.sum(np.sum(temp_z * exprx[:, None, None], -1), -1)
return (Bx, By, Bz)
def main():
B_x0 = bench.load_data()
if B_x0 is None:
B_x0 = bench.random_array((bench.args.size[0], bench.args.size[1]))
else:
sd = {512: 1, 256: 2, 128: 4, 64: 8, 32: 16, 16: 32}
try:
h = sd[bench.args.size[0]]
w = sd[bench.args.size[1]]
except KeyError:
raise ValueError('Only valid sizes are: ' + str(sd.keys()))
B_x0 = B_x0['input'][::h, ::w]
bench.start()
for _ in range(bench.args.size[2]):
Rx, Ry, Rz = calcB(window(B_x0))
bench.flush()
bench.stop()
bench.pprint()
if __name__ == '__main__':
main()
Montecarlo Pi¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Monte Carlo Pi", "samples*iterations")
def montecarlo_pi(samples):
x = bench.random_array((samples,))
y = bench.random_array((samples,))
m = np.sqrt(x * x + y * y) <= 1.0
return np.sum(m) * 4.0 / samples
def solve(samples, iterations):
acc = 0.0
for _ in range(iterations):
acc += montecarlo_pi(samples)
bench.flush()
acc /= iterations
return acc
def main():
samples, iterations = bench.args.size
bench.start()
res = solve(samples, iterations)
bench.stop()
bench.pprint()
if bench.args.verbose:
print(res)
if __name__ == "__main__":
main()
C99 Omp¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double montecarlo_pi(uint32_t samples, uint32_t x_index, uint32_t y_index, uint32_t key)
{
uint64_t darts = 0;
#pragma omp parallel for reduction(+:darts)
for (uint32_t eidx=0; eidx<samples; ++eidx) {
uint64_t x_raw = ((philox2x32_as_1x64_t)philox2x32(
((philox2x32_as_1x64_t)( (uint64_t)(x_index + eidx) )).orig,
(philox2x32_key_t){ { key } }
)).combined;
double x = x_raw / 18446744073709551616.000000;
uint64_t y_raw = ((philox2x32_as_1x64_t)philox2x32(
((philox2x32_as_1x64_t)( (uint64_t)(y_index + eidx) )).orig,
(philox2x32_key_t){ { key } }
)).combined;
double y = y_raw / 18446744073709551616.000000;
darts += sqrt(x*x + y*y) <= 1;
}
return (double)darts * (double)4.0 / (double)samples;
}
double run_montecarlo_pi(int64_t samples, int64_t iterations)
{
uint64_t index = 0; // Philox index
const uint64_t key = 0; // Philox key
uint64_t x_index = 0;
uint64_t y_index = 0;
double pi_accu = 0.0; // Accumulation over PI-approximations.
for(int64_t i=0; i<iterations; ++i) {
x_index = index;
index += samples;
y_index = index;
index += samples;
pi_accu += montecarlo_pi(samples, x_index, y_index, key);
}
pi_accu /= iterations; // Approximated value of PI
return pi_accu;
}
int main(int argc, char** argv)
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
bp.timer_start();
double pi = run_montecarlo_pi(samples, iterations);
bp.timer_stop();
bp.print("montecarlo_pi(c99_omp)");
if (bp.args.verbose) {
printf("PI-approximation = %f\n", pi);
}
return 0;
}
C99 Seq¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double montecarlo_pi(uint32_t samples, uint32_t x_index, uint32_t y_index, uint32_t key)
{
uint64_t darts = 0;
for (uint32_t eidx=0; eidx<samples; ++eidx) {
uint64_t x_raw = ((philox2x32_as_1x64_t)philox2x32(
((philox2x32_as_1x64_t)( (uint64_t)(x_index + eidx) )).orig,
(philox2x32_key_t){ { key } }
)).combined;
double x = x_raw / 18446744073709551616.000000;
uint64_t y_raw = ((philox2x32_as_1x64_t)philox2x32(
((philox2x32_as_1x64_t)( (uint64_t)(y_index + eidx) )).orig,
(philox2x32_key_t){ { key } }
)).combined;
double y = y_raw / 18446744073709551616.000000;
darts += sqrt(x*x + y*y) <= 1;
}
return (double)darts * (double)4.0 / (double)samples;
}
double run_montecarlo_pi(int64_t samples, int64_t iterations)
{
uint64_t index = 0; // Philox index
const uint64_t key = 0; // Philox key
uint64_t x_index = 0;
uint64_t y_index = 0;
double pi_accu = 0.0; // Accumulation over PI-approximations.
for(int64_t i=0; i<iterations; ++i) {
x_index = index;
index += samples;
y_index = index;
index += samples;
pi_accu += montecarlo_pi(samples, x_index, y_index, key);
}
pi_accu /= iterations; // Approximated value of PI
return pi_accu;
}
int main(int argc, char** argv)
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
bp.timer_start();
double pi = run_montecarlo_pi(samples, iterations);
bp.timer_stop();
bp.print("montecarlo_pi(c99_seq)");
if (bp.args.verbose) {
printf("PI-approximation = %f\n", pi);
}
return 0;
}
Cpp11 Omp¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double montecarlo_pi(uint32_t samples, uint32_t x_index, uint32_t y_index, uint32_t key)
{
uint64_t darts = 0;
#pragma omp parallel for reduction(+:darts)
for (uint32_t eidx=0; eidx<samples; ++eidx) {
philox2x32_as_1x64_t x_counter;
x_counter.combined = x_index + eidx;
philox2x32_as_1x64_t x_philox;
x_philox.orig = philox2x32(
x_counter.orig,
(philox2x32_key_t){ { key } }
);
double x = x_philox.combined / 18446744073709551616.000000;
philox2x32_as_1x64_t y_counter;
y_counter.combined = y_index + eidx;
philox2x32_as_1x64_t y_philox;
y_philox.orig = philox2x32(
y_counter.orig,
(philox2x32_key_t){ { key } }
);
double y = y_philox.combined / 18446744073709551616.000000;
darts += sqrt(x*x + y*y) <= 1;
}
return (double)darts * (double)4.0 / (double)samples;
}
double run_montecarlo_pi(int64_t samples, int64_t iterations)
{
uint64_t index = 0; // Philox index
const uint64_t key = 0; // Philox key
uint64_t x_index = 0;
uint64_t y_index = 0;
double pi_accu = 0.0; // Accumulation over PI-approximations.
for(int64_t i=0; i<iterations; ++i) {
x_index = index;
index += samples;
y_index = index;
index += samples;
pi_accu += montecarlo_pi(samples, x_index, y_index, key);
}
pi_accu /= iterations; // Approximated value of PI
return pi_accu;
}
int main(int argc, char** argv)
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
bp.timer_start();
double pi = run_montecarlo_pi(samples, iterations);
bp.timer_stop();
bp.print("montecarlo_pi(cpp11_omp)");
if (bp.args.verbose) {
printf("PI-approximation = %f\n", pi);
}
return 0;
}
Cpp11 Seq¶#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <Random123/philox.h>
#include <bp_util.h>
typedef union philox2x32_as_1x64 {
philox2x32_ctr_t orig;
uint64_t combined;
} philox2x32_as_1x64_t;
double montecarlo_pi(uint32_t samples, uint32_t x_index, uint32_t y_index, uint32_t key)
{
uint64_t darts = 0;
for (uint32_t eidx=0; eidx<samples; ++eidx) {
philox2x32_as_1x64_t x_counter;
x_counter.combined = x_index + eidx;
philox2x32_as_1x64_t x_philox;
x_philox.orig = philox2x32(
x_counter.orig,
(philox2x32_key_t){ { key } }
);
double x = x_philox.combined / 18446744073709551616.000000;
philox2x32_as_1x64_t y_counter;
y_counter.combined = y_index + eidx;
philox2x32_as_1x64_t y_philox;
y_philox.orig = philox2x32(
y_counter.orig,
(philox2x32_key_t){ { key } }
);
double y = y_philox.combined / 18446744073709551616.000000;
darts += sqrt(x*x + y*y) <= 1;
}
return (double)darts * (double)4.0 / (double)samples;
}
double run_montecarlo_pi(int64_t samples, int64_t iterations)
{
uint64_t index = 0; // Philox index
const uint64_t key = 0; // Philox key
uint64_t x_index = 0;
uint64_t y_index = 0;
double pi_accu = 0.0; // Accumulation over PI-approximations.
for(int64_t i=0; i<iterations; ++i) {
x_index = index;
index += samples;
y_index = index;
index += samples;
pi_accu += montecarlo_pi(samples, x_index, y_index, key);
}
pi_accu /= iterations; // Approximated value of PI
return pi_accu;
}
int main(int argc, char** argv)
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int samples = bp.args.sizes[0];
const int iterations = bp.args.sizes[1];
bp.timer_start();
double pi = run_montecarlo_pi(samples, iterations);
bp.timer_stop();
bp.print("montecarlo_pi(cpp11_seq)");
if (bp.args.verbose) {
printf("PI-approximation = %f\n", pi);
}
return 0;
}
Nbody¶Python Numpy¶from __future__ import print_function
"""
NBody in N^2 complexity
Note that we are using only Newtonian forces and do not consider relativity
Neither do we consider collisions between stars
Thus some of our stars will accelerate to speeds beyond c
This is done to keep the simulation simple enough for teaching purposes
All the work is done in the calc_force, move and random_galaxy functions.
To vectorize the code these are the functions to transform.
"""
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Gaussian elimination on matrix a without pivoting", "<size>")
G = 6.67384e-11 # m/(kg*s^2)
dt = 60 * 60 * 24 * 365.25 # Years in seconds
r_ly = 9.4607e15 # Lightyear in m
m_sol = 1.9891e30 # Solar mass in kg
def diagonal(ary, offset=0):
if ary.ndim != 2:
raise Exception("diagonal only supports 2 dimensions\n")
if offset < 0:
offset = -offset
if (ary.shape[0] - offset) > ary.shape[1]:
ary_diag = ary[offset, :]
else:
ary_diag = ary[offset:, 0]
else:
if ary.shape[1] - offset > ary.shape[0]:
ary_diag = ary[:, offset]
else:
ary_diag = ary[0, offset:]
ary_diag.strides = (ary.strides[0] + ary.strides[1],)
return ary_diag
def random_galaxy(N):
"""Generate a galaxy of random bodies"""
galaxy = { # We let all bodies stand still initially
'm': (bench.random_array((N,)) + 10) * m_sol / 10,
'x': (bench.random_array((N,)) - 0.5) * r_ly / 100,
'y': (bench.random_array((N,)) - 0.5) * r_ly / 100,
'z': (bench.random_array((N,)) - 0.5) * r_ly / 100,
'vx': np.zeros(N, dtype=bench.dtype),
'vy': np.zeros(N, dtype=bench.dtype),
'vz': np.zeros(N, dtype=bench.dtype)
}
if bench.dtype == np.float32:
galaxy['m'] /= 1e10
galaxy['x'] /= 1e5
galaxy['y'] /= 1e5
galaxy['z'] /= 1e5
return galaxy
def move(galaxy, dt):
"""Move the bodies
first find forces and change velocity and then move positions
"""
n = len(galaxy['x'])
# Calculate all dictances component wise (with sign)
dx = galaxy['x'][np.newaxis, :].T - galaxy['x']
dy = galaxy['y'][np.newaxis, :].T - galaxy['y']
dz = galaxy['z'][np.newaxis, :].T - galaxy['z']
# Euclidian distances (all bodys)
r = np.sqrt(dx ** 2 + dy ** 2 + dz ** 2)
diagonal(r)[:] = 1.0
# prevent collision
mask = r < 1.0
r = r * ~mask + 1.0 * mask
m = galaxy['m'][np.newaxis, :].T
# Calculate the acceleration component wise
Fx = G * m * dx / r ** 3
Fy = G * m * dy / r ** 3
Fz = G * m * dz / r ** 3
# Set the force (acceleration) a body exerts on it self to zero
diagonal(Fx)[:] = 0.0
diagonal(Fy)[:] = 0.0
diagonal(Fz)[:] = 0.0
galaxy['vx'] += dt * np.sum(Fx, axis=0)
galaxy['vy'] += dt * np.sum(Fy, axis=0)
galaxy['vz'] += dt * np.sum(Fz, axis=0)
galaxy['x'] += dt * galaxy['vx']
galaxy['y'] += dt * galaxy['vy']
galaxy['z'] += dt * galaxy['vz']
def main():
nbodies, niters = bench.args.size
galaxy = random_galaxy(nbodies)
bench.start()
for i in range(niters):
move(galaxy, dt)
bench.flush()
bench.stop()
bench.pprint()
if __name__ == "__main__":
main()
Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
namespace nbody
{
using NumCIL.Double;
using DATA = System.Double;
public static class nBodySolverDouble
{
//Gravity
private static DATA G = (DATA)1.0;
private static void FillDiagonal(NdArray a, DATA val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Galaxy b)
{
var dx = b.x - b.x[R.NewAxis, R.All].Transposed;
FillDiagonal(dx, 1);
var dy = b.y - b.y[R.NewAxis, R.All].Transposed;
FillDiagonal(dy, 1);
var dz = b.z - b.z[R.NewAxis, R.All].Transposed;
FillDiagonal(dz, 1);
var pm = b.mass - b.mass[R.NewAxis, R.All].Transposed;
FillDiagonal(pm, 0);
var r = (dx.Pow(2) + dy.Pow(2) + dz.Pow(2)).Pow((DATA)0.5);
//In the below calc of the the forces the force of a body upon itself
//becomes nan and thus destroys the data
var Fx = G * pm / r.Pow(2) * (dx / r);
var Fy = G * pm / r.Pow(2) * (dy / r);
var Fz = G * pm / r.Pow(2) * (dz / r);
//The diagonal nan numbers must be removed so that the force from a body
//upon itself is zero
FillDiagonal(Fx,0);
FillDiagonal(Fy,0);
FillDiagonal(Fz,0);
b.vx += Add.Reduce(Fx, 1) / b.mass;
b.vy += Add.Reduce(Fy, 1) / b.mass;
b.vz += Add.Reduce(Fz, 1) / b.mass;
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy);
galaxy.x += galaxy.vx;
galaxy.y += galaxy.vy;
galaxy.z += galaxy.vz;
}
public class Galaxy
{
private const DATA XMAX = 500;
private const DATA YMAX = 500;
private const DATA ZMAX = 500;
public NdArray mass;
public NdArray x;
public NdArray y;
public NdArray z;
public NdArray vx;
public NdArray vy;
public NdArray vz;
public Galaxy(long size)
{
this.mass = Generate.Random(size) * (DATA)Math.Pow(10, 6) / (DATA)(4 * Math.PI * Math.PI);
this.x = Generate.Random(size) * 2 * XMAX - XMAX;
this.y = Generate.Random(size) * 2 * YMAX - YMAX;
this.z = Generate.Random(size) * 2 * ZMAX - ZMAX;
this.vx = Generate.Zeroes(size);
this.vy = Generate.Zeroes(size);
this.vz = Generate.Zeroes(size);
}
public DATA Sync()
{
return this.z.Value[0] + this.y.Value[0] + this.x.Value[0];
}
}
public static void Solve(Galaxy galaxy, long steps)
{
for(long i = 0; i < steps; i++)
Move(galaxy);
}
}
}
namespace nbody
{
using NumCIL.Double;
using DATA = System.Double;
public static class nBodySolverDoubleNoTempArrays
{
//Gravity
private const DATA G = (DATA)1.0;
private const DATA MIN_DIST = (DATA)1.0;
private static void FillDiagonal(NdArray a, DATA val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Galaxy b)
{
// Calculate all dictances component wise (with sign)
b.x.Sub(b.x[R.NewAxis, R.All].Transposed, b.dx);
b.y.Sub(b.y[R.NewAxis, R.All].Transposed, b.dy);
b.z.Sub(b.z[R.NewAxis, R.All].Transposed, b.dz);
// Euclidian distances (all bodys)
b.dx.Pow(2, b.r0);
b.dy.Pow(2, b.r1);
b.r0.Add(b.r1, b.r0);
b.dz.Pow(2, b.r1);
b.r0.Add(b.r1, b.r0);
b.r0.Pow((DATA)0.5, b.r0);
FillDiagonal(b.r0, (Double)1.0);
// prevent collition
NumCIL.Double.GreaterThanOrEqual.Apply(b.r0, MIN_DIST, b.mask);
b.mask.ToDouble (b.r1);
b.r1.Mul (b.r0, b.r0);
NumCIL.Boolean.Not.Apply (b.mask, b.mask);
b.mask.ToDouble (b.r1);
b.r1.Mul (MIN_DIST, b.r1);
b.r1.Add (b.r0, b.r0);
//Calculate the acceleration component wise
var m = b.mass[R.NewAxis, R.All].Transposed;
b.dx.Mul(G,b.dx);
b.dy.Mul(G,b.dy);
b.dz.Mul(G,b.dz);
b.dx.Mul(m,b.dx);
b.dy.Mul(m,b.dy);
b.dz.Mul(m,b.dz);
b.r0.Pow(3, b.r0);
b.dx.Div(b.r0, b.dx);
b.dy.Div(b.r0, b.dy);
b.dz.Div(b.r0, b.dz);
// Set the force (acceleration) a body exerts on it self to zero
FillDiagonal(b.dx, 0);
FillDiagonal(b.dy, 0);
FillDiagonal(b.dz, 0);
Double dt = (Double) (60*60*24*365.25); // Years in seconds
b.dx.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vx.Add(b.t0, b.vx);
b.dy.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vy.Add(b.t0, b.vy);
b.dz.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vz.Add(b.t0, b.vz);
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy);
galaxy.x.Add(galaxy.vx, galaxy.x);
galaxy.y.Add(galaxy.vy, galaxy.y);
galaxy.z.Add(galaxy.vz, galaxy.z);
}
public class Galaxy
{
private const DATA XMAX = 500;
private const DATA YMAX = 500;
private const DATA ZMAX = 500;
public NdArray mass;
public NdArray x;
public NdArray y;
public NdArray z;
public NdArray vx;
public NdArray vy;
public NdArray vz;
public NdArray dx;
public NdArray dy;
public NdArray dz;
public NdArray pmass;
public NdArray r0;
public NdArray r1;
public NdArray t0;
public NumCIL.Boolean.NdArray mask;
public Galaxy(long size)
{
this.mass = Generate.Random(size) * (DATA)Math.Pow(10, 6) / (DATA)(4 * Math.PI * Math.PI);
this.x = Generate.Random(size) * 2 * XMAX - XMAX;
this.y = Generate.Random(size) * 2 * YMAX - YMAX;
this.z = Generate.Random(size) * 2 * ZMAX - ZMAX;
this.vx = Generate.Zeroes(size);
this.vy = Generate.Zeroes(size);
this.vz = Generate.Zeroes(size);
this.dx = new NdArray(new [] {size, size});
this.dy = new NdArray(new [] {size, size});
this.dz = new NdArray(new [] {size, size});
this.pmass = new NdArray(new [] {size, size});
this.r0 = new NdArray(this.dx.Shape);
this.r1 = new NdArray(this.dx.Shape);
this.t0 = new NdArray(this.x.Shape);
this.mask = new NumCIL.Boolean.NdArray(this.dx.Shape);
}
public DATA Sync()
{
return this.z.Value[0] + this.y.Value[0] + this.x.Value[0];
}
}
public static void Solve(Galaxy galaxy, long steps)
{
for(long i = 0; i < steps; i++)
Move(galaxy);
}
}
}
namespace nbody
{
using NumCIL.Float;
using DATA = System.Single;
public static class nBodySolverFloat
{
//Gravity
private static DATA G = (DATA)1.0;
private static void FillDiagonal(NdArray a, DATA val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Galaxy b)
{
var dx = b.x - b.x[R.NewAxis, R.All].Transposed;
FillDiagonal(dx, 1);
var dy = b.y - b.y[R.NewAxis, R.All].Transposed;
FillDiagonal(dy, 1);
var dz = b.z - b.z[R.NewAxis, R.All].Transposed;
FillDiagonal(dz, 1);
var pm = b.mass - b.mass[R.NewAxis, R.All].Transposed;
FillDiagonal(pm, 0);
var r = (dx.Pow(2) + dy.Pow(2) + dz.Pow(2)).Pow((DATA)0.5);
//In the below calc of the the forces the force of a body upon itself
//becomes nan and thus destroys the data
var Fx = G * pm / r.Pow(2) * (dx / r);
var Fy = G * pm / r.Pow(2) * (dy / r);
var Fz = G * pm / r.Pow(2) * (dz / r);
//The diagonal nan numbers must be removed so that the force from a body
//upon itself is zero
FillDiagonal(Fx,0);
FillDiagonal(Fy,0);
FillDiagonal(Fz,0);
b.vx += Add.Reduce(Fx, 1) / b.mass;
b.vy += Add.Reduce(Fy, 1) / b.mass;
b.vz += Add.Reduce(Fz, 1) / b.mass;
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy);
galaxy.x += galaxy.vx;
galaxy.y += galaxy.vy;
galaxy.z += galaxy.vz;
}
public class Galaxy
{
private const DATA XMAX = 500;
private const DATA YMAX = 500;
private const DATA ZMAX = 500;
public NdArray mass;
public NdArray x;
public NdArray y;
public NdArray z;
public NdArray vx;
public NdArray vy;
public NdArray vz;
public Galaxy(long size)
{
this.mass = Generate.Random(size) * (DATA)Math.Pow(10, 6) / (DATA)(4 * Math.PI * Math.PI);
this.x = Generate.Random(size) * 2 * XMAX - XMAX;
this.y = Generate.Random(size) * 2 * YMAX - YMAX;
this.z = Generate.Random(size) * 2 * ZMAX - ZMAX;
this.vx = Generate.Zeroes(size);
this.vy = Generate.Zeroes(size);
this.vz = Generate.Zeroes(size);
}
public DATA Sync()
{
return this.z.Value[0] + this.y.Value[0] + this.x.Value[0];
}
}
public static void Solve(Galaxy galaxy, long steps)
{
for(long i = 0; i < steps; i++)
Move(galaxy);
}
}
}
namespace nbody
{
using NumCIL.Float;
using DATA = System.Single;
public static class nBodySolverFloatNoTempArrays
{
//Gravity
private const DATA G = (DATA)1.0;
private const DATA MIN_DIST = (DATA)1.0;
private static void FillDiagonal(NdArray a, DATA val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Galaxy b)
{
// Calculate all dictances component wise (with sign)
b.x.Sub(b.x[R.NewAxis, R.All].Transposed, b.dx);
b.y.Sub(b.y[R.NewAxis, R.All].Transposed, b.dy);
b.z.Sub(b.z[R.NewAxis, R.All].Transposed, b.dz);
// Euclidian distances (all bodys)
b.dx.Pow(2, b.r0);
b.dy.Pow(2, b.r1);
b.r0.Add(b.r1, b.r0);
b.dz.Pow(2, b.r1);
b.r0.Add(b.r1, b.r0);
b.r0.Pow((DATA)0.5, b.r0);
FillDiagonal(b.r0, (Single)1.0);
// prevent collition
NumCIL.Float.GreaterThanOrEqual.Apply(b.r0, MIN_DIST, b.mask);
b.mask.ToFloat (b.r1);
b.r1.Mul (b.r0, b.r0);
NumCIL.Boolean.Not.Apply (b.mask, b.mask);
b.mask.ToFloat (b.r1);
b.r1.Mul (MIN_DIST, b.r1);
b.r1.Add (b.r0, b.r0);
//Calculate the acceleration component wise
var m = b.mass[R.NewAxis, R.All].Transposed;
b.dx.Mul(G,b.dx);
b.dy.Mul(G,b.dy);
b.dz.Mul(G,b.dz);
b.dx.Mul(m,b.dx);
b.dy.Mul(m,b.dy);
b.dz.Mul(m,b.dz);
b.r0.Pow(3, b.r0);
b.dx.Div(b.r0, b.dx);
b.dy.Div(b.r0, b.dy);
b.dz.Div(b.r0, b.dz);
// Set the force (acceleration) a body exerts on it self to zero
FillDiagonal(b.dx, 0);
FillDiagonal(b.dy, 0);
FillDiagonal(b.dz, 0);
Single dt = (Single) (60*60*24*365.25); // Years in seconds
b.dx.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vx.Add(b.t0, b.vx);
b.dy.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vy.Add(b.t0, b.vy);
b.dz.Reduce<Add>(0, b.t0);
b.t0.Mul(dt, b.t0);
b.vz.Add(b.t0, b.vz);
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy);
galaxy.x.Add(galaxy.vx, galaxy.x);
galaxy.y.Add(galaxy.vy, galaxy.y);
galaxy.z.Add(galaxy.vz, galaxy.z);
}
public class Galaxy
{
private const DATA XMAX = 500;
private const DATA YMAX = 500;
private const DATA ZMAX = 500;
public NdArray mass;
public NdArray x;
public NdArray y;
public NdArray z;
public NdArray vx;
public NdArray vy;
public NdArray vz;
public NdArray dx;
public NdArray dy;
public NdArray dz;
public NdArray pmass;
public NdArray r0;
public NdArray r1;
public NdArray t0;
public NumCIL.Boolean.NdArray mask;
public Galaxy(long size)
{
this.mass = Generate.Random(size) * (DATA)Math.Pow(10, 6) / (DATA)(4 * Math.PI * Math.PI);
this.x = Generate.Random(size) * 2 * XMAX - XMAX;
this.y = Generate.Random(size) * 2 * YMAX - YMAX;
this.z = Generate.Random(size) * 2 * ZMAX - ZMAX;
this.vx = Generate.Zeroes(size);
this.vy = Generate.Zeroes(size);
this.vz = Generate.Zeroes(size);
this.dx = new NdArray(new [] {size, size});
this.dy = new NdArray(new [] {size, size});
this.dz = new NdArray(new [] {size, size});
this.pmass = new NdArray(new [] {size, size});
this.r0 = new NdArray(this.dx.Shape);
this.r1 = new NdArray(this.dx.Shape);
this.t0 = new NdArray(this.x.Shape);
this.mask = new NumCIL.Boolean.NdArray(this.dx.Shape);
}
public DATA Sync()
{
return this.z.Value[0] + this.y.Value[0] + this.x.Value[0];
}
}
public static void Solve(Galaxy galaxy, long steps)
{
for(long i = 0; i < steps; i++)
Move(galaxy);
}
}
}
Nbody Nice¶Python Numpy¶Error There are issues with the implementation. Visualization does not work when running with Bohrium since it relies on NumPy masked array. #!/usr/bin/python
# -*- coding: utf-8 -*-
"""
NBody in N^2 complexity
Note that we are using only Newtonian forces and do not consider relativity
Neither do we consider collisions between stars
Thus some of our stars will accelerate to speeds beyond c
This is done to keep the simulation simple enough for teaching purposes
"""
from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
try:
import numpy_force as npf
except ImportError:
import numpy as npf
from nbody_nice_visualization import gfx_init, gfx_show
bench = util.Benchmark("n-body using the NICE method", "<image-size>*<filter-size>*<ndims>*<niters>")
def fill_diagonal(a, val):
d, _ = a.shape # This only makes sense for square matrices
a.shape = d * d # Flatten a without making a copy
a[::d + 1] = val # Assign the diagonal values
a.shape = (d, d) # Return a to its original shape
def calc_force(a, b, dt):
"""
Calculate forces between bodies
F = ((G m_a m_b)/r^2)/((x_b-x_a)/r)
"""
# Ignore division by zero since we fix it explicitely by setting the diagonal in the forces arrays
npf.seterr(divide='ignore', invalid='ignore')
G = 6.673e-11
dx = b['x'] - a['x'][:, None]
dy = b['y'] - a['y'][:, None]
dz = b['z'] - a['z'][:, None]
pm = b['m'] * a['m'][:, None]
#
# For some reason then this pow(T, 0.5) is deadly to performance...
# sqrt(T) is equivalent math, trying it out instead.
#
# This might actually be a neat optimization:
# pow(T, 0.K) => k-root(T)
#
# r = ( dx ** 2 + dy ** 2 + dz ** 2) ** 0.5
r = np.sqrt(dx ** 2 + dy ** 2 + dz ** 2)
Fx = G * pm / r ** 2 * (dx / r)
Fy = G * pm / r ** 2 * (dy / r)
Fz = G * pm / r ** 2 * (dz / r)
# The diagonal nan numbers must be removed so that the force from a body
# upon itself is zero
if a is b:
fill_diagonal(Fx, 0.)
fill_diagonal(Fy, 0.)
fill_diagonal(Fz, 0.)
a['vx'] += np.add.reduce(Fx, axis=1) / a['m'] * dt
a['vy'] += np.add.reduce(Fy, axis=1) / a['m'] * dt
a['vz'] += np.add.reduce(Fz, axis=1) / a['m'] * dt
def move(solarsystem, asteroids, dt):
"""
Move the bodies
first find forces and change velocity and then move positions
"""
calc_force(solarsystem, solarsystem, dt)
calc_force(asteroids, solarsystem, dt)
solarsystem['x'] += solarsystem['vx'] * dt
solarsystem['y'] += solarsystem['vy'] * dt
solarsystem['z'] += solarsystem['vz'] * dt
asteroids['x'] += asteroids['vx'] * dt
asteroids['y'] += asteroids['vy'] * dt
asteroids['z'] += asteroids['vz'] * dt
def random_system(x_max, y_max, z_max, n, b, dtype=npf.float):
"""Generate a galaxy of random bodies"""
solarmass = 1.98892e30
def circlev(rx, ry, rz):
"""Helper function..."""
r2 = npf.sqrt(rx * rx + ry * ry + rz * rz)
numerator = (6.67e-11) * 1e6 * solarmass
return npf.sqrt(numerator / r2)
solarsystem = {}
solarsystem['x'] = npf.random.random(n)
solarsystem['y'] = npf.random.random(n)
solarsystem['z'] = npf.random.random(n) * .01
dist = (1.0 / npf.sqrt(solarsystem['x'] ** 2 + solarsystem['y'] ** 2 + solarsystem['z'] ** 2)) - (
0.8 - npf.random.random() * .1)
solarsystem['x'] = x_max * solarsystem['x'] * dist * npf.sign(.5 - npf.random.random(n))
solarsystem['y'] = y_max * solarsystem['y'] * dist * npf.sign(.5 - npf.random.random(n))
solarsystem['z'] = z_max * solarsystem['z'] * dist * npf.sign(.5 - npf.random.random(n))
magv = circlev(
solarsystem['x'],
solarsystem['y'],
solarsystem['z']
)
absangle = npf.arctan(npf.absolute(solarsystem['y'] / solarsystem['x']))
thetav = npf.pi / 2 - absangle
solarsystem['vx'] = -1 * npf.sign(solarsystem['y']) * npf.cos(thetav) * magv
solarsystem['vy'] = npf.sign(solarsystem['x']) * npf.sin(thetav) * magv
solarsystem['vz'] = npf.zeros(n)
solarsystem['m'] = npf.random.random(n) * solarmass * 10 + 1e20;
solarsystem['m'][0] = 1e6 * solarmass
solarsystem['x'][0] = 0
solarsystem['y'][0] = 0
solarsystem['z'][0] = 0
solarsystem['vx'][0] = 0
solarsystem['vy'][0] = 0
solarsystem['vz'][0] = 0
asteroids = {}
asteroids['x'] = npf.random.random(b)
asteroids['y'] = npf.random.random(b)
asteroids['z'] = npf.random.random(b) * .01
dist = (1.0 / npf.sqrt(asteroids['x'] ** 2 + asteroids['y'] ** 2 + asteroids['z'] ** 2)) - (
npf.random.random() * .2)
asteroids['x'] = x_max * asteroids['x'] * dist * npf.sign(.5 - npf.random.random(b))
asteroids['y'] = y_max * asteroids['y'] * dist * npf.sign(.5 - npf.random.random(b))
asteroids['z'] = z_max * asteroids['z'] * dist * npf.sign(.5 - npf.random.random(b))
magv = circlev(
asteroids['x'],
asteroids['y'],
asteroids['z']
)
absangle = npf.arctan(npf.absolute(asteroids['y'] / asteroids['x']))
thetav = npf.pi / 2 - absangle
asteroids['vx'] = -1 * npf.sign(asteroids['y']) * npf.cos(thetav) * magv
asteroids['vy'] = npf.sign(asteroids['x']) * npf.sin(thetav) * magv
asteroids['vz'] = npf.zeros(b)
asteroids['m'] = npf.random.random(b) * solarmass * 10 + 1e14;
ss = {}
for key in solarsystem:
ss[key] = np.array(solarsystem[key].astype(dtype))
a = {}
for key in asteroids:
a[key] = np.array(asteroids[key].astype(dtype))
return ss, a
def main():
nplanets, nbodies, timesteps = bench.args.size
x_max = 1e18 # Simulation constants
y_max = 1e18
z_max = 1e18
dt = 1e12
solarsystem, asteroids = random_system( # Setup galaxy
x_max,
y_max,
z_max,
nplanets,
nbodies,
bench.dtype
)
if bench.args.visualize: # Init visuals
plt, P3 = gfx_init(x_max, y_max, z_max)
bench.start() # Timer start
for timestep in range(0, timesteps): # Run simulation
if bench.args.visualize and timestep % 10 == 0: # With or without..
gfx_show(plt, P3, solarsystem, asteroids) # ..visuals
move(solarsystem, asteroids, dt)
bench.flush()
bench.stop() # Timer stop
bench.pprint() # Print results..
if bench.args.visualize: # Keep showing visuals
plt.show(block=True)
if __name__ == "__main__":
main()
Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
namespace nice
{
using NumCIL.Double;
using TArray = NumCIL.Double.NdArray;
using TData = System.Double;
public static class NiceSolverDouble
{
//private static readonly Utilities.Generator<TArray, TData> Generate = new Utilities.Generator<TArray, TData>();
private static TData ConvertValue(object o) { return (TData)o; }
//Gravity
private static TData G = (TData)6.673e-11;
// Solar mass
public const TData SOLARMASS = (TData)1.98892e30;
// Discrete Time units
public const TData DT = (TData)1e12;
private static void FillDiagonal(TArray a, TData val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Bodies a, Bodies b)
{
var dx = b.x - a.x[R.NewAxis, R.All].Transposed;
var dy = b.y - a.y[R.NewAxis, R.All].Transposed;
var dz = b.z - a.z[R.NewAxis, R.All].Transposed;
var pm = b.mass * a.mass[R.NewAxis, R.All].Transposed;
if (a == b)
{
FillDiagonal(dx, 1);
FillDiagonal(dy, 1);
FillDiagonal(dz, 1);
FillDiagonal(pm, 0);
}
var r = (dx.Pow(2) + dy.Pow(2) + dz.Pow(2)).Pow((TData)0.5);
//In the below calc of the the forces the force of a body upon itself
//becomes nan and thus destroys the data
var Fx = G * pm / r.Pow(2) * (dx / r);
var Fy = G * pm / r.Pow(2) * (dy / r);
var Fz = G * pm / r.Pow(2) * (dz / r);
//The diagonal nan numbers must be removed so that the force from a body
//upon itself is zero
if (a == b)
{
FillDiagonal(Fx, 0);
FillDiagonal(Fy, 0);
FillDiagonal(Fz, 0);
}
a.vx += Add.Reduce(Fx, 1) / a.mass * DT;
a.vy += Add.Reduce(Fy, 1) / a.mass * DT;
a.vz += Add.Reduce(Fz, 1) / a.mass * DT;
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy.SolarSystem, galaxy.SolarSystem);
CalcForce(galaxy.Asteroids, galaxy.SolarSystem);
galaxy.SolarSystem.x += galaxy.SolarSystem.vx * DT;
galaxy.SolarSystem.y += galaxy.SolarSystem.vy * DT;
galaxy.SolarSystem.z += galaxy.SolarSystem.vz * DT;
galaxy.Asteroids.x += galaxy.Asteroids.vx * DT;
galaxy.Asteroids.y += galaxy.Asteroids.vy * DT;
galaxy.Asteroids.z += galaxy.Asteroids.vz * DT;
}
public class Galaxy
{
public Bodies SolarSystem;
public Bodies Asteroids;
public Galaxy(long planets, long asteroids)
{
this.SolarSystem = new SolarSystem(planets);
this.Asteroids = new Asteroids(asteroids);
}
}
public abstract class Bodies
{
public const TData XMAX = (TData)1e18;
public const TData YMAX = (TData)1e18;
public const TData ZMAX = (TData)1e18;
public TArray mass;
public TArray x;
public TArray y;
public TArray z;
public TArray vx;
public TArray vy;
public TArray vz;
protected void Reset(long size)
{
this.x = Generate.Random(size);
this.y = Generate.Random(size);
this.z = Generate.Random(size) * (TData)0.01;
var dist = 1f / Sqrt.Apply((this.x.Pow(2f) + this.y.Pow(2f) + this.z.Pow(2f)));
dist = dist - (TData)(0.8f - (new Random().NextDouble() * 0.1f));
this.x = XMAX * this.x * dist * Sign.Apply(.5f - Generate.Random(size));
this.y = YMAX * this.y * dist * Sign.Apply(.5f - Generate.Random(size));
this.z = ZMAX * this.z * dist * Sign.Apply(.5f - Generate.Random(size));
var magv = Cirklev(this.x, this.y, this.z);
var absangle = Atan.Apply(Abs.Apply(this.y / this.x));
var thetav= (TData)(Math.PI/2) - absangle;
this.vx = (TData)(-1) * Sign.Apply(this.y) * Cos.Apply(thetav) * magv;
this.vy = Sign.Apply(this.x) * Sin.Apply(thetav) * magv;
this.vz = Generate.Zeroes(size);
}
private TArray Cirklev(TArray rx, TArray ry, TArray rz)
{
var r2 = Sqrt.Apply(rx * rx + ry * ry + rz * rz);
var numerator = (TData)((6.67e-11) * 1e6 * SOLARMASS);
return Sqrt.Apply(numerator / r2);
}
}
public class SolarSystem : Bodies
{
public SolarSystem(long size)
{
base.Reset(size);
this.mass = (Generate.Random(size) * (TData)(SOLARMASS * 10)) + (TData)1e20;
this.mass[0]= (TData)1e6 * SOLARMASS;
this.x[0]= 0;
this.y[0]= 0;
this.z[0]= 0;
this.vx[0]= 0;
this.vy[0]= 0;
this.vz[0]= 0;
}
}
public class Asteroids : Bodies
{
public Asteroids(long size)
{
base.Reset(size);
this.mass = (Generate.Random(size) * (TData)(SOLARMASS * 10)) + (TData)1e14;
}
}
public static Galaxy Create(long planets, long asteroids)
{
return new Galaxy(planets, asteroids);
}
public static void Solve(Galaxy galaxy, long steps, bool image_output)
{
if (image_output)
Render(galaxy, 0, steps);
for (long step = 0; step < steps; step++)
{
Move(galaxy);
if (image_output)
Render(galaxy, step + 1, steps);
}
}
private static void Render(Galaxy galaxy, long step, long steps)
{
var image_width = 1024;
var image_height = 1024;
var axz = galaxy.Asteroids.x;// / galaxy.Asteroids.z;
var ayz = galaxy.Asteroids.y;// / galaxy.Asteroids.z;
var sxz = galaxy.SolarSystem.x;// / galaxy.SolarSystem.z;
var syz = galaxy.SolarSystem.y;// / galaxy.SolarSystem.z;
Func<int, int, int, System.Drawing.Rectangle> inflateBox = (x, y, s) => new System.Drawing.Rectangle(new System.Drawing.Point(x - (s/2), y - (s/2)), new System.Drawing.Size(s, s));
var mass_min = Min.Reduce(galaxy.SolarSystem.mass).Value[0];
var mass_max = Max.Reduce(galaxy.SolarSystem.mass).Value[0];
var rangex_max = Math.Max(Max.Reduce(axz).Value[0], Max.Reduce(sxz).Value[0]);
var rangex_min = Math.Min(Min.Reduce(axz).Value[0], Min.Reduce(sxz).Value[0]);
var rangey_max = Math.Max(Max.Reduce(ayz).Value[0], Max.Reduce(syz).Value[0]);
var rangey_min = Math.Min(Min.Reduce(ayz).Value[0], Min.Reduce(syz).Value[0]);
var rangex_diff = rangex_max - rangex_min;
var rangey_diff = rangey_max - rangey_min;
rangex_min = -Bodies.XMAX * 1.2f;
rangey_min = -Bodies.YMAX * 1.2f;
rangex_diff = Bodies.XMAX * 2.4f;
rangey_diff = Bodies.YMAX * 2.4f;
var sx = NumCIL.Int32.Min.Apply(image_width - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((sxz - rangex_min) / rangex_diff) * (image_width - 1))))).AsArray();
var sy = NumCIL.Int32.Min.Apply(image_height - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((syz - rangey_min) / rangey_diff) * (image_height - 1))))).AsArray();
var ax = NumCIL.Int32.Min.Apply(image_width - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((axz - rangex_min) / rangex_diff) * (image_width - 1))))).AsArray();
var ay = NumCIL.Int32.Min.Apply(image_height - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((ayz - rangey_min) / rangey_diff) * (image_height - 1))))).AsArray();
using (var image = new System.Drawing.Bitmap(image_width, image_height))
{
using (var gc = System.Drawing.Graphics.FromImage(image))
gc.Clear(System.Drawing.Color.White);
using (var gc = System.Drawing.Graphics.FromImage(image))
{
for (var s = 0; s < ax.Length; s++)
gc.DrawEllipse(System.Drawing.Pens.DarkGray, inflateBox(ax[s], ay[s], 1));
for (var s = 0; s < sx.Length; s++)
gc.DrawEllipse(System.Drawing.Pens.Red, inflateBox(sx[s], sy[s], 2));
}
image.Save(string.Format("step-{0:0000}.png", step), System.Drawing.Imaging.ImageFormat.Png);
}
Console.WriteLine("Completed step {0} of {1}", step, steps);
}
}
}
namespace nice
{
using NumCIL.Float;
using TArray = NumCIL.Float.NdArray;
using TData = System.Single;
public static class NiceSolverSingle
{
//private static readonly Utilities.Generator<TArray, TData> Generate = new Utilities.Generator<TArray, TData>();
private static TData ConvertValue(object o) { return (TData)o; }
//Gravity
private static TData G = (TData)6.673e-11;
// Solar mass
public const TData SOLARMASS = (TData)1.98892e30;
// Discrete Time units
public const TData DT = (TData)1e12;
private static void FillDiagonal(TArray a, TData val)
{
long d = a.Shape.Dimensions[0].Length;
a.Reshape(new NumCIL.Shape(new long[] { d }, 0, new long[] { d+1 })).Set(val);
}
private static void CalcForce(Bodies a, Bodies b)
{
var dx = b.x - a.x[R.NewAxis, R.All].Transposed;
var dy = b.y - a.y[R.NewAxis, R.All].Transposed;
var dz = b.z - a.z[R.NewAxis, R.All].Transposed;
var pm = b.mass * a.mass[R.NewAxis, R.All].Transposed;
if (a == b)
{
FillDiagonal(dx, 1);
FillDiagonal(dy, 1);
FillDiagonal(dz, 1);
FillDiagonal(pm, 0);
}
var r = (dx.Pow(2) + dy.Pow(2) + dz.Pow(2)).Pow((TData)0.5);
//In the below calc of the the forces the force of a body upon itself
//becomes nan and thus destroys the data
var Fx = G * pm / r.Pow(2) * (dx / r);
var Fy = G * pm / r.Pow(2) * (dy / r);
var Fz = G * pm / r.Pow(2) * (dz / r);
//The diagonal nan numbers must be removed so that the force from a body
//upon itself is zero
if (a == b)
{
FillDiagonal(Fx, 0);
FillDiagonal(Fy, 0);
FillDiagonal(Fz, 0);
}
a.vx += Add.Reduce(Fx, 1) / a.mass * DT;
a.vy += Add.Reduce(Fy, 1) / a.mass * DT;
a.vz += Add.Reduce(Fz, 1) / a.mass * DT;
}
private static void Move(Galaxy galaxy)
{
CalcForce(galaxy.SolarSystem, galaxy.SolarSystem);
CalcForce(galaxy.Asteroids, galaxy.SolarSystem);
galaxy.SolarSystem.x += galaxy.SolarSystem.vx * DT;
galaxy.SolarSystem.y += galaxy.SolarSystem.vy * DT;
galaxy.SolarSystem.z += galaxy.SolarSystem.vz * DT;
galaxy.Asteroids.x += galaxy.Asteroids.vx * DT;
galaxy.Asteroids.y += galaxy.Asteroids.vy * DT;
galaxy.Asteroids.z += galaxy.Asteroids.vz * DT;
}
public class Galaxy
{
public Bodies SolarSystem;
public Bodies Asteroids;
public Galaxy(long planets, long asteroids)
{
this.SolarSystem = new SolarSystem(planets);
this.Asteroids = new Asteroids(asteroids);
}
}
public abstract class Bodies
{
public const TData XMAX = (TData)1e18;
public const TData YMAX = (TData)1e18;
public const TData ZMAX = (TData)1e18;
public TArray mass;
public TArray x;
public TArray y;
public TArray z;
public TArray vx;
public TArray vy;
public TArray vz;
protected void Reset(long size)
{
this.x = Generate.Random(size);
this.y = Generate.Random(size);
this.z = Generate.Random(size) * (TData)0.01;
var dist = 1f / Sqrt.Apply((this.x.Pow(2f) + this.y.Pow(2f) + this.z.Pow(2f)));
dist = dist - (TData)(0.8f - (new Random().NextDouble() * 0.1f));
this.x = XMAX * this.x * dist * Sign.Apply(.5f - Generate.Random(size));
this.y = YMAX * this.y * dist * Sign.Apply(.5f - Generate.Random(size));
this.z = ZMAX * this.z * dist * Sign.Apply(.5f - Generate.Random(size));
var magv = Cirklev(this.x, this.y, this.z);
var absangle = Atan.Apply(Abs.Apply(this.y / this.x));
var thetav= (TData)(Math.PI/2) - absangle;
this.vx = (TData)(-1) * Sign.Apply(this.y) * Cos.Apply(thetav) * magv;
this.vy = Sign.Apply(this.x) * Sin.Apply(thetav) * magv;
this.vz = Generate.Zeroes(size);
}
private TArray Cirklev(TArray rx, TArray ry, TArray rz)
{
var r2 = Sqrt.Apply(rx * rx + ry * ry + rz * rz);
var numerator = (TData)((6.67e-11) * 1e6 * SOLARMASS);
return Sqrt.Apply(numerator / r2);
}
}
public class SolarSystem : Bodies
{
public SolarSystem(long size)
{
base.Reset(size);
this.mass = (Generate.Random(size) * (TData)(SOLARMASS * 10)) + (TData)1e20;
this.mass[0]= (TData)1e6 * SOLARMASS;
this.x[0]= 0;
this.y[0]= 0;
this.z[0]= 0;
this.vx[0]= 0;
this.vy[0]= 0;
this.vz[0]= 0;
}
}
public class Asteroids : Bodies
{
public Asteroids(long size)
{
base.Reset(size);
this.mass = (Generate.Random(size) * (TData)(SOLARMASS * 10)) + (TData)1e14;
}
}
public static Galaxy Create(long planets, long asteroids)
{
return new Galaxy(planets, asteroids);
}
public static void Solve(Galaxy galaxy, long steps, bool image_output)
{
if (image_output)
Render(galaxy, 0, steps);
for (long step = 0; step < steps; step++)
{
Move(galaxy);
if (image_output)
Render(galaxy, step + 1, steps);
}
}
private static void Render(Galaxy galaxy, long step, long steps)
{
var image_width = 1024;
var image_height = 1024;
var axz = galaxy.Asteroids.x;// / galaxy.Asteroids.z;
var ayz = galaxy.Asteroids.y;// / galaxy.Asteroids.z;
var sxz = galaxy.SolarSystem.x;// / galaxy.SolarSystem.z;
var syz = galaxy.SolarSystem.y;// / galaxy.SolarSystem.z;
Func<int, int, int, System.Drawing.Rectangle> inflateBox = (x, y, s) => new System.Drawing.Rectangle(new System.Drawing.Point(x - (s/2), y - (s/2)), new System.Drawing.Size(s, s));
var mass_min = Min.Reduce(galaxy.SolarSystem.mass).Value[0];
var mass_max = Max.Reduce(galaxy.SolarSystem.mass).Value[0];
var rangex_max = Math.Max(Max.Reduce(axz).Value[0], Max.Reduce(sxz).Value[0]);
var rangex_min = Math.Min(Min.Reduce(axz).Value[0], Min.Reduce(sxz).Value[0]);
var rangey_max = Math.Max(Max.Reduce(ayz).Value[0], Max.Reduce(syz).Value[0]);
var rangey_min = Math.Min(Min.Reduce(ayz).Value[0], Min.Reduce(syz).Value[0]);
var rangex_diff = rangex_max - rangex_min;
var rangey_diff = rangey_max - rangey_min;
rangex_min = -Bodies.XMAX * 1.2f;
rangey_min = -Bodies.YMAX * 1.2f;
rangex_diff = Bodies.XMAX * 2.4f;
rangey_diff = Bodies.YMAX * 2.4f;
var sx = NumCIL.Int32.Min.Apply(image_width - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((sxz - rangex_min) / rangex_diff) * (image_width - 1))))).AsArray();
var sy = NumCIL.Int32.Min.Apply(image_height - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((syz - rangey_min) / rangey_diff) * (image_height - 1))))).AsArray();
var ax = NumCIL.Int32.Min.Apply(image_width - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((axz - rangex_min) / rangex_diff) * (image_width - 1))))).AsArray();
var ay = NumCIL.Int32.Min.Apply(image_height - 1, NumCIL.Int32.Max.Apply(0, ((NumCIL.Int32.NdArray)(((ayz - rangey_min) / rangey_diff) * (image_height - 1))))).AsArray();
using (var image = new System.Drawing.Bitmap(image_width, image_height))
{
using (var gc = System.Drawing.Graphics.FromImage(image))
gc.Clear(System.Drawing.Color.White);
using (var gc = System.Drawing.Graphics.FromImage(image))
{
for (var s = 0; s < ax.Length; s++)
gc.DrawEllipse(System.Drawing.Pens.DarkGray, inflateBox(ax[s], ay[s], 1));
for (var s = 0; s < sx.Length; s++)
gc.DrawEllipse(System.Drawing.Pens.Red, inflateBox(sx[s], sy[s], 2));
}
image.Save(string.Format("step-{0:0000}.png", step), System.Drawing.Imaging.ImageFormat.Png);
}
Console.WriteLine("Completed step {0} of {1}", step, steps);
}
}
}
Quasicrystal¶Python Numpy¶# -*- coding: utf-8 -*-
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Quasi Crystal simulation", "k*stripes*image_size*iterations")
def main():
k = bench.args.size[0] # number of plane waves
stripes = bench.args.size[1] # number of stripes per wave
N = bench.args.size[2] # image size in pixels
ite = bench.args.size[3] # iterations
phases = np.arange(0, 2 * np.pi, 2 * np.pi / ite)
image = np.empty((N, N), dtype=bench.dtype)
d = np.arange(-N / 2, N / 2, dtype=bench.dtype)
xv, yv = np.meshgrid(d, d)
theta = np.arctan2(yv, xv)
r = np.log(np.sqrt(xv * xv + yv * yv))
r[np.isinf(r) == True] = 0
tcos = theta * np.cos(np.arange(0, np.pi, np.pi / k))[:, np.newaxis, np.newaxis]
rsin = r * np.sin(np.arange(0, np.pi, np.pi / k))[:, np.newaxis, np.newaxis]
inner = (tcos - rsin) * stripes
cinner = np.cos(inner)
sinner = np.sin(inner)
bench.start()
for phase in phases:
image[:] = np.sum(cinner * np.cos(phase) - sinner * np.sin(phase), axis=0) + k
bench.flush()
bench.stop()
bench.pprint()
if __name__ == "__main__":
main()
Reactiondiffusion¶Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
namespace reactiondiffusion
{
using NumCIL.Double;
using TArray = NumCIL.Double.NdArray;
using TData = System.Double;
public static class ReactionDiffusionSolverDouble
{
private static TArray Laplacian(TArray Z, TData dx)
{
var Ztop = Z[R.R(0, -2), R.R(1, -1)];
var Zleft = Z[R.R(1, -1), R.R(0, -2)];
var Zbottom = Z[R.R(2, 0), R.R(1, -1)];
var Zright = Z[R.R(1, -1), R.R(2, 0)];
var Zcenter = Z[R.R(1, -1), R.R(1, -1)];
return (Ztop + Zleft + Zbottom + Zright - 4 * Zcenter) / (TData)Math.Pow(dx, 2);
}
public static void UpdateBondaries(TArray Z)
{
Z[R.El(0), R.All] = Z[R.El(1), R.All];
Z[R.El(-1), R.All] = Z[R.El(-2), R.All];
Z[R.All, R.El(0)] = Z[R.All, R.El(1)];
Z[R.All, R.El(-1)] = Z[R.All, R.El(-2)];
}
public class Data
{
public TArray U;
public TArray V;
public long Size;
}
public static Data Create(long size)
{
return new Data()
{
U = Generate.Random(new long[] { size, size }),
V = Generate.Random(new long[] { size, size }),
Size = size
};
}
public static TArray Solve(Data data, long iterations, bool image_output)
{
var a = (TData)2.8e-4;
var b = (TData)5e-3;
var tau = (TData)(0.1);
var k = (TData)(-0.005);
var dx = (TData)(2.0/data.Size);
var T = (TData)10.0; // Total time
var dt = (TData)(.9 * Math.Pow(dx,2) / 2); // time step
var n = (long)(T/dt);
var U = data.U;
var V = data.V;
for(var step = 0L; step < iterations; step++)
{
if (image_output)
Plot(data, step, iterations);
// We compute the Laplacian of u and v.
var deltaU = Laplacian(U, dx);
var deltaV = Laplacian(V, dx);
// We take the values of u and v inside the grid.
var Uc = U[R.R(1,-1),R.R(1,-1)];
var Vc = V[R.R(1,-1),R.R(1,-1)];
// We update the variables.
U[R.R(1,-1),R.R(1,-1)] = Uc + dt * (a * deltaU + Uc - Uc.Pow(3) - Vc + k);
V[R.R(1,-1),R.R(1,-1)] = Vc + dt * (b * deltaV + Uc - Vc) / tau;
// Neumann conditions: derivatives at the edges are null.
UpdateBondaries(U);
UpdateBondaries(V);
}
if (image_output)
Plot(data, iterations, iterations);
return U;
}
public static TData Sync(Data data)
{
return data.U.Value[0];
}
public static void Plot(Data data, long step, long steps)
{
var cm = Utilities.Render.ColorMap(System.Drawing.Color.Coral);
var nm = Utilities.Render.Normalize<TData>(-1, 1);
Utilities.Render.Plot<TData>(string.Format("step-{0:0000}.png", step), data.U, (x, y, v) => cm(nm(v)));
}
}
}
namespace reactiondiffusion
{
using NumCIL.Float;
using TArray = NumCIL.Float.NdArray;
using TData = System.Single;
public static class ReactionDiffusionSolverSingle
{
private static TArray Laplacian(TArray Z, TData dx)
{
var Ztop = Z[R.R(0, -2), R.R(1, -1)];
var Zleft = Z[R.R(1, -1), R.R(0, -2)];
var Zbottom = Z[R.R(2, 0), R.R(1, -1)];
var Zright = Z[R.R(1, -1), R.R(2, 0)];
var Zcenter = Z[R.R(1, -1), R.R(1, -1)];
return (Ztop + Zleft + Zbottom + Zright - 4 * Zcenter) / (TData)Math.Pow(dx, 2);
}
public static void UpdateBondaries(TArray Z)
{
Z[R.El(0), R.All] = Z[R.El(1), R.All];
Z[R.El(-1), R.All] = Z[R.El(-2), R.All];
Z[R.All, R.El(0)] = Z[R.All, R.El(1)];
Z[R.All, R.El(-1)] = Z[R.All, R.El(-2)];
}
public class Data
{
public TArray U;
public TArray V;
public long Size;
}
public static Data Create(long size)
{
return new Data()
{
U = Generate.Random(new long[] { size, size }),
V = Generate.Random(new long[] { size, size }),
Size = size
};
}
public static TArray Solve(Data data, long iterations, bool image_output)
{
var a = (TData)2.8e-4;
var b = (TData)5e-3;
var tau = (TData)(0.1);
var k = (TData)(-0.005);
var dx = (TData)(2.0/data.Size);
var T = (TData)10.0; // Total time
var dt = (TData)(.9 * Math.Pow(dx,2) / 2); // time step
var n = (long)(T/dt);
var U = data.U;
var V = data.V;
for(var step = 0L; step < iterations; step++)
{
if (image_output)
Plot(data, step, iterations);
// We compute the Laplacian of u and v.
var deltaU = Laplacian(U, dx);
var deltaV = Laplacian(V, dx);
// We take the values of u and v inside the grid.
var Uc = U[R.R(1,-1),R.R(1,-1)];
var Vc = V[R.R(1,-1),R.R(1,-1)];
// We update the variables.
U[R.R(1,-1),R.R(1,-1)] = Uc + dt * (a * deltaU + Uc - Uc.Pow(3) - Vc + k);
V[R.R(1,-1),R.R(1,-1)] = Vc + dt * (b * deltaV + Uc - Vc) / tau;
// Neumann conditions: derivatives at the edges are null.
UpdateBondaries(U);
UpdateBondaries(V);
}
if (image_output)
Plot(data, iterations, iterations);
return U;
}
public static TData Sync(Data data)
{
return data.U.Value[0];
}
public static void Plot(Data data, long step, long steps)
{
var cm = Utilities.Render.ColorMap(System.Drawing.Color.Coral);
var nm = Utilities.Render.Normalize<TData>(-1, 1);
Utilities.Render.Plot<TData>(string.Format("step-{0:0000}.png", step), data.U, (x, y, v) => cm(nm(v)));
}
}
}
Rosenbrock¶Python Numpy¶from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Rosenbrock", "size*iterations")
def rosen(x):
return np.sum(
100.0 * (x[1:] - x[:-1] ** 2.0) ** 2.0 + \
(1 - x[:-1]) ** 2.0
)
def main():
N, T = bench.args.size # Grab command-line arguments
dataset = np.arange(N, dtype=bench.dtype) / bench.dtype(N)
bench.start() # Sample wall-clock start
res = 0.0
for _ in range(0, T): # Do T trials of..
res += rosen(dataset) # ..executing rosenbrock.
bench.flush()
res /= T
bench.stop() # Sample wall-clock stop
bench.pprint() # Print elapsed wall-clock etc.
if __name__ == "__main__":
main()
C99 Seq¶#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <bp_util.h>
double rosenbrock(int nelements, double* x)
{
double sum = 0.0;
for(int i=0; i<nelements-2; ++i) {
sum += 100.0*pow((x[i+1] - pow(x[i], 2)), 2) + pow((1-x[i]), 2);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
const int trials = bp.args.sizes[1];
// Create the pseudo-data
double* dataset = (double*)malloc(sizeof(double)*nelements);
for(int i=0; i<nelements; ++i) {
dataset[i] = i/(double)nelements;
}
bp.timer_start(); // Start timer
double res = 0.0;
for(int i=0; i<trials; ++i) {
res += rosenbrock(nelements, dataset); // Run benchmark
}
res /= trials;
bp.timer_stop(); // Stop timer
bp.print("rosenbrock(c99_seq)"); // Print results..
if (bp.args.verbose) { // ..and value.
printf("Result = %f\n", res);
}
return 0;
}
Cpp11 Omp¶#include <iostream>
#include <iomanip>
#include <cmath>
#include <omp.h>
#include <bp_util.h>
using namespace std;
double rosenbrock(int nelements, double* x)
{
double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<nelements-2; ++i) {
sum += 100.0*pow((x[i+1] - pow(x[i], 2)), 2) + pow((1-x[i]), 2);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
const int trials = bp.args.sizes[1];
// Create the pseudo-data
double* dataset = (double*)malloc(sizeof(double)*nelements);
#pragma omp parallel for
for(int i=0; i<nelements; ++i) {
dataset[i] = i/(double)nelements;
}
bp.timer_start(); // Start timer
double res = 0.0;
for(int i=0; i<trials; ++i) {
res += rosenbrock(nelements, dataset); // Run benchmark
}
res /= trials;
bp.timer_stop(); // Stop timer
bp.print("rosenbrock(cpp11_omp)");
if (bp.args.verbose) { // ..and value.
cout << fixed << setprecision(11)
<< "Result = " << res << endl;
}
return 0;
}
Cpp11 Seq¶#include <iostream>
#include <iomanip>
#include <cmath>
#include <bp_util.h>
using namespace std;
double rosenbrock(int nelements, double* x)
{
double sum = 0.0;
for(int i=0; i<nelements-2; ++i) {
sum += 100.0*pow((x[i+1] - pow(x[i], 2)), 2) + pow((1-x[i]), 2);
}
return sum;
}
int main(int argc, char* argv[])
{
bp_util_type bp = bp_util_create(argc, argv, 2);// Grab arguments
if (bp.args.has_error) {
return 1;
}
const int nelements = bp.args.sizes[0];
const int trials = bp.args.sizes[1];
// Create the pseudo-data
double* dataset = (double*)malloc(sizeof(double)*nelements);
for(int i=0; i<nelements; ++i) {
dataset[i] = i/(double)nelements;
}
bp.timer_start(); // Start timer
double res = 0.0;
for(int i=0; i<trials; ++i) {
res += rosenbrock(nelements, dataset); // Run benchmark
}
res /= trials;
bp.timer_stop(); // Stop timer
bp.print("rosenbrock(cpp11_seq)");
if (bp.args.verbose) { // ..and value.
cout << fixed << setprecision(11)
<< "Result = " << res << endl;
}
return 0;
}
Shallow Water¶Python Numpy¶from __future__ import print_function
"""
Shallow Water
-------------
So what does this code example illustrate?
Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
"""
import numpy as np
from benchpress.benchmarks import util
bench = util.Benchmark("Shallow Water", "height*width*iterations")
g = 9.80665 # gravitational acceleration
def droplet(height, width, data_type=np.float32):
"""Generate grid of droplets"""
x = np.array(np.linspace(-1, 1, num=width, endpoint=True), dtype=data_type)
y = np.array(np.linspace(-1, 1, num=width, endpoint=True), dtype=data_type)
(xx, yy) = np.meshgrid(x, y)
droplet = height * np.exp(-5 * (xx ** 2 + yy ** 2))
return droplet
def model(height, width, dtype=np.float32):
assert height >= 16
assert width >= 16
m = np.ones((height, width), dtype=dtype)
D = droplet(8, 8) # simulate a water drop
droploc = height // 4
(dropx, dropy) = D.shape
m[droploc:droploc + dropx, droploc:droploc + dropy] += D
droploc = height // 2
(dropx, dropy) = D.shape
m[droploc:droploc + dropx, droploc:droploc + dropy] += D
return {"H": m, "U": np.zeros_like(m), "V": np.zeros_like(m)}
def simulate(state, timesteps):
# FLOP count: i*(12*s + 4*s**2 + 14*s**2 + 9*s**2 + 4*s**2 + 9*s**2 + 14*s**2 + 6*s**2 + 19*s**2 + 19*s**2)
# where s is size and i is iterations
def loop_body(H, U, V, dt=0.02, dx=1.0, dy=1.0):
# Reflecting boundary conditions
H[:, 0] = H[:, 1];
U[:, 0] = U[:, 1];
V[:, 0] = -V[:, 1]
H[:, -1] = H[:, -2];
U[:, -1] = U[:, -2];
V[:, -1] = -V[:, -2]
H[0, :] = H[1, :];
U[0, :] = -U[1, :];
V[0, :] = V[1, :]
H[-1, :] = H[-2, :];
U[-1, :] = -U[-2, :];
V[-1, :] = V[-2, :]
# First half step
# height
Hx = (H[1:, 1:-1] + H[:-1, 1:-1]) / 2 - dt / (2 * dx) * (U[1:, 1:-1] - U[:-1, 1:-1])
# x momentum
Ux = (U[1:, 1:-1] + U[:-1, 1:-1]) / 2 - \
dt / (2 * dx) * ((U[1:, 1:-1] ** 2 / H[1:, 1:-1] + g / 2 * H[1:, 1:-1] ** 2) -
(U[:-1, 1:-1] ** 2 / H[:-1, 1:-1] + g / 2 * H[:-1, 1:-1] ** 2))
# y momentum
Vx = (V[1:, 1:-1] + V[:-1, 1:-1]) / 2 - \
dt / (2 * dx) * ((U[1:, 1:-1] * V[1:, 1:-1] / H[1:, 1:-1]) -
(U[:-1, 1:-1] * V[:-1, 1:-1] / H[:-1, 1:-1]))
# height
Hy = (H[1:-1, 1:] + H[1:-1, :-1]) / 2 - dt / (2 * dy) * (V[1:-1, 1:] - V[1:-1, :-1])
# x momentum
Uy = (U[1:-1, 1:] + U[1:-1, :-1]) / 2 - \
dt / (2 * dy) * ((V[1:-1, 1:] * U[1:-1, 1:] / H[1:-1, 1:]) -
(V[1:-1, :-1] * U[1:-1, :-1] / H[1:-1, :-1]))
# y momentum
Vy = (V[1:-1, 1:] + V[1:-1, :-1]) / 2 - \
dt / (2 * dy) * ((V[1:-1, 1:] ** 2 / H[1:-1, 1:] + g / 2 * H[1:-1, 1:] ** 2) -
(V[1:-1, :-1] ** 2 / H[1:-1, :-1] + g / 2 * H[1:-1, :-1] ** 2))
# Second half step
# height
H[1:-1, 1:-1] -= (dt / dx) * (Ux[1:, :] - Ux[:-1, :]) + (dt / dy) * (Vy[:, 1:] - Vy[:, :-1])
# x momentum
U[1:-1, 1:-1] -= (dt / dx) * ((Ux[1:, :] ** 2 / Hx[1:, :] + g / 2 * Hx[1:, :] ** 2) -
(Ux[:-1, :] ** 2 / Hx[:-1, :] + g / 2 * Hx[:-1, :] ** 2)) + \
(dt / dy) * ((Vy[:, 1:] * Uy[:, 1:] / Hy[:, 1:]) -
(Vy[:, :-1] * Uy[:, :-1] / Hy[:, :-1]))
# y momentum
V[1:-1, 1:-1] -= (dt / dx) * ((Ux[1:, :] * Vx[1:, :] / Hx[1:, :]) -
(Ux[:-1, :] * Vx[:-1, :] / Hx[:-1, :])) + \
(dt / dy) * ((Vy[:, 1:] ** 2 / Hy[:, 1:] + g / 2 * Hy[:, 1:] ** 2) -
(Vy[:, :-1] ** 2 / Hy[:, :-1] + g / 2 * Hy[:, :-1] ** 2))
bench.plot_surface(H, "3d", 0, 0, 5.5)
bench.do_while(loop_body, timesteps, state['H'], state['U'], state['V'])
def main():
H = bench.args.size[0]
W = bench.args.size[1]
I = bench.args.size[2]
state = bench.load_data()
if state is None:
state = model(H, W, dtype=bench.dtype)
bench.start()
simulate(state, I)
bench.stop()
bench.save_data(state)
bench.pprint()
if __name__ == "__main__":
main()
C99 Seq¶// Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
// Saved images may be converted into an animated gif with:
// convert -delay 20 -loop 0 swater*.png swater.gif
#include <math.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <bp_util.h>
#define iH(y,x) (H[(x)+(y)*(n+2)])
#define iU(y,x) (U[(x)+(y)*(n+2)])
#define iV(y,x) (V[(x)+(y)*(n+2)])
#define iHx(y,x) (Hx[(x)+(y)*(n+1)])
#define iUx(y,x) (Ux[(x)+(y)*(n+1)])
#define iVx(y,x) (Vx[(x)+(y)*(n+1)])
#define iHy(y,x) (Hy[(x)+(y)*(n+1)])
#define iUy(y,x) (Uy[(x)+(y)*(n+1)])
#define iVy(y,x) (Vy[(x)+(y)*(n+1)])
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
const int n = bp.args.sizes[0];
const int other_n = bp.args.sizes[1];
const int T = bp.args.sizes[2];
if (n != other_n) {
fprintf(stderr, "Implementation only supports quadratic grid.\n");
exit(0);
}
const double g = 9.8; // gravitational constant
const double dt = 0.02; // hardwired timestep
const double dx = 1.0;
const double dy = 1.0;
const int droploc = n/4;
double *H = (double*) malloc((n+2)*(n+2)*sizeof(double));
double *U = (double*) malloc((n+2)*(n+2)*sizeof(double));
double *V = (double*) malloc((n+2)*(n+2)*sizeof(double));
double *Hx = (double*) malloc((n+1)*(n+1)*sizeof(double));
double *Ux = (double*) malloc((n+1)*(n+1)*sizeof(double));
double *Vx = (double*) malloc((n+1)*(n+1)*sizeof(double));
double *Hy = (double*) malloc((n+1)*(n+1)*sizeof(double));
double *Uy = (double*) malloc((n+1)*(n+1)*sizeof(double));
double *Vy = (double*) malloc((n+1)*(n+1)*sizeof(double));
for(int i=0; i<n+2; i++)
for(int j=0; j<n+2; j++)
{
iH(i,j)=0.1;
iU(i,j)=0.1;
iV(i,j)=0.1;
}
for(int i=0; i<n+1; i++)
for(int j=0; j<n+1; j++)
{
iHx(i,j)=0.1;
iUx(i,j)=0.1;
iVx(i,j)=0.1;
iHy(i,j)=0.1;
iUy(i,j)=0.1;
iVy(i,j)=0.1;
}
iH(droploc,droploc) += 5.0;
bp.timer_start();
for(int iter=0; iter < T; iter++)
{
// Reflecting boundary conditions
for(int i=0; i<n+2; i++)
{
iH(i,0) = iH(i,1) ; iU(i,0) = iU(i,1) ; iV(i,0) = -iV(i,1);
iH(i,n+1) = iH(i,n) ; iU(i,n+1) = iU(i,n) ; iV(i,n+1) = -iV(i,n);
iH(0,i) = iH(1,i) ; iU(0,i) = -iU(1,i) ; iV(0,i) = iV(1,i);
iH(n+1,i) = iH(n,i) ; iU(n+1,i) = -iU(n,i) ; iV(n+1,i) = iV(n,i);
}
//
// First half step
//
for(int i=0; i<n+1; i++)
{
for(int j=0; j<n; j++)
{
// height
iHx(i,j) = (iH(i+1,j+1)+iH(i,j+1))/2 - dt/(2*dx)*(iU(i+1,j+1)-iU(i,j+1));
// x momentum
iUx(i,j) = (iU(i+1,j+1)+iU(i,j+1))/2 - \
dt/(2*dx)*((pow(iU(i+1,j+1),2)/iH(i+1,j+1) + \
g/2*pow(iH(i+1,j+1),2)) - \
(pow(iU(i,j+1),2)/iH(i,j+1) + \
g/2*pow(iH(i,j+1),2)));
// y momentum
iVx(i,j) = (iV(i+1,j+1)+iV(i,j+1))/2 - \
dt/(2*dx)*((iU(i+1,j+1) * \
iV(i+1,j+1)/iH(i+1,j+1)) - \
(iU(i,j+1) * \
iV(i,j+1)/iH(i,j+1)));
}
}
for(int i=0; i<n; i++)
{
for(int j=0; j<n+1; j++)
{
//height
iHy(i,j) = (iH(i+1,j+1)+iH(i+1,j))/2 - dt/(2*dy)*(iV(i+1,j+1)-iV(i+1,j));
//x momentum
iUy(i,j) = (iU(i+1,j+1)+iU(i+1,j))/2 - \
dt/(2*dy)*((iV(i+1,j+1) * \
iU(i+1,j+1)/iH(i+1,j+1)) - \
(iV(i+1,j) * \
iU(i+1,j)/iH(i+1,j)));
//y momentum
iVy(i,j) = (iV(i+1,j+1)+iV(i+1,j))/2 - \
dt/(2*dy)*((pow(iV(i+1,j+1),2)/iH(i+1,j+1) + \
g/2*pow(iH(i+1,j+1),2)) - \
(pow(iV(i+1,j),2)/iH(i+1,j) + \
g/2*pow(iH(i+1,j),2)));
}
}
//
// Second half step
//
for(int i=1; i<n+1; i++)
{
for(int j=1; j<n+1; j++)
{
//height
iH(i,j) -= (dt/dx)*(iUx(i,j-1)-iUx(i-1,j-1)) - (dt/dy)*(iVy(i-1,j)-iVy(i-1,j-1));
// x momentum
iU(i,j) -= (dt/dx)*((pow(iUx(i,j-1),2)/iHx(i,j-1) + g/2*pow(iHx(i,j-1),2)) - \
(pow(iUx(i-1,j-1),2)/iHx(i-1,j-1) + g/2*pow(iHx(i-1,j-1),2))) - \
(dt/dy)*((iVy(i-1,j) * iUy(i-1,j)/iHy(i-1,j)) -
(iVy(i-1,j-1) * iUy(i-1,j-1)/iHy(i-1,j-1)));
// y momentum - score
iV(i,j) -= (dt/dx)*((iUx(i,j-1) * iVx(i,j-1)/iHx(i,j-1)) - \
(iUx(i-1,j-1)*iVx(i-1,j-1)/iHx(i-1,j-1))) - \
(dt/dy)*((pow(iVy(i-1,j),2)/iHy(i-1,j) + g/2*pow(iHy(i-1,j),2)) - \
(pow(iVy(i-1,j-1),2)/iHy(i-1,j-1) + g/2*pow(iHy(i-1,j-1),2)));
}
}
}
/* NumPy implementation does not do this!
// res = numpy.add.reduce(numpy.add.reduce(H / n))
double res = 0.0;
for(int i=0;i<n+2;i++)
for(int j=0;j<n+2;j++)
res+=iH(i,j)/n;
*/
bp.timer_stop();
bp.print("shallow_water(c99_seq)");
}
Cpp11 Boost¶// Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
// Saved images may be converted into an animated gif with:
// convert -delay 20 -loop 0 swater*.png swater.gif
#include <math.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <bp_util.h>
#include <boost/multi_array.hpp>
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
const int n = bp.args.sizes[0];
const int other_n = bp.args.sizes[1];
const int T = bp.args.sizes[2];
if (n != other_n) {
fprintf(stderr, "Implementation only supports quadratic grid.\n");
exit(0);
}
const double g = 9.8; // gravitational constant
const double dt = 0.02; // hardwired timestep
const double dx = 1.0;
const double dy = 1.0;
const int droploc = n/4;
typedef boost::multi_array<double, 2> array_type;
array_type H(boost::extents[n+2][n+2]);
array_type U(boost::extents[n+2][n+2]);
array_type V(boost::extents[n+2][n+2]);
array_type Hx(boost::extents[n+1][n+1]);
array_type Ux(boost::extents[n+1][n+1]);
array_type Vx(boost::extents[n+1][n+1]);
array_type Hy(boost::extents[n+1][n+1]);
array_type Uy(boost::extents[n+1][n+1]);
array_type Vy(boost::extents[n+1][n+1]);
for(int i=0; i<n+2; i++)
for(int j=0; j<n+2; j++)
{
H[i][j]=0.1;
U[i][j]=0.1;
V[i][j]=0.1;
}
for(int i=0; i<n+1; i++)
for(int j=0; j<n+1; j++)
{
Hx[i][j]=0.1;
Ux[i][j]=0.1;
Vx[i][j]=0.1;
Hy[i][j]=0.1;
Uy[i][j]=0.1;
Vy[i][j]=0.1;
}
H[droploc][droploc] += 5.0;
bp.timer_start();
for(int iter=0; iter < T; iter++)
{
// Reflecting boundary conditions
for(int i=0; i<n+2; i++)
{
H[i][0] = H[i][1] ; U[i][0] = U[i][1] ; V[i][0] = -V[i][1];
H[i][n+1] = H[i][n] ; U[i][n+1] = U[i][n] ; V[i][n+1] = -V[i][n];
H[0][i] = H[1][i] ; U[0][i] = -U[1][i] ; V[0][i] = V[1][i];
H[n+1][i] = H[n][i] ; U[n+1][i] = -U[n][i] ; V[n+1][i] = V[n][i];
}
//
// First half step
//
for(int i=0; i<n+1; i++)
{
for(int j=0; j<n; j++)
{
// height
Hx[i][j] = (H[i+1][j+1]+H[i][j+1])/2 - dt/(2*dx)*(U[i+1][j+1]-U[i][j+1]);
// x momentum
Ux[i][j] = (U[i+1][j+1]+U[i][j+1])/2 - \
dt/(2*dx)*((pow(U[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(U[i][j+1],2)/H[i][j+1] + \
g/2*pow(H[i][j+1],2)));
// y momentum
Vx[i][j] = (V[i+1][j+1]+V[i][j+1])/2 - \
dt/(2*dx)*((U[i+1][j+1] * \
V[i+1][j+1]/H[i+1][j+1]) - \
(U[i][j+1] * \
V[i][j+1]/H[i][j+1]));
}
}
for(int i=0; i<n; i++)
{
for(int j=0; j<n+1; j++)
{
//height
Hy[i][j] = (H[i+1][j+1]+H[i+1][j])/2 - dt/(2*dy)*(V[i+1][j+1]-V[i+1][j]);
//x momentum
Uy[i][j] = (U[i+1][j+1]+U[i+1][j])/2 - \
dt/(2*dy)*((V[i+1][j+1] * \
U[i+1][j+1]/H[i+1][j+1]) - \
(V[i+1][j] * \
U[i+1][j]/H[i+1][j]));
//y momentum
Vy[i][j] = (V[i+1][j+1]+V[i+1][j])/2 - \
dt/(2*dy)*((pow(V[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(V[i+1][j],2)/H[i+1][j] + \
g/2*pow(H[i+1][j],2)));
}
}
//
// Second half step
//
for(int i=1; i<n+1; i++)
{
for(int j=1; j<n+1; j++)
{
//height
H[i][j] -= (dt/dx)*(Ux[i][j-1]-Ux[i-1][j-1]) - (dt/dy)*(Vy[i-1][j]-Vy[i-1][j-1]);
// x momentum
U[i][j] -= (dt/dx)*((pow(Ux[i][j-1],2)/Hx[i][j-1] + g/2*pow(Hx[i][j-1],2)) - \
(pow(Ux[i-1][j-1],2)/Hx[i-1][j-1] + g/2*pow(Hx[i-1][j-1],2))) - \
(dt/dy)*((Vy[i-1][j] * Uy[i-1][j]/Hy[i-1][j]) -
(Vy[i-1][j-1] * Uy[i-1][j-1]/Hy[i-1][j-1]));
// y momentum - score
V[i][j] -= (dt/dx)*((Ux[i][j-1] * Vx[i][j-1]/Hx[i][j-1]) - \
(Ux[i-1][j-1]*Vx[i-1][j-1]/Hx[i-1][j-1])) - \
(dt/dy)*((pow(Vy[i-1][j],2)/Hy[i-1][j] + g/2*pow(Hy[i-1][j],2)) - \
(pow(Vy[i-1][j-1],2)/Hy[i-1][j-1] + g/2*pow(Hy[i-1][j-1],2)));
}
}
}
// res = numpy.add.reduce(numpy.add.reduce(H / n))
double res = 0.0;
for(int i=0;i<n+2;i++)
for(int j=0;j<n+2;j++)
res+=H[i][j]/n;
bp.timer_stop();
bp.print("shallow_water(cpp11_boost)");
}
Cpp11 Omp¶// Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
// Saved images may be converted into an animated gif with:
// convert -delay 20 -loop 0 swater*.png swater.gif
#include <math.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <bp_util.h>
#define WIDTH 5000
#define HEIGHT 5000
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
const int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int iterations = bp.args.sizes[2];
if (width != WIDTH) {
fprintf(stderr, "Unsupported size, size must be %dx%d\n", WIDTH, HEIGHT);
exit(0);
}
if (height != WIDTH) {
fprintf(stderr, "Unsupported size, size must be %dx%d\n", WIDTH, HEIGHT);
exit(0);
}
if (width != height) {
fprintf(stderr, "Implementation only supports quadratic grid.\n");
exit(0);
}
const double g = 9.8; // gravitational constant
const double dt = 0.02; // hardwired timestep
const double dx = 1.0;
const double dy = 1.0;
const int droploc = WIDTH/4;
auto H = new double[WIDTH+2][WIDTH+2];
auto U = new double[WIDTH+2][WIDTH+2];
auto V = new double[WIDTH+2][WIDTH+2];
auto Hx = new double[WIDTH+1][WIDTH+1];
auto Ux = new double[WIDTH+1][WIDTH+1];
auto Vx = new double[WIDTH+1][WIDTH+1];
auto Hy = new double[WIDTH+1][WIDTH+1];
auto Uy = new double[WIDTH+1][WIDTH+1];
auto Vy = new double[WIDTH+1][WIDTH+1];
#pragma omp parallel for collapse(2)
for(int i=0; i<HEIGHT+2; i++) {
for(int j=0; j<WIDTH+2; j++) {
H[i][j]=0.1;
U[i][j]=0.1;
V[i][j]=0.1;
}
}
#pragma omp parallel for collapse(2)
for(int i=0; i<HEIGHT+1; i++) {
for(int j=0; j<WIDTH+1; j++) {
Hx[i][j]=0.1;
Ux[i][j]=0.1;
Vx[i][j]=0.1;
Hy[i][j]=0.1;
Uy[i][j]=0.1;
Vy[i][j]=0.1;
}
}
H[droploc][droploc] += 5.0;
bp.timer_start();
for(int iter=0; iter < iterations; iter++) {
#pragma omp parallel for
for(int i=0; i<WIDTH+2; i++) { // Reflecting boundary conditions
H[i][0] = H[i][1];
U[i][0] = U[i][1];
V[i][0] = -V[i][1];
H[i][WIDTH+1] = H[i][WIDTH];
U[i][WIDTH+1] = U[i][WIDTH];
V[i][WIDTH+1] = -V[i][WIDTH];
H[0][i] = H[1][i];
U[0][i] = -U[1][i];
V[0][i] = V[1][i];
H[HEIGHT+1][i] = H[HEIGHT][i];
U[HEIGHT+1][i] = -U[HEIGHT][i];
V[HEIGHT+1][i] = V[HEIGHT][i];
}
//
// First half step
//
#pragma omp parallel for collapse(2)
for(int i=0; i<HEIGHT+1; i++) {
for(int j=0; j<WIDTH; j++) {
// height
Hx[i][j] = (H[i+1][j+1]+H[i][j+1])/2 - dt/(2*dx)*(U[i+1][j+1]-U[i][j+1]);
// x momentum
Ux[i][j] = (U[i+1][j+1]+U[i][j+1])/2 - \
dt/(2*dx)*((pow(U[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(U[i][j+1],2)/H[i][j+1] + \
g/2*pow(H[i][j+1],2)));
// y momentum
Vx[i][j] = (V[i+1][j+1]+V[i][j+1])/2 - \
dt/(2*dx)*((U[i+1][j+1] * \
V[i+1][j+1]/H[i+1][j+1]) - \
(U[i][j+1] * \
V[i][j+1]/H[i][j+1]));
}
}
#pragma omp parallel for collapse(2)
for(int i=0; i<HEIGHT; i++) {
for(int j=0; j<WIDTH+1; j++) {
//height
Hy[i][j] = (H[i+1][j+1]+H[i+1][j])/2 - dt/(2*dy)*(V[i+1][j+1]-V[i+1][j]);
//x momentum
Uy[i][j] = (U[i+1][j+1]+U[i+1][j])/2 - \
dt/(2*dy)*((V[i+1][j+1] * \
U[i+1][j+1]/H[i+1][j+1]) - \
(V[i+1][j] * \
U[i+1][j]/H[i+1][j]));
//y momentum
Vy[i][j] = (V[i+1][j+1]+V[i+1][j])/2 - \
dt/(2*dy)*((pow(V[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(V[i+1][j],2)/H[i+1][j] + \
g/2*pow(H[i+1][j],2)));
}
}
//
// Second half step
//
#pragma omp parallel for collapse(2)
for(int i=1; i<HEIGHT+1; i++) {
for(int j=1; j<WIDTH+1; j++) {
//height
H[i][j] -= (dt/dx)*(Ux[i][j-1]-Ux[i-1][j-1]) - (dt/dy)*(Vy[i-1][j]-Vy[i-1][j-1]);
// x momentum
U[i][j] -= (dt/dx)*((pow(Ux[i][j-1],2)/Hx[i][j-1] + g/2*pow(Hx[i][j-1],2)) - \
(pow(Ux[i-1][j-1],2)/Hx[i-1][j-1] + g/2*pow(Hx[i-1][j-1],2))) - \
(dt/dy)*((Vy[i-1][j] * Uy[i-1][j]/Hy[i-1][j]) -
(Vy[i-1][j-1] * Uy[i-1][j-1]/Hy[i-1][j-1]));
// y momentum - score
V[i][j] -= (dt/dx)*((Ux[i][j-1] * Vx[i][j-1]/Hx[i][j-1]) - \
(Ux[i-1][j-1]*Vx[i-1][j-1]/Hx[i-1][j-1])) - \
(dt/dy)*((pow(Vy[i-1][j],2)/Hy[i-1][j] + g/2*pow(Hy[i-1][j],2)) - \
(pow(Vy[i-1][j-1],2)/Hy[i-1][j-1] + g/2*pow(Hy[i-1][j-1],2)));
}
}
}
/* NumPy implementation does not do this!?
// res = numpy.add.reduce(numpy.add.reduce(H / n))
double res = 0.0;
#pragma omp parallel for collapse(2) reduction(+:res)
for(int i=0;i<HEIGHT+2;i++) {
for(int j=0;j<WIDTH+2;j++) {
res+=H[i][j]/WIDTH;
}
}
*/
bp.timer_stop();
bp.print("shallow_water(cpp11_omp)");
}
Cpp11 Seq¶// Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
// Saved images may be converted into an animated gif with:
// convert -delay 20 -loop 0 swater*.png swater.gif
#include <math.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <bp_util.h>
#define WIDTH 5000
#define HEIGHT 5000
int main (int argc, char **argv)
{
bp_util_type bp = bp_util_create(argc, argv, 3);
if (bp.args.has_error) {
return 1;
}
const int height = bp.args.sizes[0];
const int width = bp.args.sizes[1];
const int T = bp.args.sizes[2];
if (width != WIDTH) {
fprintf(stderr, "Unsupported size, size must be %dx%d\n", WIDTH, HEIGHT);
exit(0);
}
if (height != WIDTH) {
fprintf(stderr, "Unsupported size, size must be %dx%d\n", WIDTH, HEIGHT);
exit(0);
}
if (width != height) {
fprintf(stderr, "Implementation only supports quadratic grid.\n");
exit(0);
}
const double g = 9.8; // gravitational constant
const double dt = 0.02; // hardwired timestep
const double dx = 1.0;
const double dy = 1.0;
const int droploc = WIDTH/4;
auto H = new double[WIDTH+2][WIDTH+2];
auto U = new double[WIDTH+2][WIDTH+2];
auto V = new double[WIDTH+2][WIDTH+2];
auto Hx = new double[WIDTH+1][WIDTH+1];
auto Ux = new double[WIDTH+1][WIDTH+1];
auto Vx = new double[WIDTH+1][WIDTH+1];
auto Hy = new double[WIDTH+1][WIDTH+1];
auto Uy = new double[WIDTH+1][WIDTH+1];
auto Vy = new double[WIDTH+1][WIDTH+1];
for(int i=0; i<HEIGHT+2; i++) {
for(int j=0; j<WIDTH+2; j++) {
H[i][j]=0.1;
U[i][j]=0.1;
V[i][j]=0.1;
}
}
for(int i=0; i<HEIGHT+1; i++) {
for(int j=0; j<WIDTH+1; j++) {
Hx[i][j]=0.1;
Ux[i][j]=0.1;
Vx[i][j]=0.1;
Hy[i][j]=0.1;
Uy[i][j]=0.1;
Vy[i][j]=0.1;
}
}
H[droploc][droploc] += 5.0;
bp.timer_start();
for(int iter=0; iter < T; iter++) {
for(int i=0; i<WIDTH+2; i++) { // Reflecting boundary conditions
H[i][0] = H[i][1];
U[i][0] = U[i][1];
V[i][0] = -V[i][1];
H[i][WIDTH+1] = H[i][WIDTH];
U[i][WIDTH+1] = U[i][WIDTH];
V[i][WIDTH+1] = -V[i][WIDTH];
H[0][i] = H[1][i];
U[0][i] = -U[1][i];
V[0][i] = V[1][i];
H[HEIGHT+1][i] = H[HEIGHT][i];
U[HEIGHT+1][i] = -U[HEIGHT][i];
V[HEIGHT+1][i] = V[HEIGHT][i];
}
//
// First half step
//
for(int i=0; i<HEIGHT+1; i++) {
for(int j=0; j<WIDTH; j++) {
// height
Hx[i][j] = (H[i+1][j+1]+H[i][j+1])/2 - dt/(2*dx)*(U[i+1][j+1]-U[i][j+1]);
// x momentum
Ux[i][j] = (U[i+1][j+1]+U[i][j+1])/2 - \
dt/(2*dx)*((pow(U[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(U[i][j+1],2)/H[i][j+1] + \
g/2*pow(H[i][j+1],2)));
// y momentum
Vx[i][j] = (V[i+1][j+1]+V[i][j+1])/2 - \
dt/(2*dx)*((U[i+1][j+1] * \
V[i+1][j+1]/H[i+1][j+1]) - \
(U[i][j+1] * \
V[i][j+1]/H[i][j+1]));
}
}
for(int i=0; i<HEIGHT; i++) {
for(int j=0; j<WIDTH+1; j++) {
//height
Hy[i][j] = (H[i+1][j+1]+H[i+1][j])/2 - dt/(2*dy)*(V[i+1][j+1]-V[i+1][j]);
//x momentum
Uy[i][j] = (U[i+1][j+1]+U[i+1][j])/2 - \
dt/(2*dy)*((V[i+1][j+1] * \
U[i+1][j+1]/H[i+1][j+1]) - \
(V[i+1][j] * \
U[i+1][j]/H[i+1][j]));
//y momentum
Vy[i][j] = (V[i+1][j+1]+V[i+1][j])/2 - \
dt/(2*dy)*((pow(V[i+1][j+1],2)/H[i+1][j+1] + \
g/2*pow(H[i+1][j+1],2)) - \
(pow(V[i+1][j],2)/H[i+1][j] + \
g/2*pow(H[i+1][j],2)));
}
}
//
// Second half step
//
for(int i=1; i<HEIGHT+1; i++) {
for(int j=1; j<WIDTH+1; j++) {
//height
H[i][j] -= (dt/dx)*(Ux[i][j-1]-Ux[i-1][j-1]) - (dt/dy)*(Vy[i-1][j]-Vy[i-1][j-1]);
// x momentum
U[i][j] -= (dt/dx)*((pow(Ux[i][j-1],2)/Hx[i][j-1] + g/2*pow(Hx[i][j-1],2)) - \
(pow(Ux[i-1][j-1],2)/Hx[i-1][j-1] + g/2*pow(Hx[i-1][j-1],2))) - \
(dt/dy)*((Vy[i-1][j] * Uy[i-1][j]/Hy[i-1][j]) -
(Vy[i-1][j-1] * Uy[i-1][j-1]/Hy[i-1][j-1]));
// y momentum - score
V[i][j] -= (dt/dx)*((Ux[i][j-1] * Vx[i][j-1]/Hx[i][j-1]) - \
(Ux[i-1][j-1]*Vx[i-1][j-1]/Hx[i-1][j-1])) - \
(dt/dy)*((pow(Vy[i-1][j],2)/Hy[i-1][j] + g/2*pow(Hy[i-1][j],2)) - \
(pow(Vy[i-1][j-1],2)/Hy[i-1][j-1] + g/2*pow(Hy[i-1][j-1],2)));
}
}
}
/* NumPy implementation does not do this!?
// res = numpy.add.reduce(numpy.add.reduce(H / n))
double res = 0.0;
for(int i=0;i<HEIGHT+2;i++) {
for(int j=0;j<WIDTH+2;j++) {
res+=H[i][j]/WIDTH;
}
}
*/
bp.timer_stop();
bp.print("shallow_water(cpp11_seq)");
}
Csharp Numcil¶#region Copyright
/*
This file is part of Bohrium and copyright (c) 2012 the Bohrium:
team <http://www.bh107.org>.
Bohrium is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.
Bohrium is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the
GNU Lesser General Public License along with Bohrium.
If not, see <http://www.gnu.org/licenses/>.
*/
#endregion
//Adapted from: http://people.sc.fsu.edu/~jburkardt/m_src/shallow_water_2d/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using R = NumCIL.Range;
using NumCIL;
namespace shallow_water
{
public static class ShallowWaterSolver
{
/// <summary>
/// NumCIL equivalent to [:]
/// </summary>
public static readonly R ALL = R.All;
/// <summary>
/// NumCIL equivalent to [1:-1]
/// </summary>
public static readonly R INNER = R.Slice(1, -1);
/// <summary>
/// NumCIL equivalent to [0]
/// </summary>
public static readonly R FIRST = R.El(0);
/// <summary>
/// NumCIL equivalent to [1]
/// </summary>
public static readonly R SECOND = R.El(1);
/// <summary>
/// NumCIL equivalent to [-1]
/// </summary>
public static readonly R LAST = R.El(-1);
/// <summary>
/// NumCIL equivalent to [:-1]
/// </summary>
public static readonly R ZM1 = R.Slice(0, -1);
/// <summary>
/// NumCIL equivalent to [1:]
/// </summary>
public static readonly R SKIP1 = R.Slice(1, 0);
public class DataDouble
{
public readonly long N;
public readonly NumCIL.Double.NdArray H;
public readonly NumCIL.Double.NdArray U;
public readonly NumCIL.Double.NdArray V;
public readonly NumCIL.Double.NdArray Hx;
public readonly NumCIL.Double.NdArray Ux;
public readonly NumCIL.Double.NdArray Vx;
public readonly NumCIL.Double.NdArray Hy;
public readonly NumCIL.Double.NdArray Uy;
public readonly NumCIL.Double.NdArray Vy;
public DataDouble(long n)
{
N = n;
H = NumCIL.Double.Generate.Ones(n + 2, n + 2);
U = NumCIL.Double.Generate.Ones(n + 2, n + 2);
V = NumCIL.Double.Generate.Ones(n + 2, n + 2);
Hx = NumCIL.Double.Generate.Ones(n + 1, n + 1);
Ux = NumCIL.Double.Generate.Ones(n + 1, n + 1);
Vx = NumCIL.Double.Generate.Ones(n + 1, n + 1);
Hy = NumCIL.Double.Generate.Ones(n + 1, n + 1);
Uy = NumCIL.Double.Generate.Ones(n + 1, n + 1);
Vy = NumCIL.Double.Generate.Ones(n + 1, n + 1);
}
}
public static System.Double SolveDouble(long timesteps, DataDouble data)
{
//Convenience indices
R RN = R.El(data.N); //Same as [n]
R RN1 = R.El(data.N + 1); //Same as [n+1]
System.Double g = 9.8f;// gravitational constant
System.Double dt = 0.02f; // hardwired timestep
System.Double dx = 1.0f;
System.Double dy = 1.0f;
long droploc = data.N / 4;
var H = data.H;
var U = data.U;
var V = data.V;
var Hx = data.Hx;
var Ux = data.Ux;
var Vx = data.Vx;
var Hy = data.Hy;
var Uy = data.Uy;
var Vy = data.Vy;
//Splash!!!
H[droploc, droploc] += 5.0f;
for (int i = 0; i < timesteps; i++)
{
H.Flush();
// Reflecting boundary conditions
H[ALL, FIRST] = H[ALL, SECOND];
U[ALL, FIRST] = U[ALL, SECOND];
V[ALL, FIRST] = -V[ALL, SECOND];
H[ALL, RN1] = H[ALL, RN];
U[ALL, RN1] = U[ALL, RN];
V[ALL, RN1] = -V[ALL, RN];
H[FIRST, ALL] = H[SECOND, ALL];
U[FIRST, ALL] = -U[SECOND, ALL];
V[FIRST, ALL] = V[SECOND, ALL];
H[RN1, ALL] = H[RN, ALL];
U[RN1, ALL] = -U[RN, ALL];
V[RN1, ALL] = V[RN, ALL];
//First half-step
//Height
Hx[ALL, R.Slice(0, -1)] = (H[SKIP1,INNER]+H[ZM1,INNER])/2 -
dt/(2*dx)*(U[SKIP1,INNER]-U[ZM1,INNER]);
//x momentum
Ux[ALL, R.Slice(0, -1)] = (U[SKIP1,INNER]+U[ZM1,INNER])/2 -
dt/(2*dx)*((U[SKIP1,INNER].Pow(2)/H[SKIP1,INNER] +
g/2*H[SKIP1,INNER].Pow(2)) -
(U[ZM1,INNER].Pow(2)/H[ZM1,INNER] +
g/2*H[ZM1,INNER].Pow(2)));
// y momentum
Vx[ALL, ZM1] = (V[SKIP1,INNER]+V[ZM1,INNER])/2 -
dt/(2*dx)*((U[SKIP1,INNER] *
V[SKIP1,INNER]/H[SKIP1, INNER]) -
(U[ZM1,INNER] *
V[ZM1,INNER]/H[ZM1,INNER]));
// height
Hy[ZM1,ALL] = (H[INNER,SKIP1]+H[INNER,ZM1])/2 -
dt/(2*dy)*(V[INNER,SKIP1]-V[INNER,ZM1]);
// x momentum
Uy[ZM1,ALL] = (U[INNER,SKIP1]+U[INNER,ZM1])/2 -
dt/(2*dy)*((V[INNER,SKIP1] *
U[INNER,SKIP1]/H[INNER,SKIP1]) -
(V[INNER,ZM1] *
U[INNER,ZM1]/H[INNER,ZM1]));
// y momentum
Vy[ZM1,ALL] = (V[INNER,SKIP1]+V[INNER,ZM1])/2 -
dt/(2*dy)*((V[INNER,SKIP1].Pow(2)/H[INNER,SKIP1] +
g/2*H[INNER,SKIP1].Pow(2)) -
(V[INNER,ZM1].Pow(2)/H[INNER,ZM1] +
g/2*H[INNER,ZM1].Pow(2)));
// Second half step
// height
H[INNER, INNER] = H[INNER, INNER] -
(dt/dx)*(Ux[SKIP1,ZM1]-Ux[ZM1,ZM1]) -
(dt/dy)*(Vy[ZM1,SKIP1]-Vy[ZM1, ZM1]);
// x momentum
U[INNER, INNER] = U[INNER, INNER] -
(dt/dx)*((Ux[SKIP1,ZM1].Pow(2)/Hx[SKIP1,ZM1] +
g/2*Hx[SKIP1,ZM1].Pow(2)) -
(Ux[ZM1,ZM1].Pow(2)/Hx[ZM1,ZM1] +
g / 2 * Hx[ZM1, ZM1].Pow(2))) -
(dt/dy)*((Vy[ZM1,SKIP1] *
Uy[ZM1,SKIP1]/Hy[ZM1,SKIP1]) -
(Vy[ZM1,ZM1] *
Uy[ZM1,ZM1]/Hy[ZM1,ZM1]));
// y momentum
V[R.Slice(1, -1),R.Slice(1, -1)] = V[R.Slice(1, -1),R.Slice(1, -1)] -
(dt/dx)*((Ux[SKIP1,ZM1] *
Vx[SKIP1,ZM1]/Hx[SKIP1,ZM1]) -
(Ux[ZM1,ZM1]*Vx[ZM1,ZM1]/Hx[ZM1,ZM1])) -
(dt / dy) * ((Vy[ZM1, SKIP1].Pow(2) / Hy[ZM1, SKIP1] +
g / 2 * Hy[ZM1, SKIP1].Pow(2)) -
(Vy[ZM1, ZM1].Pow(2) / Hy[ZM1, ZM1] +
g / 2 * Hy[ZM1, ZM1].Pow(2)));
}
//Make sure we have the actual data and use it as a checksum
return NumCIL.Double.Add.Reduce(NumCIL.Double.Add.Reduce(H / data.N)).Value[0];
}
public class DataFloat
{
public readonly long N;
public readonly NumCIL.Float.NdArray H;
public readonly NumCIL.Float.NdArray U;
public readonly NumCIL.Float.NdArray V;
public readonly NumCIL.Float.NdArray Hx;
public readonly NumCIL.Float.NdArray Ux;
public readonly NumCIL.Float.NdArray Vx;
public readonly NumCIL.Float.NdArray Hy;
public readonly NumCIL.Float.NdArray Uy;
public readonly NumCIL.Float.NdArray Vy;
public DataFloat(long n)
{
N = n;
H = NumCIL.Float.Generate.Ones(n + 2, n + 2);
U = NumCIL.Float.Generate.Ones(n + 2, n + 2);
V = NumCIL.Float.Generate.Ones(n + 2, n + 2);
Hx = NumCIL.Float.Generate.Ones(n + 1, n + 1);
Ux = NumCIL.Float.Generate.Ones(n + 1, n + 1);
Vx = NumCIL.Float.Generate.Ones(n + 1, n + 1);
Hy = NumCIL.Float.Generate.Ones(n + 1, n + 1);
Uy = NumCIL.Float.Generate.Ones(n + 1, n + 1);
Vy = NumCIL.Float.Generate.Ones(n + 1, n + 1);
}
}
public static System.Single SolveFloat(long timesteps, DataFloat data)
{
//Convenience indices
R RN = R.El(data.N); //Same as [n]
R RN1 = R.El(data.N + 1); //Same as [n+1]
System.Single g = 9.8f;// gravitational constant
System.Single dt = 0.02f; // hardwired timestep
System.Single dx = 1.0f;
System.Single dy = 1.0f;
long droploc = data.N / 4;
var H = data.H;
var U = data.U;
var V = data.V;
var Hx = data.Hx;
var Ux = data.Ux;
var Vx = data.Vx;
var Hy = data.Hy;
var Uy = data.Uy;
var Vy = data.Vy;
//Splash!!!
H[droploc, droploc] += 5.0f;
for (int i = 0; i < timesteps; i++)
{
H.Flush();
// Reflecting boundary conditions
H[ALL, FIRST] = H[ALL, SECOND];
U[ALL, FIRST] = U[ALL, SECOND];
V[ALL, FIRST] = -V[ALL, SECOND];
H[ALL, RN1] = H[ALL, RN];
U[ALL, RN1] = U[ALL, RN];
V[ALL, RN1] = -V[ALL, RN];
H[FIRST, ALL] = H[SECOND, ALL];
U[FIRST, ALL] = -U[SECOND, ALL];
V[FIRST, ALL] = V[SECOND, ALL];
H[RN1, ALL] = H[RN, ALL];
U[RN1, ALL] = -U[RN, ALL];
V[RN1, ALL] = V[RN, ALL];
//First half-step
//Height
Hx[ALL, R.Slice(0, -1)] = (H[SKIP1,INNER]+H[ZM1,INNER])/2 -
dt/(2*dx)*(U[SKIP1,INNER]-U[ZM1,INNER]);
//x momentum
Ux[ALL, R.Slice(0, -1)] = (U[SKIP1,INNER]+U[ZM1,INNER])/2 -
dt/(2*dx)*((U[SKIP1,INNER].Pow(2)/H[SKIP1,INNER] +
g/2*H[SKIP1,INNER].Pow(2)) -
(U[ZM1,INNER].Pow(2)/H[ZM1,INNER] +
g/2*H[ZM1,INNER].Pow(2)));
// y momentum
Vx[ALL, ZM1] = (V[SKIP1,INNER]+V[ZM1,INNER])/2 -
dt/(2*dx)*((U[SKIP1,INNER] *
V[SKIP1,INNER]/H[SKIP1, INNER]) -
(U[ZM1,INNER] *
V[ZM1,INNER]/H[ZM1,INNER]));
// height
Hy[ZM1,ALL] = (H[INNER,SKIP1]+H[INNER,ZM1])/2 -
dt/(2*dy)*(V[INNER,SKIP1]-V[INNER,ZM1]);
// x momentum
Uy[ZM1,ALL] = (U[INNER,SKIP1]+U[INNER,ZM1])/2 -
dt/(2*dy)*((V[INNER,SKIP1] *
U[INNER,SKIP1]/H[INNER,SKIP1]) -
(V[INNER,ZM1] *
U[INNER,ZM1]/H[INNER,ZM1]));
// y momentum
Vy[ZM1,ALL] = (V[INNER,SKIP1]+V[INNER,ZM1])/2 -
dt/(2*dy)*((V[INNER,SKIP1].Pow(2)/H[INNER,SKIP1] +
g/2*H[INNER,SKIP1].Pow(2)) -
(V[INNER,ZM1].Pow(2)/H[INNER,ZM1] +
g/2*H[INNER,ZM1].Pow(2)));
// Second half step
// height
H[INNER, INNER] = H[INNER, INNER] -
(dt/dx)*(Ux[SKIP1,ZM1]-Ux[ZM1,ZM1]) -
(dt/dy)*(Vy[ZM1,SKIP1]-Vy[ZM1, ZM1]);
// x momentum
U[INNER, INNER] = U[INNER, INNER] -
(dt/dx)*((Ux[SKIP1,ZM1].Pow(2)/Hx[SKIP1,ZM1] +
g/2*Hx[SKIP1,ZM1].Pow(2)) -
(Ux[ZM1,ZM1].Pow(2)/Hx[ZM1,ZM1] +
g / 2 * Hx[ZM1, ZM1].Pow(2))) -
(dt/dy)*((Vy[ZM1,SKIP1] *
Uy[ZM1,SKIP1]/Hy[ZM1,SKIP1]) -
(Vy[ZM1,ZM1] *
Uy[ZM1,ZM1]/Hy[ZM1,ZM1]));
// y momentum
V[R.Slice(1, -1),R.Slice(1, -1)] = V[R.Slice(1, -1),R.Slice(1, -1)] -
(dt/dx)*((Ux[SKIP1,ZM1] *
Vx[SKIP1,ZM1]/Hx[SKIP1,ZM1]) -
(Ux[ZM1,ZM1]*Vx[ZM1,ZM1]/Hx[ZM1,ZM1])) -
(dt / dy) * ((Vy[ZM1, SKIP1].Pow(2) / Hy[ZM1, SKIP1] +
g / 2 * Hy[ZM1, SKIP1].Pow(2)) -
(Vy[ZM1, ZM1].Pow(2) / Hy[ZM1, ZM1] +
g / 2 * Hy[ZM1, ZM1].Pow(2)));
}
//Make sure we have the actual data and use it as a checksum
return NumCIL.Float.Add.Reduce(NumCIL.Float.Add.Reduce(H / data.N)).Value[0];
}
}
}
Snakes And Ladders¶Python Numpy¶Error There are issues with the implementation. Implementation seems broken when running with Bohrium, it looks like it does nothing but copy data back and forth between NumPy and Bohrium. # By Natalino Busa <https://gist.github.com/natalinobusa/4633275>
from __future__ import print_function
from benchpress.benchmarks import util
import numpy as np
bench = util.Benchmark("Snakes and Ladders", "size*iterations")
def nullgame(size, dtype):
p = np.zeros((size + 1, size + 1), dtype=dtype)
for i in range(size + 1):
for j in range(6):
if i + j < size:
p[i][i + j + 1] = 1.0 / 6.0
p[size][size] = 1
p[size - 1][size] = 6.0 / 6.0
p[size - 2][size] = 5.0 / 6.0
p[size - 3][size] = 4.0 / 6.0
p[size - 4][size] = 3.0 / 6.0
p[size - 5][size] = 2.0 / 6.0
p[size - 6][size] = 1.0 / 6.0
return p
def main():
size, iterations = bench.args.size
if bench.args.visualize:
from matplotlib import pyplot
a = np.array(np.zeros(size + 1, dtype=bench.dtype))
p = np.array(nullgame(size, dtype=bench.dtype))
m = p # Initial matrix is p
pr_end = np.array(np.zeros(iterations, dtype=bench.dtype))
bench.start()
for k in range(iterations):
if bench.args.visualize:
# Plot the probability distribution at the k-th iteration
pyplot.figure(1)
pyplot.plot(m[0][0:size])
# Store the probability of ending after the k-th iteration
pr_end[k] = m[0][size]
# Store/plot the accumulated marginal probability at the k-th iteration
a = a + m[0]
bench.flush()
if bench.args.visualize:
pyplot.figure(2)
pyplot.plot(a[0:size])
# calculate the stocastic matrix for iteration k+1
if util.bh_is_loaded_as_np and bench.args.no_extmethods:
m = np.array(np.dot(m.copy2numpy(), p.copy2numpy()))
else:
m = np.dot(m, p)
bench.stop()
bench.pprint()
# plot the probability of ending the game
# after k iterations
if bench.args.visualize:
pyplot.figure(3)
pyplot.plot(pr_end[0:iterations - 1])
# show the three graphs
pyplot.show()
if __name__ == "__main__":
main()
Wisp¶Python Numpy¶# -*- coding: utf-8 -*-
"""
Created on Thu Nov 20 10:47:25 2014
GPL
@author: Rasmus Nordfang
"""
from benchpress.benchmarks import util
import numpy as np
from salt import salt
from temp import temp
from density_imr import dens
from heat_imr import heatcap
bench = util.Benchmark("Wisp", "length*width*steps")
length = 1 # size of the grid unit m
width = 1 # size of the grid unit m
N_L = bench.args.size[0] # number of grids in Length direction
N_W = bench.args.size[1] # number of grids in width direction
n_years = 10 # number of years
# parameters
# Generel Start Values
# temp,[C] density[kg/m^3], depth,[m] mass[kg] Salinity[g/kg]
T_ML = -4;
p_ML = 10;
h_ML = 50;
m_ML = 3000;
S_ML = 33.0 # Mixed Layer
T_PC = 0.1;
p_PC = 20;
h_PC = 350;
m_PC = 2000;
S_PC = 34.5 # PycoCline
T_DP = 5.0;
p_DP = 30;
h_DP = 800;
m_DP = 1000;
S_DP = 32.0 # DeeP atlantic layer
T_AB = 2.0;
p_AB = 40;
h_AB = 3000;
m_AB = 8000;
S_AB = 35.0 # AByssal Layer
T_start = np.array([T_ML, T_PC, T_DP, T_AB]) # temperature
p_start = np.array([p_ML, p_PC, p_DP, p_AB]) # density
m_start = np.array([m_ML, m_PC, m_DP, m_AB]) # mass of layer
S_start = np.array([S_ML, S_PC, S_DP, S_AB]) # Salinity of layer
h = np.array([h_ML, h_PC, h_DP, h_AB]) # depth
hight = np.array([50, 300, 400, 1800]) # [m]
Volume = np.array(length * width * hight) # [m^3]
pres = np.array([60.4, 360, 814, 3030]) # [kg/m^3] general pressure
# time
year = 60 * 60 * 24 * 365 # [s] seconds a year
dt = year / 200 # [s] Generel time stepsize
dt_top = year / 20 # [s] time stepsize for top layers
dt_temp = dt_top / dt # fraction between generel stepsize and top stepsize
time = n_years * year # [s] total time
steps = int(time / dt) # [] number of timesteps
steps = bench.args.size[2]
# define a total matrix for all timesteps. With big calculations (I should maybe save them differently)
T = np.ones(shape=(steps + 1, N_L, N_W, 4));
T[0, :, :, :] = T_start
p = np.ones(shape=(steps + 1, N_L, N_W, 4));
m = np.ones(shape=(steps + 1, N_L, N_W, 4));
m[0, :, :, :] = m_start
S = np.ones(shape=(steps + 1, N_L, N_W, 4));
S[0, :, :, :] = S_start
c_w = np.ones(shape=(steps + 1, N_L, N_W, 4)); # [J / (K * kg)] water specefic heat
V = np.zeros(shape=(steps + 1, N_L, N_W));
V[0, :, :] = 2.0 # [m^3] Ice volume
h_ice = np.ones(shape=(steps + 1, N_L, N_W)); # [m] ice height
x_ice = np.ones(shape=(steps + 1, N_L, N_W)); # percentage ice cover
I_export = 0 # [m^3] volume of ice exportet away
bench.start()
for i in range(steps): # Forward euler
h_ice[i, :, :] = V[i, :, :] # if V > 0.5 -> h_ice = V and h_ice = 1
putmask = V[i, :, :] <= 0.5
h_ice[i, :, :] = ~putmask * h_ice[i, :, :] + putmask * (V[i, :, :] ** 0.5 / 2.0)
x_ice[i, :, :] = ~putmask * x_ice[i, :, :] + putmask * (2 * h_ice[i, :, :])
if i % dt_temp == 0:
p[i, :, :, :] = dens(S[i, :, :, :], T[i, :, :, :], pres[:]) # [kg/m] (3d [kg/m^3])
c_w[i, :, :, :] = heatcap(S[i, :, :, :], T[i, :, :, :], pres[:]) # [J/(kg*C)]
m[i, :, :, :] = hight[:] * p[i, :, :, :] # [kg]
[dT, dV] = temp(T[i, :, :, :], p[i, :, :, :], h, m[i, :, :, :], c_w[i, :, :, :], x_ice[i, :, :],
0 if i < (0.5 * year / dt) else 1, h_ice[i, :, :], I_export)
T[i + 1:i + 1 + dt_temp, :, :, :] = T[i, :, :, :] + dt * dT[:, :, :]
V[i + 1:i + 1 + dt_temp, :, :] = V[i, :, :] + dt * dV[:, :]
S[i + 1:i + 1 + dt_temp, :, :, :] = salt(S[i, :, :, :], p[i, :, :, :], h, x_ice[i, :, :], I_export,
dV / dt) * dt + S[i, :, :, :]
else:
p[i, :, :, 0] = dens(S[i, :, :, 0], T[i, :, :, 0], pres[0])
p[i, :, :, 1] = dens(S[i, :, :, 1], T[i, :, :, 1], pres[1])
c_w[i, :, :, 0] = heatcap(S[i, :, :, 0], T[i, :, :, 0], pres[0]) # [J/(kg*C)]
c_w[i, :, :, 1] = heatcap(S[i, :, :, 1], T[i, :, :, 1], pres[1]) # [J/(kg*C)]
m[i, :, :, 0:2] = hight[0:2] * p[i, :, :, 0:2] # [kg]
# all layers
[dT, dV] = temp(T[i, :, :, :], p[i, :, :, :], h, m[i, :, :, :], c_w[i, :, :, :], x_ice[i, :, :],
0 if i < (0.5 * year / dt) else 1, h_ice[i, :, :], I_export)
T[i + 1, :, :, 0] = dT[:, :, 0] * dt + T[i, :, :, 0]
V[i + 1, :, :] = dV[:, :] * dt + V[i, :, :]
dS = salt(S[i, :, :, :], p[i, :, :, :], h, x_ice[i, :, :], I_export, dV / dt)
S[i + 1, :, :, 0] = dS[:, :, 0] * dt + S[i, :, :, 0]
I_export = 14.22
bench.stop()
bench.pprint()
Xraysim¶Python Numpy¶#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Created on Mon Aug 31 14:14:15 2015
@author: Mads Thoudahl
"""
from xraysimphysics import emptyAAscene, addobjtoscene
from xraysimgeometry import coordsAAscene, raygeometry, detectorgeometry, runAABB
import numpy as np
from material import Material
from scene_objects import snake
from benchpress.benchmarks import util
bench = util.Benchmark("X-ray sim", "<screen-res>*<detector-res>*<iterations>")
class Const:
EPS = 1e-5
invsqr = 1.0 / (np.pi * 4)
def buildscene(
scenedefs, # scenedefinitions
objlist, # list of objects, obj=
verbose=False # printing status updates along the way
):
""" Builds and returns a (sparse) scenegrid, and a (dense) voxel array,
describing what materials are in which voxels
args:
scenedefs in the form [x0,y0,z0,x1,y1,z1,xr,yr,zr], where
*0, and *1 is corners spanning axis aligned scene.
*r is resolution in the three dimensions.
objectlist list of objects in the form
obj = [objref, shape, material, referencesystem, specific geometric describers]
verbose directions to verboseprint std is False
returns:
scenegrid meshgrid [np[x0..x1], np[y0..y1], np[z0..z1]]
list of arrays of shapes [(xr+1), (yr+1), (zr+1)]
scenematarials 3D array of shape (xr,yr,zr) describing which
material inhibits which voxel in the scene """
# create empty scene
scenegrid = coordsAAscene(scenedefs)
scenematerials = emptyAAscene(scenedefs)
if verbose: print("axisaligned scene generated")
# habitate scene with objects
added, discarded = [], []
for obj in objlist:
if addobjtoscene(scenedefs, scenematerials, obj):
added.append(obj[0])
if verbose: print("object {} included".format(obj[0]))
else:
discarded.append(obj[0])
if verbose: print("object {} NOT included".format(obj[0]))
if verbose:
print ("axisaligned scene inhabited with {} objects, {} discarded".format(len(added), len(discarded)))
print ("objects added: {}".format(added))
print ("objects discarded: {}".format(discarded))
return scenegrid, scenematerials
def xraysim(sourcelist,
detectordeflist,
scenegrid,
scenematerials,
materials,
scene_resolution,
detector_resolution):
""" performs the calculations figuring out what is detected
INPUT:
sourcelist: list of np.array([
px,py,pz, position
relative_power, relative to other sources simulated
energy]) MeV
detectorlist: list of np.array([
px0,py0,pz0, position lower left corner
px1,py1,pz1, position upper right corner
res1,res2 ]) resolution of detector
scenegrid: a numpy array 'meshgrid' shaped (xs+1,ys+1,zs+1)
containing absolute coordinates of grid at 'intersections'
as returned by buildscene
scenematerials: a numpy array shaped (xs,ys,zs)
containing information of which MATERIAL inhibits
any voxel, as an integer value.
## for a better model this should also contain
## information of how much of the voxel is filled..
materials: a dict containing all materials used in the scenematerials
as Material objects
scene_resolution: resolution of the scene cubic, e.g. 32 equals 32^3
detector_resolution: resolution of the detectors squared, e.g. 22 equals 22^2
"""
## indexing constants
power = 3
# generate an array of endpoints for rays (at the detector)
detectors = []
for ddef in detectordeflist:
detectors.append(detectorgeometry(ddef))
for source in sourcelist:
# unpack source
sx, sy, sz, spower, senergy = source
rayorigin = sx, sy, sz
# preprocess the scene physics
# building a map of attenuation coefficients
sceneattenuates = np.zeros(scenematerials.shape)
for material_id in materials.keys():
sceneattenuates += (scenematerials == material_id) \
* materials[material_id].getMu(senergy)
ret = []
for pixelpositions, pixelareavector, dshape, result in detectors:
# do geometry
rayudirs, raylengths, rayinverse = raygeometry(rayorigin, pixelpositions)
raydst = runAABB(scenegrid, rayudirs, rayorigin, rayinverse)
# raydst is now to be correlated with material/attenuation grid
t = sceneattenuates[..., np.newaxis] * raydst
# We sums the three dimensions
t = np.sum(t, axis=(0, 1, 2))
dtectattenuates = t.reshape(detector_resolution, detector_resolution)
pixelintensity = ((Const.invsqr * source[power] * np.ones(raylengths.shape[0])[
..., np.newaxis]) / raylengths).reshape(dshape)
area = np.dot(rayudirs, pixelareavector.reshape(3, 1)).reshape(dshape)
result += pixelintensity * area * np.exp(-dtectattenuates)
ret.append(result)
if bench.args.visualize:
low = np.minimum.reduce(result.flatten())
high = np.maximum.reduce(result.flatten())
if util.Benchmark().bohrium:
low = low.copy2numpy()
high = high.copy2numpy()
util.plot_surface(result, "2d", 0, low - 0.001 * low, high - 0.5 * high)
util.plot_surface(result, "2d", 0, low - 0.001 * low, high - 0.5 * high)
# We return only the result of the detectors
return ret
def setup(scene_res, detector_res):
"""Returns a scene to xray
scene_res: resolution of the scene cubic, e.g. 32 equals 32^3
detector_res: resolution of the detectors squared, e.g. 22 equals 22^2
"""
srclist, detectorlist, scenedefs, objectlist = snake(scene_res, detector_res)
# build a model of all materials
materials = Material.initAll()
# build scene
scenegrid, scenematerials = buildscene(scenedefs, objectlist)
return (srclist, detectorlist, scenegrid, scenematerials, materials, scene_res, detector_res)
def main():
scene_res = bench.args.size[0]
detector_res = bench.args.size[1]
iterations = bench.args.size[2]
scene = setup(scene_res, detector_res)
bench.start()
for _ in range(iterations):
detector_results = xraysim(*scene)
bench.flush()
bench.stop()
bench.pprint()
bench.confirm_exit()
if __name__ == '__main__':
main()
|