Building Requirements

This section describes the software dependencies and build requirements for FLUXOS.

Required Dependencies

Dependency	Minimum Version	Description
C++ Compiler	C++11	GCC 7+, Clang 8+, or Intel C++ 18+ recommended
CMake	3.10	Build system generator
Armadillo	9.9	C++ linear algebra library
OpenMP	4.5	Shared-memory parallelization (usually bundled with compiler)

Optional Dependencies

Dependency	Version	Description
MPI	3.0+	Required for distributed computing (OpenMPI, MPICH, or Intel MPI)
CUDA Toolkit	11.0+	Required for GPU acceleration (NVIDIA GPUs, Compute Capability 6.0+)
METIS	5.0+	Optional graph partitioning for triangular mesh MPI decomposition
HDF5	1.10+	Optional parallel I/O support
LAPACK/BLAS	Any	High-performance linear algebra (Armadillo backend)

Build Modes

FLUXOS supports several build configurations:

Standard Build (OpenMP only)

cmake -DMODE_release=ON ..
make

MPI+OpenMP Hybrid Build

cmake -DMODE_release=ON -DUSE_MPI=ON ..
make

CUDA GPU Build

cmake -DMODE_release=ON -DUSE_CUDA=ON ..
make

Triangular Mesh Build

cmake -DMODE_release=ON ..
make

Full-Feature Build (Triangular Mesh + GPU + MPI)

cmake -DMODE_release=ON -DUSE_CUDA=ON -DUSE_MPI=ON ..
make

Debug Build

cmake -DMODE_debug=ON ..
make

CMake Options

Option	Default	Description
MODE_release	OFF	Enable release build with optimizations (-O3)
MODE_debug	OFF	Enable debug build with symbols (-g)
USE_MPI	OFF	Enable MPI for distributed computing
USE_CUDA	OFF	Enable CUDA GPU acceleration (requires NVIDIA GPU and CUDA Toolkit 11.0+)
USE_TRIMESH	OFF	Enable unstructured triangular mesh support (Gmsh/Triangle mesh formats)

Note

USE_CUDA and USE_TRIMESH can be combined. When both are enabled, the triangular mesh solver uses GPU acceleration with 7 specialized CUDA kernels.

Compiler Optimization Flags

The release build includes aggressive optimization flags for maximum performance:

-O3: High-level optimization
-march=native: Optimize for the current CPU architecture
-mtune=native: Tune for the current CPU
-funroll-loops: Unroll loops for better performance
-ftree-vectorize: Enable automatic vectorization
-fno-math-errno: Disable errno for math functions (faster)
-flto: Link-time optimization

Platform-Specific Notes

Linux

Most HPC clusters run Linux. Ensure you load the appropriate modules before building:

module load gcc/11.2.0
module load cmake/3.20
module load armadillo/11.0
module load openmpi/4.1.1  # if using MPI

macOS

On macOS, install dependencies via Homebrew:

brew install cmake armadillo libomp open-mpi

For Apple Silicon (M1/M2), ensure you’re using native ARM builds of dependencies.

Windows

Windows builds are supported via:

Visual Studio 2019+ with C++11 support
MSYS2/MinGW-w64 with GCC
Windows Subsystem for Linux (WSL) - recommended

Running the Example

FLUXOS includes a test case in the Working_example/ directory (Rosa Creek watershed, 859x618 cells at 2m resolution). The repository’s bin/ directory is reserved for compiled binaries.

Regular Mesh:

mkdir -p Results
./build/bin/fluxos Working_example/modset.json

Triangular Mesh:

First generate the mesh and modset.json from the DEM by editing the _config dict (mesh_type = "triangular") and running the template:

cd supporting_scripts/1_Model_Config
python model_config_template.py

Then run FLUXOS with the triangular mesh config produced by the template:

cd <repo_root>
mkdir -p Results
./build/bin/fluxos Working_example/modset_trimesh.json

Visualizing Results in Google Earth

Export simulation results as KMZ files for animated visualization:

# Regular mesh results
python supporting_scripts/2_Read_Outputs/output_supporting_lib/fluxos_viewer.py \
    --results-dir Results --dem Working_example/Rosa_2m.asc --utm-zone 10

# Triangular mesh results
python supporting_scripts/2_Read_Outputs/output_supporting_lib/fluxos_viewer.py \
    --results-dir Results --dem Working_example/Rosa_2m.asc \
    --mesh-type triangular --utm-zone 10

# Open in Google Earth
open fluxos_regular.kmz    # macOS
xdg-open fluxos_regular.kmz  # Linux

Use the time slider in Google Earth to animate through simulation timesteps.

Benchmark Results

Tested on the Rosa Creek example (859x618 grid, 5h simulation, 1h output steps) on Apple M-series:

Configuration	Mesh Type	Wall Time	Output Size
OpenMP (release)	Regular (530K cells)	4.73 s	210 MB (6 x 35 MB .txt)
OpenMP (release)	Triangular (4528 cells)	0.85 s	9.2 MB (6 x 1.5 MB .vtu)

Note

CUDA acceleration requires an NVIDIA GPU (not available on macOS ARM). MPI domain decomposition is available for distributed computing on HPC clusters.

Verifying the Build

After building, verify the executable:

# Check executable exists
ls -la build/bin/fluxos

# Run with the example case
./build/bin/fluxos Working_example/modset.json

Troubleshooting

Armadillo not found:

Ensure Armadillo is installed and its include/library paths are accessible:

# Check Armadillo installation
find /usr -name "armadillo" 2>/dev/null

# Set paths if needed
cmake -DARMADILLO_INCLUDE_DIR=/path/to/include \
      -DARMADILLO_LIBRARY=/path/to/libarmadillo.so ..

OpenMP not found:

For GCC, OpenMP is usually included. For Clang on macOS:

brew install libomp
export OpenMP_ROOT=$(brew --prefix)/opt/libomp

MPI not found:

Ensure MPI is in your PATH:

which mpicc mpicxx

# If using environment modules
module load openmpi

Link-time optimization (LTO) errors:

If LTO causes issues, disable it:

cmake -DMODE_release=ON -DCMAKE_CXX_FLAGS="-O3 -march=native" ..

CUDA not found:

Ensure CUDA Toolkit is installed and nvcc is in your PATH:

# Check CUDA installation
nvcc --version
nvidia-smi

# Set CUDA path if needed
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH

CUDA compute capability mismatch:

If you get architecture-related errors, specify your GPU’s compute capability:

cmake -DMODE_release=ON -DUSE_CUDA=ON -DCUDA_ARCH=75 ..  # For RTX 2080
cmake -DMODE_release=ON -DUSE_CUDA=ON -DCUDA_ARCH=86 ..  # For RTX 3090