{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Analyzing the Herten Aquifer with GSTools\n\nThis example is going to be a bit more extensive and we are going to do some\nbasic data preprocessing for the actual variogram estimation. But this example\nwill be self-contained and all data gathering and processing will be done in\nthis example script.\n\n\n## The Data\n\nWe are going to analyse the Herten aquifer, which is situated in Southern\nGermany. Multiple outcrop faces where surveyed and interpolated to a 3D\ndataset. In these publications, you can find more information about the data:\n\n| Bayer, Peter; Comunian, Alessandro; H\u00f6yng, Dominik; Mariethoz, Gregoire (2015): Physicochemical properties and 3D geostatistical simulations of the Herten and the Descalvado aquifer analogs. PANGAEA, https://doi.org/10.1594/PANGAEA.844167,\n| Supplement to: Bayer, P et al. (2015): Three-dimensional multi-facies realizations of sedimentary reservoir and aquifer analogs. Scientific Data, 2, 150033, https://doi.org/10.1038/sdata.2015.33\n|\n\n## Retrieving the Data\n\nTo begin with, we need to download and extract the data. Therefore, we are\ngoing to use some built-in Python libraries. For simplicity, many values and\nstrings will be hardcoded.\n\nYou don't have to execute the ``download_herten`` and ``generate_transmissivity``\nfunctions, since the only produce the ``herten_transmissivity.gz``\nand ``grid_dim_origin_spacing.txt``, which are already present.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport gstools as gs\n\nVTK_PATH = os.path.join(\"Herten-analog\", \"sim-big_1000x1000x140\", \"sim.vtk\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def download_herten():\n    \"\"\"Download the data, warning: its about 250MB.\"\"\"\n    import urllib.request\n    import zipfile\n\n    print(\"Downloading Herten data\")\n    data_filename = \"data.zip\"\n    data_url = (\n        \"http://store.pangaea.de/Publications/\"\n        \"Bayer_et_al_2015/Herten-analog.zip\"\n    )\n    urllib.request.urlretrieve(data_url, \"data.zip\")\n    # extract the \"big\" simulation\n    with zipfile.ZipFile(data_filename, \"r\") as zf:\n        zf.extract(VTK_PATH)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def generate_transmissivity():\n    \"\"\"Generate a file with a transmissivity field from the HERTEN data.\"\"\"\n    import shutil\n\n    import pyvista as pv\n\n    print(\"Loading Herten data with pyvista\")\n    mesh = pv.read(VTK_PATH)\n    herten = mesh.point_data[\"facies\"].reshape(mesh.dimensions, order=\"F\")\n    # conductivity values per fazies from the supplementary data\n    cond = 1e-4 * np.array(\n        [2.5, 2.3, 0.61, 260, 1300, 950, 0.43, 0.006, 23, 1.4]\n    )\n    # asign the conductivities to the facies\n    herten_cond = cond[herten]\n    # Next, we are going to calculate the transmissivity,\n    # by integrating over the vertical axis\n    herten_trans = np.sum(herten_cond, axis=2) * mesh.spacing[2]\n    # saving some grid informations\n    grid = [mesh.dimensions[:2], mesh.origin[:2], mesh.spacing[:2]]\n    print(\"Saving the transmissivity field and grid information\")\n    np.savetxt(\"herten_transmissivity.gz\", herten_trans)\n    np.savetxt(\"grid_dim_origin_spacing.txt\", grid)\n    # Some cleanup. You can comment out these lines to keep the downloaded data\n    os.remove(\"data.zip\")\n    shutil.rmtree(\"Herten-analog\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Downloading and Preprocessing\n\nYou can uncomment the following two calls, so the data is downloaded\nand processed again.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# download_herten()\n# generate_transmissivity()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Analyzing the data\n\nThe Herten data provides information about the grid, which was already used in\nthe previous code block. From this information, we can create our own grid on\nwhich we can estimate the variogram. As a first step, we are going to estimate\nan isotropic variogram, meaning that we will take point pairs from all\ndirections into account. An unstructured grid is a natural choice for this.\nTherefore, we are going to create an unstructured grid from the given,\nstructured one. For this, we are going to write another small function\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "herten_log_trans = np.log(np.loadtxt(\"herten_transmissivity.gz\"))\ndim, origin, spacing = np.loadtxt(\"grid_dim_origin_spacing.txt\")\n\n# create a structured grid on which the data is defined\nx_s = np.arange(origin[0], origin[0] + dim[0] * spacing[0], spacing[0])\ny_s = np.arange(origin[1], origin[1] + dim[1] * spacing[1], spacing[1])\n# create the corresponding unstructured grid for the variogram estimation\nx_u, y_u = np.meshgrid(x_s, y_s)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's have a look at the transmissivity field of the Herten aquifer\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.imshow(herten_log_trans.T, origin=\"lower\", aspect=\"equal\")\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Estimating the Variogram\n\nFinally, everything is ready for the variogram estimation. For the unstructured\nmethod, we have to define the bins on which the variogram will be estimated.\nThrough expert knowledge (i.e. fiddling around), we assume that the main\nfeatures of the variogram will be below 10 metres distance. And because the\ndata has a high spatial resolution, the resolution of the bins can also be\nhigh. The transmissivity data is still defined on a structured grid, but we can\nsimply flatten it with :any:`numpy.ndarray.flatten`, in order to bring it into\nthe right shape. It might be more memory efficient to use\n``herten_log_trans.reshape(-1)``, but for better readability, we will stick to\n:any:`numpy.ndarray.flatten`. Taking all data points into account would take a\nvery long time (expert knowledge \\*wink\\*), thus we will only take 2000 datapoints into account, which are sampled randomly. In order to make the exact\nresults reproducible, we can also set a seed.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "bins = gs.standard_bins(pos=(x_u, y_u), max_dist=10)\nbin_center, gamma = gs.vario_estimate(\n    (x_u, y_u),\n    herten_log_trans.reshape(-1),\n    bins,\n    sampling_size=2000,\n    sampling_seed=19920516,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The estimated variogram is calculated on the centre of the given bins,\ntherefore, the ``bin_center`` array is also returned.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fitting the Variogram\n\nNow, we can see, if the estimated variogram can be modelled by a common\nvariogram model. Let's try the :any:`Exponential` model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# fit an exponential model\nfit_model = gs.Exponential(dim=2)\nfit_model.fit_variogram(bin_center, gamma, nugget=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can visualise some results. For quickly plotting a covariance\nmodel, GSTools provides some helper functions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ax = fit_model.plot(x_max=max(bin_center))\nax.plot(bin_center, gamma)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "That looks like a pretty good fit! By printing the model, we can directly see\nthe fitted parameters\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(fit_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "With this data, we could start generating new ensembles of the Herten aquifer\nwith the :any:`SRF` class.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Estimating the Variogram in Specific Directions\n\nEstimating a variogram on a structured grid gives us the possibility to only\nconsider values in a specific direction. This could be a first test, to see if\nthe data is anisotropic.\nIn order to speed up the calculations, we are going to only use every 10th datapoint and for a comparison with the isotropic variogram calculated earlier, we\nonly need the first 21 array items.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# estimate the variogram on a structured grid\n# use only every 10th value, otherwise calculations would take very long\nx_s_skip = np.ravel(x_s)[::10]\ny_s_skip = np.ravel(y_s)[::10]\nherten_trans_skip = herten_log_trans[::10, ::10]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "With this much smaller data set, we can immediately estimate the variogram in\nthe x- and y-axis\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "gamma_x = gs.vario_estimate_axis(herten_trans_skip, direction=\"x\")\ngamma_y = gs.vario_estimate_axis(herten_trans_skip, direction=\"y\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "With these two estimated variograms, we can start fitting :any:`Exponential`\ncovariance models\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x_plot = x_s_skip[:21]\ny_plot = y_s_skip[:21]\n# fit an exponential model\nfit_model_x = gs.Exponential(dim=2)\nfit_model_x.fit_variogram(x_plot, gamma_x[:21], nugget=False)\nfit_model_y = gs.Exponential(dim=2)\nfit_model_y.fit_variogram(y_plot, gamma_y[:21], nugget=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, the isotropic variogram and the two variograms in x- and y-direction can\nbe plotted together with their respective models, which will be plotted with\ndashed lines.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure()  # new figure\n(line,) = plt.plot(bin_center, gamma, label=\"estimated variogram (isotropic)\")\nplt.plot(\n    bin_center,\n    fit_model.variogram(bin_center),\n    color=line.get_color(),\n    linestyle=\"--\",\n    label=\"exp. variogram (isotropic)\",\n)\n\n(line,) = plt.plot(x_plot, gamma_x[:21], label=\"estimated variogram in x-dir\")\nplt.plot(\n    x_plot,\n    fit_model_x.variogram(x_plot),\n    color=line.get_color(),\n    linestyle=\"--\",\n    label=\"exp. variogram in x-dir\",\n)\n\n(line,) = plt.plot(y_plot, gamma_y[:21], label=\"estimated variogram in y-dir\")\nplt.plot(\n    y_plot,\n    fit_model_y.variogram(y_plot),\n    color=line.get_color(),\n    linestyle=\"--\",\n    label=\"exp. variogram in y-dir\",\n)\n\nplt.legend()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The plot might be a bit cluttered, but at least it is pretty obvious that the\nHerten aquifer has no apparent anisotropies in its spatial structure.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"semivariogram model (isotropic):\\n\", fit_model)\nprint(\"semivariogram model (in x-dir.):\\n\", fit_model_x)\nprint(\"semivariogram model (in y-dir.):\\n\", fit_model_y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Creating a Spatial Random Field from the Herten Parameters\n\nWith all the hard work done, it's straight forward now, to generate new\n*Herten-like realisations*\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# create a spatial random field on the low-resolution grid\nsrf = gs.SRF(fit_model, seed=19770928)\nsrf.structured([x_s_skip, y_s_skip])\nax = srf.plot()\nax.set_aspect(\"equal\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "That's pretty neat!\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}