Time offsets#

This notebook introduces the concept of time offsets and how they are used in gwrefpy.

This notebook can be downloaded from the source code here.

Time offsets are a concept to handle time series with different timestamps. If two time series share the same timestamps, they can be compared directly. If not, we need to introduce a time offset to align them. The offset represents an interval for which all data points will be considered as having the same timestamp.

In the following notebook, we will illustrate this concept with a simple example. The example is inspired by section 3.3 in Strandanger (2024).

import gwrefpy as gr
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

Data#

Let’s start by declaring the data we will explore as an example. We create Well objects and add to a Model object. Finally, we plot the data.

obs_ts = pd.Series(
    index=[
        pd.Timestamp("2023-01-07"),
        pd.Timestamp("2023-02-01"),
        pd.Timestamp("2023-02-25"),
    ],
    data=[10.4, 10.7, 10.8],
    name="obs",
)
ref_ts = pd.Series(
    index=[
        pd.Timestamp("2023-01-08"),
        pd.Timestamp("2023-02-03"),
        pd.Timestamp("2023-02-09"),
        pd.Timestamp("2023-02-25"),
        pd.Timestamp("2023-02-28"),
    ],
    data=[8.9, 9.2, 9.3, 9.3, 9.5],
    name="ref",
)
obs = gr.Well("obs", False, obs_ts)
ref = gr.Well("ref", True, ref_ts)

model = gr.Model("offset test")
model.add_well([obs, ref])

Hide code cell source

fig, (up, down) = plt.subplots(nrows=2, sharex=True)

obs_ts.plot(ax=up, marker="o", linestyle="", label="obs", color="black")
ref_ts.plot(ax=down, marker="o", linestyle="", label="ref", color="blue")
_ = [ax.legend() for ax in (up, down)]
_ = [ax.grid() for ax in (up, down)]
../_images/2e46a75cfe8fcf6c353397383e6336bb49ec40f09febc9d9d1b8a2fb83e77115.png

As we can see, the wells share some timestamps, but not all. In fact, only one timestamp (2023-02-25) is shared between the two wells. The table below shows an outer join between the two wells.

pd.concat([obs_ts, ref_ts], axis="columns", join="outer")
obs ref
2023-01-07 10.4 NaN
2023-01-08 NaN 8.9
2023-02-01 10.7 NaN
2023-02-03 NaN 9.2
2023-02-09 NaN 9.3
2023-02-25 10.8 9.3
2023-02-28 NaN 9.5

Using time offsets#

Our two wells have only one shared timestamp, but we know that they have several timestamps that are close to each other. To compare the two time series, we can introduce a time offset.

A time offset is defined as a time interval. All data points within this interval will be considered as having the same timestamp. For example, if we set an offset of 1 day, all data points within 1 day of each other will be considered as having the same timestamp. If multiple data points fall within the same interval, the mean value will be used default. As of version 0.2.0 we also include median, max and min as aggregation method within an offset (see the API reference for the Model class and its fit method).

Important

The offset extends in both directions. For example, if we have a timestamp of 2023-02-25 and an offset of 1 day, all data points between 2023-02-24 and 2023-02-26 will be considered as having the same timestamp.

The offset argument can be given as a string or a pandas.Timedelta object. Refer to the pandas documentation for more information about the accepted strings.

We provide a helper function for analyzing multiple time offsets at once: analyze_offsets. This function takes a list of offsets and computes the number of matching data pairs. Let’s see how it works by analyzing offsets of 0, 1, and 3 days.

offsets = ["0D", "1D", "3D"]
results = gr.analyze_offsets(ref, obs, offsets)

Hide code cell source

results = results.reset_index().set_index("index", drop=True)
results.index.name = "offset"
results
n_pairs
offset
0D 1
1D 2
3D 3

Let’s plot the width of the offset to understand the results better.

Hide code cell source

fig, axs = plt.subplots(nrows=len(offsets), ncols=1, sharex=True)

for i, (ax, offset) in enumerate(zip(axs, offsets)):
    ax2 = ax.twinx()
    
    for ts, value in obs_ts.items():
        x_start = ts - pd.Timedelta(offset)
        x_end = ts + pd.Timedelta(offset)
        ax.plot([x_start, x_end], [value, value], 
                color="gray", alpha=0.75, linewidth=5)
    
    for ts, value in ref_ts.items():
        x_start = ts - pd.Timedelta(offset)
        x_end = ts + pd.Timedelta(offset)
        ax2.plot([x_start, x_end], [value, value], 
                 color="blue", alpha=0.5, linewidth=5)
    
    obs_ts.plot(ax=ax, marker="o", linestyle="", label="obs", color="black")
    ref_ts.plot(ax=ax2, marker="o", linestyle="", label="ref", color="blue")
    
    ax.set_ylabel("obs", color="black")
    ax2.set_ylabel("ref", color="blue")
    
    ax.tick_params(axis="y", labelcolor="black")
    ax2.tick_params(axis="y", labelcolor="blue")

    ax.set_title(f"{pd.to_timedelta(offset).days} days")
    ax.grid()

fig.tight_layout()
../_images/05f9a5e15d3ab953a939ddfcf4b6d6207eb1ef0c8d68425612b6e4dd3634669c.png

For this simple example, we can demonstrate that a time offset of 3 days seems reasonable. This offset allows for several matching data pairs, while still being small enough to avoid grouping timestamps that are too far apart.

As mentioned, any points within the offset interval are averaged. The newly constructed timestamps are taken as the mean of the grouped timestamps. Given an offset of 3 days:

  • for the first period (January), a data pair is formed because the data point from each well fall within their respective offsets

  • for the second period (February), the first data point from the reference well is paired with the data point from the observation well, because their offset intervals intersect

  • for the third period (March), two data points and their offset interval fall within the offset interval of the single data point from the observation well. This means that a single data pair will be formed between the average of the data points from the reference well, and the single data point from the observation well.

Hide code cell source

offset = "3D"

fig, ax = plt.subplots(figsize=(6, 3))
ax2 = ax.twinx()

for ts, value in obs_ts.items():
    x_start = ts - pd.Timedelta(offset)
    x_end = ts + pd.Timedelta(offset)
    ax.plot([x_start, x_end], [value, value], 
            color="gray", alpha=0.5, linewidth=5)

for ts, value in ref_ts.items():
    x_start = ts - pd.Timedelta(offset)
    x_end = ts + pd.Timedelta(offset)
    ax2.plot([x_start, x_end], [value, value], 
             color="blue", alpha=0.5, linewidth=5)

obs_ts.plot(ax=ax, marker="o", linestyle="", label="obs", color="black")
ref_ts.plot(ax=ax2, marker="o", linestyle="", label="ref", color="blue")

ax.set_ylabel("obs", color="black")
ax2.set_ylabel("ref", color="blue")

ax.tick_params(axis="y", labelcolor="black")
ax2.tick_params(axis="y", labelcolor="blue")

periods = [
    ("2023-01-05", "2023-01-20", "Period 1"),
    ("2023-01-28", "2023-02-12", "Period 2"),
    ("2023-02-20", "2023-03-05", "Period 3")
]

colors = ["lightblue", "lightgreen", "grey"]
y_positions = [0.9, 0.1, 0.1]

for i, (start_date, end_date, label) in enumerate(periods):
    start = pd.Timestamp(start_date)
    end = pd.Timestamp(end_date)
    
    ax.axvspan(start, end, alpha=0.25, color=colors[i], zorder=0)
    
    center = start + (end - start) / 2
    y_min, y_max = ax.get_ylim()
    y_pos = y_min + (y_max - y_min) * y_positions[i]
    ax.text(center, y_pos, label, 
            ha="center", va="center", fontweight="bold",
            bbox=dict(boxstyle="round,pad=0.3", facecolor="white", alpha=0.8))

ax.grid()

plt.xticks(rotation=45)
plt.tight_layout()
../_images/f324e071a19bc3f1a8ab4b6cb44e9500fa2fecda9fe4549211f71451a8021409.png

When fitting these two wells with an offset of 3 days, three data pairs will formed as explained above.

This concludes this notebook on time offsets in gwrefpy. Happy fitting!