Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

More on Road Network Matching. Road network matching shenanigans | by João Paulo Figueira | Feb, 2023

admin by admin
February 10, 2023
in Artificial Intelligence


Road network matching shenanigans

Photo by Denys Nevozhai on Unsplash

The purpose of this article is to complement and correct the previous one on the same subject. In that article, I presented an approach to reconstructing missing map-matched data from the Extended Vehicle Energy Dataset¹ (EVED) [1]. My technique explored the mathematical properties of triangles to find the lost data on a road network database quickly. Unfortunately, a subtle error in the code made it fail under some circumstances.

Here, I present a correction to the code and discuss the results of applying these techniques to reconstruct more than thirty-two thousand trajectories on the EVED.

In the previous article, I illustrated two methods to fit map-matched locations to road network segments using triangular properties. The geometrical concepts that support the idea are adequate, but the code needs to be corrected. The issue occurs on the node filtering part and affects both matching methods, the ratio-based and the distance-based. Let’s look at the original code again using the ratio-based function:

def get_matching_edge(self, latitude, longitude, bearing=None):
loc = np.array([latitude, longitude])
_, r = self.geo_spoke.query_knn(loc, 1)
radius = self.max_edge_length + r[0]
node_idx, dists = self.geo_spoke.query_radius(loc, radius)
nodes = self.ids[node_idx]
distances = dict(zip(nodes, dists))
adjacent_set = set()
graph = self.graph

best_edge = None
for node in nodes:
if node not in adjacent_set:
adjacent_nodes = np.intersect1d(np.array(graph.adj[node]),
nodes, assume_unique=True)

adjacent_set.update(adjacent_nodes)
for adjacent in adjacent_nodes:
edge_length = graph[node][adjacent][0]['length']
ratio = (distances[node] + distances[adjacent]) /
edge_length
if best_edge is None or ratio < best_edge[2]:
best_edge = (node, adjacent, ratio)

if bearing is not None:
best_edge = fix_edge_bearing(best_edge, bearing, graph)
return best_edge

The function uses a set to store all the processed adjacent nodes to avoid repetitions. And it does avoid repetitions, unfortunately. This is where the error lies because by excluding a node, the function is also preventing all other adjacent edges from being tested. This situation is especially prevalent for nodes with multiple links. Correcting the code requires a conceptual change, removing processed road network edges instead of processed nodes.

But another issue surfaced after using the code with a few different trajectories, a black swan. What happens when the input GPS location is close to one of the nodes? In some rare situations, the sampled GPS locations match one of the road network nodes very closely, which induces very subtle floating point instabilities. The result is the selection of a nonsensical edge, ruining the path reconstruction. To solve this issue, I added a default parameter with the minimum distance from the input GPS location to the closest node. When this distance is below one meter (three feet), the function refuses to select an edge and returns an empty value. This solution is moderate as this will happen relatively infrequently, and the ensuing reconstruction mechanism compensates for that.

You can see the corrected code below, featuring the set of tested road network edges and the minimum distance check.

def get_matching_edge(self, latitude, longitude,
bearing=None, min_r=1.0):
best_edge = None
loc = np.array([latitude, longitude])
_, r = self.geo_spoke.query_knn(loc, 1)
if r > min_r:
radius = self.max_edge_length + r[0]
node_idx, dists = self.geo_spoke.query_radius(loc, radius)
nodes = self.ids[node_idx]
distances = dict(zip(nodes, dists))
tested_edges = set()
graph = self.graph
node_set = set(nodes)

for node in nodes:
adjacent_nodes = node_set & set(graph.adj[node])

for adjacent in adjacent_nodes:
if (node, adjacent) not in tested_edges:
edge_length = graph[node][adjacent][0]['length']
ratio = edge_length / (distances[node] + distances[adjacent])

if best_edge is None or ratio > best_edge[2]:
best_edge = (node, adjacent, ratio)
tested_edges.add((node, adjacent))
tested_edges.add((adjacent, node))

if bearing is not None:
best_edge = fix_edge_bearing(best_edge, bearing, graph)
return best_edge

Note that I also changed the rate formula so that it ranges between zero and one, with larger values corresponding to a better fit. Interestingly, the rate-based function works better and faster than the distance-based one.

The first part of the path reconstruction implies applying the function above to every point of the input trajectory and keeping only the unique road network edges. The code below shows a Python function that does that.

def match_edges(road_network, trajectory):
edges = []
unique_locations = set()
edge_set = set()
for p in trajectory:
if p not in unique_locations:
e = road_network.get_matching_edge(*p, min_r=1.0)
if e is not None:
n0, n1, _ = e
edge = (n0, n1)
if edge not in edge_set:
edge_set.add(edge)
edges.append(edge)
unique_locations.add(p)
return edges

The function assigns a road network edge to each unique trajectory location. Figure 1 below shows a sample of such matching.

Figure 1 — The map above shows the input trajectory locations in red and the matched road network edges in blue. (Image source: Author using Folium and OpenStreetMap imagery)

As you can see, there are some gaps between the matched links due to the GPS sampling frequency. Reconstructing the path requires us to fill in these gaps, and we do so by finding the shortest way between the road network edges. The code below does that by keeping together the connected links and filling in the gaps for all the others using OSMnx’s services [2].

def build_path(rn, edges):
path = []
for e0, e1 in pairwise(edges):
if not len(path):
path.append(e0[0])
if e0[0] != e1[0] and e0[1] != e1[1]:
if e0[1] == e1[0]:
path.extend([e0[1], e1[1]])
else:
n0, n1 = int(e0[1]), int(e1[0])
sp = ox.distance.shortest_path(rn, n0, n1)
if sp is not None:
path.extend(sp[1:])
return path

After calling this function on the previous matches, we should have the trajectory fully reconstructed as a sequence of road network edges. We see a continuous result by superimposing the trajectory locations to the patched road network edges, as depicted in Figure 2 below.

Figure 2 — The map above shows the reconstructed link sequence with the original trajectory locations superimposed. (Image source: Author using Folium and OpenStreetMap imagery)

The Case for a Minimum Distance

While reviewing this solution, I found an issue when iterating through the individual trajectories. Figure 3 below details the problem.

Figure 3 — The image above shows a corner case where this road network edge matching system breaks down. One of the trajectory’s locations falls right over one of the nodes, fooling the matching process. (Image source: Author using Folium and OpenStreetMap imagery)

A spurious cycle in the route arises from an unexpected value in the input data for the patching process. We can dig deeper into what is happening by reverting to the view showing which road network edges the algorithm matches to these GPS locations.

Figure 4 below illustrates the situation at the fork, where the matched GPS location coincides (or is extremely close) to one of the road network nodes. This will likely cause some numeric instability yielding a wrong result. A robust algorithm should avoid these situations, but how?

Figure 4 — The matching problem becomes evident once we see the location assignments. Clearly, in the above case, the very close proximity of the GPS location to one of the road network nodes induced instability in the edge selection process. (Image source: Author using Folium and OpenStreetMap imagery)

After considering it, I decided to change both matching algorithms and use the distance to the closest road network node as a criterion to include or exclude the input location. If this distance is smaller than a given threshold (one meter or three feet as default), the match function returns an empty value signaling the calling code that it cannot safely assign a link. This is fine for the following step as it should infer the missing links because that algorithm depends only on road network edges and not the input locations.

Now that we have sanitized the whole algorithm, let us see how it performs on the entire dataset.

To assess how the algorithm behaves on the whole EVED, I created a new standalone script that iterates through all trajectories, matches the edges, generates a path, and compares the original trajectory’s length to the reconstructed path. The Python script records all the differences so we can later analyze them using a standard statistical toolset.

The script core lies in its main loop, detailed in the code below.

def process_trajectories():
rn = download_network()
road_network = RoadNetwork(rn)

state = load_state()
if state is None:
state = {
"trajectories": get_trajectories(),
"errors": []
}

save_counter = 0
trajectories = state["trajectories"]
while len(trajectories) > 0:
trajectory_id = trajectories[0]
trajectory = load_trajectory_points(trajectory_id,
unique=True)
if len(trajectory) > 3:
edges = match_edges(road_network, trajectory)
path = build_path(rn, edges)

if len(path) > 0:
diff = calculate_difference(rn, path, trajectory)
print(f"Trajectory: {trajectory_id}, Difference: {diff}")
state["errors"].append((trajectory_id, diff))

trajectories = trajectories[1:]
state["trajectories"] = trajectories

save_counter += 1
if save_counter % 100 == 0:
save_state(state)

save_state(state)

The function starts by downloading and preprocessing the road network data from OpenStreetMap. Because this is a long-running script, I added a persistent state serialized as a JSON-formatted text file and saved every 100 trajectories. This feature allows the user to interrupt the Python script without losing all the data from already-processed trajectories. The function that calculates the length difference between the original input trajectory and the reconstructed path is displayed below.

def calculate_difference(rn, path, trajectory):
p_loc = np.array([(rn.nodes[n]['y'], rn.nodes[n]['x']) for n in path])
t_loc = np.array([(t[0], t[1]) for t in trajectory])

p_length = vec_haversine(p_loc[1:, 0], p_loc[1:, 1],
p_loc[:-1, 0], p_loc[:-1, 1]).sum()
t_length = vec_haversine(t_loc[1:, 0], t_loc[1:, 1],
t_loc[:-1, 0], t_loc[:-1, 1]).sum()
return p_length - t_length

The code converts both inputs into sequences of geolocations, computes their cumulative lengths, and finally computes their difference. When the script finishes, we can load the state data and analyze it using Pandas.

We see how the difference between both distances distributes in Figure 5 below (named “error“).

Figure 5 — The histogram above shows the distribution of the distance differences between the original input trajectories and the reconstructed paths. The horizontal axis displays distances in meters. (Image source: Author)

To see more detail around zero, I clipped the histogram and created the one in Figure 6 below.

Figure 6 — The histogram above details the difference distribution, highlighting the negative and majority positive differences. Most differences are minor, hinting at a good performance of the matching algorithm. (Image source: Author)

As you can see, there are some differences in the negative part of the spectrum. Still, it is primarily positive, meaning that the reconstructed path lengths are usually longer than trajectories, which is expectable. Remember that trajectory points seldom match road network nodes, meaning that the most common scenario should be a match midway through a segment.

We can also inspect where the mid-fifty percent of differences is by drawing a box-and-whiskers plot, as depicted below in Figure 7.

Figure 7 — The box-and-whiskers plot above shows where the middle 50% of the distribution lies. The 25%, 50%, and 75% percentiles are 13.0, 50.2, and 156.5 meters, respectively. We can also see that the inliers band is relatively narrow, ranging from -202.3 to 371.8 meters. (Image source: Author)

The image above contains essential information on the distribution of differences and the number of outliers. Using Tukey’s fences method, we can classify as outliers 5,332² trajectories out of 32,528. This proportion represents a relatively high percentage value of 16.4%, meaning there are probably better reconstruction methods than this or that there is a sizable amount of poorly processed trajectories. We can look into some of the most dramatic distance difference outliers to try and understand these results.

Outlier Analysis

As previously noted, there are over five-thousand trajectories with poor matching. We can quickly inspect a few to determine why the difference between the sampled trajectory and the reconstruction is so significant. We start with a revealing case, trajectory number 59, depicted below in Figure 8.

Figure 8 — The trajectory displayed above illustrates a map-matching error that led the reconstruction process to build a longer than necessary path. (Image source: Author using Folium and OpenStreetMap imagery)

The case above shows that the map-matching process incorrectly placed three GPS samples on a parallel road, forcing the reconstruction process to find a spurious detour. This type of error might result from the inadequate parameterization of the map-matching process, something to research in a later article.

Finally, we can see a very radical case in Figure 9 below. The sampled GPS locations are so far away from the road that the map-matching process gave up. As you can see, the reconstruction process assigns road network links that bear no relation to the intended trajectory.

Figure 9 — The trajectory pictured above contains numerous errors, and the initial map-matching procedure could not eliminate them. (Image source: Author using Folium and OpenStreetMap imagery)

In this article, I corrected previously published code on road network edge matching and assessed its performance using the EVED dataset. I collected the performance assessment data using a specialized Python script that revealed a relatively high number of matched trajectories with seemingly abnormal deviations in the measured distances. To understand why this result could be better, I will turn next to the original map-matching software, Valhalla, to try and fix the issues above.

  1. The original paper authors licensed the dataset under the Apache 2.0 License (see the VED and EVED GitHub repositories). Note that this also applies to derivative work.
  2. The source of all images and calculations is in a Jupyter notebook, available from the public GitHub repository.

[1] Zhang, S., Fatih, D., Abdulqadir, F., Schwarz, T., & Ma, X. (2022). Extended vehicle energy dataset (eVED): An enhanced large-scale dataset for deep learning on vehicle trip energy consumption. ArXiv. https://doi.org/10.48550/arXiv.2203.08630

[2] Boeing, G. 2017. OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks. Computers, Environment and Urban Systems 65, 126–139. doi:10.1016/j.compenvurbsys.2017.05.004



Source link

Previous Post

Satellite Image Recognition of Buildings with a Drive-Thru | by Taras Skavinskyy | Feb, 2023

Next Post

Real-time Translations with AI – KDnuggets

Next Post

Real-time Translations with AI - KDnuggets

Long-wavelength VCSEL for Consumer Applications

Monitoring Lake Mead drought using the new Amazon SageMaker geospatial capabilities

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.