build details

Show: section status errors & todos local changes recent changes last change in-page changes feedback controls

Performance objective

Modified 2018-10-30 by julian121266

Lane following (LF / LFV)

Modified 2018-10-31 by julian121266

As a performance indicator for both the “lane following task” and the “lane following task with other dynamic vehicles”, we choose the integrated speed $v(t)$ along the road (not perpendicular to it) over time of the Duckiebot. This measures the moved distance along the road per episode, where we fix the time length of an episode. This encourages both faster driving as well as algorithms with lower latency. An episode is used to mean running the code from a particular initial configuration.

$$ \objective_{P-LF(V)}(t) = \int_{0}^{t} - v(t) dt $$

The integral of speed is defined over the traveled distance of an episode up to time $t=T_{eps}$, where $T_{eps}$ is the length of an episode.

The way we measure this is in units of “tiles traveled”:

$$ \objective_{P-LF(V)}(t) = \text{# of tiles traveled} $$

Autonomous mobility on demand (AMoD)

Modified 2018-10-30 by julian121266

In an autonomous mobility-on-demand system a coordinated fleet of robotic taxis serves customers in an on-demand fashion. An operational policy for the system must optimize in three conflicting dimensions:

  1. The system must perform at the highest possible service level, i.e., at smallest possible wait times and smallest possible journey times.
  2. The system’s operation must be as efficient as possible, i.e., it must reduce its empty mileage to a minimum.
  3. The system’s capital cost must be as inexpensive as possible, i.e, the fleet size must be reduced to a minimum.

We consider robotic taxis that can carry one customer. To compare different AMoD system operational policies, we introduce the following variables:

\begin{align*} &d_E &= &\text{ empty distance driven by the fleet} \\ &d_C &= &\text{ occupied distance driven by the fleet} \\ &d_T = d_C + d_E &= &\text{ total distance driven by the fleet} \\ &N &= &\text{ fleet size} \\ &R &= &\text{ number of customer requests served} \\ &w_i &= &\text{ waiting time of request } i\in \{1,...,R\} \\ &W &= &\text{ total waiting time } W = \sum_{i=1}^{R} w_i \end{align*}

The provided simulation environment is designed in the standard reinforcement framework: Rewards are issued after each simulation step. The (undiscounted) sum of all rewards is the final score. The higher the score, the better the performance.

For the AMoD-Task, there are 3 different championships (sub-tasks) which constitute separate competitions. The simulation environment computes the reward value for each category and conatenates them into a vector of length 3, which is then communicated as feedback to the learning agent. The agent can ignore but the entry of the reward vector from the category that they wish to maximize.

The three championships are as follows:

No questions found. You can ask a question on the website.