We introduce a new dataset for Cross-Lingual Vision-Language Navigation.
The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions.
XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.
Data is formatted as follows:
{
"distance": float,
"scan": str,
"path_id": int,
"path": [str x num_steps],
"heading": float,
"instructions": [str x 3],
}
distance: length of the path in meters.scan: Matterport scan id.path_id: Unique id for this path.path: List of viewpoint ids (the first is is the start location, the last is the goal location)heading: Agents initial heading in radians (elevation is always assumed to be zero).instructions: Three unique natural language strings describing how to find the goal given the start pose.
For the test set, only the first path_id (starting location) is included (a test server is hosted by Anderson et al. for scoring uploaded trajectories).