Code Walk-through
The goal of this page is to briefly introduce you to our raynet library. This page is mainly focused for those of you that want to have a better understanding of the main modules implemented in our codebase in order to help you familiarize with it.
Using different datasets as input
We want to be able to deal with multiple datasets however, not all of them can be parsed using the same format. To this end, we created the Dataset class that handles various datasets in a generic manner. In principle, a dataset is defined as a collection of scenes, however different datasets are represented with different folder conventions. Until now, we have tested our implementation on two challenging datasets,
For example, the Aerial Dataset is represented by a directory that contains
one directory for every scene. Every scene is represented by a directory with
two inner directories containing the views and the camera poses. On the
contrary, the DTU Dataset is represented by a directory that contains three
subdirectories, one for the camera poses, one the raw images of every scene and
one for the ground-truth data of every scene. To this end, we created two
wrapper classes to the Dataset
class to handle the corresponding dataset
types: RestrepoDataset
and the DTUDataset
. In order to create an instance
of a of Dataset
class it simply suffices to specify two arguments,
- dataset_directory: The path to the folder containing the dataset.
- select_neighbors_based_on: This argument is used to control how
neighbouring views are selected. We provide two options based either
on their geometrical distance or their order in the file system. They can be
selected with
distance
andfilesystem
respectively.
For instance one can create a RestrepoDataset
object just by writing
In [1]: from raynet.common.dataset import RestrepoDataset
In [2]: dataset = RestrepoDataset(
"/path/to/folder/containing/the/probabilistic_reconstruction_data_downsampled/",
"distance"
)
Every dataset is defined as a set of scenes and every scene can be specified with a unique scene index. Therefore, the API for the Dataset class is
get_scene(scene_idx)
and it creates a Scene object. For the DTU Dataset we use the provided scene indices, while for the Aerial Dataset we map the scenes to indices based on their alphabetical order.
A Scene is defined as a collection of raw images, camera poses, ground-truth data as well as a bounding box that specifies the borders of the scene. However, similar to the datasets, not all scenes are represented using the same format. Therefore, we implement two wrappers, one for the scenes following the format of the Aerial Dataset and one for the scenes following the format of the DTU Dataset. Their API is the following,
get_image(i)
: Returns the image of the current scene.get_images()
: Returns a list of Image objects one for every image of the current scene.get_random_image()
: Returns anImage
object for a random image of the current scene.get_image_with_neighbors(i, neighbors)
: Returns a list ofImage
objects, where the first is the image and the rest are the neighbouring views to this image. The neighbouring views are selected based on theselect_neighbors_based_on
argument analysed before.get_depth_for_pixel(i, y, x)
: Returns the ground-truth depth value of the pixel of the image.get_depth_map(i)
: Returns the ground-truth depth map for the image.get_depthmaps()
: Returns a list with numpy arrays containing the corresponding depth map for every image in the current scene.get_pointcloud()
: Returns a Pointcloud object containing the ground-truth point cloud of the current scene.
In case you want to use a dataset that follows a different format, you need to implement a wrapper on the Dataset and on the Scene class based on your requirements.
Training different networks
Now that we have analysed how one can use different datasets as inputs it is also worth mentioning how one can train different networks to perform the 3D reconstruction task. We have built in various architectures that can be used to extract similarity features between patches from different views. Part of the code that defines those architectures is shown below. The full code can be found here.
- simple_cnn: Each layer comprises convolution, spatial batch normalization and a ReLU non-linearity. We repeat this schemes 5 times but we remove the ReLU from the last layer in order to retain information encoded both in the negative and positive range. The receptive field of this architecture is .
common_params = dict(
filters=32,
kernel_size=3
)
Sequential([
Conv2D(input_shape=input_shape, **common_params),
BatchNormalization(),
Activation("relu"),
Conv2D(**common_params),
BatchNormalization(),
Activation("relu"),
Conv2D(**common_params),
BatchNormalization(),
Activation("relu"),
Conv2D(**common_params),
BatchNormalization(),
Activation("relu"),
Conv2D(**common_params),
BatchNormalization()
])
- simple_cnn_ln: This architecture is the same as the above, with the only difference that we have replaced the spatial batch normalization with layer normalization. The receptive field of this architecture is .
common_params = dict(
filters=32,
kernel_size=3,
)
Sequential([
Conv2D(input_shape=input_shape, **common_params),
LayerNormalization(),
Activation("relu"),
Conv2D(**common_params),
LayerNormalization(),
Activation("relu"),
Conv2D(**common_params),
LayerNormalization(),
Activation("relu"),
Conv2D(**common_params),
LayerNormalization(),
Activation("relu"),
Conv2D(**common_params),
LayerNormalization()
])
- dilated_cnn_receptive_field_25: For this architecture we also utilize dilated convolutional layers in order to be able to increase the receptive field without increasing the number of parameters. Again we employ RELU non-linearity and we remove it from the last layer. The receptive field of this architecture is .
Sequential([
Conv2D(
filters=32,
kernel_size=5,
input_shape=input_shape,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=5,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=5,
kernel_regularizer=kernel_regularizer,
dilation_rate=2
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer,
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("relu"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization()
])
- dilated_cnn_receptive_field_25_with_tanh: This architecture is the same as the above with the only difference that we have replaced the RELU non-linearities with tanh non-linearities. Again the receptive field is .
Sequential([
Conv2D(
filters=32,
kernel_size=5,
input_shape=input_shape,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=5,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=5,
kernel_regularizer=kernel_regularizer,
dilation_rate=2
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer,
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization(),
Activation("tanh"),
Conv2D(
filters=32,
kernel_size=3,
kernel_regularizer=kernel_regularizer
),
BatchNormalization()
])
Inferring 3D Reconstructions
We provide three factories that can be used to test a previously trained
models. The multi_view_cnn
factory can be used to test a Multi-View CNN
model, (namely without the MRF) and estimates discretized depth maps at
uniformly sampled depth hypotheses. Similar, the multi_view_cnn_voxel_space
factory is the same with the multi_view_cnn
factory, with the only difference
that it predicts discretized depths on the voxel grid, defined using the
bounding box of the scene. Finally, the raynet
factory can be used to infer
the 3D Model of a scene using our end-to-end trainable model. All factories
share the same API,
forward_pass(scene, images_range)
Arguments
- scene: A Scene that specifies the scene to be processed
- images_range: A tuple that specifies the indices of the views of the scene to be used for the reconstruction
Returns
Given a Scene
object and an image range that holds the views to be used for
the reconstruction, we predict a corresponding depth-map for every view.