Skip to content

Segments

Segments are used to extract a sequence of statements from a data science program to give the sequence a name and make it reusable. In the following discussion we explain how to declare a segment and how to call it.

Declaring a Segment

Minimal Example

Let's look at a minimal example of a segment:

segment loadMovieRatingsSample() {}

This declaration of a segment has the following syntactic elements:

  • The keyword segment.
  • The name of the segment, here loadMovieRatingsSample. This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to use lowerCamelCase for the names of segments.
  • The list of parameters (i.e. inputs) of the segment. This is delimited by parentheses. In the example above, the segment has no parameters.
  • The body of the segment, which contains the statements that should be run when the segment is called. The body is delimited by curly braces. In this example, the body is empty, so running this segment does nothing.

Parameters

To make a segment configurable, add parameters (inputs). We will first show how to declare parameters and afterwards how to refer to them in the body of the segment.

Parameter Declaration

Parameters must be declared in the header of the segment so callers know they are expected to pass them as an argument, and so we can use them in the body of the segment.

In the following example, we give the segment a single parameters with name nInstances and type Int.

segment loadMovieRatingsSample(nInstances: Int) {}

More information about parameters can be found in the linked document.

References to Parameters

Within the segment we can access the value of a parameter using a reference. Here is a basic example where we print the value of the nInstances parameter to the console:

segment loadMovieRatingsSample(nInstances: Int) {
    print(nInstances);
}

More information about references can be found in the linked document.

Statements

In order to describe what should be done when the segment is executed, we need to add statements to its body. The previous example in the section "References to Parameters" already contained a statement - an expression statement to be precise. Here is another example, this time showing an assignment:

segment loadMovieRatingsSample(nInstances: Int) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}

More information about statements can be found in the linked document. Note particularly, that all statements must end with a semicolon.

Results

Results (outputs) are used to return values that are produced inside the segment back to the caller. First, we show how to declare the available results of the segment and then how to assign a value to them.

Result Declaration

As with parameters we first need to declare the available results in the headed. This tells callers that they can use these results and reminds us to assign a value to them in the body of the segment. Let's look at an example:

segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}

We added two results to the segment: The first one is called features and has type Dataset, while the second one is called target and also has type Dataset.

More information about the declaration of results can be found in the linked document.

Assigning to Results

Currently, the program will not compile since we never assigned a value to these results. This can be done with an assignment and the yield keyword:

segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
    yield features = movieRatingsSample.keepAttributes(
        "leadingActor",
        "genre",
        "length"
    );
    yield target = movieRatingsSample.keepAttributes(
        "rating"
    );
}

In the assignment beginning with yield features = we specify the value of the result called features, while the next assignment beginning with yield target = assigns a value to the target result.

The order of the result declarations does not need to match the order of assignment. However, each result must be assigned exactly once. Note that unlike the return in other programming languages, yield does not stop the execution of the segment, which allows assignments to different results to be split across multiple statements.

Visibility

By default, a segment can be imported in any other file and reused there. We say they have public visibility. However, it is possible to restrict the visibility of a segment with modifiers:

internal segment internalSegment() {}

private segment privateSegment() {}

The segment internalSegment is only visible in files with the same package. The segment privateSegment is only visible in the file it is declared in.

Calling a Segment

Inside a pipeline, another segment, or a lambda we can then call a segment, which means the segment is executed when the call is reached: The results of a segment can then be used as needed. In the following example, where we call the segment loadMovieRatingsSample that we defined above, we assign the results to placeholders:

val features, val target = loadMovieRatingsSample(nInstances = 1000);

More information about calls can be found in the linked document.