Skip to content

Segments

Segments are used to extract a sequence of statements from a data science program to give the sequence a name and make it reusable. In the following discussion we explain how to declare a segment and how to call it.

Declaring a Segment

Minimal Example

Let's look at a minimal example of a segment:

segment loadMovieRatingsSample() {}

This declaration of a segment has the following syntactic elements:

  • The keyword segment.
  • The name of the segment, here loadMovieRatingsSample. This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest using lowerCamelCase for the names of segments.
  • The list of parameters (i.e. inputs) of the segment. This is delimited by parentheses. In the example above, the segment has no parameters.
  • The body of the segment, which contains the statements that should be run when the segment is called. The body is delimited by curly braces. In this example, the body is empty, so running this segment does nothing.

Parameters

Parameters define the expected inputs of some declaration that can be called. We refer to such declarations as callables. We distinguish between

Required Parameters

Required parameters must always be passed when the declaration is called. Let us look at an example:

requiredParameter: Int

Here are the pieces of syntax:

  • The name of the parameter (here requiredParameter). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to use lowerCamelCase for the names of parameters.
  • A colon.
  • The type of the parameter (here Int).

Optional Parameters

Optional parameters have a default value and, thus, need not be passed as an argument unless the default value does not fit. Here is an example:

optionalParameter: Int = 1

These are the syntactic elements:

  • The name of the parameter (here optionalParameter). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to use lowerCamelCase for the names of parameters.
  • A colon.
  • The type of the parameter (here Int).
  • An equals sign.
  • The default value of the parameter (here 1). This must be a constant expression, i.e. something that can be evaluated by the compiler. Particularly calls usually do not fulfill this requirement.

Complete Example

Let us now look at a full example of a segment called doSomething with one required parameter and one optional parameter:

segment doSomething(requiredParameter: Int, optionalParameter: Boolean = false) {
    // ...
}

The interesting part is the list of parameters, which uses the following syntactic elements:

  • An opening parenthesis.
  • A list of parameters, the syntax is as described above. They are separated by commas. A trailing commas is permitted.
  • A closing parenthesis.

Restrictions

Several restrictions apply to the order of parameters and to combinations of the various categories of parameters:

Parameters Old

To make a segment configurable, add parameters (inputs). We will first show how to declare parameters and afterwards how to refer to them in the body of the segment.

Parameter Declaration

Parameters must be declared in the header of the segment so callers know they are expected to pass them as an argument, and so we can use them in the body of the segment.

In the following example, we give the segment a single parameters with name nInstances and type Int.

segment loadMovieRatingsSample(nInstances: Int) {}

More information about parameters can be found in the linked document.

References to Parameters

Within the segment we can access the value of a parameter using a reference. Here is a basic example where we print the value of the nInstances parameter to the console:

segment loadMovieRatingsSample(nInstances: Int) {
    print(nInstances);
}

More information about references can be found in the linked document.

Results

Results define the outputs of some declaration when it is called. Here is an example:

result: Int

Here is a breakdown of the syntax:

  • The name of the result (here result). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to use lowerCamelCase for the names of parameters.
  • A colon.
  • The type of the parameter (here Int).

Complete Example

Let us now look at a full example of a segment called doSomething with two results:

segment doSomething() -> (result1: Int, result2: Boolean) {
    // ...
}

The interesting part is the list of results, which uses the following syntactic elements:

  • An arrow ->.
  • An opening parenthesis.
  • A list of results, the syntax is as described above. They are separated by commas. A trailing commas is permitted.
  • A closing parenthesis.

Shorthand Version: Single Result

In case that the callable produces only a single result, we can omit the parentheses. The following two declarations are, hence, equivalent:

segment doSomething1() -> (result: Int) {}
segment doSomething2() -> result: Int {}

Shorthand Version: No Results

In case that the callable produces no results, we can usually omit the entire results list. The following two declarations are, hence equivalent:

segment doSomething1() -> () {}
segment doSomething2() {}

The notable exception are callable types, where the result list must always be specified even when it is empty.

Results (outputs) are used to return values that are produced inside the segment back to the caller. First, we show how to declare the available results of the segment and then how to assign a value to them.

Result Declaration

As with parameters we first need to declare the available results in the headed. This tells callers that they can use these results and reminds us to assign a value to them in the body of the segment. Let's look at an example:

segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}

We added two results to the segment: The first one is called features and has type Dataset, while the second one is called target and also has type Dataset.

More information about the declaration of results can be found in the linked document.

Assigning to Results

Currently, the program will not compile since we never assigned a value to these results. This can be done with an assignment and the yield keyword:

segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
    yield features = movieRatingsSample.keepAttributes(
        "leadingActor",
        "genre",
        "length"
    );
    yield target = movieRatingsSample.keepAttributes(
        "rating"
    );
}

In the assignment beginning with yield features = we specify the value of the result called features, while the next assignment beginning with yield target = assigns a value to the target result.

The order of the result declarations does not need to match the order of assignment. However, each result must be assigned exactly once. Note that unlike the return in other programming languages, yield does not stop the execution of the segment, which allows assignments to different results to be split across multiple statements.

Yielding Results

In addition to the declaration of placeholders, assignments are used to assign a value to a result of a segment or declare results of a block lambda.

Yielding Results of Segments

The following snippet shows how we can assign a value to a declared result of a segment:

segment trulyRandomInt() -> result: Int {
    yield result = 1;
}

The assignment here has the following syntactic elements:

  • The keyword yield, which indicates that we want to assign to a result.
  • The name of the result, here greeting. This must be identical to one of the names of a declared result in the header of the segment.
  • An = sign.
  • The expression to evaluate (right-hand side).
  • A semicolon at the end.

Statements

In order to describe what should be done when the segment is executed, we need to add statements to its body. The previous example in the section "References to Parameters" already contained a statement - an expression statement to be precise. Here is another example, this time showing an assignment:

segment loadMovieRatingsSample(nInstances: Int) {
    val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}

More information about statements can be found in the linked document. Note particularly, that all statements must end with a semicolon.

Visibility

By default, a segment can be imported in any other file and reused there. We say they have public visibility. However, it is possible to restrict the visibility of a segment with modifiers:

internal segment internalSegment() {}

private segment privateSegment() {}

The segment internalSegment is only visible in files with the same package. The segment privateSegment is only visible in the file it is declared in.

Calling a Segment

Inside a pipeline, another segment, or a lambda we can then call a segment, which means the segment is executed when the call is reached: The results of a segment can then be used as needed. In the following example, where we call the segment loadMovieRatingsSample that we defined above, we assign the results to placeholders:

val features, val target = loadMovieRatingsSample(nInstances = 1000);

More information about calls can be found in the linked document.