Segments¶
Segments are used to extract a sequence of statements from a data science program to give the sequence a name and make it reusable. In the following discussion we explain how to declare a segment and how to call it.
Declaring a Segment¶
Minimal Example¶
Let's look at a minimal example of a segment:
This declaration of a segment has the following syntactic elements:
- The keyword
segment
. - The name of the segment, here
loadMovieRatingsSample
. This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest usinglowerCamelCase
for the names of segments. - The list of parameters (i.e. inputs) of the segment. This is delimited by parentheses. In the example above, the segment has no parameters.
- The body of the segment, which contains the statements that should be run when the segment is called. The body is delimited by curly braces. In this example, the body is empty, so running this segment does nothing.
Parameters¶
Parameters define the expected inputs of some declaration that can be called. We refer to such declarations as callables. We distinguish between
- required parameters, which must always be passed, and
- optional parameters, which use a default value if no value is passed explicitly.
Required Parameters¶
Required parameters must always be passed when the declaration is called. Let us look at an example:
Here are the pieces of syntax:
- The name of the parameter (here
requiredParameter
). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to uselowerCamelCase
for the names of parameters. - A colon.
- The type of the parameter (here
Int
).
Optional Parameters¶
Optional parameters have a default value and, thus, need not be passed as an argument unless the default value does not fit. Here is an example:
These are the syntactic elements:
- The name of the parameter (here
optionalParameter
). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to uselowerCamelCase
for the names of parameters. - A colon.
- The type of the parameter (here
Int
). - An equals sign.
- The default value of the parameter (here
1
). This must be a constant expression, i.e. something that can be evaluated by the compiler. Particularly calls usually do not fulfill this requirement.
Complete Example¶
Let us now look at a full example of a segment called doSomething
with one required parameter and one optional parameter:
The interesting part is the list of parameters, which uses the following syntactic elements:
- An opening parenthesis.
- A list of parameters, the syntax is as described above. They are separated by commas. A trailing commas is permitted.
- A closing parenthesis.
Restrictions¶
Several restrictions apply to the order of parameters and to combinations of the various categories of parameters:
- After an optional parameter all parameters must be optional.
Parameters Old¶
To make a segment configurable, add parameters (inputs). We will first show how to declare parameters and afterwards how to refer to them in the body of the segment.
Parameter Declaration¶
Parameters must be declared in the header of the segment so callers know they are expected to pass them as an argument, and so we can use them in the body of the segment.
In the following example, we give the segment a single parameters with name nInstances
and type Int
.
More information about parameters can be found in the linked document.
References to Parameters¶
Within the segment we can access the value of a parameter using a reference. Here is a basic example where we print the value of the nInstances
parameter to the console:
More information about references can be found in the linked document.
Results¶
Results define the outputs of some declaration when it is called. Here is an example:
Here is a breakdown of the syntax:
- The name of the result (here
result
). This can be any combination of upper- and lowercase letters, underscores, and numbers, as long as it does not start with a number. However, we suggest to uselowerCamelCase
for the names of parameters. - A colon.
- The type of the parameter (here
Int
).
Complete Example¶
Let us now look at a full example of a segment called doSomething
with two results:
The interesting part is the list of results, which uses the following syntactic elements:
- An arrow
->
. - An opening parenthesis.
- A list of results, the syntax is as described above. They are separated by commas. A trailing commas is permitted.
- A closing parenthesis.
Shorthand Version: Single Result¶
In case that the callable produces only a single result, we can omit the parentheses. The following two declarations are, hence, equivalent:
Shorthand Version: No Results¶
In case that the callable produces no results, we can usually omit the entire results list. The following two declarations are, hence equivalent:
The notable exception are callable types, where the result list must always be specified even when it is empty.
Results (outputs) are used to return values that are produced inside the segment back to the caller. First, we show how to declare the available results of the segment and then how to assign a value to them.
Result Declaration¶
As with parameters we first need to declare the available results in the headed. This tells callers that they can use these results and reminds us to assign a value to them in the body of the segment. Let's look at an example:
segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}
We added two results to the segment: The first one is called features
and has type Dataset
, while the second one is called target
and also has type Dataset
.
More information about the declaration of results can be found in the linked document.
Assigning to Results¶
Currently, the program will not compile since we never assigned a value to these results. This can be done with an assignment and the yield
keyword:
segment loadMovieRatingsSample(nInstances: Int) -> (features: Dataset, target: Dataset) {
val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
yield features = movieRatingsSample.keepAttributes(
"leadingActor",
"genre",
"length"
);
yield target = movieRatingsSample.keepAttributes(
"rating"
);
}
In the assignment beginning with yield features =
we specify the value of the result called features
, while the next assignment beginning with yield target =
assigns a value to the target
result.
The order of the result declarations does not need to match the order of assignment. However, each result must be assigned exactly once. Note that unlike the return
in other programming languages, yield
does not stop the execution of the segment, which allows assignments to different results to be split across multiple statements.
Yielding Results¶
In addition to the declaration of placeholders, assignments are used to assign a value to a result of a segment or declare results of a block lambda.
Yielding Results of Segments¶
The following snippet shows how we can assign a value to a declared result of a segment:
The assignment here has the following syntactic elements:
- The keyword
yield
, which indicates that we want to assign to a result. - The name of the result, here
greeting
. This must be identical to one of the names of a declared result in the header of the segment. - An
=
sign. - The expression to evaluate (right-hand side).
- A semicolon at the end.
Statements¶
In order to describe what should be done when the segment is executed, we need to add statements to its body. The previous example in the section "References to Parameters" already contained a statement - an expression statement to be precise. Here is another example, this time showing an assignment:
segment loadMovieRatingsSample(nInstances: Int) {
val movieRatingsSample = loadDataset("movieRatings").sample(nInstances = 1000);
}
More information about statements can be found in the linked document. Note particularly, that all statements must end with a semicolon.
Visibility¶
By default, a segment can be imported in any other file and reused there. We say they have public visibility. However, it is possible to restrict the visibility of a segment with modifiers:
The segment internalSegment
is only visible in files with the same package. The segment privateSegment
is only visible in the file it is declared in.
Calling a Segment¶
Inside a pipeline, another segment, or a lambda we can then call a segment, which means the segment is executed when the call is reached: The results of a segment can then be used as needed. In the following example, where we call the segment loadMovieRatingsSample
that we defined above, we assign the results to placeholders:
More information about calls can be found in the linked document.