Overview
Params provide values to the inputs and outputs definition, and can provide context only values as well.
The information provided in the params is injected after the globals context information, which can be accessed without prefix or using the params.*
prefix.
Top level access
The values of the inputs and the outputs are made available in the top level of the context, this means that when users are developing their Polyaxonfiles, they can use the variables defined in their inputs and outputs without any prefix:
version: 1.1
kind: component
inputs:
- name: intput1
type: str
- name: input2
type: str
run:
kind: job
container:
image: "image:test"
command: ["command"]
args: [
"--param1={{input1}}",
"--param2={{input2}}",
]
Getting all params
The params prefix allows to convert all params to a list of arguments:
version: 1.1
kind: component
inputs:
- name: intput1
type: str
- name: input2
type: str
run:
kind: job
container:
image: "image:test"
command: ["command"]
args: "{{ params.as_args }}"
Params prefix
Sometimes users will need to access the meta-information about their inputs and outputs and not just the values, for instance let take the following example, a component that returns a docker image:
version: 1.1
kind: component
outputs:
- name: destination
type: image
run:
kind: job
connections: ["{{ params.destination.connection }}"]
container:
image: "image:test"
command: ["command"]
args: [{{connections[params.destination.connection].url + '/' + destination}}]
And this is an example of an operation to use with this component:
version: 1.1
kind: component
name: build
params:
destination:
connection: docker-connection
value: "image:tag"
...
This component is using the params to pass the destination value as well as a connection that will be used to authenticate the registry, both the parameter’s value and connection are used in this component:
{{ destination }}
is only a shortcut for exposing inputs/outputs params’ values which also exists on the params
prefix, e.g. {{ params.destination.value }}
.
The params.*
prefix exposes several other information if the user needs to access metadata and not just the value,
these are all the information exposed under the params
prefix for each input/output:
{{params.param-name.value}}
: The value that will be checked against the IO type.{{params.param-name.connection}}
: A connection that is passed with the param.{{params.param-name.as_arg}}
: A string representing an argument representation of the param’s value, this only get injected if the value is not null, i.e.--param-name={{ value }}
or--param-name
if the value is of type boolean.{{params.param-name.as_str}}
: A string representing of the param’s value.{{params.param-name.type}}
: The type of the param based on the Input/Output configuration.
Params resolution order
It’s very important to understand how Polyaxon resolves params to build valid Polyaxonfiles.
Polyaxon resolves params in this order:
- top level params section.
- join params section.
- matrix params sections.
Le’s see the impact of this order with some examples.
Using params in joins
Let’s see first how we can use globals in joins:
kind: operation
...
joins:
- query: "metrics.loss:<0.1, project.name:{{ globals.project_name }}"
sort: "metrics.loss"
params:
tensorboards: {value: "artifacts.tensorboard", contextOnly: true}
In this example we are using the project_name
which is resolved from the current run,
to filter for all runs that we will request the tensorboard path from. (artifacts.tensorboard
requests the lineage information if reported.)
If we decide to template the max loss or to limit the number of runs to fetch the tensorboard path from, we can do the following:
kind: operation
...
joins:
- query: "metrics.loss:<{{ max_loss }}, project.name:{{ globals.project_name }}"
sort: "metrics.loss"
limit: "{{ limit }}"
params:
tensorboards: {value: "artifacts.tensorboard", contextOnly: true}
params:
max_loss:
value: 0.1
contextOnly: True
limit:
value: 5
contextOnly: True
This is not different from setting the values directly, but sometimes you may need to use a variable several times without using magic strings.
To take this to the next level, the params can come from an input definition:
kind: operation
...
joins:
- query: "metrics.loss:<{{ max_loss }}, project.name:{{ globals.project_name }}"
sort: "metrics.loss"
limit: "{{ limit }}"
params:
tensorboards: {value: "artifacts.tensorboard", contextOnly: true}
component:
inputs:
- {name: limit, type: int, value: 5, isOptional: true, description: "top experiments."}
- {name: max_loss, type: float, value: 0.01, isOptional: true, description: "maximum loss."}
Another use case is running a map reduce style DAG, in which case you may need to filter for operation coming from an upstream matrix:
version: 1.1
kind: component
inputs:
- {name: limit, type: int, value: 5, isOptional: true, description: "top experiments."}
- {name: min_accuracy, type: float, value: 0.9, isOptional: true, description: "min accuracy."}
run:
kind: dag
operations:
- name: tune
dagRef: ...
matrix:
kind: random
concurrency: 5
numRuns: 20
params:
learning_rate:
kind: linspace
value: 0.001:0.1:5
dropout:
kind: choice
value: [0.25, 0.3]
conv_activation:
kind: pchoice
value: [[relu, 0.1], [sigmoid, 0.8]]
epochs:
kind: choice
value: [5, 10]
- name: best_models
dagRef: ...
joins:
- query: "metrics.accuracy:>{{ min_accuracy }}, project.name:{{ tune_uuid }}, kind: job"
sort: "-metrics.accuracy"
limit: "{{ limit }}"
params:
uuids: {value: "globals.uuid", contextOnly: true}
learning_rates: {value: "inputs.learning_rate", contextOnly: true}
accuracies: {value: "outputs.accuracy", contextOnly: true}
losses: {value: "outputs.loss", contextOnly: true}
params:
limit:
ref: dag
value: inputs.limit
contextOnly: true
max_loss:
ref: dag
value: inputs.max_loss
contextOnly: true
tune_uuid:
ref: ops.tune
value: globals.uuid
contextOnly: true
This is a more complex example, but we will try to explain how information is passed from the DAG and from the operation to the last operation with joins.
- We declare a component with a DAG runtime. This component has 2 optional inputs.
- This dag has also two operations: one operation for running hyperparameter tuning followed by an operation that fetches the top experiments based on their accuracies.
- To restrict the
best_models
operation to only look for runs in thetune
operation, we are requesting first the uuid as acontextOnly
which allows us to declare
that param and instruct Polyaxon to not validate it against the IO declaration.
- We are also restricting the
best_models
operation to only look for operations with an accuracy higher than the parameter passed to our DAG, with a default equal to 0.9. - Finally, we are only interested in 5 runs or any number we decide to pass to the DAG with the limit input. Again, we use the param with
ref: dag
to request those values and make them available to the operation. - In the end we see that the join itself is exposing several params: the
uuids
of all runs found, theirlearning_rates
theiraccuracies
, and theirlosses
.
Note that we are using contextOnly
for all the params defined in the join.
If the definition of the component dagRef
for the best_models
operation is expecting inputs and outputs,
contextOnly
would not be necessary, and those values will be validated. An example of a such component would have the following IO definition:
inputs:
- {name: uuids, type: uuid, isList: true}
- {name: learning_rates, type: float, isList: true}
- {name: accuracies, type: float, isList: true}
- {name: losses, type: float, isList: true}
It’s important to use isList: true
otherwise, the compiler will raise an error because the join is returning a list and the user is expecting a single value.
Using params and joins in matrix
Similarly, since matrix.params
are the last to be resolved, you can use any param from the main params section or the joins section in your matrix definition.
matrix:
kind: random
concurrency: 5
numRuns: 20
params:
some_choice_param:
kind: choice
value: {{ param_choice }}
params:
param_choice:
value: [foo, bar]
contextOnly: true
A similar analysis can be applied here as well, the param_choice
value can be coming either from an upstream operation or as a dynamic input.