Pipelines

[source]

Pipeline

polyaxon.processing.pipelines.Pipeline(mode, name='Pipeline', subgraphs_by_features=None, shuffle=True, num_epochs=None)

Abstract InputPipeline class. All input pipelines must inherit from this. An InputPipeline defines how data is read, parsed, and separated into features and labels.

  • Args:
    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • name: str, name to give for this pipeline.
    • subgraphs_by_features: dict, list of modules to call for each feature to be processed.
    • shuffle: If true, shuffle the data.
    • num_epochs: Number of times to iterate through the dataset. If None, iterate forever.

[source]

TFRecordImagePipeline

polyaxon.processing.pipelines.TFRecordImagePipeline(mode, name='TFRecordImagePipeline', subgraphs_by_features=None, shuffle=True, num_epochs=None, data_files=None, meta_data_file=None)

Abstract InputPipeline class. All input pipelines must inherit from this. An InputPipeline defines how data is read, parsed, and separated into features and labels.

  • Args:
    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • name: str, name to give for this pipeline.
    • subgraphs_by_features: dict, list of modules to call for each feature to be processed
    • shuffle: If true, shuffle the data.
    • num_epochs: Number of times to iterate through the dataset. If None, iterate forever.

[source]

ParallelTextPipeline

polyaxon.processing.pipelines.ParallelTextPipeline(mode, name='ParallelTextPipeline', subgraphs_by_features=None, shuffle=True, num_epochs=None, source_files=None, target_files=None, source_delimiter='', target_delimiter='')

An input pipeline that reads two parallel (line-by-line aligned) text files.

  • Args:
    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • name: str, name to give for this pipeline.
    • subgraphs_by_features: dict, list of modules to call for each feature to be processed
    • shuffle: If true, shuffle the data.
    • num_epochs: Number of times to iterate through the dataset. If None, iterate forever.
    • source_files: An array of file names for the source data.
    • target_files: An array of file names for the target data. These must be aligned to the source_files.
    • source_delimiter: A character to split the source text on. Defaults to " " (space). For character-level training this can be set to the empty string.
    • target_delimiter: Same as source_delimiter but for the target text.

[source]

TFRecordSourceSequencePipeline

polyaxon.processing.pipelines.TFRecordSourceSequencePipeline(mode, name='TFRecordSourceSequencePipeline', subgraphs_by_features=None, shuffle=True, num_epochs=None, files=None, source_field='source', target_field='target', source_delimiter='', target_delimiter='')

An input pipeline that reads a TFRecords containing both source and target sequences.

  • Args:
    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • name: str, name to give for this pipeline.
    • subgraphs_by_features: dict, list of modules to call for each feature to be processed
    • shuffle: If true, shuffle the data.
    • num_epochs: Number of times to iterate through the dataset. If None, iterate forever.
    • files: An array of file names to read from.
    • source_field: The TFRecord feature field containing the source text.
    • target_field: The TFRecord feature field containing the target text.
    • source_delimiter: A character to split the source text on. Defaults to " " (space). For character-level training this can be set to the empty string.
    • target_delimiter: Same as source_delimiter but for the target text.

[source]

ImageCaptioningPipeline

polyaxon.processing.pipelines.ImageCaptioningPipeline(mode, name='ImageCaptioningPipeline', subgraphs_by_features=None, shuffle=True, num_epochs=None, files=None, image_field='image/data', image_format='jpg', caption_ids_field='image/caption_ids', caption_tokens_field='image/caption')

An input pipeline that reads a TFRecords containing both source and target sequences.

  • Args:
    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • name: str, name to give for this pipeline.
    • subgraphs_by_features: dict, list of modules to call for each feature to be processed
    • shuffle: If true, shuffle the data.
    • num_epochs: Number of times to iterate through the dataset. If None, iterate forever.
    • files: An array of file names to read from.
    • image_field: The TFRecord feature field containing the source images.
    • image_format: The images extensions.
    • caption_ids_field: The caption ids field.
    • caption_tokens_field: the caption tokends field.