🌻Workflow

Introduction

The workflow module aims to help users to create reusable and persistable ML assets. It allows users to -

  • Automate and streamline complex tasks, making it easier to manage their assets.

  • Simple code modification to change their processing and logical functions to transform these data into Dagster assets.

  • Easily modify the workflows to accommodate new requirements as they emerge.

  • Saving time and computing costs in the long run.

By building reusable workflows, users can achieve greater efficiency and productivity and reduce the risk of errors and omissions.

Storage

This is the storage utilities for define assets' IO Manager. It defines at which each asset result is persisted. Now, Fiat support the following storage services and data formats -

class PersistStorage(Enum):
    Local = "local"
    OSS = "oss" # Alicloud Object Storage Service


class DataFormat(Enum):
    Binary = ".bin"
    JSON = ".json"
    PlainText = ".txt"

If you are using an external object storage service to persist assets, make sure you include authentication information within the application.json config file -

To get an IO Manager instance, the user only needs to call the get_io_manager util function -

Annotations

With annotation, you can easily create an asset -

Use it as a decorator, modify your code with @as_asset -

What if I want to specify a dependent asset? Sure! Just add the asset function name as an input argument. Moreover, you can set the dedicated IOManger for each asset with io_manager_key -

After defining all of your assets, you need to explicitly state your assets "Definitions" -

Ray Utils

We also provide a few Ray utilities to make you easy interact with your Ray computing cluster.

Adhoc Task On ray

When using adhoc_task_on_ray, the context manager will return a helper function. It accepts a function to be executed on the remote Ray cluster. The Fiat Copilot will wrap it into a Ray Remote Task and send it to the cluster.

EnsureMake sure the function can get its dependencies and data on the remote cluster.

Long Running Job Submission

Sometimes if your project has lots of dependencies or is fairly complicated. Submitting a running job may be a good choice. For this purpose, Fiat Copilot provides a programmatic approach to submit your job onto a remote Ray computing cluster.

You need to define the job description object and provide the target cluster URL -

Track and manage your assets in DagitUI

Start local Dagster dev server with the Fiat CLI dev run command -

Open your browser and visit http://127.0.0.1:3000 -

DagitUI

Then, you can materialize your assets and check your results at your storage -

Materialization
Check persisted assets' results

Last updated