Building scalable ML workflows – DEV Community
December 22, 2024

Building scalable ML workflows – DEV Community

Not long ago, I wrote an article introduce Trafiguraan open source project I have been developing. In short, Tork is a general-purpose decentralized workflow engine suitable for a variety of workloads. At my job, we mainly use it for CPU/GPU intensive tasks, such as processing digital assets (3D, video, imaging, etc.), and as CI/CD tools for internal PaaS.

Lately, I’ve been thinking about how to leverage Tork to run machine learning type workloads. I am particularly inspired by become project and wanted to see if I could do something similar, but using a plain old Docker image model file.

Given that machine learning workloads often consist of different, interdependent stages, such as data preprocessing, feature extraction, model training, and inference, having an engine that can orchestrate these steps is critical. These stages often require different types of computing resources (e.g., CPU for preprocessing, GPU for training) and can benefit greatly from parallelization.

Additionally, elasticity is a key requirement when running machine learning workflows. Interruptions, whether due to hardware failures, network issues or resource constraints, can cause significant setbacks, especially for long-running processes such as model training.

The requirements were very similar to my other non-ML workloads, so I decided to test all my theories to see what it would take to execute a simple ML workflow on Tork.


experiment

For the first experiment, we try to perform a simple sentiment analysis inference task:

Download the latest Tok binary file and unzip it.

tar xvzf tork_0.1.109_darwin_arm64.tgz
Enter full screen mode

Exit full screen mode

Start Trafigura standalone model:

./tork run standalone
Enter full screen mode

Exit full screen mode

If all goes well, you should look like this:

...
10:36PM INF Coordinator listening on http://localhost:8000
...

Enter full screen mode

Exit full screen mode

Next, we need to build a Docker image containing the model and necessary inference scripts. Tork tasks are usually executed within Docker containers.

inference.py

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import os

MODEL_NAME = os.getenv("MODEL_NAME")

def load_model_and_tokenizer(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    return tokenizer, model

def predict_sentiment(text, tokenizer, model):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

    with torch.no_grad():
        outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_label = torch.argmax(predictions, dim=1).item()
    confidence = predictions[0][predicted_label].item()

    return predicted_label, confidence

if __name__ == "__main__":
    tokenizer, model = load_model_and_tokenizer(MODEL_NAME)
    text = os.getenv("INPUT_TEXT")
    label, confidence = predict_sentiment(text, tokenizer, model)
    sentiment_map = {0: "Negative", 1: "Positive"}
    sentiment = sentiment_map[label]
    print(f"{sentiment}")

Enter full screen mode

Exit full screen mode

Dockerfile

FROM huggingface/transformers-pytorch-cpu:latest

WORKDIR /app

COPY inference.py .

# Pre-load the model during image build
RUN python3 -c "from transformers import AutoTokenizer, AutoModelForSequenceClassification; AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english'); AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')"
Enter full screen mode

Exit full screen mode

docker build -t sentiment-analysis .
Enter full screen mode

Exit full screen mode

Next, let’s set up a Tork job to perform inference.

sentiment.yaml

name: Sentiment analysis example
inputs:
  input_text: Today is a lovely day
  model_name: distilbert-base-uncased-finetuned-sst-2-english
output: "{{trim(tasks.sentimentResult)}}"
tasks:
  - name: Run sentiment analysis
    var: sentimentResult
    # the image we created in the previous step, 
    # but can be any image available from Docker hub
    # or any other image registry
    image: sentiment-analysis:latest
    run: |
      python3 inference.py > $TORK_OUTPUT
    env:
      INPUT_TEXT: "{{inputs.input_text}}"
      MODEL_NAME: "{{inputs.model_name}}"
Enter full screen mode

Exit full screen mode

Submit assignment. Tork jobs are executed asynchronously. After submitting a job, you will be given a job ID to track its progress:

JOB_ID=$(curl -s \
  -X POST \
  -H "content-type:text/yaml" \
  --data-binary @sentiment.yaml \
  http://localhost:8000/jobs | jq -r .id)
Enter full screen mode

Exit full screen mode

Poll the job’s status and wait for completion:

while true; do 
  state=$(curl -s http://localhost:8000/jobs/$JOB_ID | jq -r .state)
  echo "Status: $state"
  if [ "$state" = "COMPLETED" ]; then; 
     break 
  fi 
  sleep 1
done
Enter full screen mode

Exit full screen mode

Check job results:

curl -s http://localhost:8000/jobs/$JOB_ID | jq -r .result
Enter full screen mode

Exit full screen mode

Positive
Enter full screen mode

Exit full screen mode

try to change input_text exist sentiment.yaml and resubmit the job to get different results.


Next steps

Now that I have this basic proof of concept on my machine, I need to push this Docker image to the Docker registry so that any Tork worker on my production cluster can use it. But it seems like a viable approach.

The code for this article can be found at jitub.

If you are interested in learning more about Tork:

document: https://www.tork.run
rear end: https://github.com/runabol/tork
Web UI: https://github.com/runabol/tork-web

2024-12-22 14:56:47

Leave a Reply

Your email address will not be published. Required fields are marked *