Skip to main content

Building a Python Client

Python is the most common choice for Hypermass clients, especially when the data is being fed into AI models, data science pipelines, or automation scripts.

A simple file poller

This script uses a simple loop, watching for non-hidden files in the SUB_DIR

import time
from pathlib import Path

# 🔥 set this to the path configured in your hypermass-config.yaml subscription-targets
SUB_DIR = "<<YOUR PATH HERE>>"

# this function does the heavy lifting, processing the files in order
def scan_and_process():
files_in_sequence = sorted(
[f for f in Path(SUB_DIR).iterdir() if f.is_file() and not f.name.startswith(".")],
key=lambda x: x.stat().st_mtime
)
for file_path in files_in_sequence:
print(f"Processing: {file_path.name}")

# ⚠️⚠️⚠️ YOUR LOGIC HERE ⚠️⚠️⚠️
# e.g., data = file_path.read_text()

# Delete once processed
file_path.unlink()

print(f"Monitoring Hypermass subscription output folder {SUB_DIR}...")

while True:
scan_and_process()
time.sleep(5)

About this code

We suggest starting with this and modifying it to suit your needs. Some notes about the approach;

  • Concise Filtering: The line [f for f in Path(SUB_DIR).iterdir() if f.is_file() and not f.name.startswith(".")] handles both your file check and the .hypermass metadata exclusion in one go.
  • Modification-Time Sorting: By using st_mtime, you ensure that even if the CLI downloads three files in a single second (which can happen), you process them in the order they were finalized on your disk.
  • Restart tolerant This approach tolerates disconnections, the program being stopped for a while, likewise hypermass sync being stopped.
note

The hypermass sync command uses hidden directories for metadata and temporary files - don't delete or move the hidden .hypermass directory as it can break the sync process - the above code will ignore this directory.

How to productionise this code example

With a few simple steps you can turn this into production ready code.

Small file subscriptions

For small files, just load the whole content into memory with file_path.read_text()

Large file subscriptions

Dealing with large files typically involves stream processing. You may want to pass the file reference to your function and usa a stream processing library, depending on your use case, such as ijson for json, and Standard python SAX parser for xml.

Stop on fail

You might want to consider throwing an exception to break out of the poller loop if you can't parse a file, preserving the file sequence for when the issues are fixed.

Scheduling

This allows multiple processes to live in the same program. Depending on your needs this may be useful.

The schedule library is a great way to achieve this background polling behaviour with minimal code.

pip install schedule

Here's a snippet of code setting up the scheduler (replacing the while True: loop above)

import schedule
import time
import threading

# 🔥 the scan_and_process() function from above goes here 🔥

def run_scheduler():
schedule.every(5).seconds.do(scan_and_process)

threading.Thread(target=run_scheduler, daemon=True).start()

print("Scheduler started in the background...")