Building a Python Client
Python is the most common choice for Hypermass clients, especially when the data is being fed into AI models, data science pipelines, or automation scripts.
A simple file poller
This script uses a simple loop, watching for non-hidden files in the SUB_DIR
import time
from pathlib import Path
# 🔥 set this to the path configured in your hypermass-config.yaml subscription-targets
SUB_DIR = "<<YOUR PATH HERE>>"
# this function does the heavy lifting, processing the files in order
def scan_and_process():
files_in_sequence = sorted(
[f for f in Path(SUB_DIR).iterdir() if f.is_file() and not f.name.startswith(".")],
key=lambda x: x.stat().st_mtime
)
for file_path in files_in_sequence:
print(f"Processing: {file_path.name}")
# ⚠️⚠️⚠️ YOUR LOGIC HERE ⚠️⚠️⚠️
# e.g., data = file_path.read_text()
# Delete once processed
file_path.unlink()
print(f"Monitoring Hypermass subscription output folder {SUB_DIR}...")
while True:
scan_and_process()
time.sleep(5)
About this code
We suggest starting with this and modifying it to suit your needs. Some notes about the approach;
- Concise Filtering: The line
[f for f in Path(SUB_DIR).iterdir() if f.is_file() and not f.name.startswith(".")]handles both your file check and the.hypermassmetadata exclusion in one go. - Modification-Time Sorting: By using
st_mtime, you ensure that even if the CLI downloads three files in a single second (which can happen), you process them in the order they were finalized on your disk. - Restart tolerant This approach tolerates disconnections, the program being stopped for a while, likewise
hypermass syncbeing stopped.
The hypermass sync command uses hidden directories for metadata and temporary files - don't delete or move the hidden
.hypermass directory as it can break the sync process - the above code will ignore this directory.
How to productionise this code example
With a few simple steps you can turn this into production ready code.
Small file subscriptions
For small files, just load the whole content into memory with file_path.read_text()
Large file subscriptions
Dealing with large files typically involves stream processing. You may want to pass the file reference to your function and usa a stream processing library, depending on your use case, such as ijson for json, and Standard python SAX parser for xml.
Stop on fail
You might want to consider throwing an exception to break out of the poller loop if you can't parse a file, preserving the file sequence for when the issues are fixed.
Scheduling
This allows multiple processes to live in the same program. Depending on your needs this may be useful.
The schedule library is a great way to achieve this background polling behaviour with minimal code.
pip install schedule
Here's a snippet of code setting up the scheduler (replacing the while True: loop above)
import schedule
import time
import threading
# 🔥 the scan_and_process() function from above goes here 🔥
def run_scheduler():
schedule.every(5).seconds.do(scan_and_process)
threading.Thread(target=run_scheduler, daemon=True).start()
print("Scheduler started in the background...")