All Posts

2026

Phase 1: Canonical filenames buy zero-config batching

A short naming convention like `NN_NNN_ego.mp4` encodes enough metadata to run the entire batch with one CLI flag. Why this is cheap to add, and what it forces you to *not* build.

05/11/2026 data pipelinecli

Phase 1: Don't use Label Studio Source Storage for local files

A short warning. LS's "Cloud Storage → Source Storage" feature looks like exactly what you want for local data. Use it and you get tens of thousands of phantom tasks that collide with the ones you actually imported.

05/11/2026 label studiodata pipeline

Phase 1: Four signals for review-frame sampling

A uniform "every Nth frame" sampler wastes labeler time on near-identical frames the model already nailed. Four signals do better — segment boundaries, per-segment uniform, low confidence, bbox jumps.

05/11/2026 computer visiondata pipelinelabel studio

Phase 1: Keep tests pure-Python with lazy imports

`mediapipe`, `ultralytics`, `cv2` are slow to import and need model weights at runtime. The trick that keeps the test suite small, fast, and weights-free is putting those imports inside function bodies, not at module top level.

05/11/2026 pythontestingcomputer vision

Phase 1: One module owns every path on disk

Why a video pre-annotation pipeline ends up with one `layout` module that knows where everything lives, and what breaks when six different parts of the codebase each compute paths their own way.

05/11/2026 data pipelinepythonarchitecture

Phase 2: the eval set must never see pre-annotations

The single decision that most human-in-the-loop projects get wrong. If your eval labels were seeded by the model's own predictions, every F1 number you ever report is biased toward the model. The fix is cheap on day one, expensive on day forty.

05/11/2026 evaluationhitlml ops

Phase 2: what counts as ground truth

When you harvest Label Studio exports to fine-tune the next model, treating *everything* in the export as ground truth is how you train the model on its own predictions. The filters worth applying before a single byte goes into a training set.

05/11/2026 computer visionlabel studiofine-tuning

Phase 2: don't retrain on every export

After every batch of human-corrected episodes gets exported from Label Studio, the temptation is to retrain immediately. The reasons not to, and a cheap cadence trigger that actually fires when retraining will help.

05/11/2026 fine-tuninghitlml ops

Phase 2: version the slice, not the snapshot

When you fine-tune model v3, you need to be able to answer "which exported corrections went into it". Snapshotting the whole training set is the obvious answer and the wrong one. Track the inputs and the derivation; the training set is a function of them.

05/11/2026 ml opsreproducibilityhitl

Phase 1: Why one episode becomes three Label Studio projects

A deep dive on the multi-project pattern for video pre-annotation — what forces the split, how one episode fans out, and when not to fight Label Studio's data model.

05/11/2026 computer visionlabel studiodata pipeline

Phase 1: Two fps knobs in a video pre-annotation pipeline

Inference frame rate and review-frame sampling look like one thing and aren't. What each knob actually buys, and what breaks if you treat them as the same.

05/11/2026 computer visiondata pipelinemediapipe

Phase 1: Notes from building a video pre-annotation pipeline

A Phase-1 pipeline for embodied-robot video data — MediaPipe + YOLO inference, action segmentation, Label Studio import — plus the boring path-abstraction decision that kept it from collapsing.

05/11/2026 computer visiondata pipelinelabel studio

CIT CTF 2026 · Debug Disaster: a leaky debug page and a forgotten route

Flask debug=True leaks more than tracebacks — it leaked the source code of a forgotten route that dumps .env in cleartext.

04/26/2026 CTFcit-ctf-2026web

CIT CTF 2026 · A Massive Problem: mass assignment via dict.update

The challenge name spells it out. At register time, record.update(incoming) lets the role field in the request body overwrite the hard-coded default.

04/26/2026 CTFcit-ctf-2026web

CIT CTF 2026: a few writeups worth keeping

I played CIT CTF 2026 over the holiday — this is the index post for a short series of writeups covering Web, Crypto and Misc challenges.

04/26/2026 CTFcit-ctf-2026web

CIT CTF 2026 · Baby Exponent: the most textbook RSA e=3

Public exponent e=3, plaintext small enough that m³ never overflowed the modulus. Integer cube root and done.

04/25/2026 CTFcit-ctf-2026crypto

CIT CTF 2026 · Dog Barking: three bark durations, one custom encoding

78 seconds of dog barks. Three distinct bark durations encode bit 0, bit 1, and the byte separator. Not Morse — a custom code.

04/25/2026 CTFcit-ctf-2026misc

CIT CTF 2026 · Server Components: RCE via Next.js 15 RSC deserialization

package.json pins next@15.0.4 — squarely inside the window for this year's React Server Components deserialization RCE.

04/25/2026 CTFcit-ctf-2026web